Embedded Soft-Core Processor-Based Built-In Self-Test of Field Programmable Gate Arrays by Bradley Fletcher Dutton A thesis submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree of Master of Science Auburn, Alabama May 14, 2010 Keywords: Built-In Self-Test, Field Programmable Gate Array, Fault Tolerance, Single-Event Upset Detection and Correction Copyright 2010 by Bradley Fletcher Dutton Approved by Charles E. Stroud, Chair, Professor of Electrical and Computer Engineering Vishwani D. Agrawal, Professor of Electrical and Computer Engineering Victor P. Nelson, Professor of Electrical and Computer Engineering ii Abstract The exponential growth in the number of transistors on very large scale integration (VLSI) integrated circuits (ICs), coupled with increasing device interface bandwidth and new surface mount and low profile packaging technologies, have made testing of ICs increasingly difficult and costly at all levels of the testing process. Field programmable gate arrays (FPGAs) pose a particularly difficult problem for test engineers due to their programmable nature, overall size and complexity, limited number of inputs/outputs (I/O), and large number and variety of embedded cores. In addition to manufacturing defects, ?soft? errors due to single event upsets (SEUs) have become a serious problem because of the increasing size of the configuration memory in FPGAs and shrinking design rules, even in fault-tolerant systems operating at ground level. Building on previous work, this thesis uses built-in self-test (BIST) as a solution to the testing problem for Xilinx Virtex-5 FPGAs. BIST configurations are presented for the configurable logic blocks (CLBs), I/O Tiles, and SEU detection/correction cores in Xilinx Virtex-5 FPGAs. In addition, this thesis presents a novel approach to BIST that uses a soft-core processor configured in the fabric of the device under test to perform reconfiguration of the resources under test, control the BIST execution, and perform fault diagnosis. This approach is particularly useful for in-system testing of FPGAs in fault-tolerant or high-reliability systems because it greatly reduces the amount and complexity of external hardware required for test. To combat the problem of ?soft? errors due to SEUs that can occur in the FPGA configuration memory during normal operation, an approach for on-line detection and correction of SEUs in iii the configuration memory of Xilinx Virtex-4 and Virtex-5 FPGAs is also presented. While not entirely immune to SEU effects, this approach greatly reduces the probability of an SEU induced failure in the user logic, and no single error from an SEU can cause a complete system failure. iv Acknowledgments First, I would like to thank Dr. Stroud for three great years of guidance, encouragement, employment, and education. You have taught me most of what I know about being an engineer, and what I appreciate most in hindsight is that you?ve always challenged me to be the best. I might not have even gone to graduate school if not for you. I also would like to thank the many students that I?ve had a chance to work with and learn from while in the BIST lab. Lee, Daniel, and Bobby: I learned a lot from you guys and, honestly, the lab was never the same without you (Bobby, you especially: I can?t help laughing even as I write this). To the students that came later ? Jia, Mary, Brooks, and Joey ? thanks for being good friends through thick and thin and for making time spent at work more fun. To Joseph and Jie: for being the best engineers my age that I?ve ever met, and, therefore, inspiring me to always work a little harder. I would also like especially to thank my mom and dad for always being supportive in everything that I?ve done. Robbie: for being my best and oldest friend and future business partner (or future landlord, if engineering doesn?t work out). And Bo and Samantha, thanks for dragging me out of my room and keeping me up late, regardless of projects or exams, and for teaching me some things that cannot be learned in a classroom. v Table of Contents Abstract ........................................................................................................................................... ii Acknowledgments.......................................................................................................................... iv List of Tables ................................................................................................................................. ix List of Figures ................................................................................................................................ xi List of Abbreviations .................................................................................................................... xv Chapter One. Introduction .............................................................................................................. 1 1.1 Overview of Built-In Self-Test ............................................................................................. 2 1.2 Introduction to Field Programmable Gate Arrays (FPGAs) ................................................. 4 1.3 Overview of Virtex-5 FPGAs ............................................................................................... 7 1.4 BIST for FPGAs ................................................................................................................. 10 1.5 Single Event Upsets in FPGAs ........................................................................................... 11 1.6 Verification by Fault Injection ............................................................................................ 13 1.7 Thesis Statement ................................................................................................................. 14 1.8 Thesis Format ..................................................................................................................... 15 1.9 References ........................................................................................................................... 15 Chapter Two. Built-In Self-Test of Configurable Logic Blocks in Virtex-5 FPGAs ................... 18 2.1 Introduction And Background ............................................................................................ 18 2.2 Overview of Virtex-5 CLBs ............................................................................................... 20 2.3 BIST Approach And Architecture ...................................................................................... 22 2.4 Experimental Results .......................................................................................................... 26 vi 2.5 Summary And Conclusions ................................................................................................ 32 2.6 Acknowledgements ............................................................................................................. 33 2.7 References ........................................................................................................................... 34 Chapter Three. Built-In Self-Test of Programmable Input/Output Tiles in Virtex-5 FPGAs ...... 35 3.1 Introduction ......................................................................................................................... 35 3.2 Prior Work .......................................................................................................................... 37 3.3 Overview of Virtex-5 I/O Tiles .......................................................................................... 38 3.4 Overview of BIST Architecture .......................................................................................... 39 3.5 Configurations for I/O Logic Modes .................................................................................. 43 3.6 Configurations for I/O SerDes Modes ................................................................................ 43 3.7 Experimental Results .......................................................................................................... 45 3.8 BIST for Programmable I/O buffers ................................................................................... 48 3.9 Conclusions ......................................................................................................................... 49 3.10 Acknowledgements ........................................................................................................... 50 3.11 References ......................................................................................................................... 50 Chapter Four. Built-In Self-Test of SEU Detection Cores in Virtex-4 and Virtex-5 FPGAs ...... 52 4.1 Introduction ......................................................................................................................... 52 4.2 Frame ECC and ICAP Logic .............................................................................................. 54 4.3 Test Algorithm .................................................................................................................... 57 4.4 BIST Approach ................................................................................................................... 59 4.4.1 Test Pattern Generator ..................................................................................................... 60 4.4.2 Output Response Analyzer .............................................................................................. 62 4.4.3 Additional Logic .............................................................................................................. 64 4.5 Implementation Results ...................................................................................................... 64 4.6 Conclusions ......................................................................................................................... 70 vii 4.7 Acknowledgements ............................................................................................................. 70 4.8 References ........................................................................................................................... 71 Chapter Five. Embedded Processor Based Fault Injection and SEU Emulation for FPGAs ....... 73 5.1 Introduction and Background ............................................................................................. 73 5.2 Hard Core Processor Case Study ........................................................................................ 75 5.3 Soft Core Processor Case Study ......................................................................................... 79 5.3.1 Overview of Approach ..................................................................................................... 80 5.3.2 Architecture and Operation .............................................................................................. 85 5.3.3 Implementation Results ................................................................................................... 88 5.4 Summary and Conclusions ................................................................................................. 92 5.5 Acknowledgements ............................................................................................................. 92 5.6 References ........................................................................................................................... 93 Chapter Six. Soft-Core Embedded Processor-Based Built-In Self-Test of FPGAs ...................... 95 6.1 Introduction ......................................................................................................................... 95 6.2 Background ......................................................................................................................... 96 6.3 Embedded BIST Architecture ........................................................................................... 100 6.4 Software Development ..................................................................................................... 104 6.5 Design Flow and Implementation Results ........................................................................ 109 6.6 Conclusions ....................................................................................................................... 111 6.7 Acknowledgements ........................................................................................................... 111 6.8 References ......................................................................................................................... 112 Chapter Seven. Soft-Core Embedded Processor-Based Built-In Self-Test of FPGAs Case Study ..................................................................................................................................................... 113 7.1 Introduction ....................................................................................................................... 113 7.2 Background ....................................................................................................................... 114 7.3 Results of Implementation in Virtex-5 ............................................................................. 118 viii 7.4 Future Improvements ........................................................................................................ 123 7.5 Other Applications ............................................................................................................ 124 7.6 Conclusions ....................................................................................................................... 125 7.7 Acknowledgements ........................................................................................................... 127 7.8 References ......................................................................................................................... 127 Chapter Eight. On-line Single Event Upset Detection and Correction in Field Programmable Gate Array Configuration Memories .......................................................................................... 129 8.1 Introduction ....................................................................................................................... 129 8.2 Background ....................................................................................................................... 134 8.3 Operation of SEU Detect and Correct .............................................................................. 137 8.4 SEU Detect and Correct Architecture ............................................................................... 140 8.5 Implementation Results .................................................................................................... 145 8.6 Experimental Results ........................................................................................................ 149 8.7 Conclusions ....................................................................................................................... 153 8.8 Acknowledgements ........................................................................................................... 154 8.9 References ......................................................................................................................... 155 Chapter Nine. Summary and Conclusions .................................................................................. 157 9.1 Summary of Work ............................................................................................................ 157 9.2 Future Work ...................................................................................................................... 160 Bibliography ............................................................................................................................... 162 ix List of Tables Table 2.1: List of acronyms .......................................................................................................... 20 Table 2.2: SliceL logic BIST configurations ................................................................................ 25 Table 2.3: SliceM BIST configurations ........................................................................................ 26 Table 2.4: CLB BIST totals (17 configurations) .......................................................................... 32 Table 3.1: I/O tile BIST totals (15 configurations) ....................................................................... 48 Table 4.1: Frame ECC codes ........................................................................................................ 55 Table 4.2: Hamming parity matrix example ................................................................................. 57 Table 4.3: ICAP and Frame ECC BIST summary ........................................................................ 70 Table 5.1: Embedded fault injection run time analysis for AT94K40 .......................................... 78 Table 5.2: Parity bit encoding, where X = don?t care ................................................................... 87 Table 5.3: Embedded fault list format .......................................................................................... 87 Table 5.4: Embedded fault injection core resources ..................................................................... 88 Table 5.5: Fault/SEU injection core I/O descriptions ................................................................... 91 Table 6.1: BIST control registers ................................................................................................ 103 Table 6.2: Compressed partial reconfiguration data size ............................................................ 107 Table 7.1: Test configurations developed for various FPGAs .................................................... 115 Table 8.1: Memory resources in two Virtex-5 FPGAs ............................................................... 130 Table 8.2: Frame ECC error codes [25][26] ............................................................................... 135 Table 8.3: Hamming bit error diagnosis [25][26] ....................................................................... 143 x Table 8.4: SEU controller resource utilization in Virtex-4 devices ............................................ 148 Table 8.5: SEU controller resource utilization in Virtex-5 devices ............................................ 148 Table 8.6: SEU emulation results ............................................................................................... 152 Table 8.7: Approximate number of configuration bits for common resources [5] ..................... 153 xi List of Figures Figure 1.1: Basic BIST architecture [3] .......................................................................................... 3 Figure 1.2: Typical custom ASIC, standard cell ASIC, and FPGA cost vs. volume ...................... 5 Figure 1.3: Typical FPGA architecture [12] ................................................................................... 6 Figure 1.4: Simplified basic logic element ..................................................................................... 7 Figure 1.5: Virtex-5 configurable logic block [15] ......................................................................... 8 Figure 1.6: Virtex-5 6-Input LUT [16] ........................................................................................... 9 Figure 1.7: Illustration of a single-event effect in a CMOS inverter ............................................ 12 Figure 2.1: Simplified basic logic element ................................................................................... 21 Figure 2.2: Virtex-5 configurable logic block [11] ....................................................................... 21 Figure 2.3: Circular comparison architecture ............................................................................... 23 Figure 2.4: Equivalent ORA architecture ..................................................................................... 24 Figure 2.5: SliceL fault coverage (simulation) ............................................................................. 29 Figure 2.6: SliceL fault coverage (fault injection) ........................................................................ 29 Figure 2.7: SliceM fault coverage (simulation) ............................................................................ 30 Figure 2.8: SliceM fault coverage (fault injection) ....................................................................... 30 Figure 2.9: Boundary Scan interface test time .............................................................................. 31 Figure 2.10: 32-bit parallel interface test time .............................................................................. 31 Figure 3.1: Simplified programmable I/O cell .............................................................................. 37 Figure 3.2: Virtex-5 programmable I/O tile .................................................................................. 38 xii Figure 3.3: Column oriented circular comparison ........................................................................ 40 Figure 3.4: Virtex-5 equivalent ORA architecture ....................................................................... 41 Figure 3.5: Bitslip synchronizer circuit ........................................................................................ 45 Figure 3.6: 50 MHz Boundary Scan configuration interface test time ......................................... 47 Figure 3.7: 100 MHz 32-bit parallel configuration interface test time ......................................... 47 Figure 4.1: Frame ECC and ICAP primitives ............................................................................... 56 Figure 4.2: Sequential Hamming bit calculation .......................................................................... 60 Figure 4.3: Test pattern write sequence via ICAP interface ......................................................... 61 Figure 4.4: Test pattern read sequence via ICAP interface .......................................................... 61 Figure 4.5: ICAP and Frame ECC BIST architecture. .................................................................. 65 Figure 4.6: BIST VHDL component declaration. ......................................................................... 66 Figure 4.7: Virtex-4 FX12 with ICAP/Frame ECC BIST ............................................................ 68 Figure 4.8: Virtex-5 LX20T with ICAP/Frame ECC BIST .......................................................... 69 Figure 5.1: AT94K series SoC architecture .................................................................................. 76 Figure 5.2: AT94K routing architecture ....................................................................................... 77 Figure 5.3: SliceL simulation stuck-at fault coverage .................................................................. 83 Figure 5.4: SliceL fault injection stuck-at fault coverage ............................................................. 83 Figure 5.5: Total CLB test time via Boundary Scan ..................................................................... 84 Figure 5.6: Frame read-modify-write flowchart ........................................................................... 85 Figure 5.7: Block diagram of fault injection core ......................................................................... 88 Figure 5.8: Routed embedded fault inject core (right) with half-array of routed CLB BIST (left) in Virtex-5 LX20T ........................................................................................................................ 90 Figure 5.9: Fault inject core component declaration .................................................................... 91 xiii Figure 6.1: Configurable logic block (CLB) BIST architecture ................................................... 97 Figure 6.2: Embedded soft core processor based BIST architecture .......................................... 103 Figure 6.3: Embedded processor BIST algorithms ..................................................................... 105 Figure 6.4: Compressed BIST partial reconfiguration structure in C ......................................... 107 Figure 6.5: Original reconfiguration file sizes and compressed data structure sizes for one CRC BIST and a set of 5 I/O Logic BIST partial reconfigurations ..................................................... 108 Figure 6.6: Embedded processor BIST design implementation ................................................. 110 Figure 7.1: Simplified soft-core processor-based BIST architecture .......................................... 117 Figure 7.2: Unrouted embedded processor-based BIST configuration for top configurable logic blocks (CLB) in Virtex-5 LX30T viewed in FPGA Editor ........................................................ 119 Figure 7.3: CLB BIST test time for external configuration (full compressed and partial compressed bitstreams) and embedded processor test time ........................................................ 120 Figure 7.4: Contribution to embedded processor-based CLB BIST test time by initial external configuration and by five internal partial reconfigurations ........................................................ 122 Figure 7.5: Comparison of CLB BIST ORA read back times with embedded processor-based approach and external Boundary Scan interface ......................................................................... 122 Figure 7.6: 32-bit, 100 MHz interface test time for full chip CLB west or east with one full compressed configuration and five partial reconfigurations ....................................................... 124 Figure 8.1: FIT rate (corrected for sea-level New York, NY) versus Xilinx device family, initial release year, and minimum feature size [6] where the center line represents the nominal value and the span of the line represents the upper and lower 95% confidence levels ........................ 131 Figure 8.2: Frame ECC and ICAP primitives ............................................................................. 137 Figure 8.3: SEU controller VHDL component declaration ........................................................ 137 xiv Figure 8.4: SEU controller behavioral pseudocode .................................................................... 139 Figure 8.5: SEU controller block diagram .................................................................................. 142 Figure 8.6: SEU controller LOG cycle time vs. Virtex-4 device ................................................ 146 Figure 8.7: SEU controller cycle time vs. Virtex-5 device ......................................................... 147 Figure 8.8: Routed SEU controller implemented in Virtex-5 LX20T device............................. 150 xv List of Abbreviations ATE Automatic Test Equipment BIST Built-In Self Test BRAM Block RAM BSCAN Boundary Scan BUT Block under Test CAD Computer-aided Design CLB Configurable Logic Block CMOS Complementary Metal-oxide-semiconductor CUT Circuit under Test DFT Design for Testability DSP Digital Signal Processor DUT Device Under Test ECC Error Correction Code FF Flip-flop FIFO First-in First-out FPGA Field Programmable Gate Array FSM Finite State Machine GUI Graphical User Interface HDL Hardware Description Language xvi I/O Input / Output IC Integrated Circuit ICAP Internal Configuration Access Port IP Intellectual Property LUT Look-up Table LSB Least Significant Bit MSB Most Significant Bit ORA Output Response Analyzer PIP Programmable Interconnect Point PLB Programmable Logic Block RAM Random Access Memory SERDES Serializer / Deserializer SEU Single Event Upset SoC System-on-Chip SRAM Static Random Access Memory TCK Test Clock TDI Test Data In TDO Test Data Out TMS Test Mode Select TPG Test Pattern Generator VLSI Very Large Scale Integration 1 Chapter One. Introduction Moore?s law, which predicts a doubling of integrated circuit (IC) transistor density every 18 to 24 months, has been an accurate predictor of the exponential growth in the number of transistors in ICs since it was first observed by Gordon Moore in 1965 [1]. According to the most recent International Technology Roadmap for Semiconductors (ITRS) report, minimum feature size is expected to continue to decrease by a factor of two (e.g. transistor density will increase by a factor of two) every two years until 2022 [2]. With very large-scale integration (VLSI) circuits already surpassing the one billion transistor mark in 2008, this report, in accordance with Moore?s law, predicts that the number of transistors on a single IC of comparable physical area will exceed 128 billion by 2022. Increasing transistor count and density and increasing device interface bandwidth, coupled with new surface mount and low profile packaging technologies, have made testing of integrated circuits increasingly difficult and costly at all levels of the testing process [3] [4]. In addition, larger device sizes and smaller feature sizes have increased both the number and type of faults that can occur [4]. Testing embedded resources in VLSI devices is especially difficult because their embedded nature makes them difficult to control and observe from the external chip I/O; furthermore, the number of external I/O is continually decreasing in proportion to the number of transistors on a single die [4]. While the number of I/O has increased by an order of magnitude for most VLSI devices, the number of transistors on a single die increased by more than 4 orders of magnitude over the same time period [4]. (This trend is commonly called Rent?s Rule, for E. F. Rent of IBM, who was the first to investigate a relationship between the number 2 of I/O and the number of internal logic blocks in 1960 [5]). Due to the limited number of external I/O in proportion to the number of transistors on a chip, and without the inclusion of any additional test circuitry, the controllability and observability of most VLSI designs are severely limited during testing. Another factor affecting testing of VLSI ICs is the cost of automatic test equipment (ATE). While the cost of manufacturing transistors in VLSI circuits has continued to decrease with each new technology node, the cost of testing has increased both in absolute terms and in proportion to overall manufacturing cost. In fact, the cost of testing a single transistor already exceeds its cost of production [3], and due to the ever increasing density and bandwidth of integrated circuits, testing costs will continue to rise. It is expected that by the year 2014, the cost of a leading edge VLSI test machine will exceed twenty million dollars [4]. Consequently, design for testability (DFT) methods, which incorporate additional test circuitry during the design phase to increase circuit controllability and observability during testing, are included in some form in virtually every VLSI design. Two of the most common DFT techniques are scan design and built-in self-test (BIST). Another DFT method, known as Boundary Scan or JTAG (Joint Test Action Group) [6], is usually included to facilitate board-level testing of systems with high pin-count and surface mount components [3] [7]. A recent offshoot of Boundary Scan, IEEE standard 1500-2005 [8], describes a scalable wrapper architecture and control mechanism for testing embedded cores in System-on-Chip (SOC) devices and the interconnect between cores [3]. The primary focus of this thesis will be on BIST as a solution for testing VLSI ICs. 1.1 Overview of Built-In Self-Test BIST was introduced around 1980 as a way to test embedded cores in VLSI devices [4]. The basic idea of BIST is to incorporate extra circuitry and functionality in the device under test 3 such that the circuit can test itself [3] [4]. This implies that the circuit is capable of generating test patterns and compacting output responses. Therefore, BIST, in contrast to other techniques such as scan design which relies on externally applied test patterns, does not require costly ATE hardware. In addition, many BIST techniques are applicable at every level of the testing process, from wafer-level manufacturing test to board-level and in-system test. Another advantage of some BIST approaches when compared to scan-based test techniques is that patterns can be applied to the circuit under test and the output responses monitored at system speeds, which facilitates the detection of delay and coupling faults [4] [9]. A simple BIST architecture, shown in Figure 1.1, consists of a test pattern generator (TPG), output response analyzer (ORA), circuit under test (CUT), and some additional control circuitry [3] [4]. For system-level use of BIST, input isolation circuitry and a dedicated BIST controller must be included. The BIST controller can be used to initiate the BIST, initialize the CUT, activate the input isolation circuitry, and provide an indication when the test is complete. During off-line tests, the TPG generates a set of test patterns which are applied to the circuit under test (CUT) to sensitize potential fault sites, and the ORA compacts the output response of the CUT. At the conclusion of the test, the results are determined by examination of the ORA contents (generally, by comparison to the fault-free circuit ?signature?) [4]. Figure 1.1: Basic BIST architecture [3] Test Pattern Generator (TPG) Output Response Analyzer (ORA) Circuit Under Test (CUT) Input Isolation Circuit System Inputs Pass/Fail System Outputs 4 There are some costs associated with BIST that must be taken into consideration. In ASICs, BIST requires additional circuitry and functionality that results in area and performance penalties. This additional circuitry is shown in gray in Figure 1.1. Typically, the performance penalty is minimal, amounting to no more than a multiplexer delay in the primary input data path and additional fan-out in the primary output data path of the circuit under test. The area penalty varies depending on the exact BIST architecture used (which is, in turn, usually a function of desired fault coverage and the type of circuit under test). This additional area is disadvantageous because larger chip areas result in fewer chips per wafer, and, therefore, higher cost per chip due to lower yield [4]. Also, some additional I/O pins may be required for activation of the BIST circuitry and results retrieval [4]. The inclusion of BIST also increases the design effort and risk to the project, because, on top of designing the system function, the BIST circuitry must also be designed and verified. However, most case studies have found that the benefits of BIST usually outweigh the costs (including addition design time and overhead) when included in a project [4], and many computer-aided design (CAD) tools now support automatic insertion of pre- engineered BIST circuitry during the design phase, which reduces the design effort and risk to the project. 1.2 Introduction to Field Programmable Gate Arrays (FPGAs) Field Programmable Gate Arrays (FPGAs) are pre-fabricated semiconductor devices that can be programmed (i.e. configured) after manufacturing to perform complex sequential or combinational logic functions. Compared to standard-cell or custom ASIC designs, FPGAs provide lower non-recurring engineering costs and faster time-to-market [10]. The non-recurring engineering costs associated with the design and manufacture of FPGAs are initially absorbed by the manufacturer and are passed to the customer in the form of a higher price-per-part. This cost, 5 coupled with the cost of the additional logic required for programming of the device, makes the recurring costs of designs with FPGAs higher than those with ASICs. For these reasons, FPGAs are commonly used for rapid prototyping of designs prior to first silicon and in low-volume, highly-specialized digital systems (where the FPGA is used in lieu of an ASIC). An illustration of the total cost (i.e. recurring plus non-recurring costs) as a function of volume (number of parts) for a design implemented as a standard-cell ASIC, as a custom ASIC, and in an FPGA is shown in Figure 1.2 [10]. Figure 1.2: Typical custom ASIC, standard cell ASIC, and FPGA cost vs. volume Due to the programmable nature of FPGAs, area, power and performance penalties are incurred for designs implemented in FPGAs when compared to the same design implemented as an ASIC. For several benchmark circuits implemented in both a 90 nm FPGA and 90 nm standard-cell ASIC, the FPGA implementation required between 18 and 35 times greater silicon area, and the critical path delay of the circuit increased by 3 to 4 times versus the ASIC implementation [11]. A typical FPGA is composed of an array of programmable logic blocks (PLB) (also called configurable logic blocks, or CLB) and input/output (I/O) cells connected by a Custom ASIC Standard-Cell ASIC FPGA Number of parts Total cost FPGA & Standard Cell ASIC break- even point 6 programmable interconnect network, as illustrated in Figure 1.3 [12]. Most modern FPGAs also include ?hard? cores such as reduced instruction set computer (RISC) or complex instruction set computer (CISC) processors, digital signal processors (DSPs), random access memories (RAMs), and high-speed serializer/deserializer (SERDES) input/output (I/O) cells. These ?hard? cores can perform certain common functions, such as multiply/accumulate or serialization/deserialization, with greater efficiency than can be achieved by implementing the same function in CLBs, which helps to reduce the performance/area penalties when compared with ASICs [11]. Figure 1.3: Typical FPGA architecture [12] The front-end of the FPGA design process is identical to that for a standard-cell ASIC. However, the post synthesis design flow is much less complex for FPGA implementations. After behavioral simulation and functional verification, computer aided design (CAD) tools (usually supplied by the FPGA manufacturer, but also available through third parties) translate the digital designs in Hardware Description Language (HDL) or schematic form to a device specific netlist which maps the design into the FPGA?s configurable logic and programmable routing network. PLB PLB PLB PLB PLB PLB PLB PLB PLB PLB PLB PLB PLB PLB PLB PLB Input/Output Cell (I/O Cell) Programmable Interconnect Network 7 A configuration bit-file is generated from this netlist and downloaded to the configuration memory of the FPGA to implement the desired user function. 1.3 Overview of Virtex-5 FPGAs This body of work is primarily concerned with Xilinx Virtex-5 FPGAs. Virtex-5 FPGAs are fabricated in a 1.0 V, 65 nm CMOS copper process with 12 metal layers [13]. The number of flip-flops and LUTs in a single Virtex-5 device ranges from 12,480 up to 207,360. As many as 1,200 user I/O are available in the highest pin-count package [13]. The configuration memory in all Virtex-5 devices is a large static random access memory (SRAM), ranging in size from 4.94 Mb (4,935,744 bits) to 82.7 Mb (82,687,488 bits) [14]. Each CLB in an FPGA consists of one or more basic logic elements. The Virtex-5 basic logic element, illustrated in Figure 1.4, comprises a six-input look-up table (LUT), a configurable flip-flop/latch (FF/LAT), a multiplexor to control the combinational output, and a multiplexor to control the registered output (FF/LAT input) [15]. Figure 1.4: Simplified basic logic element Additional dedicated carry logic is included to perform special logic and arithmetic functions. In some slices, the LUT can be configured as a small RAM, called a free RAM or LUT RAM, with an independent read and shared write address input. Four such logic elements LUT/ RAM Carry Logic FF/ LAT 6 CIN COUT 8 are grouped to form a slice, and two slices are grouped to form a complete configurable logic block (CLB), as illustrated in Figure 1.5. The logic blocks are replicated and tiled in columns and rows, as in Figure 1.4, and are connected via programmable switch-boxes to local and global routing resources. Larger devices include more CLBs, but the structure of the CLB is identical across all devices in the FPGA family [15]. Figure 1.5: Virtex-5 configurable logic block [15] The LUTs in Virtex-5 devices are designed with two outputs each. The primary output can utilize the full 64-bit LUT to implement any six variable Boolean function. The second output can be used to control the carry chain, or both outputs can implement two five variable Boolean functions for five shared inputs. Both outputs can be selected by the multiplexors for the registered or combinatorial CLB output paths. A block diagram of the Virtex-5 6-input LUT is shown in Figure 1.6 [16]. Select slices also support RAM and shift register modes of operation. Each LUT can be configured as a simple 64 x 1-bit or 32 x 2-bit RAM. Dynamic multiplexors in each slice allow for Shannon expansion of the four slice LUTs to form a 256 x 1-bit RAM. Additionally, the four slice LUTs can share address inputs to form a 32 x 8-bit RAM. Each LUT can also form a single 32-bit or two 16-bit shift registers. The four LUTs in the slice can be cascaded to form a 128-bit shift register or can operate in parallel form a 16 x 8-bit shift register in a slice [15] [16]. COUT COUT Switch Matrix Memory Slice(0) SliceM Logic Slice(1) SliceL CLB CIN CIN 9 Figure 1.6: Virtex-5 6-Input LUT [16] In addition to CLBs, every device in the Virtex-5 family includes DSP and Block RAM ?hard? cores. Each DSP core can perform 25 x 18 2?s complement multiplication, and includes an adder/subtractor/accumulator block. The DSP can also perform bit-wise logic operations including NOR, OR, AND, NAND, XNOR, and XOR. Up to five pipeline registers may be configured for use in the data path for increased throughput (up to 550 MHz) in high performance applications [17]. Each Block RAM core is 36 Kbit in size, with true dual-port read/write access to each memory element. Each of the read and write ports are configurable, such that the address and data bus widths can vary from 32K x 1-bit to 1K x 72-bit. In addition, the Block RAM can operate in a FIFO mode (with configurable data width and programmable almost-full and almost-empty flags) and/or in an error correction code (ECC) mode [15]. Some devices in the Virtex-5 family also include other ?hard? cores such as gigabit transceivers, Ethernet MACs, PCI Express blocks, and/or Power PC processors [13]. A2 A3 A4 A5 A6 LUT5 A2 A3 A4 A5 A6 LUT5 O5 O6 LUT6 A6 A5 A4 A3 A2 A1 10 1.4 BIST for FPGAs Testing FPGAs is difficult when compared to testing ASICs because of their programmable nature and overall complexity [9]. Each of the programmable resources must be tested in all modes of operation to achieve high fault coverage. This implies that multiple re- configurations of the device are required during testing. Because the total test time is usually dominated by the time spent configuring the device under test, the size of FPGA configuration memories is also a factor in testing [9]. FPGAs are, in general, not well-suited for scan-based testing methods. However, the programmable nature of FPGAs allows for the creation of test circuitry in the programmable logic during testing. In addition, the regular structure of FPGAs makes pseudo-exhaustive test methods highly efficient [4] [9] [17] [18] [19]. BIST for FPGAs exploits the re-programmability of FPGAs to create BIST circuitry in the FPGA fabric during manufacturing and system-level off-line testing [4] [9] [17] [18] [19]. The only overhead is the external memory required to store the BIST configurations along with the time required to download and execute the BIST. No area overhead or performance penalties are incurred in the user function because the BIST logic is replaced by the intended system function after testing is complete. The BIST configurations are applicable to all levels of testing because they are independent of the intended system function and require no specialized external test fixture or equipment. Most research and development in BIST for FPGAs has focused on reducing the number of test configurations, reducing the size of test configuration files, and decreasing BIST execution time [4] [7] [8] [23]. Other research has focused on developing BIST techniques for the complex embedded cores included in many modern FPGAs, such as DSPs [24] and RAMs [3] [5]. This thesis presents new BIST approaches for the CLBs, I/O Tiles, and SEU detection cores in Virtex-5 FPGAs. 11 This thesis also presents a new approach to BIST for FPGAs that utilizes a soft-core processor configured in the fabric of the FPGA under test to execute the BIST sequence, including retrieval and analysis (fault diagnosis) of BIST results and reconfiguration of the FPGA for subsequent BIST configurations. The approach reduces the required number of configurations for BIST of any logic resource to a maximum of four, and by moving the complex BIST controller logic into the FPGA fabric, the external hardware requirements for BIST of FPGAs is greatly reduced. This approach is particularly useful in high-reliability and fault- tolerant applications, especially when fault-diagnosis is required. 1.5 Single Event Upsets in FPGAs BIST is typically targeted at detecting manufacturing defects or ?hard? faults that appear during normal operation. However, ?soft? errors, known as Single Event Upsets (SEUs), are known to affect the configuration memory and other memory elements of FPGAs during normal operation. These errors are caused when charged particles, such as heavy ions or protons, travel through the FPGA, as illustrated in Figure 1.7 [27]. These particles can alter the state of any static memory element, resulting in an SEU [27] [28] [29]. While SEUs occur more frequently in high radiation environments such as space, they have also been experimentally observed in FPGAs at ground level [28] [29] [30]. Because the configuration memory of an FPGA establishes the overall system function performed by the FPGA, an SEU in the configuration memory can alter the FPGA functionality. This, coupled with the large size of the configuration memory, makes SEUs a significantly greater concern in FPGAs than in typical ASICs [31]. 12 Figure 1.7: Illustration of a single-event effect in a CMOS inverter Several methods exist to mitigate the effects of SEUs in FPGAs. The most common methods include power cycling, triple modular redundancy, redundant devices, and active configuration memory scrubbing [27]. Power cycling is essentially the simplest form of configuration memory scrubbing, because the entire configuration memory is refreshed (from a radiation hardened memory) each time that power is cycled off and on. When a power cycling mitigation scheme is employed, SEUs can persist in memory elements for a period of time equal to the power-cycling period. This approach is usually sufficient for non-critical applications in low radiation environments [27]. Triple modular redundancy creates three identical copies of the user function in the FPGA fabric and adds majority voters on the inputs to all flip-flops and on all primary outputs of the circuit [32]. This approach is very robust: any single SEU cannot cause the circuit to malfunction, and multiple SEUs must alter the same flip-flop input or primary output in two circuit copies on the same clock cycle in order for the error to propagate. However, the area penalty for any TMR approach is greater than 200% of the original circuit size, which increases system cost and power requirements. Also, circuit performance can be adversely impacted due 13 to the increased size of the circuit and inclusion of majority voters in critical paths [27]. Duplicating the user function in multiple FPGAs and performing voting on the outputs of the FPGAs in a radiation hardened device is the most robust from of SEU mitigation. However, designing systems with multiple FPGAs is both costly and difficult, and requires special design considerations such that the FPGAs remain synchronized after an SEU is repaired in any one of the devices [27]. Active configuration memory management (also called active configuration memory scrubbing) utilizes error correction code (ECC) stored with configuration data in the configuration memory to actively detect and repair SEUs [14]. The ECC, in conjunction with some additional user-accessible dedicated logic, can be used to detect SEUs in the configuration memory [15]. This approach incurs minimal area overhead, and SEUs persist for only a small window of time. The configuration management hardware may be hosted on an external radiation hardened FPGA, microprocessor, ASIC, or in the FPGA itself. However, in the latter case, the circuitry responsible for the repair of SEUs is also susceptible to SEUs [31]. Therefore, the area of the detection and repair circuitry should be minimized to decrease the probability of an SEU in that logic. An active configuration memory management approach for Xilinx Virtex- 4 and Virtex-5 FPGAs that requires no additional external hardware is described in this thesis. 1.6 Verification by Fault Injection During the development of BIST approaches for FPGAs, it is necessary to verify the fault coverage of the BIST configurations. It is difficult to find actual faulty devices and their usefulness is limited due to the fixed nature of the faults. Physical faults can be created by etching the packaged device and creating opens or shorts in routing resources that lie at the top level of interconnect metal for example, but once again the usefulness of these devices is limited. 14 A more efficient approach is to manipulate the configuration memory bits to emulate physical faults in the device [33] [34] [35] [36]. For example, a stuck-at fault in a look-up table (LUT) bit can be emulated by overwriting the particular configuration memory bit and setting it to the desired stuck-at fault value. SEUs, on the other hand, can be emulated by flipping the value of bits in the configuration memory. Shorts and opens in the interconnect network can be emulated along with almost any fault in the logic resources that can be controlled by configuration memory bits. An approach for the emulation of stuck-at faults and SEUs in the configuration memory of Virtex-4 and Virtex-5 FPGAs is presented in this thesis. 1.7 Thesis Statement Testing FPGAs is difficult due to their high complexity, the limited observability and controllability of embedded cores, and their programmable nature. Also, the increasing density and large size of the configuration memory has made transient and on-line faults due to SEUs more common and of greater concern, even in fault-tolerant applications that operate at ground level. This work considers both ?hard? faults due to manufacturing defects and device ageing as well as transient or ?soft? faults induced by SEUs in Virtex-5 FPGAs. Furthermore, this work considers ?hard? faults that may affect the detection and correction of SEUs by corrupting the dedicated SEU detection hardware in Virtex-5 FPGAs, and presents BIST approaches for this hardware. Other BIST methods are proposed as a solution to detect ?hard? faults and manufacturing defects that can affect the configuration memory and programmable resources in Virtex-5 FPGAs, including the CLBs and I/O Tiles. A novel BIST approach for FPGAs that utilizes a soft-core processor configured in the fabric of the FPGA under test to perform complex functions such as reconfiguration of resources under test and fault diagnosis is also presented. Finally, a method for active detection and correction of temporary or ?soft? errors by active 15 configuration memory management and without the requirement of additional external hardware is presented for Xilinx Virtex-4 and Virtex-5 FPGAs. 1.8 Thesis Format This thesis is written in ?publication format? as suggested by the Auburn University Graduate School Electronic Thesis and Dissertation Guide, and consists of conference and journal papers that were published (or accepted for publication) during the course of research conducted by the author while in the graduate program at Auburn University. A majority of the actual research and the writing of all published papers included in this thesis represents the efforts of the primary student author and not collaborators. Each paper is presented ?as published?, with the exception of an acknowledgments section at the end of each chapter that provides the name, location, and date of publication of the original paper along with any information regarding relevant published papers that do not appear in this thesis. The papers are reformatted to comply with the guidelines set forth by the Graduate School. References are organized as follows: Each chapter in the body of the thesis contains its original list of references (numbered consecutively beginning at 1), such that the chapter may stand-alone and as it appears in the original published paper. In addition, a cumulative bibliography of all references cited in the thesis is included at the end of the thesis. 1.9 References [1] G. Moore, ?Cramming More Components onto Integrated Circuits,? Proc. of the IEEE, vol. 86, no. 1, pp. 82-85, 1998. [2] Semiconductor Industry Association, International Technology Roadmap for Semiconductors: 2007 edition, http://public.itrs.net. [3] Y. Min and C. Stroud, ?Introduction,? in VLSI Test Principles and Architectures, L-T Wang, C-W Wu, and X. Wen, Eds., San Francisco: Morgan Kaufmann, 2006, pp. 1-33. 16 [4] C. Stroud, A Designer?s Guide to Built-In Self-Test, Boston: Springer, 2002. [5] P. Christie, D. Stroobandt, ?The Interpretation and Application of Rent?s Rule,? IEEE Trans. on VLSI Systems, vol. 8, no. 6, pp. 639-648, 2000. [6] IEEE Standard Test Access Port and Boundary-Scan Architecture, IEEE Std 1149.1- 2001, New York, 2001. [7] M. Bushnell and V. Agrawal, Essentials of Electronic Testing for Digital, Memory and Mixed-Signal VLSI Circuits, New York: Springer, 2000. [8] IEEE Standard Testability Method for Embedded Core-Based Integrated Circuits, IEEE Std. 1500-2005, New York, 2005. [9] L-T Wang, C. Stroud, and N. Touba, System-on-Chip Test Architectures, San Francisco: Morgan Kaufmann, 2007. [10] M. Smith, Application-Specific Integrated Circuits, Addison-Wesley, 1997. [11] I. Kuon and J. Rose, ?Measuring the Gap Between FPGAs and ASICs,? IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol.26, no.2, pp.203-215, 2007 [12] S. Brown and J. Rose, ?FPGA and CPLD architectures: a tutorial,? IEEE Design & Test of Computers, vol.13, no.2, pp.42-57, 1996 [13] Virtex-5 Family Overview, DS100 (v5.0), Xilinx Inc., 2009. [14] Virtex-5 FPGA Configuration User Guide, UG191 (v3.2), Xilinx Inc., 2008. [15] Virtex-5 FPGA User Guide, UG190 (v 4.2), Xilinx Inc., 2008. [16] A. Cosoroaba and F. Rivoallon, ?Achieving Higher System Performance with the Virtex- 5 Family of FPGAs,? Xilinx Inc., San Jose, CA, 2006. [17] Virtex-5 FPGA ExtremeDSP Design Considerations: User Guide, UG193 (v3.3), Xilinx Inc., 2009. [18] M. Abramovici and C. Stroud, ?BIST-based test and diagnosis of FPGA logic blocks,? IEEE Trans. on VLSI Syst., vol. 9, no. 1, pp. 159-172, 2001. [19] S. Toutounchi and A. Lai, ?FPGA test and coverage,? Proc. IEEE Int. Test Conf., pp. 599-607, 2002. [20] J Sunwoo and C. Stroud, ?BIST of Configurable Cores in SoCs Using Embedded Processor Dynamic Reconfiguration,? Proc. Int. SoC Design Conf., pp. 174-177, 2005. [21] B. Dutton and C. Stroud, ?Built-In Self-Test of Configurable Logic Blocks in Virtex-5 FPGAs,? Proc. IEEE Southeastern Symp. on System Theory, pp. 230-234, 2009. 17 [22] B. Dutton and C. Stroud, ?Built-In Self-Test of Programmable Input/Output Tiles in Virtex-5 FPGAs,? Proc. IEEE Southeastern Symp. on System Theory, pp. 235-239, 2009. [23] C. Stroud, S. Konala, P. Chen, and M. Abramovici, ?Built-in self-test of logic blocks in FPGAs,? Proc. IEEE VLSI Test Symp., pp.387-392, 1996. [24] M. Pulukuri and C. Stroud, ?Built-In Self-Test of Digital Signal Processors in Virtex-4 FPGAs,? Proc. IEEE Southeastern Symp. on System Theory, pp. 34-38, 2009. [25] C. Stroud, S. Garimella and J. Sunwoo, ?On-Chip BIST-Based Diagnosis of Embedded Programmable Logic Cores in System-On-Chip Devices,? Proc. ISCA Int. Conf. on Computers and Their Applications, pp. 308-313, 2005. [26] B. Garrison, D. Milton, and C. Stroud, ?Built-In Self-Test for Memory Resources in Virtex-4 FPGAs,? Proc. ISCA Int. Conf. on Computers and Their Applications, pp. 63- 68, 2009. [27] B. Bridgford, C. Carmichael, and C. Tseng, ?Single-Event Upset Mitigation Selection Guide,? XAPP987 (v1.0), Xilinx Inc., 2008. [28] E. Normand, ?Single Event Upset at Ground Level,? IEEE Trans. on Nuclear Science, vol. 43, pp. 2742-2750, 1996. [29] A. Lesea and P. Alfke, ?Xilinx FPGAs Overcome the Side Effects of Sub-90 nm Technology,? WP256 (v1.0.1), Xilinx Inc., 2007. [30] A. Lesea, ?Continuing Experiments of Atmospheric Neutron Effects on Deep Submicron Integrated Circuits,? WP286 (v1.0), Xilinx Inc., 2008. [31] K. Chapman and L. Jones, ?SEU Stratagies for Virtex-5 Devices,? XAPP864 (v1.0.1), Xilinx Inc., 2009. [32] Xilinx TRMTool User Guide: TMRTool Software Version 9.2i, UG156 (v2.2), Xilinx Inc., 2009. [33] P. Ellervee, J. Raik, K. Tammem?e and R. Ubar, ?Environment for FPGA-based Fault Emulation,? Proc. Estonian Acad. Sci. Eng., vol. 12, pp. 323?335, 2006. [34] T. Slaughter, C. Stroud, J. Emmert and B. Skaggs, ?Fault Injection Emulation for Field Programmable Gate Arrays,? Proc. Int. Society for Optical Eng., vol. 4525, pp. 1-9, 2001. [35] E. Johnson, M. Caffrey, P. Graham, N. Rollins and M. Wirthlin, ?Accelerator Validation of an FPGA SEU Simulator,? IEEE Trans. on Nuclear Sci., vol. 50, no. 6, pp. 2147-2157, 2003. [36] F. Kastensmidt, L. Carro and R. Reis, Fault-Tolerance Techniques for SRAM-based FPGAs, The Netherlands: Springer, 2006. 18 Chapter Two. Built-In Self-Test of Configurable Logic Blocks in Virtex-5 FPGAs A Built-In Self-Test (BIST) approach is presented for the configurable logic blocks (CLBs) in Xilinx Virtex-5 Field Programmable Gate Arrays (FPGAs). A total of 17 configurations were developed to completely test the full functionality of the CLBs, including distributed RAM modes of operation. These configurations cumulatively detect 100% of stuck- at faults in every CLB. There is no area overhead or performance penalty and the approach is applicable to all levels of FPGA testing (wafer, package, and in-system). A novel output response analyzer (ORA) design, which is efficiently implemented in FPGAs, provides both an overall single-bit pass/fail result and optimal diagnostic resolution when faults are detected. The implementation of the BIST approach in all Virtex-5 FPGAs and experimental results are discussed. 2.1 Introduction And Background Built-In Self-Test (BIST) for Field Programmable Gate Arrays (FPGAs) is typically targeted at manufacturing defects and operational faults that can appear at any point in the product life-cycle. As a result, BIST for FPGAs employs a defect-oriented test strategy [1]. Ideally, a BIST approach would be applicable to all levels of testing, from manufacturing test to in-system test, and would be entirely independent of the end user function. Additionally, the BIST would achieve maximal stuck-at fault coverage and would be executed at-speed to provide high fault coverage for a variety of fault models. When possible, high diagnostic resolution of detected faults is desired for fault-tolerant applications. This chapter presents a BIST approach 19 for the configurable logic blocks (CLBs) in Virtex-5 FPGAs that represents the culmination of over 15 years of work in FPGA BIST to address these concerns. The first BIST for the configurable logic in FPGAs was proposed in [2]. The approach exploits the re-programmability of FPGAs to create BIST circuitry in the FPGA fabric during off-line testing. The only overhead is the external memory required to store the BIST and system function configurations along with the time required to download and execute the BIST. No area overhead or performance penalties are incurred since the BIST logic ?disappears? after the test session. Furthermore, the tests are applicable at all levels of testing since they are independent of the system function and require no external test fixture or equipment. The basic idea for the BIST is to configure some of the CLBs as Test Pattern Generators (TPGs) and Output Response Analyzers (ORAs) while configuring other CLBs as blocks under test (BUTs). The BUTs are repeatedly configured until they have been tested in every mode of operation [1]. These tests achieve maximal fault coverage by applying pseudo-exhaustive test patterns such that each sub-circuit of the BUT is exhaustively tested [2]. Several examples of BIST for the CLBs in FPGAs have been published, with each offering some improvement over the previous approach. Reference [3] introduced Boundary Scan as a means of controlling the BIST sequence. Xilinx engineers, in [4], introduced a set of iterative array logic tests with similarities to the approach presented in [2] and [3]. The general BIST approach, which is independent of the CLB array size, can also be adapted for on-line BIST techniques, as discussed in [5]. Previous examples of the implementation of this BIST approach on Xilinx 4000, Spartan, Virtex-I, Spartan-II and Atmel FPGAs are contained in [6], [7], and [8]. Partial reconfiguration was used in [9] to reduce the overall download and test times as well as system down time. 20 The BIST approach for Virtex-5 FPGAs builds primary on the previous work in [2], [3], [8], and [10]. However, our approach offers an improved ORA architecture and fewer total test configurations. We also improve the accuracy of the fault simulation models and add verification of the configurations on the target device via configuration memory bit fault injection. The remainder of this chapter is organized as follows. Section 2.2 gives an overview of the CLB architecture in Virtex-5 FPGAs. Section 2.3 describes the BIST approach and implementation specific to Virtex-5 FPGAs. Section 2.4 describes the experimental result and verification of the BIST. Section 2.5 summarizes and concludes the chapter. Table 2.1: List of acronyms Acronym Definition Acronym Definition CLB Configurable Logic Block BUT Block Under Test BIST Built-in Self-test LUT Look-Up Table ORA Output Response Analyzer SliceL Logic Slice TPG Test Pattern Generator SliceM Memory Slice 2.2 Overview of Virtex-5 CLBs The basic Virtex-5 logic element, illustrated in Figure 2.1, is composed of a 6-input look- up table (LUT), a configurable flip-flop/latch, and multiplexers to control the combinational logic output and the registered output (flip-flop/latch input). Additional dedicated fast carry logic is included to perform special logic and arithmetic functions. In some slices, the LUT can be configured as a small RAM, called a distributed RAM or LUT RAM, or as a shift register [11]. Four such basic logic elements are grouped to form a slice, and two slices are grouped to form a complete CLB, as shown in Figure 2.2 [11]. Each CLB is connected by a switch matrix to local and global programmable routing resources. Identical CLBs are tiled in columns and rows with larger devices including more columns and/or rows of CLBs. Additionally, the structure of the CLB is identical across all devices in the Virtex-5 family. The 6-input LUTs are 21 designed with two outputs each. The primary output, O6, can utilize the full 64-bit LUT to implement any 6-variable Boolean function. The secondary output, O5, can be used to initialize the carry chain, or both the O5 and O6 output can implement an independent 5-variable Boolean function for five shared inputs. Either LUT output can be selected by the configuration multiplexers for the registered or combinatorial CLB output paths [11]. Figure 2.1: Simplified basic logic element Figure 2.2: Virtex-5 configurable logic block [11] Some slices (specifically the lower slice in every other column of CLBs and both columns to the left of a digital signal processor column) also support RAM and shift register modes of operation. The LUT RAMs in each slice have independent read address inputs and share a set of write address inputs. The independent read inputs facilitate the construction of dual-port RAMs within a slice. Each LUT can be configured as a simple 64?1-bit or 32?2-bit RAM. Dynamically controlled multiplexers in each slice allow the four LUTs to form a 256?1- LUT/ RAM Carry Logic FF/ LAT 6 CIN COUT COUT COUT Switch Matrix Memory Slice(0) SliceM Logic Slice(1) SliceL CLB CIN CIN 22 bit RAM. Additionally, the four LUTs can share five read address inputs and utilize eight independent data inputs to form a 32?8-bit RAM. Each LUT can also form a single 32-bit or two 16-bit shift registers. The four LUTs can be cascaded to form a 128-bit shift register or can operate in parallel form a 16?8-bit shift register bank [11]. 2.3 BIST Approach And Architecture The BIST approach takes advantage of the regular structure of FPGAs by using comparison-based ORAs to compare the outputs of multiple identical BUTs. This detects all faults affecting any combination of BUTs (since all fault-free BUTs must produce the same pattern) so long as all of the BUTs compared by a set of ORAs do not fail identically and at the same time [3]. Since a faulty TPG could cause a faulty BUT to escape detection, multiple identical TPGs are used to drive alternating BUTs. This eliminates the assumption that the TPGs are fault-free because, with multiple identical TPGs, a faulty TPG will cause the outputs of some of the BUTs to disagree, resulting in ORAs reporting failures. The CLB BIST architectures can be divided into two categories based on the slice mode being tested. The first set of configurations tests every CLB in the FPGA in SliceL (logic) mode of operation. The second set of configurations tests every SliceM. Only those slices which support SliceM (memory) mode are tested during the second set of configurations. In SliceL BIST architecture, alternating columns of CLBs are configured as ORAs and BUTs, as illustrated in Figure 2.3. The set of BIST configurations is repeated twice with the roles of the CLBs reversed such that every CLB serves both as ORA and as BUT. Two outputs of each BUT are compared by an ORA with the outputs of two adjacent identically configured BUTs in the same row, as shown in Figure 2.4. A mismatch of two identically configured BUT outputs latches a logic 0 in the ORA flip-flop. Otherwise, a logic 1 is retained in the ORA and is 23 interpreted as a passing result at the end of the test sequence. Traditionally, the results of the BIST are recovered via partial configuration memory readback where the contents of every ORA are retrieved from the configuration memory. However, we use a new ORA design that utilizes the dedicated carry logic in the CLB to form an iterative-OR of the ORA outputs. In each ORA, a passing result of logic 1 selects the Carry-in input, which is the Pass/Fail result of the previous ORA. Figure 2.3: Circular comparison architecture The Carry-in input of the first ORA in the iterative-OR chain is connected to Boundary Scan Test Data In (TDI), with the output of the last ORA connected to Test Data Out (TDO). If any ORA in the chain registers a failure, a logic 0 on the output of that ORA will select the logic 1 input of the carry chain multiplexer which translates to a logic 1 on TDO. Otherwise, TDO passes the state of TDI such that by toggling TDI and observing TDO, the integrity of the iterative-OR chain can be verified at the end of the BIST sequence. If the output of the OR chain indicates a failure (TDO is a logic 1 regardless of the state of TDI), the contents of the ORAs can be retrieved via partial configuration memory readback to determine the location(s) of the failing BUT(s). This facilitates the single-bit pass/fail indication for faster test time without sacrificing diagnostic resolution for fault-tolerant applications. TPG TPG BUT ORA TPG 24 Figure 2.4: Equivalent ORA architecture In Virtex-5 FPGAs, the carry-in of the bottom CLB and the carry-out of the top CLB in each column are not connected. To continue the carry chain, the carry-out of the top ORA in one column is connected to the D output and is routed to the AX input of the bottom ORA in an adjacent column. The AX input is selected as the carry-chain input in the bottom ORA in each column. In the ORA, each LUT is programmed with the hexadecimal value 0x90090000FFFFFFFF. By tying the A6 LUT input to logic 1, the O6 LUT output reads only the upper 32-bits of the LUT which implements the comparison ORA equation shown in Equation 2.1, while the O5 output reads only the lower 32-bits of the LUT (which controls the carry chain multiplexer for the iterative-OR chain). 5)43()21(6 ?????????=O (2.1) The architecture of the Virtex-5 CLBs requires a minimum of six configurations to test each of the 6 inputs to the flip-flop input multiplexers, (A-C)FFMUX. The first five of these configurations can also test the 5 inputs to the combinational logic output multiplexers (A- D)OUTMUX. Alternating XOR and XNOR functions in the LUTs detects every LUT stuck-at fault in two BIST configurations. Multiple identical TPGs are implemented in a column of embedded digital signal processors (DSPs) and drive alternating columns of BUTs. This reduces loading on the TPGs in large devices and eliminates the assumption that the TPG is fault-free. The DSPs are configured to accumulate a large prime number placed on the DSP inputs. This 1 BUT j outputy BUTk outputy 0 1 ORAk carry-out BUTj outputx BUTk outputx ORAj carry-out 25 number, 0xCA6691, was shown in [12] to produce an exhaustive sequence of 12-bit test patterns in 212 clock cycles with a relatively high number of transitions in the most significant bits of the accumulator output. Virtex-5 CLBs require at least 12 TPG lines for pseudo-exhaustive testing, and, therefore, 4,096 clock cycles for the exhaustive set of test patterns to be produced by the accumulator. Six of the TPG outputs fan out to the inputs of each of the four LUTs. Adjacent LUTs are alternately programmed with XOR and XNOR functions such that adjacent LUTs will produce opposite logic values. Another six TPG lines exercise the AX, BX, CX, DX, CE, and SR slice inputs with pseudo-exhaustive test patterns. A total of 12 SliceL BIST configurations are generated, such that every CLB is a BUT for six configurations and an ORA for another six configurations. A summary of the SliceL BIST configurations is given in Table 2.2. Table 2.2: SliceL logic BIST configurations ConFigure# A-D LUTs FF/Latch CYINIT CLKIINV #1 XOR/XNOR FF INIT1 #OFF CLK #2 XNOR/XOR FF INIT0 AX CLK #3 XOR/XNOR FF INIT0 0 CLK #4 XNOR/XOR LAT INIT1 1 CLK #5 XOR/XNOR FF INIT0 0 CLK #6 XNOR/XOR FF INIT1 AX CLK_B ConFigure# A-D FFMUX A-D MUX #1 O6, O6, O6, O6 CY, CY, CY, CY #2 O5, O5, O5, O5 XOR, XOR, XOR, XOR #3 AX, BX, CX, DX O5, O5, O5, O5 #4 XOR, XOR, XOR, XOR O6, O6, O6, O6 #5 CY, CY, CY, CY F7, F8, F7, CY #6 F7, F8, F7, DX F7, F8, F7, CY Every other CLB column contains a SliceM. In addition, the CLB column to the left of a DSP column contains a SliceM and, in SX devices, the second CLB column to the right of a DSP column contains a SliceM. In columns containing SliceMs, only the bottom slice in each CLB is a SliceM. Therefore, every SliceM can be tested simultaneously since there is at least one SliceL 26 for every SliceM (located in the same CLB) that can serve as an ORA. The ORAs for the SliceM BIST architecture are the same as those used in the SliceL BIST architecture, including the iterative-OR chain. However, the circular comparison chain is formed along each column containing SliceMs by comparing the outputs of each BUT with the identically configured BUT in an adjacent row. A 2048?18-bit block RAM, effectively configured as a ROM, is used to store deterministic test patterns and, in conjunction with a DSP configured as an address counter, forms a TPG. Multiple identical TPGs are configured to drive alternating rows of BUTs. The SliceM BIST configurations are summarized in Table 2.3. To test the LUT RAMs in single-port modes (configurations #1 and #2), the block RAMs are initialized with the test patterns for a March Y test algorithm. A March Y RAM test requires 8N test patterns, where N is the number of address locations [10] [13]. For the remaining configurations, the block RAMs are initialized with test patterns for a dual-port RAM test algorithm [1] [6]. Table 2.3: SliceM BIST configurations ConFigure# RAM mode DI1MUX WEMUX FFMUX #1 SPRAM64 DX CE O6 #2 SPRAM32 A-DX CE O6 #3 DPRAM32 DX WE O5 #4 SRL32 MC31 WE MC31 #5 SRL16 A-DX WE O6 ConFigure# OUTMUX WA8used WA7used BIST CCs #1 O6 0 0 2,048 #2 O6 #OFF #OFF 2,048 #3 O6 #OFF #OFF 2,048 #4 O6 #OFF #OFF 2,048 #5 MC31 #OFF #OFF 2,048 2.4 Experimental Results The BIST configurations were developed using accurate gate-level models of the Virtex- 5 CLB. The SliceL and SliceM were modeled separately for fault simulation. For both SliceL 27 and SliceM, the BIST configurations and their associated fault coverage were first optimized using these gate-level models. The single stuck-at gate-level fault coverage for SliceL and SliceM BIST configurations obtained from fault simulations of these models are summarized in Figure 2.5 and Figure 2.7, respectively. The BIST configurations were then verified on Virtex-5 LX30T and SX35T devices via configuration memory bit fault injection. Using the fault injection approach, configuration memory bits can be manipulated to emulate physical faults in the FPGA core including shorts and opens in programmable interconnect as well as almost any fault in logic resources controlled by a configuration memory bit. Configuration bits controlling the SliceLs and SliceMs were injected with faults and the BIST configurations were executed with the faulty configuration on the device. The BIST results of the faulty configuration are retrieved via partial configuration memory readback. The fault injection results show that the 17 BIST configurations cumulatively detect every configuration memory bit fault in every CLB. The results of the fault injection for SliceL BIST are shown in Figure 2.6. The similarity of the fault injection results and fault simulation results serve as a good indicator of the accuracy of the gate-level fault models, which include every stuck-at fault in the CLB (including configuration memory bits). Figure 2.7 and Figure 2.8 summarize the fault simulation results and the results of configuration memory bit fault injection, respectively, for the SliceM BIST configurations. It should be noted that three of the SliceM faults are detected by SliceL configurations. There are two methods by which the results of the BIST sequence can be obtained. First, the single bit pass/fail result can be determined via the TDO output of the ORA iterative-OR chain. However, the location of failing BUTs cannot be determined using this method. Another option is to perform a partial configuration memory readback to determine the contents of each 28 ORA at the end of the BIST. By this method, the location of the failing BUT(s) can be easily determined with diagnostic resolution of LUT or flip-flop. To minimize test time and achieve maximum fault resolution, a combination of the two methods is used. First, the pass/fail status of the BIST is determined by observing TDO. If TDO presents a logic 1 regardless of the state of TDI, at least one ORA has observed a failure. Partial configuration memory readback can then be used to obtain the locations of the failing ORA(s) and, thereby, determine the location(s) of the faulty BUT(s). We have developed two C programs that automatically generate the 17 BIST configurations for all Virtex-5 LX, LXT, SXT, and FXT devices. Table 2.4 summarizes the total download file size for the 17 BIST configurations, the maximum BIST clock frequency, and the total number of BIST clock cycles for full chip tests on several Virtex-5 devices. The total full chip test time for serial and parallel configuration interfaces is summarized in Figure 2.9 and Figure 2.10. The calculated test time assumes a 40 MHz BIST clock for all configurations and devices. However, on most devices, the BIST configurations can operate at higher clock frequencies. 29 0 500 1000 1500 2000 2500 3000 1 2 3 4 5 6 Configuration # # F au lts D ete cte d 0 10 20 30 40 50 60 70 80 90 100 Individual FC Cumulative FC Figure 2.5: SliceL fault coverage (simulation) 0 100 200 300 400 500 600 1 2 3 4 5 6 Configuration # # F au lts D ete cte d 0 10 20 30 40 50 60 70 80 90 100 Figure 2.6: SliceL fault coverage (fault injection) 30 0 1000 2000 3000 4000 5000 6000 7000 8000 1 2 3 4 5 Configuration # # F au lts D ete cte d 0 10 20 30 40 50 60 70 80 90 100 Figure 2.7: SliceM fault coverage (simulation) 0 10 20 30 40 50 60 70 80 1 2 3 4 5 Configuration # # F au lts D ete cte d 0 10 20 30 40 50 60 70 80 90 100 Figure 2.8: SliceM fault coverage (fault injection) 31 0 200 400 600 800 1000 1200 1400 1600 1800 2000 LX 20T LX 30T LX 50T LX 85T LX 110 T SX3 5T SX5 0T SX9 5T Ti me (m s) Readback Execution Configuration Figure 2.9: Boundary Scan interface test time 0 5 10 15 20 25 30 35 LX 20T LX 30T LX 50T LX 85T LX 110 T SX3 5T SX5 0T SX9 5T Ti me (m s) Readback Execution Configuration Figure 2.10: 32-bit parallel interface test time 32 In early FPGAs, all LUTs were able to function as small RAMs such that the first BIST configuration applied typically tested the LUTs in the RAM mode of operation. Using this approach, the first BIST configuration was able to detect most faults that could affect the LUT [2]. When combined with a simultaneous test of the flip-flop, the first BIST configuration was able to achieve around 80% fault coverage. A similar characteristic can be observed in the first SliceM BIST configuration in Figure 2.7, which achieves greater than 70% fault coverage. However, current FPGAs, such as Virtex-4 and Virtex-5, limit the number of LUTs that can function as small RAMs. Therefore, two BIST configurations are required (with alternate XOR and XNOR programming) to detect most of the faults in all LUTs. This can be observed in Figure 2.5, where the cumulative fault coverage after the first configuration reaches 51% and after two configurations exceeds 92%. Table 2.4: CLB BIST totals (17 configurations) Device Total ConFigure Size (kB) Max. BIST Clock Freq. BIST CCs LX20T 1,762 90.7 MHz 59,392 LX30T 2,630 74.0 MHz 59,392 LX50T 3,930 74.4 MHz 59,392 LX85T 6,265 58.2 MHz 59,392 LX110T 8,837 58.0 MHz 59,392 SX35T 3,378 59.2 MHz 59,392 SX50T 5,041 61.1 MHz 59,392 SX95T 8,818 44.7 MHz 59,392 2.5 Summary And Conclusions A BIST approach for testing the CLBs in Virtex-5 FPGAs was presented. A total of 17 test configurations were developed to achieve 100% stuck-at fault coverage in every CLB. Twelve of these configurations pseudo-exhaustively test every SliceL and every SliceM in the SliceL mode. Another five configurations test every SliceM in their RAM and shift register 33 modes of operation. The BIST configurations were developed using accurate gate-level fault models of the CLB and verified using configuration memory bit fault injection. A novel ORA design provides a single bit pass/fail result for each BIST sequence and is independent of the configuration interface. Optional partial configuration memory readback provides optimal diagnostic resolution for fault-tolerant applications when the pass/fail output indicates failures. As a result, the BIST approach is applicable to all levels of FPGA testing including manufacturing testing and in-system testing for fault-tolerant applications. We modified SliceL BIST to support FXT devices by creating two circular comparison chains across rows directly above the PowerPC core because CLBs above the PowerPC have no carry-in routing. We have also applied this approach to Virtex-4 devices resulting in 20 and 5 BIST configurations for SliceL and SliceM tests, respectively, compared to 31 total configurations for Virtex-4 CLBs reported in [8]. Our Virtex-4 CLB BIST also includes the new ORA design for single bit pass/fail indication. 2.6 Acknowledgements The contents of this chapter were published under the title ?Built-In Self-Test of Configurable Logic Blocks in Virtex-5 FPGAs? in Proceedings of the 41st IEEE Southeast Symposium on System Theory, 2009, pp. 230-234. Prof. Charles Stroud is a co-author on the paper. The design of the ORA presented in this paper is protected by U.S. Provisional Patent #61/196,964, 2008, ?Output Response Analyzer for System-Level Test of Field Programmable Gate Arrays?. The student author and committee chair Prof. Charles Stroud are co-applicants on the provisional patent. A majority of the actual research and the writing of the published paper represents the efforts of the primary student author and not collaborators, and the research represents work performed while in the graduate program at Auburn University. 34 2.7 References [1] L-T Wang, C. Stroud, and N. Touba, System-on-Chip Test Architectures, Morgan Kaufmann, 2007. [2] C. Stroud, S. Konala, P. Chen, and M. Abramovici, ?Built-in self-test of logic blocks in FPGAs,? Proc. IEEE VLSI Test Symp., pp.387-392, 1996. [3] M. Abramovici and C. Stroud, ?BIST-based test and diagnosis of FPGA logic blocks,? IEEE Trans. on VLSI Syst., vol. 9, no. 1, pp. 159-172, 2001. [4] S. Toutounchi and A. Lai, ?FPGA test and coverage,? Proc. IEEE Int. Test Conf., pp. 599-607, 2002. [5] M. Abramovici, C. Stroud, and J. Emmert, ?Online BIST and BIST-based diagnosis of FPGA logic blocks,? IEEE Trans. on Very Large Scale Integr. (VLSI) Syst., vol.12, no.12, pp. 1284-1294, 2004. [6] C. Stroud, K. Leach, and T. Slaughter, ?BIST for Xilinx 4000 and Spartan series FPGAs: a case study,? Proc. IEEE Int. Test Conf., pp. 1258-1267, 2003. [7] C. Stroud, J. Harris, S. Garimella, and J. Sunwoo, ?Built-in self-test for system-on-chip: a case study,? Proc. IEEE Int. Test Conf., pp. 837-846, 2004. [8] S. Dhingra, D. Milton, and C. Stroud, ?BIST for logic and memory resources in Virtex-4 FPGAs,? Proc. IEEE North Atlantic Test Workshop, pp. 19-27, 2006. [9] S. Dhingra, S. Garimella, A. Newalker, and C. Stroud, ?Built-in self-test of Virtex and Spartan II FPGAs using partial reconfiguration,? Proc. IEEE North Atlantic Test Workshop, pp. 7-14, 2005. [10] C. Stroud and S. Garimella, ?BIST and diagnosis of multiple embedded cores in SoCs,? Proc. Int. Conf. on Embedded Systems and Applications, pp. 130-136, 2005. [11] Virtex-5 FPGA User Guide, UG190 (v 4.2), Xilinx Inc., San Jose, CA, May 2008. [12] S. Gupta, J. Rajski, and J. Tyszer, ?Test pattern generation based on arithmetic operations,? Proc. IEEE Int. Conf. on Computer-Aided Design, pp. 117-124, 1994. [13] A. van de Goor, Testing Semiconductor Memories Theory and Practice, John Wiley and Sons, 1991. 35 Chapter Three. Built-In Self-Test of Programmable Input/Output Tiles in Virtex-5 FPGAs A Built-In Self-Test (BIST) approach is presented for the logic resources in the programmable input/output (I/O) tiles in Virtex-5 field programmable gate arrays (FPGAs). A total of 15 BIST configurations were developed to test the I/O cell programmable logic resources in all modes of operation. The approach utilizes dedicated I/O buffer bypass routing in the I/O tile such that the BIST is package independent and applicable to all levels of testing from wafer- level to system-level. The approach offers control of BIST execution and maximal diagnostic resolution of faulty I/O tiles for device and package independent testing. Either the Boundary Scan interface or a simple system-level interface may be used for BIST execution, control, and diagnosis independent of the configuration interface. Experimental results are presented including fault detection capabilities. 3.1 Introduction The input/output (I/O) buffers of JTAG compliant devices are typically tested using the Boundary Scan EXTEST feature [1]. However, field programmable gate arrays (FPGAs) have a significant amount of configurable logic resources associated with the I/O buffers that cannot be tested in this manner. These configurable logic resources typically include multiplexers and flip- flops/latches, as illustrated in Figure 3.1, for improving system timing specifications such as set- up and hold times as well as clock-to-output delay. Additional logic resources are included to support single data rate (SDR) and double data rate (DDR) transmission and reception as well as for serialization/de-serialization (SerDes) modes of operation. In Xilinx Virtex-5 FPGAs, for 36 example, there are at least 32 multiplexers and 47 flip-flops included in the configurable logic associated with each I/O cell to support various modes of operation. The Boundary Scan INTEST feature can be used to test the configurable logic resources in an I/O cell [1]. However, the INTEST feature is supported by few FPGA manufacturers. While there has been some prior work in testing I/O cells [2][3][4][5], previous work in Built-In Self-Test (BIST) for FPGAs has largely overlooked I/O cells and their associated logic resources. However, it has been observed that the programmable logic in unused or un-bonded I/O cells is sometimes used by FPGA synthesis tools for implementing system logic functions [5]. The work presented in this chapter builds primarily on the prior work in [5], in which an I/O cell BIST architecture was proposed and implemented for Atmel AT40K series FPGAs and Atmel AT94K series programmable system-on-a-chip (SoC) [6]. However, this chapter offers several improvements over that previous BIST approach. In addition, this chapter describes the actual implementation, operation, and verification of BIST configurations developed for Virtex-5 FPGAs [7] whose I/O cells are much more complex than those found in the AT40K and AT94K devices [6]. The BIST configurations presented here test the full functionality of logic resources included in the Virtex-5 I/O cells including input logic (ILOGIC), output logic (OLOGIC), as well as input and output Serializer/Deserializer (SerDes) operation. The chapter begins with an overview of the prior work in I/O cell BIST in Section 3.2, followed in Section 3.3 by an overview of Virtex-5 I/O tiles. The overall BIST approach is described in Section 3.4, and details of the specific BIST configurations are discussed for Logic and SerDes modes in Sections 3.5 and 3.6, respectively. We present experimental results from actual implementation in Virtex- 5 FPGAs in Section 3.7. Section 3.8 discusses a BIST approach for the configurable I/O buffers before the summary and conclusion in Section 3.9. 37 Figure 3.1: Simplified programmable I/O cell 3.2 Prior Work There has been limited prior work in the area of testing I/O cells in, or applicable to, FPGAs [2] [3] [4] [5]. In [5], a system-level BIST architecture is presented for the I/O cells of Atmel FPGAs. The overall BIST approach was similar to that used for configurable logic resources in the FPGA core [8]. The BIST architecture in [5] consists of a single TPG implemented in configurable logic blocks (CLBs) sourcing test vectors to the I/O cells under test. A single TPG was implemented under the assumption that internal FPGA resources had already been tested and found to be fault-free. The I/O cells under test are identically configured with bidirectional I/O buffers such that the output responses are sent back into the FPGA internal resources. However, for in-system testing, this requires that all connecting devices be tri-stated during testing. The output responses of the I/O cells are monitored by CLBs configured as comparison-based output response analyzers (ORAs). While presenting a general architecture applicable to any FPGA or configurable SoC with an FPGA core and bidirectional I/O buffers, [5] implemented 27 BIST configurations applicable to the Atmel AT94K SoC and AT40K FPGA only. Boundary Scan Access Tri-state Control Output Data Input Data PAD to/from internal configurable routing resources 38 3.3 Overview of Virtex-5 I/O Tiles The I/O cells in Virtex-5 FPGAs include an output logic block (OLOGIC), input logic block (ILOGIC), I/O delay block, and a bidirectional I/O buffer, as illustrated in Figure 3.2 [7] The number of I/O cells in Virtex-5 ranges from 360 to 1,200 depending on the size of the particular FPGA. Figure 3.2: Virtex-5 programmable I/O tile Each OLOGIC includes registers for improving system clock-to-output timing and supporting SDR and DDR transmission of data. The OLOGIC can also perform parallel-to-serial conversion of output data for widths between 2 and 6-bits when operating in SerDes mode. The ILOGIC includes registers for improving system set-up and hold times and supporting SDR and DDR reception of data. It can also perform serial-to-parallel conversion of input data for widths between 2 and 6-bits when operating in SerDes mode. The ILOGIC also incorporates a Bitslip To/From Device Resources Master I/O Cell Output Logic (OLOGIC) Input Logic (ILOGIC) Slave I/O Cell Output Logic (OLOGIC) Input Logic (ILOGIC) From Device Resources To/From Device Resources From Device Resources 39 sub-module for synchronizing serial interfaces that include a training pattern. Invoking the Bitslip input re-orders the data on the parallel outputs of the input logic block in a barrel-shifter operation [7]. In Virtex-5 FPGAs, two I/O cells are grouped to form an I/O tile, as illustrated in Figure 3.2. Each I/O tile includes dedicated shift routing to support expanded SerDes data widths. In master/slave mode, two I/O cells in the same I/O tile are connected via the dedicated shift routing to support data widths of 7, 8 and 10-bits [7]. Each I/O cell also includes dedicated routing (also shown in Figure 3.2) directly from the OLOGIC to the ILOGIC that bypasses the I/O buffer. 3.4 Overview of BIST Architecture Our BIST approach for I/O tiles is similar to other BIST approaches that we have developed for testing CLBs in Virtex-4 and Virtex-5 FPGAs [9]. A set of deterministic test patterns is stored in 36-kbit block random access memories (RAMs) in the FPGA fabric. The outputs of the block RAMs are connected directly to the inputs of alternating rows of I/O tiles under test. One block RAM is configured for every 5 rows of I/O tiles under test. One digital signal processor (DSP) per block RAM is configured as a counter to sequentially address the block RAM. Collectively, one 36-kbit block RAM and one DSP form the TPG for every I/O tile BIST configuration. However, the block RAM contents are modified for some configurations to target specific resources/functions under test. The advantage of configuring multiple TPGs is twofold: first, multiple TPGs reduce loading, thereby maximizing the BIST execution frequency in large devices, and, secondly, configuring multiple identical TPGs eliminates the assumption that the TPG logic resources are fault-free. Any fault affecting the behavior of a TPG will be detected by the comparison-based ORAs monitoring the I/O cells at the boundaries of any faulty and fault-free TPG. 40 BIST of I/O cells is well suited for circular comparison-based ORAs since many identical I/O cells are tested simultaneously. The outputs of each I/O cell under test are monitored by two ORAs and compared with the outputs of two other identically configured I/O cells in an adjacent row, as shown in Figure 3.3. To complete the circular comparison, I/O cells in the top row of the test area are compared with I/O cells under test in the bottom row of the test area. Figure 3.3: Column oriented circular comparison The circular comparison approach does not suffer from aliasing effects as long as all of the BUTs being compared do not fail identically and at the same time. Furthermore, circular comparison improves diagnostic resolution [4]. An output response mismatch between two identically configured I/O cell outputs is latched as a logic 0 in the ORA flip-flop for the duration of the test session. Otherwise, logic 1 is retained in the ORA and is interpreted as a passing result at the conclusion of the BIST sequence. In previous implementations of the comparison-based ORA, the dedicated carry logic and routing resources in the ORA CLBs were un-used [4]. However, in all BIST configurations that we have developed for Virtex-5 FPGAs, BUT ORA TPG TPG TPG 41 these resources are utilized to form an iterative-OR chain of every ORA in the test area. In each ORA, a passing result of logic 1 selects the Carry-in input to the CLB, which is the Pass/Fail result of an adjacent ORA. The carry-in input of the first MUX in the iterative-OR chain is connected to a system input, with the carry-out of the last ORA connected to a system output. If any ORA in the chain records a failure (e.g. mismatch), a logic 0 on the output of that ORA will select a logic 1 as the input to the carry MUX, as illustrated in Figure 3.4. Figure 3.4: Virtex-5 equivalent ORA architecture If no failure is observed in the ORA, the carry-in input is propagated through the CLB. If no ORAs in the iterative-OR chain observe failures, the carry-in input to the first ORA in the chain will propagate through every ORA slice to the carry-out output of the final ORA such that an overall pass/fail result is obtained without reading back the configuration memory to obtain the contents of the ORA flip-flops. By toggling the OR-chain input and observing the OR-chain output at the end of each BIST sequence, the integrity of the iterative OR-chain is verified. If the output of the iterative OR-chain indicates failures were detected, the contents of the ORAs can be retrieved via partial configuration memory readback for precise fault diagnosis. Another important difference between our I/O tile BIST architecture and the prior work is in the configuration of the I/O tiles under test. Previous approaches have relied on bidirectional I/O buffers to provide the return path for test patterns exiting the output logic and returning to the ORAs via input logic [5] [10]. However, the reliance on bi-directionally configured I/O buffers 1 BUT j outputy BUTk outputy 0 1 ORAk carry-out BUTj outputx BUTk outputx ORAj carry-out 42 severely limits the applicability of this type of BIST for in-system testing. With every I/O buffer configured in the path of the logic under test, we required that all connecting devices be tri-stated during in-system testing. Connecting passive devices, such as termination resistors or light emitting diodes (LEDs), introduce another problem since these devices cannot be disconnected or tristated during in-system tests. In [9], the authors observed that, at certain frequencies, LEDs connected to I/O buffers under test caused the comparison ORAs to erroneously report failures for otherwise fault-free I/O tiles. These failures were observed at frequencies as low as 325 kHz [9], which is unacceptable for an at-speed test of the logic resources. As a result, the generality of the BIST is compromised. Fortunately, the I/O tiles in Virtex-4 and Virtex-5 FPGAs include dedicated routing from the OLOGIC to the ILOGIC that bypasses the I/O buffer [7]. Using this feedback routing instead of the I/O buffer means that no signals from the FPGA under test can reach, and therefore be influenced by, external devices. Furthermore, bypassing the I/O buffer does not sacrifice fault coverage in the I/O tile logic resources. With the I/O buffers removed from all tests for logic resources, these tests may be applied without concern for the external test environment, thus making our approach applicable to all levels of FPGA testing. The obvious disadvantage of this approach is that it does not concurrently test the I/O buffer. However, we have developed a stand-alone BIST architecture for the I/O buffers that is applicable to device and wafer-level testing. This architecture tests the programmable analog features of the I/O buffers in every bidirectional mode of operation. Additionally, the Boundary Scan EXTEST feature may be used for in-system tests of the I/O buffers in their system mode of operation. 43 3.5 Configurations for I/O Logic Modes Six test configurations are required to fully test the I/O tile logic resource in all ILOGIC/OLOGIC modes of operation. The I/O delay module is concurrently tested in these I/O Logic mode tests in two of three modes of operation. Feedback routing from the OLOGIC to the ILOGIC has two possible routes: one through the I/O delay module and one dedicated route which bypasses the I/O delay module. The route through the I/O delay module allows for testing of the output delay functionality in all supported delay modes (fixed delay, variable delay, and default). However, testing delay of input and output signals simultaneously is not possible without configuring the I/O buffers in bidirectional mode. Three of the six I/O logic BIST configurations test the DDR transmit and receive modes of operation, including, in the OLOGIC, opposite-edge, same-edge, and same-edge pipelined output modes. The fourth and fifth configurations test the flip-flop and latch functionality of the primary registers. In the sixth and final configuration, the combinatorial (un-registered) path through the I/O tile logic resources is tested. Programmable initialization values, set/reset values, and synchronous/ asynchronous reset/toggle inputs are concurrently tested. The number of clock cycles for BIST execution is 2048 for all I/O Logic BIST configurations. 3.6 Configurations for I/O SerDes Modes A total of nine configurations are required to fully test the I/O tile logic resource in the SerDes modes of operation. Six of these configurations test the I/O SerDes logic configured for data widths of 2, 3, 4, 5, and 6-bits. Two configurations are included for the 4-bit data width to test the programmable active level on the tri-state inputs of the OLOGIC. Another three configurations test the master/slave SerDes modes for data widths of 7, 8, and 10-bits. Two of the nine configurations test the SerDes in DDR mode, with the other seven configurations testing 44 SDR modes of operation. SerDes operations require two clocks: a high speed clock for serial data and a divided clock for the FPGA fabric. The amount of clock division is an integer equal to the data width when testing SDR modes, and is half of the data width when testing DDR modes. We use regional clock buffers with integrated clock division, called BUFRs [7], to provide the divided clock for the ORAs and TPGs in SerDes configurations. The BUFR has programmable clock division, from 1 to 8, and BYPASS modes. There are also clear (CLR) and clock enable (CE) inputs to the BUFR. We connect the CLR and CE inputs of every BUFR to the TPGs to achieve a simultaneous test of the BUFRs and the I/O SerDes logic. Concurrent testing of the BUFRs is beneficial since they would likely be used in conjunction with SerDes. Since each BUFR clocks only one adjacent clock region, a faulty BUFR will cause failures in the ORAs along at least one boundary of an adjacent clock region. As with the I/O tiles under test, a faulty BUFR can only escape detection if every BUFR in the test area fails identically and on the same clock cycle(s). One addition to the BIST architecture for SerDes mode testing stems from the need for synchronization of the serial bit streams before executing the BIST sequence. In SerDes mode, the positioning of deserialized data on the parallel side of the OLOGIC is initially indeterminate. Due to the nature of comparison-based ORAs, data on the parallel outputs of every I/O cell under test must be synchronized. To ensure identical alignment of deserialized test patterns, the SerDes BIST architecture adds a Bitslip synchronizer circuit, illustrated in Figure 3.5. Upon download of any SerDes mode configuration, the ORAs are held disabled and the TPGs are held in reset. A training pattern, stored in the programmable set/reset values of the block RAM output registers, is presented to the inputs of the I/O cells under test. The training pattern positions a single zero in a field of ones on the parallel side of the output logic block. The Bitslip 45 synchronizer circuit monitors the Q2 parallel I/O tile output and one-shots the Bitslip control line until the zero is shifted into the Q2 position. As a result of the clock division and Bitslip latency, synchronization will be obtained in no more than 4N2?4N clock cycles, where N is the SerDes data width for the configuration. Each I/O cell has a dedicated Bitslip synchronizer circuit that will continue to one-shot the Bitslip control line until the training pattern is positioned with the single zero at the Q2 output, thereby identically aligning the test patterns for the comparison- based ORAs. The synchronizer is then disabled by the TPG during the BIST execution. Figure 3.5: Bitslip synchronizer circuit For SerDes configurations, the number of BIST clock cycles is equal to 1024 times the amount of clock division used during that configuration plus the worst case synchronization time for the data width being tested. It should also be noted that the number of BIST clock cycles is independent of the size of the array, and independent of the number of I/O cells under test. 3.7 Experimental Results All of the BIST configurations are automatically generated for any size and family of Virtex-5 FPGAs by a set of ANSI C programs that we have developed. Two programs are used to generate the six configurations for the I/O logic modes of operation described in Section 3.5. Another set of two programs generates all nine of the configurations to test the I/O SerDes modes of operation described in Section 3.6. Our first program in each set generates a template BIST configuration in Xilinx Description Language (XDL) and then converts the template to to ISERDES BITSLIP from ISERDES Q2 CLKDIV Synchronizer Enable TPG Bitslip test pattern X Y Z 46 Native Circuit Description (NCD) format using Xilinx?s conversion tool, XDL.exe. The BIST template is routed by Xilinx?s place and route software, PAR.exe, before conversion back to XDL format. Our second program modifies the routed XDL file to produce the various BIST configurations, and converts those files back to NCD format. The final download configuration files are created using Xilinx?s bitstream generation software, BitGen.exe. Table 3.1 summarizes the total size of the 15 I/O BIST configuration files, the maximum BIST clock frequency, and the total number of BIST clock cycles for all Virtex-5 LXT and SXT devices. Note that the total number of BIST clock cycles is device-independent due to concurrent testing of I/O cells by the BIST architecture. The totals shown in Table 3.1 were used to calculate the best- and worst-case total test times, which are dependent on the configuration interface. The total test time for Boundary Scan and SelectMap 32-bit parallel configuration interfaces are shown in Figure 3.6 and Figure 3.7, respectively. A 50 MHz BIST clock is assumed for all configurations and all devices. Readback time is for partial configuration memory readback of the ORA contents after every configuration for diagnosis of failing BIST configurations. However, when diagnosis is not required, or there are no failures, the single bit pass/fail result can be determined via the ORA iterative-OR chain. To minimize the test time and achieve maximum fault resolution, a combination of the two methods is used. First, the pass/fail status of the BIST is determined by observing the output of the ORA iterative-OR chain. If the OR chain indicates failures, partial configuration memory readback can be used to obtain the locations of the failing ORA(s) and, thereby, determine the location(s) of the failing I/O Tile(s). 47 0 200 400 600 800 1000 1200 1400 LX 20T LX 30T LX 50T LX 85T LX 110 T LX 155 T LX 220 T LX 330 T SX3 5T SX5 0T SX9 5T Ti me (m s) Readback Execution Configuration Figure 3.6: 50 MHz Boundary Scan configuration interface test time 0 5 10 15 20 25 LX 20T LX 30T LX 50T LX 85T LX 110 T LX 155 T LX 220 T LX 330 T SX3 5T SX5 0T SX9 5T Ti me (m s) Readback Execution Configuration Figure 3.7: 100 MHz 32-bit parallel configuration interface test time 48 Table 3.1: I/O tile BIST totals (15 configurations) Device Total Config. Size (kB) Max. BIST Clock Freq. BIST CCs LX20T 862 102.8 MHz 47112 LX30T 1482 89.38 MHz 47112 LX50T 2186 102.4 MHz 47112 LX85T 2726 73.96 MHz 47112 LX110T 3641 74.40 MHz 47112 LX155T 4181 66.10 MHz 47112 LX220T 4706 58.75 MHz 47112 LX330T 6985 56.17 MHz 47112 SX35T 1740 91.19 MHz 47112 SX50T 2511 75.17 MHz 47112 SX85T 3923 69.59 MHz 47112 3.8 BIST for Programmable I/O buffers In addition to the BIST approach presented for I/O Logic and SerDes modes of operation, we have developed a stand-alone BIST approach for the I/O buffers in FPGAs. The approach tests the I/O buffers in all bidirectional modes of operation and associated I/O standards, requiring 77 configurations for Virtex-5 FPGAs. The approach is directly applicable to device and wafer-level testing, and is applicable to in-system testing with some customization of configurations. The bidirectional buffers configured during in-system tests can be expected to have different load characteristics in the system, depending on the way they are terminated and whether they are normally an input, output, or bidirectional port during system operation. For example, we would expect the I/O buffers that are connected to large external loads to fail if they are tested at a high frequency. For in-system testing, all of the I/O buffers can be tested at a single low frequency that is guaranteed to be sufficiently slow to allow fault-free I/O buffers to pass. However, this may result in faulty I/O buffers escaping detection in the case of delay 49 faults. Alternatively, the I/O buffers can be grouped together by loading characteristics to be tested independently and at different frequencies. 3.9 Conclusions A BIST approach for testing the programmable logic resources of I/O cells in FPGAs was presented including the actual development for and implementation in Xilinx Virtex-5 FPGAs. Six BIST configurations were developed to test the input and output logic resources in ILOGIC and OLOGIC modes. Another nine configurations test the SerDes functionality of the I/O logic resources for all supported data widths. By testing the I/O buffers separately, the logic resources in the I/O tiles may be tested in-system in all modes of operation. The BIST configurations are package independent because they can test I/O tiles with both bonded and unbonded I/O buffers. This is important since FPGA synthesis tools sometimes use I/O logic and routing resources to implement the system function. All of these BIST configurations have been generated, downloaded, and verified on LX30T, LX50T, SX35T, and SX50T FPGAs. Due to similarities in architectures, features, and operational modes of the I/O cells in Xilinx Virtex-4 and Virtex-5 FPGAs, we have also applied the BIST approach described in this chapter to Virtex-4 FPGAs where a total of five I/O Logic, nine I/O SerDes, and 76 I/O buffer BIST configurations were developed, downloaded, and verified on LX60, SX35, and FX12 FPGAs. The iterative-OR ORA provides a simple interface for BIST results retrieval that is very fast relative to partial configuration memory readback and is independent of the configuration interface. However, for fault-tolerant applications, maximal diagnostic resolution of faulty I/O tiles can still be obtained via partial configuration memory readback. The BIST configurations can detect faults in the configuration memory bits associated with I/O tile logic and routing excluding the I/O buffer. 50 Clocking at system speeds during testing could potentially improve parametric fault coverage in the I/O delay element. 3.10 Acknowledgements The contents of this chapter were published under the title ?Built-In Self-Test of Programmable Input/Output Tiles in Virtex-5 FPGAs? in Proceedings of the 41st IEEE Southeast Symposium on System Theory, 2009, pp. 235-239. Prof. Charles Stroud is a co-author on the paper. Prior to publication, a preliminary version of the paper was presented at the 2008 IEEE North Atlantic Test Workshop. The proceedings of the IEEE North Atlantic Test Workshop are not published. As of this writing, a paper detailing the I/O Buffer BIST approach (describe briefly in Section 3.8) is pending publication under the title ?On System-Level Use of BIST for Programmable Input/Output Buffers in FPGAs,? in Proc. of the 2010 IEEE Southeast Regional Conference. A majority of the actual research and the writing of the published paper presented in this chapter represents the efforts of the primary student author and not collaborators, and the research represents work performed while in the graduate program at Auburn University. 3.11 References [1] IEEE Standard Test Access Port and Boundary-Scan Architecture, IEEE Std 1149.1- 2001, 2001. [2] C. Jia and L. Milor, ?A BIST Solution for the Test of I/O Speed,? Proc. IEEE Int. Test Conf., pp. 1023-1030, 2003. [3] L. Zhao, D. Walker and F. Lombardi, ?IDDQ Testing of Input/Output Resources of SRAM-Based FPGAs,? Proc. Asian Test Symp., pp. 375-380, 1999. [4] L-T Wang, C. Stroud, and N. Touba, System-on-Chip Test Architectures, Morgan Kaufmann, 2007. [5] S. Vemula and C. Stroud, ?Built-In Self-Test for Programmable I/O Buffers in FPGAs and SoCs?, Proc. IEEE Southeastern Symp. on System Theory, pp. 534-538, 2006. 51 [6] AT94K Series Field Programmable System Level Integrated Circuit, Data Sheet, Atmel Corp., 2001. [7] Virtex-5 FPGA User Guide, UG190 (v 4.2), Xilinx Inc., San Jose, CA, May 2008. [8] D. Milton, S. Dhingra, and C. Stroud, ?Embedded Processor Based Built-In Self-Test and Diagnosis of Logic and Memory Resources in FPGAs,? Proc. Int. Conf. on Embedded Systems and Applications, pp. 87-93, 2006. [9] L. Lerner, S. Vemula, and C. Stroud, ?System-Level BIST for Programmable I/O Buffers in FPGAs and SoCs,? Proc. IEEE North Atlantic Test Workshop, pp. 1-9, 2006. [10] L. Lerner, ?Built-In Self-Test for Input/Output Tiles in Field Programmable Gate Arrays,? M.S. thesis, Dept. of Elect. and Comput. Eng., Auburn Univ., Auburn, AL, Dec. 2007. 52 Chapter Four. Built-In Self-Test of SEU Detection Cores in Virtex-4 and Virtex-5 FPGAs A Built-In Self-Test (BIST) approach is presented for the Internal Configuration Access Port (ICAP) and Frame Error Correcting Code (ECC) logic cores embedded in Xilinx Virtex-4 and Virtex-5 Field Programmable Gate Arrays (FPGAs). The Frame ECC logic facilitates the detection of Single Event Upsets (SEUs) in the FPGA configuration memory. The ICAP provides read and write access to the configuration memory from within the FPGA fabric, enabling embedded dynamic reconfiguration and fault-tolerant applications with memory scrubbing. Therefore, the fault-free operation of the ICAP and Frame ECC logic is critical for space and fault-tolerant applications that require detection and repair of SEUs. The BIST approach presented is applicable to all Virtex-4 and Virtex-5 FPGAs for both manufacturing and system-level testing of the ICAP and Frame ECC logic. The actual implementation of the BIST approach in Virtex-4 and Virtex-5 FPGAs and associated experimental results are discussed. 4.1 Introduction The increased use of Field Programmable Gate Arrays (FPGAs) for implementing digital logic applications over the past two decades has been accompanied by increased concern about radiation effects; in particular, the effects of Single Event Upsets (SEUs). In addition to memory elements, such as flip-flops and random access memories (RAMs), the contents of the static random access memory (SRAM) used as the configuration memory to establish the overall application performed by the FPGA is also susceptible to SEUs. An SEU induced bit-flip in the SRAM configuration memory can alter the functionality of the FPGA. This makes SEUs of significantly more concern in FPGAs than in traditional application specific integrated circuits 53 (ASICs). Radiation experiments indicate the SEU rate in FPGAs increased by a factor of 4.74 when design rules decreased from 600nm to 350nm with a corresponding reduction in Vcc supply voltage from 5V to 3.3V [1]. Xilinx Virtex-4 FPGAs are reported to have SEU FIT (failures in 109 hours) rates of 246 per million bits of configuration memory, and only 151 in Virtex-5 FPGAs [2]. This reduction in SEU FIT rate from Virtex-4 to Virtex-5 indicates that Xilinx is designing FPGA configuration memories to be more robust, as suggested in [3]. However, the largest FPGAs currently have configuration memories with up to 160 million bits [4]. As a result, some recent FPGAs, like Virtex-4 and Virtex-5, have incorporated additional logic that enables the detection of SEUs in the configuration memory. This logic can be used in conjunction with user-defined circuitry in the FPGA core to correct erroneous configuration memory bits that result from SEUs [5]. Approaches for on-line SEU detection and correction for Virtex-4 FPGAs have been proposed in [5] and [6] and for Virtex-5 FPGAs in [6] and [7]. All of these approaches assume that the embedded specialized cores for SEU detection, including the Internal Configuration Access Port (ICAP) and Frame Error Correcting Code (ECC) modules, are fault-free. This chapter presents an off-line BIST approach which completely tests the internal hardware mechanisms used for SEU detection and correction in the configuration memory of Xilinx Virtex-4 and Virtex-5 FPGAs. Since the FPGA is reconfigured for BIST only when testing is desired or required, there is no area or performance penalty incurred by the system application(s) normally executed in the FPGA. The only overhead for the BIST approach is the memory required to store one additional configuration used to configure the target device for BIST. The BIST approach is VHDL-based and is applicable to all production Virtex-4 and Virtex-5 devices. Furthermore, the BIST can be used for both manufacturing and system-level 54 testing of the ICAP and Frame ECC logic. The chapter begins with an overview of the ICAP and Frame ECC circuitry included in Virtex-4 and Virtex-5 FPGAs in Section 4.2. The test algorithm employed by the BIST approach to detect faults in parity-based ECC circuits is described in Section 4.3. Section 4.4 describes the method for generating and applying the test patterns to the ICAP and Frame ECC logic as well as the method used for output response analysis. Section 4.5 describes the actual implementation of the BIST approach in the fabric of Virtex-4 and Virtex-5 FPGAs along with experimental results. The chapter is summarized and concludes in Section 4.6. 4.2 Frame ECC and ICAP Logic Like any RAM, the configuration memory of an FPGA is partitioned into words, also referred to as frames, which represent the smallest addressable unit of the configuration memory for write and read operations. Virtex-4 and Virtex-5 frames consist of 1,312 bits [8]-[11]. Each frame includes a 12-bit field of 11 Hamming bits and an overall parity bit for to provide the potential for single error correction (SEC) as well as double error detection (DED) in the frame data. The parity and Hamming bits are generated external to the FPGA by the configuration bitstream generation software and are subsequently downloaded with the application specific configuration data to the FPGA configuration memory. An overall cyclic redundancy check (CRC) performed on the device during the download verifies the integrity of configuration data during download. However, system memory data subject to change during the operation of the FPGA, such as contents of block RAMs and look-up tables (LUTs) used as distributed RAMs, are not covered by the overall parity and Hamming bits. Virtex-4 and Virtex-5 FPGAs provide a specialized core, called Frame ECC, for detection and identification of single-bit errors and detection of double-bit errors in the frame 55 data [9][11]. The Frame ECC primitive, illustrated in Figure 4.1, has 11 syndrome outputs, an error output, and syndrome valid output. Each time that a frame is read from the configuration memory the Frame ECC module calculates the Hamming bits as well as overall parity for the frame data, and compares these bits with the Hamming bits and parity stored for that frame in the configuration memory. Based on this comparison, the Frame ECC module produces indications for no error, single-bit error, and double-bit error in addition to a syndrome indicating the location of single-bit errors. System memory element contents (for example, block RAMs, LUT RAMs, and flip-flops) are masked from the internal parity and Hamming calculation by the Frame ECC. The error codes for the Frame ECC are summarized in Table 4.1. Table 4.1: Frame ECC codes Error Type Condition (when syndromevalid = 1) No bit error Hamming match w/ no parity error 1-bit correctable error (SEC) Hamming mismatch w/ parity error 2-bit error detection (DED) Hamming mismatch w/ no parity error A Hamming mismatch with an overall parity error indicates that a single-bit correctable error has occurred. In this case, the bit-wise exclusive-OR of the stored Hamming code and the regenerated Hamming code, which is called the syndrome, gives the location of the single-bit error. A Hamming mismatch (non-zero syndrome) and no overall parity error indicate a non- correctable double-bit error has occurred. In the case of a double-bit error, the frame data must be repaired with data from a reliable external source. Single-bit errors in the configuration memory can be repaired with additional user logic implemented in the FPGA fabric to flip the bit in error as was done in [5], [6], and [7]. 56 Figure 4.1: Frame ECC and ICAP primitives The SYNDROMEVALID output is asserted for one clock cycle per frame during a frame read operation to indicate that the SYNDROME and ERROR outputs are valid for the current frame [9][11]. The most significant bit of the SYNDROME[11:0] bus is the overall parity error indication. The ERROR output is asserted when a single-bit or double-bit error is detected. To distinguish between single-bit correctable errors and double-bit non-correctable errors, the user must add logic to determine the result based on the scenarios in the last two entries in Table 4.1. The ICAP provides access to status and control registers as well as to the configuration memory from the FPGA fabric [9][11]. The ICAP works like the external SelectMAP configuration interface except that it has separate 32-bit read and write buses, as opposed to a bidirectional 32-bit bus. The maximum operating frequency of the ICAP is 100 MHz, and it supports 8-bit, 16-bit, and 32-bit word sizes. Every device includes two ICAPs. However, both ports cannot be used simultaneously. A bit in a control register is used to select whether the upper or lower ICAP is the active port. ERROR Frame ECC SYNDROME[11:0] SYNDROMEVALID ICAP_OUT[31:0] BUSY ICAP CLK_EN ICAP_IN[31:0] WRITE CLK 57 4.3 Test Algorithm Hamming bits are parity calculated over a certain subset of bits in the configuration frame data. For example, the Hamming parity matrix in Table 4.2 can be extended to any number of data bits (D#) where the Hamming bits (H#) occupy the power-of-2 number locations in the counting sequence. Each Hamming bit is calculated by exclusive-ORing the data bits that have a logic 1 in the same row as that Hamming bit, yielding the logic equations shown in the lower half of the table for this example. Table 4.2: Hamming parity matrix example H1 H2 D1 H3 D2 D3 D4 H4 D5 D6 D7 D8 D9 D10 D11 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 H1 = D1 ? D2 ? D4 ? D5 ? D7 ? D9 ? D11 H2 = D1 ? D3 ? D4 ? D6 ? D7 ? D10 ? D11 H3 = D2 ? D3 ? D4 ? D8 ? D9 ? D10 ? D11 H4 = D5 ? D6 ? D7 ? D8 ? D9 ? D10 ? D11 As a result, the Frame ECC logic consists mainly of parity generators. A parity generator is simply an exclusive-OR tree, and can be arranged in linear tree or balanced tree forms; both arrangements are C-testable with four test patterns if and only if the exact parity tree construction and interconnections are known for every gate in the tree [13][14]. However, for cases where the parity tree structure is unknown, a pseudo-exhaustive test set to detect all gate level single and multiple stuck-at faults is: 1) walk a single one through a field of zeros, and 2) all combinations of two ones in a field of zeros [15]. This set of test patterns also detects all bridging faults in the Hamming generation circuit and overall parity generation circuit [16]. Therefore, the number of test vectors, NTV, required in terms of the number of inputs, N, to test any parity generator (regardless of structure) is given by: 58 NTV = 22 2 NN NN +=+?? ? ? ??? ? (4.1) For the Virtex-4 and Virtex-5 Frame ECC logic, which calculates Hamming and parity over 1312-bits, the number of test patterns required by Equation 4.1 is NTV = 861,328. It is interesting to note that the parity calculations could be performed sequentially (32- bits at a time), as opposed to in parallel based on the entire 1312-bit frame. This leads to a significant reduction in the amount of logic for the calculation of Hamming code bits and overall parity. By masking appropriate bits from the parity trees (forcing bits to logic 0 using a mask LUT in conjunction with AND gates) the entire set of calculations can be performed sequentially, one 32-bit word at a time, as illustrated in Figure 4.2. The sequential Hamming generator requires twelve 32-input parity trees (one for each Hamming bit and one for the overall parity bit) with the cumulative parity calculations stored in 12 flip-flops. The Hamming and overall parity bits stored in the middle word of the frame are latched for comparison with the regenerated bits to produce the syndrome and overall parity error. This sequential parity generation would require only about 372 XOR gates and 352 AND gates for the masks. Parallel calculation over the entire 1312 frame bits, on the other hand, would require approximately 8,516 XOR gates. It is possible that the number of test vectors for the sequential Hamming and parity bit calculation circuit might be reduced from that given by Equation 4.1. However, the set of test vectors described previously will also ensure complete testing of the word counter, masking circuit, and flip-flops/latches used to perform the sequential Hamming calculation. This means the test pattern sequence is independent of the actual architecture of the Frame ECC circuit. In addition, the walking patterns in the set of test vectors will detect stuck-at and bridging faults in the ICAP. 59 4.4 BIST Approach Our approach to testing the Frame ECC logic is to implement a customized embedded core in the FPGA fabric that will repetitively write and read a single frame of configuration memory via the ICAP with the set of test patterns described in Section 4.3. The target frame for the BIST is arbitrarily located in the programmable interconnect network to avoid any configuration memory bits that are masked from the Frame ECC circuitry as a result of potentially legitimate changes to LUT-RAMs and flip-flop contents [9][11]. The basic procedure is as follows: (1) Write a configuration memory frame with a test pattern via the ICAP. (2) Read the frame containing the test pattern, compacting the ICAP output response. (3) Compact the output response of the Frame ECC when the syndrome is valid. (4) Generate the next test pattern and repeat Steps 1 through 3 for all 861,328 test vectors. Even using the 32-bit ICAP interface, this test sequence is time-intensive because each frame write and read requires a significant amount of overhead in terms of clock cycles. In our implementation of the BIST, there are 318 clock cycles of overhead for each of the 861,328 test patterns. Therefore, the actual test time is 318 times the number of test patterns (as will be discussed in Section 4.1), or 273,902,304 clock cycles. However, the amount of logic that is tested is not insignificant, and the Frame ECC logic is critical for space and fault-tolerant applications that rely on the detection and correction of SEUs during on-line operation. 60 Figure 4.2: Sequential Hamming bit calculation 4.4.1 Test Pattern Generator The test pattern generator (TPG) used to generate the parity tree test patterns is the largest component of the BIST architecture. It requires two 1,312-bit shift registers, 1,312 two-input OR gates, and a 32-bit 64-to-1 multiplexor array (the TPG is identical for both Virtex-4 and Virtex-5). In all, the TPG occupies about 1000 slices in Virtex-5 ? 90% percent of all of the resources occupied by the BIST circuitry. Virtex-4 and Virtex-5 FPGAs incorporate several configuration registers to provide write/read access to the configuration memory. The Frame Address Register (FAR) stores the memory address to/from which frame data is written/read. The Frame Data Register Input (FDRI) and Frame Data Register Output (FDRO) registers facilitate input/output data to/from the configuration memory. There are other registers such as the status (STAT) register, the cyclic redundancy check (CRC) register, and the command (CMD) register which stores the next register operation to perform such as ?Write FAR? or ?Read FDR0?. To write/read to/from the configuration memory, a combination of these registers must be used. In Virtex-4 and Virtex-5, the frame write and read instructions for the BIST are stored in a single 512?32-bit block RAM. The complete set of write and read instructions utilize about 10% of the Block RAM. The procedure for writing/reading to/from the configuration 1312 ConFigure Memory Word Counter Mask LUT FFs LATs FRAME ECC 32?12 12 12 12 12 32 32 Syndrome and parity error 61 memory in the context of the BIST is illustrated in the pseudocode of Figure 4.3 and Figure 4.4, respectively. Write_Test_Pattern (Test_Pattern, FRAME_ADDR){ Write to Command RESET_CRC Write to ID Register DEVICE_ID Write to Command WCFG WRITE_CONFIG_MEM Write to Frame Address FAR FRAME_ADDR Write to Frame Data Input FDRI 82 words for(i=0; i<41; i++){ Write word(i) of Test_Pattern } for(i=0; i<41; i++){ Write pad word 0x00000000 } Write NO-OP Write NO-OP Write to CRC 0x0000DEFC } Figure 4.3: Test pattern write sequence via ICAP interface Read_Test_Pattern (FRAME_ADDR){ Write to Command READ_CONFIG_MEM WCFG Write to Frame Address FAR FRAME_ADDR Read Frame Data Output FDRO 82 words for(i=0; i<41; i++){ // Discard pad frame } for(i=0; i<41; i++){ // Enable MISR to compact output // of FrameECC and ICAP } Write NO-OP Write NO-OP } Figure 4.4: Test pattern read sequence via ICAP interface In both Virtex-4 and Virtex-5, the frame address selected as the write/read destination for the test patterns cannot contain LUT-RAM or flip-flop configuration bits because these bit locations are masked in the Frame ECC logic during read back (due to the fact that these bits can change after configuration if the capture command is decoded via the configuration interface or 62 if the capture input to the capture primitive is asserted) [9][11]. Additionally, no BIST logic or routing resources can be located in the target reconfiguration memory region. Otherwise, the test logic could overwrite and modify parts of its own architecture. To eliminate the risk of overwriting the configuration of BIST logic or routing, the target configuration memory frame is located in the routing resources in the leftmost column of I/O Tiles (however, any frame containing only routing resources and not utilized for the BIST logic could be used). In Virtex-4, the target configuration frame is arbitrarily located in the leftmost column of I/O Tiles in the 16 rows below the center line. In Virtex-5, the target configuration frame is arbitrarily located in the lower 20 rows of the leftmost column of I/O Tiles. To avoid the target frame resources, the BIST logic is physically constrained to the right half of the target device during placement and routing. Additionally, before synthesizing the BIST, the Block RAM contents may require a minor modification. The Block RAM contents are device dependent, since the correct device ID must be written to the ID register before data can be written to the configuration memory via the ICAP. This is to ensure that a configuration file formatted for one device is not written, by mistake, to the wrong device. 4.4.2 Output Response Analyzer Since only one Frame ECC component is included in every Virtex-4 and Virtex-5 device, comparison-based output response analysis of identical blocks under test (BUT) is not possible. Furthermore, comparison with stored good circuit output responses is not practical, since the 861,328 12-bit syndromes could not be stored on the device. Instead, a 32-bit multiple input signature register (MISR) with internal feedback and primitive characteristic polynomial is employed to compact the Frame ECC output responses into a final signature. The MISR characteristic polynomial, P(x), is given by: 63 1)( 272832 ++++= xxxxxP (4.2) At the conclusion of the BIST, the signature in the MISR is compared with the known good circuit signature stored in the BIST logic, producing a single-bit pass/fail output. Additionally, the MISR is configured in a scan chain such that the signature can be retrieved via Boundary Scan for comparison with the good circuit signature. Any mismatch of the good circuit signature and the signature obtained by the BIST indicates a faulty circuit response. It should be noted that all MISRs have some probability of signature aliasing and fault escape. Signature aliasing occurs when a faulty circuit produces the same signature as the fault-free circuit. However, signature aliasing is extremely unlikely for properly designed MISRs. The classical approximation for the probability of fault aliasing is 2-n, where n is the degree of the MISR?s primitive polynomial [17]. Therefore, the probability of signature aliasing is approximately 1 in 4.3 billion for the 32-bit MISR described by Equation 4.2. The ICAP is tested by adding another identical 32-bit MISR to observe the ICAP outputs during the BIST sequence. This MISR, which is enabled when the ICAP read input is asserted, will detect any stuck-at faults as well as any bridging faults in the ICAP inputs and outputs. The MISR used to detect faults in the ICAP uses a similar on-chip comparison with the known good ICAP signature to produce a pass/fail output that is logically ORed with the pass/fail output of the Frame ECC MISR and comparison circuit, as illustrated in Figure 4.5. A simultaneous test of the ICAP and Frame ECC is logical since the ICAP would almost certainly be used for any space or fault-tolerant application that actively detects and corrects SEUs. However, because each device includes two ICAPs, only one of the ICAPs may be tested per BIST configuration in our current approach. Both ICAPs can be tested by simply generating, downloading, and executing two BIST configurations that alternate between the two ICAPs. It may also be possible to modify the BIST architecture such that both ICAPs are tested during the same 64 configuration by using the top ICAP for the first half of the BIST sequence and switching to the bottom ICAP during the remainder of the BIST sequence, for example. This would require two additional instructions to write a logic 1 to the ICAP_SELECT bit in the control register, enabling access via the lower ICAP. 4.4.3 Additional Logic In addition to the TPG and MISRs, the BIST architecture includes a custom soft-core embedded processor to control the BIST sequence execution. The processor is modeled in VHDL and is implemented entirely in configurable logic blocks. It controls the ICAP read/write signal and clock enable, the TPG/Block RAM multiplexor select inputs, and the TPG clock enable. The processor also includes three counters for addressing the instruction Block RAM, the TPG multiplexor, and for frame read timing. A block diagram of the ICAP and Frame ECC BIST architecture, including (from left to right) the TPG, circuits under test (CUT) and MISR output response analyzers, is shown in Figure 4.5. The input/output behavior of the architecture is discussed in Section 4.5. 4.5 Implementation Results The entire BIST circuit is implemented in VHDL, and only one configuration download is required for the BIST application. Some minor architectural differences between Virtex-4 and Virtex-5 devices require changes to the VHDL model for the two families of devices. First, before writing to the configuration memory, a device ID check must be performed by writing the correct device ID to the IDCODE register. This prevents accidental configuration with a bitstream formatted for another device. Any attempt to write the configuration memory without a successful device ID check will cause the FPGA to attempt a fallback reconfiguration [9][11]. The device IDs are kept in a look-up table specific to Virtex-4 or Virtex-5 and are synthesized 65 with the design as a constant. Second, the frame address register is formatted differently for Virtex-4 and Virtex-5, requiring a modification to the stored target frame address. Finally, the input/output ordering for the ICAP in Virtex-5 is byte-swapped, compared to the Virtex-4 ICAP. Therefore, we maintain two VHDL BIST models, one for Virtex-4 and one for Virtex-5 with each model supporting all devices within that particular family. Figure 4.5: ICAP and Frame ECC BIST architecture. There are six primary inputs and three primary outputs for the BIST architecture. The VHDL component declaration illustrating these primary inputs and outputs of the BIST configuration is given in Figure 4.6. It should be noted that the four inputs associated with the MISR scan chain, Scan_Clock, Scan_Mode, Scan_In, and Scan_Out, are included only for design verification. Therefore, only three primary inputs and two primary outputs are required for a typical application. The Clock input can be a free-running system clock or can be supplied by the Boundary Scan interface via TCK (DRCK internally). The maximum BIST clock frequency when the clock is supplied externally is 100 MHz, which corresponds to the maximum ICAP clock Frame ECC ICAP 32-bit MIS R 32-bit MIS 12 32 32 32 Good Signatur e Good Signatur e TD SYNDROMEVALID Scan_In 32 CUT Done TDO Start TPG (Generates 861,328 test patterns) Counter 32 32 32 10 Counter Block RAM 6 Scan_Out (Signatures) 66 frequency. When the clock is supplied by Boundary Scan, the maximum BIST clock frequency is limited to 50 MHz which corresponds to the maximum TCK clock frequency. It should be noted, however, that the BIST logic in the FPGA fabric can actually operate well above the maximum configuration frequency of 100 MHz in all Virtex-4 and Virtex-5 devices based on timing analysis of the synthesized and routed design. Figure 4.6: BIST VHDL component declaration. The Start signal is an active-high, asynchronous signal which enables the execution of the BIST sequence. The Start signal should be asserted for a minimum of three cycles of Clock to begin the BIST sequence, but then may be de-asserted or may be left asserted. The BIST will start and run automatically to completion after download by tying the Start signal to logic 1 in the top-level VHDL model. Toggling the Start signal low and then high after the completion of the BIST will clear the MISRs and cause the entire BIST sequence to repeat. This feature can be used to check for reproducible BIST results during design verification. The Scan_Mode input places both MISRs in a scan mode. With Scan_Mode asserted, the Scan_In input is an optional input to the MISR scan chain, which can be used in conjunction with Scan_Out (the output of the MISR scan chain) for loading and retrieving signatures during design verification. The input TDI and output TDO provide a single-bit pass/fail result for the BIST. As illustrated in Figure component Frame_ECC_BIST is port( Clock : in std_logic; TDI : in std_logic; Start : in std_logic; Scan_In : in std_logic; Scan_Mode : in std_logic; Scan_Clock : in std_logic; TDO : out std_logic; Done : out std_logic; Scan_Out : out std_logic); end component Frame_ECC_BIST; 67 4.6, TDI is one input to a 3-input OR gate, with the other two inputs coming from the outputs of the MISR signature comparators. When both MISRs contain the good circuit signatures, TDO (the output of the OR gate) will equal TDI. However, if either MISR does not contain the good circuit signature, the output of the functional OR will be logic 1, regardless of the state of TDI. The Done output is asserted when the BIST sequence is complete. When the Done signal is asserted, the pass/fail result is valid on the TDO output. The BIST sequence, after download (and without tying Start to logic 1), is as follows: (1) Assert the Start input. (2) Wait for the Done signal to be asserted. (3) Drive TDI low, poll TDO (should be logic 0). (4) Drive TDI high, poll TDO (should be logic 1). The BIST is interpreted as passing if the TDO output presents a logic 0 in Step 3 and a logic 1 in Step 4. This ensures that the TDO output is not stuck in the fault-free state due to a fault in the FPGA. Optionally, the contents of the two 32-bit MISRs may be scanned out and verified by external comparison to the known good circuit signatures. The total execution time for the BIST with an external 100 MHz clock is 2.739 seconds. The BIST has been downloaded, executed and verified on Virtex-4 FX12, SX35, and LX60 devices and on Virtex-5 LX30T, LX50T, SX35T, and SX50T devices using both Boundary Scan and external clock and control. Due to the differences in the configuration interfaces, Virtex-4 and Virtex-5 produce different good circuit signatures, as reflected in Table 4.3. Figures 4.7 and 4.8 show the ICAP and Frame ECC BIST implemented in the smallest Virtex-4 (FX12) and Virtex-5 (LX20T) devices, respectively. As can be seen in both figures, the BIST circuitry easily fits in programmable logic resources in the right hand half of the array. This shows that the BIST can be implemented in all other Virtex-4 and Virtex-5 devices, all of which have larger 68 arrays than those illustrated in Figures 4.7 and 4.8. The target configuration frame areas that should be avoided by constraining the design placement are also illustrated in the figures. Figure 4.7: Virtex-4 FX12 with ICAP/Frame ECC BIST Target Reconfiguration Area 69 Figure 4.8: Virtex-5 LX20T with ICAP/Frame ECC BIST Table 4.3 summarizes the actual implementation of BIST circuitry in Virtex-4 and Virtex-5 FPGAs. This includes the number of slices occupied by the BIST circuitry, the number of lines of VHDL code for the complete BIST circuit, and the total test time (excluding initial configuration time) at the maximum operating frequency of 100 MHz. The primary reason for the difference in the number of logic slices is due to the fact that Virtex-5 incorporates four 6- input LUTs and four flip-flops per slice while Virtex-4 slices incorporate only two 4-input LUTs and two flip-flops. As a result, a Virtex-5 slice has twice the logic of a Virtex-4 slice ? hence, Virtex-4 requires at least twice the number of slices. The smaller LUTs in Virtex-4 account for the additional slices. The 32-bit good circuit signatures for the Frame ECC and ICAP modules are also included in Table 4.3. Target Reconfiguration Area 70 Table 4.3: ICAP and Frame ECC BIST summary Virtex-4 Virtex-5 # of logic slices 2546 1010 # lines of VDHL 1125 1125 Total test time 2.739 sec. 2.739 sec. Frame ECC signature 0x9BC92CDB 0x969C47DD ICAP signature 0xB3FFB18B 0x31D989BD 4.6 Conclusions This chapter has presented a BIST approach for the ICAP and Frame ECC modules in Virtex-4 and Virtex-5 FPGAs. These modules are critical components used for SEU detection and correction in the configuration memory of FPGAs for space and fault-tolerant applications. The BIST approach was developed in VHDL and is applicable to all Virtex-4 and Virtex-5 devices, and the only overhead is the memory required to store the BIST configuration and downtime for the test application. The total test time is independent of the size of the FPGA. However, when using compressed configuration bitstream files, the download time can vary with the size of the FPGA depending on the physical constraints applied during synthesis. The BIST can be periodically downloaded and executed in systems which rely on the Frame ECC and ICAP logic for on-line detection and correction of SEUs to guarantee the fault-free operation of these resources. The approach has been implemented, downloaded, and verified on a variety of Virtex-4 and Virtex-5 devices. 4.7 Acknowledgements The contents of this chapter were published under the title ?BIST of Embedded SEU Detection and Correction Cores in Virtex-4 & Virtex-5 FPGAs? in Proceedings of the International Conference on Embedded Systems and Applications, 2009, pp. 149-155. Prof. Charles Stroud is a co-author on the paper. A majority of the actual research and the writing of 71 the published paper represents the efforts of the primary student author and not collaborators, and the research represents work performed while in the graduate program at Auburn University. 4.8 References [1] M. Ohlsson, P. Dyreklev and K. Johansson, ?Neutron Single Event Upsets in SRAM- Based FPGAs,? Proc. IEEE Nuclear and Space Radiation Effects Conf., pp. 177-180, 1998. [2] A. Lesea, ?Continuing Experiments of Atmospheric Neutron Effects on Deep Submicron Integrated Circuits,? WP286 (v1.0), Xilinx Inc., 2008. [3] A. Lesea, P. Alfke, ?Xilinx FPGAs Overcome the Side Effects of Sub-90 nm Technology,? WP256 (v1.0.1), Xilinx Inc., March 2007. [4] Virtex-6 Family Overview, DS150 (v1.0), Xilinx Inc., 2009. [5] L. Jones, ?Single Event Upset (SEU) Detection and Correction Using Virtex-4 Devices,? Application Note XAPP714 (v 1.5), Xilinx Inc., 2007. [6] B. Dutton and C. Stroud, ?Single Event Upset Detection and Correction in Virtex-4 and Virtex-5 FPGAs,? Proc. ISCA International Conf. on Computers and Their Applications, pp. 57-62, 2009. [7] K. Chapman and L. Jones, ?SEU Stratagies for Virtex-5 Devices,? XAPP864 (v1.0.1), Xilinx Inc., March 2009. [8] Virtex-4 FPGA User Guide, UG070 (v2.5), Xilinx Inc., 2008. [9] Virtex-4 FPGA Configuration User Guide, UG071 (v1.1), Xilinx Inc., 2008. [10] Virtex-5 FPGA User Guide UG190 (v4.2), Xilinx Inc., 2008. [11] Virtex-5 FPGA Configuration User Guide, UG191 (v3.2), Xilinx Inc., 2008. [12] J. Heiner, N. Collins, and M. Wirthlin, ?Fault-tolerant ICAP Controller for High-Reliable Internal Scrubbing,? IEEE Aerospace Conf., pp. 1-10, 2008. [13] D. Bossen, D. Ostapko, and A. Patel, ?Optimum test patterns for parity networks,? Proc. AFIPS Fall 1970 Joint Comput. Conf., pp. 63-68, 1970. [14] W-B Jone and C-J Wu, "Multiple fault detection in parity checkers," IEEE Trans. on Computers, vol.43, no.9, pp.1096-1099, 1994. [15] S. Mourad and E. McCluskey, ?Testability of parity checkers,? IEEE Trans. on Industrial Electronics, vol. 36, no. 2, pp. 254-262, 1989. 72 [16] L-T Wang, C. Stroud, and N. Touba, System-on-Chip Test Architectures, San Francisco, CA: Morgan Kaufmann, 2007. [17] C. Stroud, A Designer?s Guide to Built-In Self-Test, Boston, MA: Springer, 2002. 73 Chapter Five. Embedded Processor Based Fault Injection and SEU Emulation for FPGAs Two embedded processor based fault injection case studies are presented which are applicable to Field Programmable Gate Arrays (FPGAs) and FPGA cores in configurable System-on-Chip (SoC) implementations. The case studies include embedded hard core and soft core processors which manipulate configuration memory bits to emulate physical and transient faults in the FPGA core including shorts and opens in programmable interconnect and many different faults in logic resources. The emulated faults are used to evaluate fault detection capabilities of Built-In Self-Test (BIST) approaches, including fault identification capabilities of diagnostic procedures, and to evaluate the effect of Single Event Upsets (SEUs), including their detection and correction. Embedded processor based approaches provide significant improvement over previous fault injection techniques and, in turn, enable a more thorough analysis of BIST, diagnosis, and SEU mitigation. 5.1 Introduction and Background There are a number of Field Programmable Gate Array (FPGA) applications that can make use of the presence of physical faults. These applications include Built-In Self-Test (BIST) of the FPGA itself [1], some fault-tolerant design techniques [2], and Single Event Upset (SEU) detection/correction techniques for FPGA configuration memories [3]. These applications target FPGA devices as well as FPGA cores in configurable System-on-Chip (SoC) implementations. Verification, analysis, and evaluation of these applications can be performed with the ability to inject or emulate physical faults in the FPGA. 74 It is difficult to find actual faulty devices and their usefulness is limited due to the fixed nature of the fault [1]. Physical faults can be created by etching the packaged device and creating opens in routing resources that lie at the top level of interconnect metal for example, but once again the usefulness of these devices is limited. A more efficient approach is to manipulate the configuration memory bits to emulate physical faults in the device [4]. For example, a stuck- at fault in a look-up table (LUT) bit can be emulated by overwriting the particular configuration memory bit and setting it to the desired stuck-at fault value. SEUs on the other hand can be emulated by flipping the value of bits in the configuration memory. Shorts and opens in the interconnect network can be emulated along with almost any fault in the logic resources that can be controlled by configuration memory bits. When downloading the intended system configuration, the faults to be emulated can be injected in the configuration data just prior to the actual download process [1]. Alternatively, the intended configuration can be downloaded with subsequent partial reconfiguration used to inject and emulate the fault. One of the first FPGA applications to use fault injection emulation was hardware acceleration techniques for fault simulation [4]. However, the download time for fault injection detracted from the hardware acceleration to the extent that the manipulation of configuration bits was abandoned and replaced by fault emulation circuitry that was modeled and downloaded with the circuit to be simulated [5][6]. The overhead of the additional fault emulation circuitry and its associated routing was significant but acceptable in the case of fault simulation [7]. The additional circuitry and routing was not acceptable in the case of BIST approaches since the goal was to maximize the resources under test in any given configuration such that there are no remaining resources available to emulate faults. As a result, fault injection via configuration memory bit manipulation has been used extensively to debug, verify, and analyze development 75 of BIST configurations and diagnostic procedures for FPGAs [1][8]. Similarly, analysis of the affects of SEUs [3] as well as SEU detection and correction in FPGA configuration memories [9] can use manipulation of configuration memory bits and has been shown to be effective in emulating 97% of the SEUs induced and observed in radiation chamber experiments [3]. In this chapter, we present two case studies of embedded processors used to manipulate FPGA configuration memory bits for FPGA BIST and SEU detection/correction applications. The first case study uses a hard core embedded processor that has dedicated program and data memories with write access to the configuration memory of an FPGA core in a configurable SoC. In this case study, described in Section 5.2, the device is the Atmel AT9K series Field Programmable System Level Integrated Circuit (FPSLIC). The second case study uses a soft core embedded processor in an FPGA for manipulation of configuration memory bits via an internal configuration access port (ICAP). The soft core processor is downloaded with the application to be injected with faults. In this case study, described in Section 5.3, the devices include Xilinx Virtex-4 and Virtex-5 FPGAs. Each case study includes an overview of the device architectures, description of the fault injection emulation technique, and experimental results of the actual implementation. The chapter is summarized and concludes in Section 5.4. 5.2 Hard Core Processor Case Study The Atmel AT94K series configurable SoC consists of an FPGA core, various RAM cores, and an 8-bit Advanced Virtual RISC (AVR) microcontroller core as shown in Figure 5.1 [10]. Three types of memory resources include [10]: 1) many small 32?4-bit RAMs distributed throughout the FPGA core, 2) a 4-Kbyte to 16-Kbyte dual-port data RAM shared by AVR microcontroller and the FPGA core, and 3) a 20-Kbyte to 32-Kbyte program memory accessible only by the AVR microcontroller and used for storing machine code. 76 The AVR core is an 8-bit RISC architecture with 32 general purpose registers including a number of peripherals like watchdog timer, UART, etc [10]. There are two 8-bit bi-directional general purpose I/O ports. An 8-bit bi-directional data bus between the FPGA and AVR (controlled by the AVR) provides communications between the two cores. Whenever 8-bit data is written to (or read from) the data bus by the AVR, a strobe signal to the FPGA core is generated on FPGAIOWE (or FPGAIORE) along with one of 16 decoded select lines to the FPGA. There are four external interrupts to the AVR along with 16 interrupts from the FPGA. Figure 5.1: AT94K series SoC architecture The FPGA core is constructed as a symmetrical N?N array of programmable logic blocks (PLBs), where N=48 for the AT94K40 device (the largest AT94K series SoC) [10]. Each PLB contains two 3-input LUTs, a D flip-flop, and additional multiplexers/gates. Every PLB has dedicated diagonal (X) and orthogonal (Y) local routing resources to its neighboring PLBs, as shown in Figure 5.2a [10]. As shown in Figure 5.2b, the vertical and horizontal global routing resources associated with each PLB traverse a total of four PLBs (?4 lines) and eight PLBs (?8 lines). Vertical and horizontal bus repeaters are placed at the boundaries of every 4?4 array of PLBs (shown in Figure 5.2c for the horizontal bus) to prevent signal degradation in lengthy and/or heavily loaded signal nets. The repeaters also facilitate connections between ?4 and ?8 lines as seen in Figure 5.2d. =RAM =PLB =repeater AVR Processor FPGA core Data RAM Program Memory Peripheral Units 8 data read, write,18 select lines 16 interrupts 16 address 2 control 8 data 8 data 3 cont 16 address 77 Figure 5.2: AT94K routing architecture The AVR microcontroller core can write to (but not read from) the FPGA core configuration memory such that the FPGA can be dynamically reconfigured (either fully or partially) by the AVR core during normal system operation [10]. The FPGA configuration memory access is via a 24-bit address bus and 8-bit data bus. The address bus is partitioned into three 8-bit components referred to as FPGAX, FPGAY, and FPGAZ. FPGAX and FPGAY correspond to horizontal and vertical location of the programmable resource in the array while FPGAZ corresponds to specific logic/routing resources within the specified programmable resource. A write to the 8-bit data bus, FPGAD, results in a write cycle to a byte of the FPGA configuration memory. Sets of BIST configurations were developed to test the various programmable resources in the FPGA core including PLBs, RAMs, and the programmable interconnect network with horizontal and vertical repeaters [11]. During the verification and analysis of the sets of BIST Y Y Y Y X X X X PLB = Programmable Interconnect Point (PIP) (a) local routing (b) global routing (1 PLB) (c) horizontal repeaters in global routing 4 PLBs 8 PLBs d) repeater connections ?4 line ?8 line PLB = ?4 line = ?8 line =repeater 78 configurations, every configuration bit associated with the specified resource under test was injected in turn with a stuck-at-0 fault and a stuck-at-1 fault. For each fault injected, the BIST configurations that target that resource were applied (with the injected fault present). The BIST results indicate which BIST configurations, if any, detected the emulated fault. Because of the large number of faults to be emulated (twice the number of configuration bits) for each BIST configuration, injecting the faults in the configuration download file prior to each download takes considerable time as indicated by the ?download run time? in Table 5.1. Note that bank clock and set/reset lines are associated with the vertical repeaters, hence, the larger number of configuration bits when compared to the horizontal repeaters and associated routing. Table 5.1: Embedded fault injection run time analysis for AT94K40 Resource BIST Configs Config Bits Total Faults Download Run Time Processor Run Time PLB with flip-flops 8 81 162 4 hr 29 min 4 min 34 sec Vertical Repeaters 20 71 142 3 hr 55 min 4 min 1 sec Horizontal Repeaters 20 65 130 3 hr 36 min 3 min 40 sec Free RAM 3 4 8 13 min 14 sec BIST configurations can also be generated and executed by the embedded AVR processor [11]. In this case, fault injection emulation is somewhat more difficult since the processor core has write-only access to the FPGA configuration memory. If the processor core could also read the configuration memory, it could perform a read-modify-write (RMW) operation to inject a fault at any desired configuration memory bit. With write-only access, one must also know the normal BIST configuration data for each configuration memory byte in order to inject a single fault without disturbing the other seven bits of configuration data; otherwise, we could be injecting eight faults at a time. When the embedded processor is generating the BIST 79 configuration, the information is contained within that resident program. As a result, the fault injection emulation can more realistically be performed from the embedded processor, although the development effort is greater without the RMW capability. Table 5.1 gives the run time when using the embedded processor core to perform fault injection emulation along with the BIST configuration generation and execution. A speed-up of almost a factor of 60 is obtained when the embedded processor core performs the fault injection emulation analysis including BIST configuration generation, BIST sequence execution, and BIST results retrieval. 5.3 Soft Core Processor Case Study The configuration memories of Virtex-4 [12] and Virtex-5 [13] FPGAs are partitioned into frames, where each frame has a fixed length of 1,312 bits, or forty-one 32-bit words. A frame is the smallest addressable segment of the configuration memory; therefore all memory write/read operations must be performed on whole frames. In Virtex-4 devices, a frame contains the configuration data for 16 rows of configurable logic blocks (CLBs) and input/output (I/O) tiles, or four rows of block random access memories (RAMs) and digital signal processors (DSPs) tiles in the same column [12]. In Virtex-5 devices, a frame covers 20 rows of CLBs and I/O tiles or five rows of block RAMs and DSPs tiles [13]. This means that individual FPGA resources cannot be reconfigured without also providing explicit configuration data for other FPGA resources that occupy the same frame. Virtex-4 and Virtex-5 FPGAs incorporate several configuration registers to provide write/read access to the configuration memory. The Frame Address Register (FAR) stores the memory address to/from which frame data is written/read. The Frame Data Register Input (FDRI) and Frame Data Register Output (FDRO) registers facilitate input/output data to/from the configuration memory. There are other registers such as the status (STAT) register, the cyclic 80 redundancy check (CRC) register, and the command (CMD) register which stores the next register operation to perform such as ?Write FAR? or ?Read FDR0?. To write/read to/from the configuration memory, a combination of these registers must be used. These registers are accessible from both Boundary Scan and SelectMAP configuration interfaces as well as the internal configuration access port (ICAP) located in, and accessible from, the FPGA fabric. Emulated SEUs, or faults injected for BIST, require the reconfiguration of a single configuration memory bit after system configuration, or each BIST configuration, is downloaded. Furthermore, the contents of the frame, which configure multiple rows of resources, must be preserved during reconfiguration for emulated SEU/fault injection. Our approach takes advantage of partial reconfiguration and read back capabilities of Virtex-4 and Virtex-5 FPGAs to implement RMW for bit-level partial reconfiguration. 5.3.1 Overview of Approach The basic approach begins with locating the frame containing the target bit for fault or SEU emulation. The frame is read in its entirety and stored. Next, the target bit is located within the frame, and overwritten with the desired stuck-at value in the case of a fault. This approach also supports emulation of SEUs by simply inverting the target bit. Finally, the modified frame is written back to the same location in the configuration memory from which it was read. Optionally, a subsequent read back of the frame can be used to verify the frame RMW results. The frame address and index of the bit targeted for fault/SEU emulation are stored in a list of faults/SEUs to be emulated. For each fault in the list, the BIST configuration is downloaded, executed with the fault on the device, and the results retrieved. If any of the output response analyzers (ORAs) record a failure, indicating a faulty block under test (BUT), the fault has been detected [9]. However, most tests of a specific FPGA resource require multiple BIST 81 configurations to test its programmability and achieve high fault coverage. Given N BIST configurations and M faults in the fault list, the total number of downloads, executions, and retrievals of BIST results is N?M. The main reason why this many downloads are required is that there is no way to reset the ORAs once a fault is detected such that failures are latched until a new configuration is downloaded. Partial reconfiguration can be used to reduce download time, but it does not reset the ORAs between two consecutive BIST configurations. Therefore, once a fault is detected, the ORAs return failure indications for the remaining BIST configurations that may not detect the fault. Even though ORA failure indications imply a fault was detected, it is not clear which configuration detected the fault for proper evaluation. Since the BIST approach pseudo-exhaustively tests multiple identically configured BUTs, the fault coverage in one BUT may be assumed to be the overall fault coverage for all BUTs. This assumption greatly reduces the number of faults, M, that need to be emulated to obtain accurate fault coverage. For example, consider Figure 5.3, which shows the simulated individual and cumulative single stuck-at fault coverage for our BIST configurations for Virtex-5 CLBs in SliceL mode of operation. The simulation results are based on gate-level models of the CLB. The simulation results show that six BIST configurations are required to cumulatively detect 100% of single stuck-at faults in the CLB in SliceL mode of operation. However, as discussed in [14], the SliceL configurations must be applied twice such that every CLB serves both as a BUT and an ORA. A total of 3,006 collapsed stuck-at faults were found for the SliceL and another 8,462 faults for SliceM, all of which were cumulatively detected in fault simulation. These comprehensive fault lists include all faults affecting the CLB, including configuration memory bit stuck-at faults. Therefore, by using fault injection to emulate a subset of the complete fault 82 list (specifically, those faults affecting the configuration memory bits), both the quality of the BIST configurations and the accuracy of the gate-level fault simulation models can be gauged. Less than 100% fault coverage from fault injection would suggest inaccuracies in the simulation model and potentially lower fault coverage than the fault simulations suggest. Of the 3,006 faults in the SliceL, 614 represent configuration memory bit stuck-at faults. These faults were emulated using the RMW approach previously described, with results shown in Figure 5.4. Using fault injection, 100% of the configuration memory bit faults affecting the SliceL mode of operation were detected, confirming the simulation results in Figure 5.3. Furthermore, the similarity of the fault coverage trends in Figures 5.3 and 5.4 helps to verify the accuracy of simulation models. The biggest drawback of prior fault injection approaches is the large number (N?M) of downloads required to emulate a sufficient sample of configuration memory bit faults. To obtain the results shown in Figure 5.4, a total of 614?6 = 3,684 downloads, fault injections, BIST executions, and results retrievals were required. Additionally, any revision to a BIST configuration requires the complete fault list be run again to ensure that the modified configuration does not jeopardize fault detection capabilities. The total time required for fault injection can be calculated by multiplying the test time for the set of BIST configurations by the number of faults in the fault list. Figure 5.5 shows the total test time for the set of all CLB BIST configurations using compressed downloads via a 50MHz Boundary Scan interface. Consider the set of CLB BIST configurations for the mid-sized LX50T, which requires 3,147 ms using the 50 MHz Boundary Scan interface from Figure 5.5. For the complete list of 698 configuration memory bit faults (which includes SliceM mode configuration bits), the fault injection time is 698?3.147 = 2,197 seconds. The more realistic fault injection time that we experienced, using a 83 333 kHz PC parallel port interface to Boundary Scan, was approximately 150?2,197 = 81,666 seconds, or 91.53 hours. This lengthy application time prompted us to develop the embedded soft core processor based fault injection approach which greatly improves the test time by both increasing the achievable configuration interface frequency and by increasing the configuration interface word size using the ICAP. 0 500 1000 1500 2000 2500 3000 1 2 3 4 5 6 Configuration # # F au lts D ete cte d 0 10 20 30 40 50 60 70 80 90 100 Individual FC Cumulative FC Figure 5.3: SliceL simulation stuck-at fault coverage 0 100 200 300 400 500 600 1 2 3 4 5 6 Configuration # # F au lts D ete cte d 0 10 20 30 40 50 60 70 80 90 100 Figure 5.4: SliceL fault injection stuck-at fault coverage 84 0 1000 2000 3000 4000 5000 6000 7000 8000 LX 20T LX 30T LX 50T LX 85T LX 110 T SX3 5T SX5 0T SX9 5T Ti me (m s) Readback Execution Configuration Figure 5.5: Total CLB test time via Boundary Scan The ICAP provides access to configuration registers and the configuration memory internally from the FPGA fabric. The ICAP works like the external SelectMAP interface except that it has separate 32-bit write and read buses, as opposed to a bidirectional 32-bit bus. The maximum operating frequency of the ICAP is 100 MHz, and it supports 8-bit, 16-bit, and 32-bit word sizes [12][13]. Every device includes two ICAPs; however, both ports can not be used simultaneously. A configuration bit in the configuration interface control register selects between the upper and lower ICAPs. The basic idea of an embedded fault/SEU emulation approach is to embed all of the logic required for frame RMW operations in the FPGA with the BIST or SEU controller configuration, using the ICAP to access the configuration memory. The benefit of embedded fault/SEU emulation approach is a minimum 32 times speed up over the external Boundary Scan configuration interface operating at the same frequency. In addition, configuration frequencies of 100 MHz are achievable within the FPGA fabric. 85 5.3.2 Architecture and Operation In our embedded fault/SEU emulation approach, a configuration containing both the BIST and SEU controller architecture and some additional logic is downloaded to the device. A list of fault/SEU sites (configuration memory address and bit indexes) is loaded into the embedded fault/SEU emulation logic in the FPGA either with the download or via an external interface after download. The embedded system proceeds by reading the configuration frame containing the first fault/SEU site. The frame is temporarily stored in the FPGA fabric while the target bit is located and the fault/SEU injected. Next, the frame is written back into the configuration memory and the BIST is allowed to execute as normal. When the BIST has run to completion, a single-bit pass/fail result for the configuration is stored. Normally, using the external interface, the BIST would proceed to the next configuration. However, the embedded logic can correct the previously injected fault, reset the ORAs, and then inject the next fault in the fault list, as can be seen in the flowchart in Figure 5.6. This approach has been implemented in Virtex-4 and Virtex-5 FPGAs. The implementation is discussed in the remainder of this section. Figure 5.6: Frame read-modify-write flowchart IDLE Read Frame Modify Bit Write Frame EOF ? Pause? Reset Fault List Pointer Yes No No Yes Start Fault List 86 The embedded fault/SEU emulation core is entirely implemented in CLBs and two block RAMs in the FPGA fabric. A central component of the architecture is the dual port 18-kbit block RAM. Block RAMs have two independently configurable read and write ports (A port and B port); only the stored data is shared [12][13]. One block RAM is used to temporarily store frames during the RMW procedure. To accomplish the RMW, the B port is configured for 32-bit reads/writes and the B port input data bus is connected directly to the ICAP 32-bit data output bus. The B port data output bus is connected to the ICAP inputs via a 32-bit 2-to-1 multiplexor. A frame read is initiated at the configuration memory frame address specified by the current fault and as the frame is read it is stored in the first forty-one 32-bit words in the block RAM. Next, the A port, configured for 1-bit read/write operations, is used to locate the target bit in the location specified by the fault list entry. In the case of a stuck-at 1/stuck-at 0 fault, a 1/0 is written at the specified bit. However, for SEU emulation, the contents of the specified bit address are read, inverted, and then written back to the same address. Finally, the modified frame is written back to the same address from which it was read via the 32-bit B port output data bus. The fault list is stored in a second dual-port 18-kbit block RAM. The block RAM is configured with independent 512?36-bit read and write ports. The write port is connected to a Boundary Scan user access register with some additional logic for controlling the address bus; namely, a 32-bit shift register and address counter. The read port output bus of the block RAM is connected to the embedded fault/SEU injection logic and state machine. This block RAM structure allows a fault list to be written into the block RAM after the device is configured, and the list is immediately accessible by the fault/SEU injection logic and state-machine. However, the block RAM contents can also be initialized with a fault list in the VHDL model, eliminating 87 the need to shift in the fault list via the Boundary Scan user access register. The block RAM is capable of storing up to 512 faults. The core must be capable of facilitating any length fault list up to the maximum of 512 faults. Therefore, an end-of-file delimiter is required. Each 32-bit word in the block RAM has four parity bits which we use to store the file delimiters as well as control bits for stuck-at faults and bit-flips (SEU emulation). The ability to inject multiple faults simultaneously is also desirable. This requires the inclusion of a ?pause? delimiter in addition to the ?end-of-file? delimiter. Our solution is to use the two least significant bits of the parity word to encode the fault type (stuck-at 1, stuck-at 0, or bit-flip) and to use the two most significant parity bits to store delimiters. The encoding scheme for these bits is shown in Table 5.2, and the overall fault list format for the 32-bit data word and 4-bit parity word is shown in Table 5.3. Table 5.2: Parity bit encoding, where X = don?t care Parity[3:2] Description Parity[1:0] Description 00 Continue to next fault 00 Stuck-at zero 01 Pause at fault 01 Stuck-at one 1X End-of-file (EOF) 1X Bit-flip (SEU) Table 5.3: Embedded fault list format 35:34 33:32 32:21 20:0 Delimiters Fault Code Bit Index Frame Address The other significant component of the architecture is a 40?256-bit ROM implemented in LUTs in the FPGA fabric. This ROM is used to store all 32-bit ICAP instructions required for the frame RMW process. Another eight control bits control the ICAP write and clock enable inputs, and serve as inputs to the state machine logic. Instructions are stored in the ROM in the order in which they are written to the block RAM such that the block RAM may be sequentially 88 addressed to initiate new frame reads and writes. The two block RAMs, instruction ROM, and ICAP are connected by an assortment of glue logic, including the large 32-bit 2-to-1 multiplexor. A block diagram of the overall embedded fault/SEU injection core appears in Figure 5.7. Figure 5.7: Block diagram of fault injection core 5.3.3 Implementation Results The total number of slices used in Virtex-4 and Virtex-5 FPGAs is shown in Table 5.4. The primary reason for the difference in the number of logic slices is due to the fact that Virtex-5 incorporates four 6-input LUTs and four flip-flops per slice while Virtex-4 slices incorporate only two 4-input LUTs and two flip-flops. As a result, a Virtex-5 slice has twice the logic of a Virtex-4 slice ? hence, Virtex-4 requires at least twice the number of slices. The smaller LUTs in Virtex-4 account for the additional slices. Table 5.4: Embedded fault injection core resources Attribute Virtex-4 Virtex-5 # lines of VHDL ~950 ~950 # block RAMs 2 2 # slices 228 67 BSCAN I C A P GO FaultList Block RAM EOF PAUSED Frame RMW Block RAM ROM & FSM VHDL Generic: Device Name 89 The entire embedded fault/SEU emulation core is modeled in VHDL. For VHDL-based designs to be faulted, the fault/SEU emulation core may be instantiated in the top level of the design and synthesized with the intended system function to be faulted. Our BIST configurations are not modeled in VHDL, and in this case the fault injection core is added later in the design flow. Because our BIST configurations are modeled in Xilinx Design Language (XDL), the fault/SEU emulation core is synthesized and converted to XDL. The XDL of the embedded core and the BIST can then be combined and the design flow continued. In either case, it will be necessary to constrain the placement of the design to an area of the FPGA not targeted for fault injection. For example, if the fault injection core is embedded with a block RAM BIST configuration [15], the two fault injection core block RAMs must be constrained to an area of the device away from the BIST configuration. Furthermore, the fault list must not contain the address of fault sites located in the embedded fault/SEU emulation core?s block RAMs. If any configuration memory frame addresses in the fault list happen to correspond with any of the embedded core?s resources, the core could overwrite a bit controlling the functionality of its own resources, resulting in likely failure. An example of a properly constrained design is shown in Figure 5.8. In the figure, a partial array of test pattern generators ORAs and CLBs under test is placed in the left half of the device with the embedded fault injection core is constrained to the right half of the device. The embedded fault injection core is loaded with fault addresses residing only in the left half of the array. The component declaration for the embedded fault/SEU injection core is shown in Figure 5.9. There are two primary inputs and two primary outputs for the model, as well as a generic which specifies the device. It should be noted that the Boundary Scan access to the fault list block RAM is embedded in the VHDL model, so these I/O do not appear in the top level 90 component declaration. While the top level component declaration is identical for Virtex-4 and Virtex-5, we maintain separate VHDL models for Virtex-4 and Virtex-5 because of some minor architectural differences between the device families. First, before writing to the configuration memory, a device ID check must be performed by writing the correct device ID to the IDCODE register. (This prevents accidental configuration with a bitstream formatted for another device.) The device IDs are kept in a LUT specific to Virtex-4 or Virtex-5 and are synthesized with the design as a constant; all Virtex-4 and Virtex-5 devices are supported. The generic device in the top level model is used to locate the correct device ID in the VHDL LUT. Second, the frame address register is formatted differently for Virtex-4 and Virtex-5, requiring small changes in the ordering of the fault list block RAM data output bus. Finally, the input/output ordering for the ICAP in Virtex-5 is byte-swapped, compared to Virtex-4 ICAP. Figure 5.8: Routed embedded fault inject core (right) with half-array of routed CLB BIST (left) in Virtex-5 LX20T 91 Table 5.5: Fault/SEU injection core I/O descriptions Name Direction Description CLK Input Clock input up to 100MHz (ICAP max) GO Input Digital 1-shot input asserted to start injection of 1 or more faults separated by ?pause? delimiters. PAUSED Output Asserted to indicate injection of 1 or more faults separated by ?pause? delimiters is complete. EOF Output End-of-file asserted when end of fault list is reached. Figure 5.9: Fault inject core component declaration The details of the primary inputs and outputs of the embedded core are summarized in Table 5.5. The normal embedded fault injection process with a free running system clock (up to 100 MHz) is as follows: (1) Download BIST configuration with embedded fault injection core. (Optionally load fault list via Boundary Scan user access register). (2) Toggle the GO input. Fault injection begins and runs to completion or until a ?pause at fault? is encountered. (3) Monitor the PAUSED and EOF outputs. When PAUSED is asserted, execute the BIST configuration and record results. Repeat steps 2 and 3 until both PAUSED and EOF are asserted, then go to step 4. (4) Execute the BIST for a final time and record results. The end of fault file is reached and fault injection is complete. The embedded fault injection core has been verified on Virtex-4 and Virtex-5 devices. The core was initially verified by synthesizing only the core, loading a fault list, and executing the fault injection. To verify the injection of faults and bit-flips, the contents of the configuration memory were read back via the Boundary Scan interface and compared line-by-line to the component fltinject is generic(DEVICE : string(1 to 6):="LX110T"); port( GO : in std_logic; CLK : in std_logic; EOF : out std_logic; PAUSED : out std_logic); end component fltinject; 92 original configuration download file. The core is capable of injecting stuck-at faults and SEU bit-flips anywhere in the configuration memory except block RAM contents. It is possible, however, to modify the architecture to support injection of faults in block RAM contents. Transient faults can be emulated by back-to-back SEU bit-flips such that the fault exists for a minimum of 3 ?s - the minimum RMW time for a single frame. By incorporating two back-to- back bit-flips with a ?pause? delimiter, the user can control a transient fault for longer periods. 5.4 Summary and Conclusions We have presented case studies for two embedded processor approaches for SEU and fault injection emulation in FPGA and FPGA cores in reconfigurable SoCs. In the first case, a dedicated hard core processor was used to inject emulated faults in the FPGA core configuration memory via a write-only interface. The lack of read access to the configuration memory increased the development effort and difficulty for use in the evaluation and analysis of BIST configurations for the FPGA. In the second case, a soft core processor was developed which was capable of read-modify-write access to the FPGA configuration memory. This facilitates the emulation of single and multiple stuck-at faults as well as bit-flipping for emulation of single and multiple SEUs. Hence, the embedded SEU/fault emulation processor supports a wide variety of fault types with no download penalty for more efficient and thorough evaluation of BIST and SEU mitigation. It should be noted that the fault injection is used in a fault-free device to analyze SEU detection/correction and BIST development and is not part of the manufacturing or system-level operation or test. 5.5 Acknowledgements The contents of this chapter were published under the title ?Embedded Processor Based Fault Injection and SEU Emulation for FPGAs? in Proceedings of the International Conference 93 on Embedded Systems and Applications, 2009, pp. 183-189. Prof. Charles Stroud and former Auburn University Department of Electrical and Computer Engineering students Mustafa Ali and John Sunwoo are co-authors on the paper. A majority of the actual research and the writing of the published paper represents the efforts of the primary student author and not collaborators, and the research represents work performed while in the graduate program at Auburn University. 5.6 References [1] C. Stroud, J. Nall, M. Lashinsky and M. Abramovici, ?BIST-Based Diagnosis of FPGA Interconnect,? Proc. IEEE Int. Test Conf., pp. 618-627, 2002. [2] F. Kastensmidt, L. Carro and R. Reis, Fault-Tolerance Techniques for SRAM-based FPGAs, Springer, 2006. [3] E. Johnson, M. Caffrey, P. Graham, N. Rollins and M. Wirthlin, ?Accelerator Validation of an FPGA SEU Simulator,? IEEE Trans. on Nuclear Sci., vol. 50, no. 6, pp. 2147-2157, 2003. [4] P. Ellervee, J. Raik, K. Tammem?e and R. Ubar, ?Environment for FPGA-based Fault Emulation,? Proc. Estonian Acad. Sci. Eng., vol. 12, pp. 323?335, 2006. [5] S. Hwang, J. Hong and C. Wu, ?Sequential Circuit Fault Simulation Using Logic Emulation,? IEEE Trans. on CAD of ICs and Systems, vol. 17, no. 8, pp. 724-736, 1998. [6] P. Civera, L. Macchiarulo, M. Rebaudengo, M. Reorda and M. Violante, ?An FPGA- Based Approach for Speeding-Up Fault Injection Campaigns on Safety-Critical Circuits,? Journal of Electronic Testing: Theory and Applications, vol. 18, pp, 261?271, 2002. [7] R. Sedaghat, ?Routability estimation of FPGA-based fault injection,? Electronics Letters, vol. 41, no. 14, pp. 790-792, 2005. [8] T. Slaughter, C. Stroud, J. Emmert and B. Skaggs, ?Fault Injection Emulation for Field Programmable Gate Arrays,? Proc. Int. Society for Optical Eng., vol. 4525, pp. 1-9, 2001. [9] B. Dutton and C. Stroud, ?Single Event Upset Detection and Correction in Virtex-4 and Virtex-5 FPGAs,? Proc. ISCA Int. Conf. on Computers and Their Applications, pp. 57- 62, 2009. [10] AT94K Series Field Programmable System Level Integrated Circuit, Datasheet, Atmel Corp., 2001. 94 [11] J. Sunwoo and C. Stroud, ?Built-In Self-Test of Configurable Cores in SoCs Using Embedded Processor Dynamic Reconfiguration,? Proc. Int. SoC Design Conf., pp. 174- 177, 2005. [12] Virtex-4 FPGA Configuration Guide, UG071 (v1.5), Xilinx Inc., 2007. [13] Virtex-5 FPGA Configuration User Guide, UG191 (v2.7), Xilinx Inc., 2008. [14] B. Dutton and C. Stroud, ?Built-In Self-Test of Configurable Logic Blocks in Virtex-5 FPGAs,? Proc. IEEE Southeastern Symp. on System Theory, pp. 235-249, 2009. [15] B. Garrison, D. Milton, and C. Stroud, ?Built-In Self-Test for Memory Resources in Virtex-4 FPGAs,? Proc. ISCA Int. Conf. on Computers and Their Applications, pp. 63- 68, 2009. 95 Chapter Six. Soft-Core Embedded Processor-Based Built-In Self-Test of FPGAs This chapter presents the first implementation of Built-In Self-Test (BIST) of Field Programmable Gate Arrays (FPGAs) using a soft core embedded processor for reconfiguration of the FPGA resources under test, control of BIST execution, retrieval of BIST results, and fault diagnosis. The approach was implemented in Xilinx Virtex-5 FPGAs but is applicable to any FPGA that contains an internal configuration memory access port 6.1 Introduction Built-In Self-Test (BIST) for Field Programmable Gate Arrays (FPGAs) exploits the re- programmability of FPGAs to create BIST circuitry in the FPGA fabric during manufacturing and system-level off-line testing [1]. The only overhead is the external memory required to store the BIST configurations along with the time required to download and execute the BIST. No area overhead or performance penalties are incurred in the user function because the BIST logic is replaced by the intended system function after testing is complete. The BIST configurations are applicable to all levels of testing because they are independent of the intended system function and require no specialized external test fixture or equipment. Most research and development in BIST for FPGAs has focused on reducing the number of test configurations, reducing the size of test configuration files, and decreasing BIST execution time [2]-[8]. But the ever increasing complexity and level of integration in FPGAs has, with few exceptions, resulted in longer test times, more downloads, and more memory required for storing BIST configurations for each new generation of FPGA. However, the increasing size and complexity of FPGAs have also created opportunities for innovation in FPGA testing. 96 This chapter presents the first implementation of BIST for FPGAs using a soft core embedded processor synthesized into the fabric of the FPGA under test. The approach reduces the number of configuration files required for BIST by exploiting the regularity of BIST structures to significantly compress and store partial configuration data in the embedded processor?s program memory. The embedded processor controls and executes the BIST sequence, including retrieval and analysis (fault diagnosis) of BIST results, and reconfiguration of the FPGA for subsequent BIST configurations. This embedded processor based BIST approach is possible for two reasons: first, the growing size and complexity of FPGAs facilitates the inclusion of complex circuitry that only occupies a small percentage of the total configurable resources, leaving adequate area for BIST logic; and, secondly, the ability to access the configuration memory from inside the FPGA fabric has made possible internal reconfiguration and read back. The approach has been successfully implemented in Xilinx Virtex-5 but is applicable to any FPGA with internal configuration memory access. 6.2 Background A number of BIST approaches have been developed for the configurable logic and memory resources in FPGAs [1]. Due to the programmable nature of resources to be tested, all BIST approaches for FPGAs require multiple configurations in order to obtain high fault coverage. Generally, a BIST approach is organized into test sessions and phases [2]. Each test session consists of a set of test phases (test configurations) for a particular resource under test in order to test that resource in all modes of operation. For example, BIST of configurable logic blocks (CLBs) requires two test sessions. In the first test session, half of the CLBs are configured as blocks under test (BUTs), with the remaining half serving as comparison-based output response analyzers (ORAs) and test pattern generators (TPGs). In recent CLB BIST approaches, 97 the TPGs are implemented in non-CLB resources freeing CLBs to function as additional ORAs such that circular comparison can be implemented, as illustrated in Figure 6.1, where the outputs of each BUT in a row or column are monitored by two ORAs and compared to the outputs of two other identically configured BUTs [1]. This circular comparison in conjunction with multiple identically configured TPGs provides high diagnostic resolution with low probability of fault escape [1]. In the second test session, the positions of the BUTs and ORAs are swapped, such that every CLB is configured as a BUT in one test session and as ORA in the other test session. Figure 6.1: Configurable logic block (CLB) BIST architecture BIST control, including downloading the initial BIST configuration, executing the BIST sequence, retrieval of results, fault diagnosis based on failing results, and reconfiguration of subsequent BIST phases, has traditionally been achieved via interface to an external BIST controller. However, the increased complexity of FPGAs, large number of test configurations associated with various programmable resources, and speed limitations of external download interfaces result in long manufacturing test times and limit practicality of system-level testing. Various approaches have been investigated to reduce the overall test time while achieving high quality tests. Beyond minimizing the number of test phases, partial reconfiguration reduces test time by reconfiguring only the resources under test for various modes of operation once the TPG TPG BUT ORA TPG 98 overall test structure has been downloaded into the device. BIST configurations that have been recently developed for Virtex-4 and Virtex-5 FPGAs include a single-bit pass/fail output to eliminate retrieval of ORA contents for passing test phases or when fault diagnosis is not desired [5]-[8]. When failures are observed, partial configuration memory read back can be used to obtain the ORA contents to diagnose the faulty resource(s) for fault tolerant applications. Beyond these techniques, the only new development in FPGA BIST has been introduction of embedded processor based approaches. Prior work in embedded processor based BIST includes system-on-chip (SoC) testing with hard core microprocessors [9] but did not address testing of FPGAs or FPGA cores in SoCs. The first embedded processor based BIST approach for FPGAs was developed to minimize test time, number of downloads, and complexity of the external BIST controller by relocating BIST reconfiguration, control, and diagnosis to the dedicated hard core embedded processor in the Atmel AT94K series configurable SoC [3][4]. The device consists of an FPGA core, various RAMs, and an 8-bit Advanced Virtual RISC (AVR) microcontroller [10]. Sets of BIST configurations were developed to test each of the various programmable resources in the FPGA core including CLBs, RAMs, IOBs, and programmable routing network [3][4]. The embedded processor was used to configure the FPGA for each test session, execute the BIST sequence, retrieve BIST results from the ORAs, and perform diagnosis based on failing BIST results. This embedded processor based BIST approach achieved a total test time speed-up of about 43.5 over the tradition approach of downloading each BIST configuration [4]. External memory requirements for storing BIST configurations were reduced by a factor of about 158 because only a single program needed to be downloaded into the AVR program memory, from which all BIST configurations were generated and executed. 99 While this embedded processor based BIST approach was practical for system-level testing, the approach was developed specifically for AT94K devices such that application to other FPGAs is limited due to reliance on the hard core processor with dedicated program memory. Some hardcore processors (such as the PowerPC in Virtex-4 and Virtex-5 FX series FPGAs) do not have a dedicated program memory and must use programmable resources in the FPGA. Soft core processors, on the other hand, can be implemented in most FPGAs such that a soft core processor based approach would be applicable to a wider range and variety of FPGAs and applications. The primary requirement is that the FPGA include an internal configuration access port (ICAP) to provide processor access to the configuration memory. The configuration memories of Virtex-4 [11] and Virtex-5 [12] FPGAs are partitioned into frames, where each frame has a fixed length of 1,312 bits, or forty-one 32-bit words. A frame is the smallest addressable segment of the configuration memory; therefore all memory read/write operations must be performed on whole frames. This means that individual FPGA resources cannot be reconfigured without also providing explicit reconfiguration data for other FPGA resources that occupy the same frame. In Virtex-5, a frame contains the configuration data for 20 rows of CLBs and (I/O) tiles, or 5 rows of block RAMs and DSPs tiles in the same column. In Virtex-4, a frame contains configuration data for 16 rows of CLBs and I/O tiles, or 4 rows of block RAMs and DSP tiles. Both Virtex-4 and Virtex-5 FPGAs include several configuration registers to access the configuration memory, including Frame Address Register (FAR), Frame Data Register Input (FDRI), and Frame Data Register Output (FDRO) which facilitate writing/reading data to/from a specific frame of configuration memory. There are other registers for functions such as status (STAT), cyclic redundancy check (CRC), command (CMD), etc. To access the configuration 100 memory, a combination of these registers must be used. These registers are normally accessible from both Boundary Scan and SelectMAP configuration interfaces but are also accessible via the ICAP located inside fabric. The ICAP works like the external SelectMAP configuration interface except that it has separate 32-bit write and read buses, as opposed to a bidirectional 32-bit bus. The maximum ICAP clock frequency is 100 MHz. 6.3 Embedded BIST Architecture The soft core embedded processor based BIST approach for FPGAs incorporates additional logic in the FPGA fabric along with the BIST logic to perform tasks typically assigned to an external BIST controller or computer. The embedded BIST approach offers several advantages over the external BIST approach. First, the 32-bit ICAP configuration interface is used for reconfiguration, eliminating the test time penalties associated with the lower speed serial Boundary Scan interface. Secondly, the total number of external download configurations is reduced to one per test session. In addition, all control of the BIST configurations and sequences can be implemented in the embedded controller. Diagnostic procedures can also be performed by the embedded BIST controller, further reducing the complexity of the external BIST controller in fault tolerant applications and providing considerable speed-up when compared to Boundary Scan based read back and diagnosis. The implementation of the embedded processor BIST approach in Virtex-5 FPGAs incorporates elements of both hardware and software design to achieve an architecture that is general enough for any Virtex-5 device as well as for any BIST approach for the resources in Virtex-5 FPGAs. The design is applicable to any Virtex-5 device with only minor modifications to system software and no modifications to system hardware. Furthermore, the design can easily be extended to Virtex-4 devices for similar improvements in test time. To minimize the number 101 of external downloads per test session, the embedded processor based BIST hardware must fit in one half of the smallest supported device. The embedded processor core must also be capable of storing configuration data for all of the subsequent test phases for each test session in memory in the FPGA fabric using Block RAMs or distributed RAMs. Finally, the core must support interfaces for connecting with the ICAP and BIST circuitry. There are a variety of designs which can be used for the embedded processor ranging from fast, full-custom register transfer level (RTL) designs, to highly configurable general purpose soft core microprocessors. While RTL level designs are useful for simple repetitive tasks, this approach is not very efficient for supporting multiple device architectures of a variety of BIST approaches. Such an approach requires a different hardware configuration for each device and for each BIST session, which requires a significant amount of hardware development time when compared with other, more general purpose software based approaches. Another option is to use a general purpose processor in the form of a ?soft? intellectual property (IP) core. One of the simplest and most efficient general purpose architectures available for Xilinx FPGAs is the PicoBlaze 8-bit microcontroller [13]. The PicoBlaze occupies one block RAM and approximately 50 slices in Virtex-5 FPGAs ? much less than half of an array in the smallest Virtex-5 device. The PicoBlaze is supported by a simple assembler and software simulator. However, the program memory in the PicoBlaze is limited to 1024 stored instructions and scratch-pad memory is limited to 64 Bytes. The 8-bit architecture also creates timing penalties when interfacing with the 32-bit ICAP port because each ICAP write requires a minimum of four PicoBlaze instructions of two clock cycles each. To improve timing for ICAP operations, a 32-bit architecture is best for embedded BIST applications in Virtex-4 and Virtex-5 devices. One IP core that meets the requirement for a BIST controller is the MicroBlaze soft core processor, which is a highly configurable 32-bit general 102 purpose RISC microprocessor for Xilinx FPGAs [14]. The MicroBlaze also includes an optional, pre-engineered, interrupt driven ICAP hardware interface. The MicroBlaze can be configured with up to 64 kB of combined program and initializable data memory in Virtex-5 FPGAs that is implemented in the FPGA fabric in Block RAMs. The processor can be modified by the addition of custom peripherals on the processor local bus (PLB). These features led to selection of MicroBlaze as the embedded processor in our implementation. The basic architecture for the embedded processor BIST approach is illustrated in Figure 6.2 for CLB BIST where half of the FPGA array is used for processor and additional hardware resources and the other half of the array contains the CLB BIST configuration. Custom memory- mapped registers are included in the MicroBlaze VHDL model for interfacing with the BIST circuitry. The processor interfaces directly with the ICAP for reconfiguration of the BIST array and read back of BIST results. To test all CLBs in the FPGA, a second configuration is generated with locations of BIST logic and embedded processor swapped, as shown in Figure 6.2b. For some resources, such as I/O tiles or CRC modules, it is possible to test all of the resources simultaneously by placing the MicroBlaze around the BIST circuitry. One memory mapped write-only (WO) register, shown in Table 6.1, is included for control of the BIST circuitry and sequence. The outputs of the register are connected directly to inputs to the BIST logic, but not all of the register bits are utilized in any one BIST session. One read-only (RO) register, also shown in Table 6.1, is included at the same memory-mapped address as the output register. The inputs to this register are connected directly to outputs of the BIST logic. Each register is general enough to be applicable to all BIST configurations that have been developed for Virtex-5. 103 Because many BIST configurations must be executed for a different minimum number of clock cycles to achieve the intended fault coverage, there is the need for a hardware timer for BIST execution. Therefore, a 16-bit down counter is included in addition to the RO and WO BIST control registers. The counter is initialized by writing to the lower 16-bits of the BIST control register. The counter automatically counts down to zero, setting the cnt_eq bit when zero. The cnt_eq bit is used to enable the BIST logic and can be polled in software to determine when the BIST is complete. The counter clock and BIST clock can share the MicroBlaze clock or can be clocked independently at a higher clock frequency by connecting the BIST clock input and 16-bit counter clock to an independent BIST clock source. Figure 6.2: Embedded soft core processor based BIST architecture Table 6.1: BIST control registers Write-only register Read-only register 31:24 23:21 20 19 18 17 16 15:0 31:3 2 1 0 control reserved done reset start tdi clk_en cnt_init reserved (read as 0) cnt_eq BIST_done tdo (a) session #1 (b) session #2 BIST Area MicroBlaze Soft core Processor WO Register RO Register ICAP RO Register WO Register ICAP MicroBlaze Soft core Processor BIST Area 104 6.4 Software Development One important feature of this BIST approach stems directly from the generality of the embedded processor. Namely, that only the software changes from one BIST session to the next; the hardware remains unchanged for any and all BIST sessions. The software can be efficiently constructed in a manner that exploits the regularity of BIST configurations, and only the code for a particular BIST session need be compiled and programmed in the MicroBlaze program memory since a new download is performed at the start of each test session. Each BIST for a specific resource is composed of a set of phases, with each phase corresponding to a reconfiguration of the resources under test. Each phase comprises writes of entire set of frames of data to configuration columns that control the resources under test. Therefore, only certain portions of the partial reconfiguration files must be stored because the array-half, row, and column locations of the resources under test can be determined algorithmically based on the particular device in which BIST is implemented. The algorithm for the embedded BIST reconfiguration process is shown in Figure 6.3. The algorithm for frame address generation using multi-frame write operations, given configuration row and destination minor address for frame data previously written to FDRI, is also shown in Figure 6.3. No modification to the MicroBlaze hardware is required for support of other BIST approaches such as DSP and Block RAM BIST [5][6]. However, the control bits [31:24] of the WO BIST register are used during these test sessions to control the TPG mode. The outputs of these register bits are connected directly to the mode control inputs of the TPG when the MicroBlaze hardware and BIST hardware are merged. Block RAM and DSP embedded BIST architectures are otherwise arranged identically to the CLB BIST architecture shown in Figure 6.2 with the BIST circuitry occupying one-half of the device. 105 Reconfiguration files are generated in a manner that allows full or partial reconfiguration from an external memory without the need for an ?intelligent? controller. While ideal for systems containing only non-volatile memory and an FPGA, the partial reconfiguration files are too large to be directly stored in the program or data memory of an embedded processor. For example, the total size of the 5 partial reconfiguration files for CLB BIST in half of an array of a small Virtex-5 device (LX30T) is 41,360 Bytes ? exceeding the maximum 32 kB of data memory that can be allocated for MicroBlaze. Partial reconfiguration files are also device dependent since the size of the reconfiguration file is proportional to the device size. Hence, compression of partial reconfiguration files is required for the embedded processor. Overall BIST algorithm Addressing algorithm w/ multi-frame write for all test phases do for all configuration columns do for all configuration rows in BIST half do if column is block under test then for all frames in reconfiguration structure do for all minor addresses in compressed config do construct configuration frame multi-frame write to row, column, & minor muti-frame write to all RUTs in half & row end for end for end if end for end for execute BIST phase get BIST results end for set done bit in WO control register Figure 6.3: Embedded processor BIST algorithms Our compression scheme exploits four features of Virtex-5 partial reconfiguration files to compress the data for storage in the embedded processor program memory and eliminate device dependencies. First, each configuration file contains certain instructions, such as those for multi- frame writes to the configuration memory, which are repeated many times during download. Since, in the embedded processor BIST approach, the download is executed under the control of the embedded processor, ICAP instructions can be stored once and regenerated when needed. Second, multi-frame writes can only occur in one configuration row in Virtex-5 devices. For BIST configurations, which create identical configurations in BUTs and ORAs in every 106 configuration row, the overhead of multi-frame write instructions can be eliminated by storing frame data only once for one configuration row; the structure of the partial reconfiguration file can be regenerated by repetitively writing the frame data and frame addresses to the ICAP inside of a software loop for all configuration rows. Third, because one frame of configuration data spans 20 rows of identical resources under test, 2 to 4 words of frame data are repeated 10 to 20 times (in a repeating sequence) in each 41-word frame for BIST. Therefore, only 2 to 4 words of configuration data need to be stored for each frame in the partial reconfiguration file. The frame can be reconstructed in its entirety from the smallest repeating set of 32-bit words. Finally, the partial reconfiguration file includes the addresses of every frame to which frame data must be written for each configuration row. Again, due to the regularity of the BIST structure, only the minor addresses in the first BUT column for each configuration frame need to be stored. The remaining addresses can be regenerated algorithmically (Figure 6.3) given the locations of resources under test in the FPGA fabric. We constructed a program to automatically extract only the essential data from every partial reconfiguration files for any BIST session using the compression methods described above. The program generates a C header file with a data structure containing only essential data from the compressed partial reconfiguration file. The data structure declaration is shown in Figure 6.4, where the constant NRECONFIG is the number of test phases for the BIST session. When the compression program was used to compress the 5 partial reconfiguration files for a CLB BIST session in a Virtex-5 LX30T, the total size of the files reduced from 41,360 Bytes to 820 Bytes. Table 6.2 shows the size of the original compressed partial reconfiguration files and the size of the essential data in compressed form for different BIST sessions for Virtex- 5. The original file size given in the table is for an LX30T and the size of the file will increase in 107 proportion to the number of configuration rows in a given device. However, the size of the essential data in compressed format is independent of the device size. Figure 6.5 illustrates these device dependencies of reconfiguration file sizes for the smallest and largest devices in each Virtex-5 family of devices (LXT, SXT, FXT, and TXT). struct framedata { unsigned int numword; //# of words unsigned int word[MAXWORD]; //config data unsigned int numminor; //# of addresses unsigned int minor[MAXMINOR];//minor addr }; struct partialconfig { unsigned int numframe; //# frames struct framedata frame[MAXFRAME];//frames } config[NRECONFIG] = { //compressed frame data placed here by program }; Figure 6.4: Compressed BIST partial reconfiguration structure in C Table 6.2: Compressed partial reconfiguration data size BIST Session Number of BIST Sessions Number of BIST Reconfigurations Original File Size (Bytes) Compressed Size (Bytes) CLB East 2 5 41,360 820 CLB West 2 5 41,360 820 LUT-RAM 2 4 10,944 1,232 I/O Logic 1 5 11,308 1,236 I/O SerDes 1 8 94,432 2,680 CRC 1 1 4,716 184 DSP 1 9 28,836 1152 Block RAM 2 5 285,040 4920 ECC RAM 2 2 19,384 1200 FIFO 2 3 29,076 1800 FIFO ECC 2 1 9,692 600 108 0 20 40 60 80 100 120 140 LX 30T LX 330 T SX 35T SX 240 T FX 30T FX 200 T TX 150 T TX 240 T To tal Si ze (k B) CRC CRC Compressed I/O Logic I/O Logic Compressed Figure 6.5: Original reconfiguration file sizes and compressed data structure sizes for one CRC BIST and a set of 5 I/O Logic BIST partial reconfigurations Read back and diagnosis of BIST phases is performed by software in the embedded BIST processor when fault diagnosis is desired for a given application. The basic idea is to read back every frame of configuration memory containing an ORA flip-flop. The ORA flip-flop contents are then stored in an array in the processor data memory. An ORA contains a logic 0 when a failure is detected, otherwise a logic 1. Since the locations of ORAs are known for every BIST session in any device, the frame addresses of ORA flip-flops can be generated algorithmically during read back. The diagnostic algorithm [1] for circular comparison is easily implemented in the embedded processor to identify faulty resources. When combined with the 32-bit parallel access to the configuration memory, read back and diagnosis via the embedded processor provides a substantial improvement in test time when compared to serial access via Boundary Scan. 109 6.5 Design Flow and Implementation Results The embedded BIST processor design flow, illustrated in Figure 6.6a, is more complex than the traditional BIST design flow due to the inclusion of the MicroBlaze processor and BIST session specific software. Generating the embedded processor based BIST configurations requires inputs from three sets of source files. First, the C source file for the specific BIST approach (e.g. CLB, DSP, block RAM, etc.) is compiled to an executable linkable file (ELF) format. The MicroBlaze hardware is modeled in VHDL and synthesized using the Xilinx ISE design flow. The placement of the MicroBlaze logic is constrained to one half of the device. The placed, unrouted design is then converted to Xilinx Design Language (XDL) format. The BIST logic is generated concurrently by the BIST generation program which produces an unrouted XDL description of the BIST circuitry. The BIST array is constrained to the other half of the FPGA. The BIST XDL description and the MicroBlaze XDL description are merged by concatenating the two XDL files and connecting primary inputs and outputs of the BIST logic to the WO and RO BIST control registers included in the MicroBlaze logic to form the complete unrouted embedded processor based BIST configuration in XDL format. Finally, the complete hardware portion of the design is converted to an NCD format and routed, from which the bitstream configuration file is generated using the Xilinx BitGen program. At this point, the compiled software in ELF format is translated into Block RAM initialization values in the bitstream download file using the Xilinx Data2Mem program. 110 (a) design and verification process (b) implementation in LX30T device Figure 6.6: Embedded processor BIST design implementation The embedded processor based BIST approach has been successfully implemented for BIST in Virtex-5 FPGAs. The unrouted embedded processor based BIST configuration for the CLBs implemented in the top of a Virtex-5 LX30T is shown in Figure 6.6b. Two such configurations are implemented to fully test the CLBs with the locations of the BUTs and ORAs swapped, and another two configurations are required to test the bottom half CLBs. For the purpose of embedded BIST, the MicroBlaze processor is configured with a hardware integer multiplier, five stage pipeline, and 64 kB of on-chip program and data memory (configured in Block RAMs). In Virtex-5 devices, the MicroBlaze with ICAP interface and BIST control registers occupies three DSPs, 16 block RAMs, and 400 CLBs. The percentage of utilized BIST Program BitGen.exe BIT file XDL file NCD file XDL.exe Verification on FPGA XDL file Merge XDL Files XDL file VHDL files FPGA Editor ELF file C files C Compiler Data2MEM.exe XST Synthesis Hardware DRC Fault Injection SDK Development Software Debug download 111 resources is less than 50% in the smallest Virtex-5 device (LX20T). Timing analysis indicates that the maximum operating frequency of the MicroBlaze processor when constrained to one- half of a device is greater than 100 MHz in all Virtex-5 devices. Therefore, all ICAP operations can be performed at the maximum frequency of 100 MHz. 6.6 Conclusions We have presented the first embedded soft core processor based FPGA BIST approach. The approach reduces the number of external configurations of the FPGA during any BIST session to a maximum of two (one for each half of the array); however, many resources can be tested in a single BIST session. The embedded processor performs reconfiguration of the resources under test at the maximum allowable clock frequency and data width. Read back of ORA contents can be performed when fault diagnosis is desired for fault-tolerant applications. The soft core processor approach was implemented in Virtex-5 FPGAs using the MicroBlaze processor. However, the overall approach is applicable to any FPGA with internal write and read access to the configuration memory. 6.7 Acknowledgements The contents of this chapter were published under the title ?Soft Core Embedded Processor Based Built-In Self-Test of FPGAs? in Proceedings of the 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, 2009, pp. 29-37. Prof. Charles Stroud is a co-author on the paper. A majority of the actual research and the writing of the published paper represents the efforts of the primary student author and not collaborators, and the research represents work performed while in the graduate program at Auburn University. 112 6.8 References [1] L-T Wang, C. Stroud, and N. Touba, System-on-Chip Test Architectures, San Francisco, CA: Morgan Kaufmann, 2007. [2] M. Abramovici and C. Stroud, ?BIST-Based Test and Diagnosis of FPGA Logic Blocks, IEEE Trans. on VLSI Systems, vol. 9, no. 1, pp. 159-172, 2001. [3] C. Stroud, S. Garimella and J. Sunwoo, ?On-Chip BIST-Based Diagnosis of Embedded Programmable Logic Cores in System-On-Chip Devices,? Proc. ISCA Int. Conf. on Computers and Their Applications, pp. 308-313, 2005. [4] J Sunwoo and C. Stroud, ?BIST of Configurable Cores in SoCs Using Embedded Processor Dynamic Reconfiguration,? Proc. Int. SoC Design Conf., pp. 174-177, 2005. [5] B. Garrison, et. al., ?Built-In Self-Test for Memory Resources in Virtex-4 FPGAs,? Proc. ISCA Int. Conf. on Computers and Their Applications, pp. 63-68, 2009. [6] M. Pulukuri and C. Stroud, ?Built-In Self-Test of Digital Signal Processors in Virtex-4 FPGAs,? Proc. IEEE Southeastern Symp. on System Theory, pp. 34-38, 2009. [7] B. Dutton and C. Stroud, ?Built-In Self-Test of Configurable Logic Blocks in Virtex-5 FPGAs,? Proc. IEEE Southeastern Symp. on System Theory, pp. 230-234, 2009. [8] B. Dutton and C. Stroud, ?Built-In Self-Test of Programmable Input/Output Tiles in Virtex-5 FPGAs,? Proc. IEEE Southeastern Symp. on System Theory, pp. 235-239, 2009. [9] R. Rajsuman, ?Testing a System-On-Chip with Embedded Microprocessor,? Proc. IEEE Int. Test Conf., pp. 499-508, 1999. [10] AT94K Series Field Programmable System Level Integrated Circuit, DS1138, 2005. [11] Virtex-4 FPGA Configuration User Guide, UG071 (v1.1), Xilinx, 2008. [12] Virtex-5 FPGA Configuration User Guide, UG191 (v2.7), Xilinx, 2008. [13] PicoBlaze 8-bit Embedded Microcontroller User Guide, UG129 (v1.1.2), Xilinx, 2008. [14] MicroBlaze Processor Reference Guide, UG081 (v.9.0), Xilinx, 2008. 113 Chapter Seven. Soft-Core Embedded Processor-Based Built-In Self-Test of FPGAs Case Study This chapter presents the results of a case study which investigates the use of an embedded soft-core processor to perform Built-In Self-Test (BIST) of the logic resources in Xilinx Virtex-5 Field Programmable Gate Arrays (FPGAs). We show that the approach reduces the complexity of an external BIST controller and the number of external reconfigurations, making it particularly appealing for in-system testing of high-reliability and fault-tolerant systems with FPGAs. However, the overall test time is not improved due to an increase in the size of the required configuration files as a consequence of the inclusion of the soft-core embedded processor logic, whose relative irregularity results in less effective compression of configuration data files. 7.1 Introduction This chapter presents the results of the first implementation of Built-In Self-Test (BIST) for Field Programmable Gate Arrays (FPGAs) using a soft-core embedded processor synthesized into the configurable fabric of the FPGA under test. The approach, as originally proposed in [1], reduces the number of configuration files required for BIST by exploiting the regularity of BIST architectural structures to significantly compress and store partial configuration data in the embedded processor?s program memory. The embedded processor controls and executes the BIST sequence, including retrieval and analysis (fault diagnosis) of BIST results, and performs partial reconfiguration of the FPGA for subsequent BIST test phases [1]. This embedded processor-based BIST approach is possible for two reasons: first, the growing size and 114 complexity of FPGAs facilitates the inclusion of complex circuitry that only occupies a small percentage of the total configurable resources, leaving adequate area for BIST logic and routing; and, secondly, the ability to access the configuration memory from inside the FPGA fabric has made possible internal reconfiguration and read back of FPGA logic and routing resources. The approach was successfully implemented in Xilinx Virtex-5 [2] but is applicable to any FPGA with internal configuration memory access. The remainder of the chapter is organized as follows: Section 7.2 presents an overview of BIST for FPGAs and the previously proposed soft-core processor-based BIST technique. Section 7.3 presents the results of our implementation of soft-core embedded processor-based BIST in Virtex-5 FPGAs, including test time analysis and comparisons with other BIST approaches for FPGAs. Section 7.4 discussed ways in which the proposed approach might be improved, with Section 7.5 covering other potential applications of the approach. The chapter is summarized in Section 7.6. 7.2 Background BIST for FPGAs exploits the re-programmability of FPGAs to create test circuitry in the FPGA fabric during off-line testing [3]. The only overhead is the external memory required to store the BIST configurations along with the time required to download and execute the numerous BIST configurations. No area overhead or performance penalties are incurred because the BIST logic is reconfigured with the intended system function after testing is complete. The BIST configurations are applicable to all levels of testing because they are independent of the end-user system function and require no specialized external test fixture or equipment. Over the past 15 years, a number of BIST approaches have been developed for the configurable logic and routing resources in FPGAs. Due to the programmable nature of FPGAs, all BIST approaches for FPGAs require multiple configurations of the resources under test in all of their modes of 115 operation in order to obtain high fault coverage. Some of these BIST approaches are summarized in Table 7.1, where the number of BIST configurations is given for each type of resource including configurable logic blocks (CLBs), input/output (I/O) tiles, random access memories (RAMs), digital signal processors (DSPs), and programmable routing resources. Table 7.1: Test configurations developed for various FPGAs FPGA CLBs Routing I/O RAMs DSPs References ORCA 2C 9 27 - 0 0 [5][6] ORCA 2CA 14 41 - 0 0 [5][6] Delta 39K 20 419 - 11 0 [7] 4000/Spartan 12 128 - 0 0 [8] 4000XL/XLA 12 206 - 0 0 [8] AT40K/AT94K 4 56 27 3 0 [9] - [11] Virtex/Spartan-2 12 283 7 5 0 [11][12] Virtex-4 10+5 84 14 15 5 [13] - [17] Virtex-5 6+5 ? 15 16 11 [17][18] Most research and development in BIST for FPGAs has focused on reducing the number of test configurations, reducing the size of test configuration files, and decreasing BIST execution time [2]-[8]. But the ever increasing complexity and level of integration in FPGAs has, with few exceptions, resulted in longer test times, more downloads, and more memory required for storing BIST configurations for each new generation of FPGA. However, the increasing size and complexity of FPGAs has also created opportunities for innovation in FPGA testing. In [1], we proposed an embedded processor-based approach which exploits some of these features of current FPGAs in an attempt to improve test time and reduce the complexity of BIST. The soft-core embedded processor-based BIST approach for FPGAs incorporates additional logic in the FPGA fabric along with the BIST logic to perform tasks typically assigned to an external controller or computer. The new approach offers several advantages over the traditional external BIST approach. First, the 32-bit internal configuration access port (ICAP) is 116 used for reconfiguration of the resources under test, eliminating the test time penalties associated with the lower speed, serial Boundary Scan interface. Secondly, the total number of configurations that are downloaded via the external configuration interface is reduced to one per test session. In addition, all control of the BIST configurations and test procedures can be implemented in the embedded processor. Finally, fault diagnosis procedures can also be performed by the embedded processor, further reducing the complexity of the external BIST controller in fault-tolerant applications and providing considerable speed-up when compared to Boundary Scan based readback and diagnosis. The basic architecture of the embedded BIST approach for CLBs is illustrated in Figure 7.1 [1]. In this particular BIST approach, one-half of the FPGA array is configured with the BIST circuitry, including multiple Test Pattern Generators (TPGs), comparison-based Output Response Analyzers (ORAs), and the Blocks Under Test (BUTs). The TPGs are constructed from CLBs or other logic resources such as DSPs, RAMs, etc. The TPGs provide identical test patterns to alternating rows or columns of identically configured BUTs whose outputs are monitored by two ORAs and compared with the outputs of two other BUTs in a circular comparison arrangement, as shown in Figure 7.1. The ORAs are constructed from CLBs such that only half of the CLBs can be BUTs in a given test session, and the positions of the BUTs and ORAs must be swapped during a subsequent test session in order to test all of the CLBs in half of the array. The second half of the FPGA array is reserved for a MicroBlaze soft-core processor and any additional hardware resources associated with the processor [19]. Custom memory-mapped registers are included in the MicroBlaze VHDL model for interfacing with the BIST circuitry. One memory mapped write-only (WO) register is included for control of the BIST circuitry. The 117 outputs of the register are connected directly to all inputs to the BIST logic. One read-only (RO) register is included at the same memory-mapped address as the output register. The inputs to this register are connected directly to the outputs of the BIST logic. Each register is general enough to be applicable to all BIST configurations that we have developed for Virtex-5. Finally, the MicroBlaze interfaces directly with the FPGAs ICAP for partial reconfiguration of the BIST array and for read back of output responses. To test all of the resources in the FPGA, a second configuration is generated with the location of the BIST logic and embedded processor swapped. For BIST of some resources, such as input/output (I/O) tiles and cyclic redundancy check (CRC) circuits, it is possible to test all of the target resources simultaneously by placing the MicroBlaze around the BIST circuitry. Figure 7.1: Simplified soft-core processor-based BIST architecture (b) BIST session #2 BIST Area MicroBlaze Soft-Core Processor WO Register RO Register ICAP (a) BIST session #1 RO Register WO Register ICAP MicroBlaze Soft-Core Processor BIST Area 118 7.3 Results of Implementation in Virtex-5 The embedded processor-based BIST approach was implemented for BIST of Virtex-5 FPGAs using the MicroBlaze soft processor [19]. The unrouted embedded processor-based BIST configuration for the top CLBs implemented in the Virtex-5 LX30T is shown in Figure 7.2. Note that two such configurations are implemented to fully test the top CLBs with the locations of the BUTs and ORAs swapped, and another two configurations are required to test the bottom half CLBs. For the purpose of embedded BIST, the MicroBlaze processor is configured with a hardware integer multiplier, five stage pipeline, and 64 KB of on-chip program and data memory (configured in Block RAMs). In Virtex-5 devices, the MicroBlaze with ICAP interface and BIST control registers occupies three DSPs, 16 block RAMs, and 400 CLBs. The percentage of utilized resources is less than 50% in the smallest Virtex-5 device such that the approach works for all FPGAs in the Virtex-5 family. Timing analysis indicates that the maximum operating frequency of the MicroBlaze processor when constrained to one-half of a device is greater than 100 MHz in all Virtex-5 devices. Therefore, all ICAP operations can be performed at the maximum ICAP configuration clock frequency of 100 MHz. For accurate measurements of test time and to obtain experimental results with the MicroBlaze processor, an additional 32-bit hardware timer/counter was included in the MicroBlaze VHDL model. By starting the timer/counter at the beginning of a test phase, and stopping it at the end of the test phase, the exact number of clock cycles for reconfiguration of the resources under test, test execution, and ORA read back can be determined. To extract the value in the timer/counter at the end of each test, the MicroBlaze performs a read of the timer/counter value and reports this number via a UART interface to a connected PC, where it is displayed and logged in a terminal program. 119 Figure 7.2: Unrouted embedded processor-based BIST configuration for top configurable logic blocks (CLB) in Virtex-5 LX30T viewed in FPGA Editor Figure 7.3 shows the total test time for one session of CLB testing in several Virtex-5 devices for external configuration with full compressed configuration and partial compressed reconfiguration bitstreams downloaded and controlled via the 50 MHz Boundary Scan configuration interface and for the MicroBlaze embedded processor approach. These test times 120 take into account all of the configurations required to achieve 100% fault coverage in the CLB in SliceL mode, as reported in [7], which used traditional external reconfiguration techniques. However, these times double to achieve 100% fault coverage in every CLB, because a second set of identically sized configurations are required with the locations of the BUTs and ORAs swapped. The optimized external reconfiguration provides the fastest overall test time when compared with the other two approaches since the entire array is tested concurrently. This approach is about three times as fast as the embedded processor approach on average, but is device dependent, as can be seen in Figure 7.3. However, the embedded approach is significantly faster than external configuration with full or compressed bitstream download files. 0 0.5 1 1.5 2 2.5 3 Te st Ti me (s ec on ds ) Full Compressed Partial Compressed Embedded Top Embedded Top & Bottom Figure 7.3: CLB BIST test time for external configuration (full compressed and partial compressed bitstreams) and embedded processor test time By studying the configuration file sizes for the two BIST approaches, the cause for the increase in test time for the embedded processor approach becomes clear. Consider Figure 7.4, 121 which shows the contributions to test time for one session of CLB BIST with the embedded processor approach. The contribution from the initial compressed full configuration download (using the 50 MHz external Boundary Scan configuration interface) is shown on bottom and the contribution from the five subsequent partial reconfigurations by the embedded processor (using the 100 MHz 32-bit ICAP) is shown on top. The overall test time is dominated by the initial download time. This is due, in part, to the slower serial Boundary Scan configuration interface; however, the main contributor to the overall test time is an increase in the size of the initial configuration file (relative to the traditional BIST approach). The cause of the size increase is due to the inclusion of the MicroBlaze configuration data in the configuration file, the irregularity of which reduces the effectiveness of the configuration file compression (see Figure 7.2). We observed that the inclusion of the MicroBlaze logic increased the size of the first compressed configuration file size by 2100 kB (which is approximately constant for all devices). The additional 2100 kB of configuration data is larger than the next five partial reconfiguration files combined, and, assuming Boundary Scan configuration, increases the time for initial configuration by 336 ms. While it is possible to improve the timing for internal reconfiguration of the resources under test, there is no way to improve timing for the first compressed configuration download. The potential for savings in test time does exists for systems which require fault diagnosis, and, therefore, read back of ORA contents at the end of each test phase. In this case, the embedded approach provides a speed-up of 5.4 times during read back of ORA results versus read back via Boundary Scan, as can be seen in Figure 7.5. 122 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Te st Ti me (s ec on ds ) 5 Partial Reconfigurations 1 Compressed Config Figure 7.4: Contribution to embedded processor-based CLB BIST test time by initial external configuration and by five internal partial reconfigurations 0 10 20 30 40 50 60 70 OR A Re ad ba ck T im e ( ms ) Boundary Scan Embedded Processor Figure 7.5: Comparison of CLB BIST ORA read back times with embedded processor-based approach and external Boundary Scan interface 123 7.4 Future Improvements The overall reconfiguration times for the embedded processor-based BIST approach can be reduced by modeling a custom processor for reconfiguration and test control. When a full custom embedded fault injection approach was compared to the MicroBlaze based fault injection presented in this work, a speed-up factor of almost 12 was observed for the FSM hardware-only approach versus the general-purpose processor-based approach. However, a hardware only implementation requires a different hardware configuration for every device and BIST session, as reported in [20]. Ultimately, with custom hardware, the reconfiguration time could approach the minimum achievable test time for the 100 MHz, 32-bit SelectMAP or ICAP configuration interface. This best case timing occurs when one word is read or written on each active edge of the clock, as is the case for configuration from a dedicated memory. The best case timing for CLB BIST east or west configurations is shown in Figure 7.6 (doubling the time shown in the figure yields the total test time for all CLBs in SliceL mode). However, these times should not be directly compared to those in Figure 7.4 for the embedded processor-based approach, because Figure 7.4 assumes initial configuration from the Boundary Scan interface. Another possibility is to clock the MicroBlaze at a frequency greater than 100 MHz, using a divided clock equal to 100 MHz for the ICAP and portions of the ICAP interface logic. This will, however, require the development of a custom ICAP interface. Based on timing analysis, clock frequencies around 150 MHz are attainable in the MicroBlaze processor when constrained to one-half of the FPGA. Therefore, a speed-up of approximately 1.5 times could be achieved using a multiple clock approach. 124 0 2 4 6 8 10 12 14 16 LX 20T LX 30T LX 50T LX 85T LX 110 T SX3 5T SX5 0T SX9 5T Tim e ( ms ) Readback Execution Configuration Figure 7.6: 32-bit, 100 MHz interface test time for full chip CLB west or east with one full compressed configuration and five partial reconfigurations 7.5 Other Applications An approach similar to the embedded processor-based BIST could be applied to an external processor or microcontroller connected to the SelectMAP 32-bit configuration interface. Conceptually, the approach is similar to the approach for Atmel SoCs [3][4], except that the processor and FPGA are integrated at the board level, rather than at the chip level. The only overhead required above that for the traditional BIST approach is processor downtime for the test, additional circuit board interconnections, additional processor I/O, and a portion of the processor program memory (16,558 Bytes for one session of CLB BIST) for storing the embedded BIST software and reconfiguration data. The impact to the system could be minimized by performing tests of the FPGA as a low priority, background task, at the expense of increased test time. The approach could provide the 5.4 times speed-up of the embedded processor during reconfiguration and read back using the 32-bit, 100 MHz SelectMAP configuration interface without the penalty incurred by testing the FPGA in two sessions, one for 125 each half. The size of the initial download is also reduced when compared to the embedded processor-based approach due to the highly optimized structure of the BIST circuitry. Furthermore, the memory required to store the BIST configurations can be reduced at the expense of some additional program memory in the hard processor. The embedded processor-based BIST approach for Virtex-5 FPGAs is directly applicable to Virtex-4 FPGAs [21] with some modification to the BIST specific software (including device specific subroutines such as algorithmic resource under test frame address generation) and stored configuration data. Differences between the Virtex-4 and Virtex-5 ICAP interfaces, such as byte-swapping on the Virtex-5 ICAP, are accounted for during synthesis of the MicroBlaze and associated ICAP interface circuitry based on the targeted device family. The frame address register is also arranged differently in Virtex-4 and Virtex-5, but this can be accounted for in software [22][23]. The overall test times for Virtex-4 relative to external reconfiguration closely match those results obtained in Virtex-5. 7.6 Conclusions We have presented the results of a case study which implements the first soft-core embedded processor-based BIST approach. The approach is applicable to any FPGA with write/read access to the configuration memory from within the FPGA fabric and with sufficient configurable resources to implement both the soft-core processor and the BIST circuitry. The number of external configurations of the FPGA during any BIST session is reduced to a maximum of two (one for each half of the array) and internal reconfiguration of the resources under test are performed at the maximum allowable clock frequency and data width. Read back of ORA contents and diagnosis of faulty resources under test can be performed by the embedded soft-core processor when fault diagnosis is desired, for fault-tolerant applications for example, 126 providing a speed-up of 5.4 versus readback via the Boundary Scan interface. The approach can significantly decrease the overall test time in systems with a relatively slow external configuration interface, as was the case for the previous implementation of embedded processor- based BIST using a dedicated hard-core processor [4]. The soft-core processor approach was implemented in Virtex-5 FPGAs using the MicroBlaze processor for BIST reconfiguration, control of execution, fault injection, and fault diagnosis. Reconfiguration of the resources under test is achieved via the ICAP port in the FPGA fabric. When implemented in Virtex-5, the approach requires more testing time when compared with optimized external reconfiguration using compressed partial reconfiguration bitstreams. This is primarily due to the fact that the overall BIST approach has been architected for optimum configuration file compression. This includes orienting the BIST architecture with the configuration memory for maximizing the effectiveness of compressed download files with multi-frame write features, partial reconfiguration of the resources under test by maintaining constant placement and routing between test phases, and a single pass/fail indication to avoid partial configuration memory read back for BIST results. This is a testament to the advanced state of FPGA BIST techniques as well as the features and capabilities offered by FPGA manufacturers to decrease configuration times. However, the soft-core processor approach is significantly faster than configuration with full or compressed configuration bitstreams alone. Only two downloads are required for each BIST session when the embedded processor-based approach is used, compared to six configurations for CLB east/west tests and nine for SerDes tests for example. BIST control, execution and fault diagnosis implemented in the embedded processor eliminate the need for complex external test equipment for manufacturing testing and intelligent external BIST 127 controllers for in-system testing and diagnosis in fault-tolerant applications. The architecture is applicable to any BIST for Virtex-4 and Virtex-5 FPGAs without modification of the embedded processor hardware; only the MicroBlaze program memory contents need to be changed. 7.7 Acknowledgements The contents of this chapter are accepted for publication in Proc. IEEE Southeastern Symposium on System Theory, 2010. Prof. Charles Stroud is a co-author on the paper. A majority of the actual research and the writing of the published paper represents the efforts of the primary student author and not collaborators, and the research represents work performed while in the graduate program at Auburn University. 7.8 References [1] B. Dutton and C. Stroud, ?Soft-core Embedded Processor Based Built-In Self-Test of FPGAs,? Proc. IEEE Int. Symp. On Defect and Fault Tolerence in VLSI Sys., pp. 29-37, 2009. [2] Virtex-5 FPGA User Guide, UG190(v4.2), Xilinx, 2008. [3] L-T Wang, C. Stroud, and N. Touba, System-on-Chip Test Architectures, San Francisco, CA: Morgan Kaufmann, 2007. [4] S. Toutounchi and A. Lai, ?FPGA Test Coverage,? Proc. IEEE Int. Test Conf., pp. 1248- 1257, 2003. [5] M. Abramovici and C. Stroud, ?BIST-Based Test and Diagnosis of FPGA Logic Blocks, IEEE Trans. on VLSI Systems, vol. 9, no. 1, pp. 159-172, 2001. [6] C. Stroud, J. Nall, M. Lashinsky and M. Abramovici, ?BIST-Based Diagnosis of FPGA Interconnect,? Proc. IEEE Int. Test Conf., pp. 618-627, 2002. [7] C. Stroud and J. Bailey, ?Bridging Fault Extraction from Physical Design Data for Manufacturing Test Development, Proc. IEEE Int. Test Conf., pp. 760-769, 2000. [8] C. Stroud, K. Leach and T. Slaughter, ?BIST for Xilinx 4000 and Spartan Series FPGAs: A Case Study, Proc. IEEE Int. Test Conf., pp. 1258-1267, 2003. 128 [9] C. Stroud, S. Garimella and J. Sunwoo, ?On-Chip BIST-Based Diagnosis of Embedded Programmable Logic Cores in System-On-Chip Devices,? Proc. ISCA Int. Conf. on Computers and Their Applications, pp. 308-313, 2005. [10] J Sunwoo and C. Stroud, ?Built-In Self-Test of Configurable Cores in SoCs Using Embedded Processor Dynamic Reconfiguration,? Proc. Int. SoC Design Conf., pp. 174- 177, 2005. [11] S. Vemula and C. Stroud, Built-In Self-Test for Programmable I/O Buffers in FPGAs and SOCs, Proc. IEEE Southeastern Symp. on System Theory, pp. 534-538, 2006. [12] S. Dhingra, S. Garimella, A. Newalkar and C. Stroud, ?Built-In Self-Test for Virtex and Spartan II FPGAs Using Partial Reconfiguration,? Proc. IEEE North Atlantic Test Workshop, pp. 7-14, 2005. [13] D. Milton, S. Dhingra and C. Stroud, ?Embedded Processor Based Built-In Self-Test and Diagnosis of Logic and Memory Resources in FPGAs,? Proc. Int. Conf. on Embedded Systems and Applications, pp 87-93, 2006. [14] B. Garrison, D. Milton, and C. Stroud, ?Built-In Self-Test for Memory Resources in Virtex-4 FPGAs,? Proc. ISCA Int. Conf. on Computers and Their Apps., pp. 63-68, 2009. [15] M. Pulukuri and C. Stroud, ?Built-In Self-Test of Digital Signal Processors in Virtex-4 FPGAs,? Proc. IEEE Southeastern Symp. on System Theory, pp. 34-38, 2009. [16] J. Yao, B. Dixon, C. Stroud and V. Nelson, ?Built-In Self-Test of Programmable Interconnect in Virtex-4 FPGAs,? Proc. IEEE Southeastern Symp, on System Theory, pp. 29-33, 2009. [17] B. Dutton and C. Stroud, ?Built-In Self-Test of Configurable Logic Blocks in Virtex-5 FPGAs,? Proc. IEEE Southeastern Symp. on System Theory, pp. 230-234, 2009. [18] B. Dutton and C. Stroud, ?Built-In Self-Test of Programmable Input/Output Tiles in Virtex-5 FPGAs,? Proc. IEEE Southeastern Symp. on System Theory, pp. 235-239, 2009. [19] MicroBlaze Processor Reference Guide, UG081(v.9.0), Xilinx, 2008. [20] B. Dutton and C. Stroud, ?Embedded Processor Based Fault Injection and SEU Emulation for FPGAs,? Proc. Int. Conf. on Emb. Systems and Apps., pp. 183-189, 2009. [21] Virtex-4 FPGA User Guide, UG070 (v2.5), Xilinx, 2008. [22] Virtex-4 FPGA Configuration User Guide, UG071(v1.1), Xilinx, 2008. [23] Virtex-5 FPGA Configuration User Guide, UG191(v2.7), Xilinx, 2008. [24] R. Rajsuman, ?Testing a System-On-Chip with Embedded Microprocessor,? Proc. IEEE Int. Test Conf., pp. 499-508, 1999. 129 Chapter Eight. On-line Single Event Upset Detection and Correction in Field Programmable Gate Array Configuration Memories Larger field programmable gate array (FPGA) configuration memories and shrinking design rules have raised concerns about single event upsets (SEUs), especially for high- reliability, high-availability systems that use FPGAs. We present a design for the on-line detection and correction of SEUs in the configuration memory of Xilinx Virtex-4 and Virtex-5 FPGAs. The design corrects all single-bit errors and detects all double-bit errors in the configuration memory at maximum speed and with minimal overhead and power dissipation. A method for SEU emulation in the configuration memory of FPGAs is presented which enables the experimental verification of the approach. The results of SEU emulation in Xilinx FPGAs are discussed. 8.1 Introduction The increased use of field programmable gate arrays (FPGAs) for implementing digital logic applications over the past two decades has been accompanied by increased concern about radiation effects, and, in particular, the effects of single event upsets (SEUs). In addition to memory elements such as flip-flops, look-up tables (LUTs), and random access memory (RAM) cores, FPGAs contain a large static random access memory (SRAM), referred to as the configuration memory, which establishes the overall system application performed by the FPGA. An SEU induced bit-flip in the SRAM configuration memory can therefore alter the functionality of the FPGA. This, coupled with the large size of the configuration memory, makes SEUs of significantly more concern in FPGAs than in traditional application specific integrated circuits 130 (ASICs). In Xilinx Virtex-5 FPGAs, for example, the configuration memory alone represents greater than 99% of all memory elements in a given device, as summarized in Table 8.1, where the LX30T represents one of the smallest FPGAs in the Virtex-5 family and the LX330T represents one of the largest [26][27]. Table 8.1: Memory resources in two Virtex-5 FPGAs Number of Memory Elements Memory Type LX30T % of Total LX330T % of Total Flip-Flops 19,200 0.2% 207,360 0.25% LUT RAM Bits 327,680 3.5% 3,502,080 4.22% Block RAM Bits 1,327,104 14.1% 11,943,936 14.41% Configuration Bits 9,362,432 99.8% 82,687,488 99.75% Total 9,381,632 100.00% 82,894,848 100.00% Finding an accurate measurement of the susceptibility of SRAM configuration memories to SEUs has been the focus of much research, including that in [15], [18], [24] and [30]. Accelerator testing conducted with Xilinx 4000 series FPGAs indicates the SEU frequency increased by a factor of 4.74 when design rules decreased from 600nm to 350nm with a corresponding reduction in power supply voltage from 5V to 3.3V [18]. On the other hand, 90nm Xilinx Virtex-4 FPGAs are reported to have SEU FIT (failures in 109 hours) rates of 246 per 106 bits of configuration memory, while 65nm Virtex-5 FPGAs have a lower SEU FIT rate of 151 per 106 bits (adjusted for sea-level in New York, NY) [6][15]. The FIT rate per Mb of configuration memory in Xilinx FPGAs has actually been decreasing since the Virtex-II series in the year 2000. This reduction in SEU FIT rate by a factor of about 3.5 from Virtex-II to Virtex- 5, despite drastic reductions in feature size and supply voltage, indicates that Xilinx is incorporating architecture dependent SEU hardening techniques in the design of the configuration memory. This trend can be seen in Figure 8.1, where the SEU rate for each Xilinx FPGA family is plotted along with the initial release year and minimum feature size. According 131 to [16], since 2002 Xilinx has designed the configuration memory to be more robust in an attempt to reduce soft failure rates even as the size and density of the FPGA grows. That this attempt has been successful is supported by the fact that the FIT rates reported for Xilinx FPGAs are low when compared to typical SRAMs [18]. A more robust SRAM design is possible because the SRAM configuration memory remains static a majority of the time, in contrast to typical SRAM memories which are designed to be as small and as fast as possible [5][18]. 0 50 100 150 200 250 300 350 400 450 500 Virt ex Virt ex- E Virt ex- II Virt ex- II P ro Spa rtan -3 Virt ex- 4 Spa rtan -3E Virt ex- 5 FI T/ Mb C on fig M em ory Figure 8.1: FIT rate (corrected for sea-level New York, NY) versus Xilinx device family, initial release year, and minimum feature size [6] where the center line represents the nominal value and the span of the line represents the upper and lower 95% confidence levels However, even the relatively low FIT rates of Xilinx FPGAs can become problematic when considering the design of high reliability, high availability systems or systems which operate at high altitude or in space. The largest commercially available Xilinx FPGAs currently have configuration memories with more than 80 Mb [27] and in the next generation of devices the largest available FPGAs will include configuration memories of over 160 Mb in size [28]. For the 80 Mb Virtex-5 device, the FIT rate per device is 10,960 failures in 1 billion hours, or ?98 ?99 ?00 ?01 ?02 ?03 ?04 ?05 ?06 year 250 180 150 130 90 90 90 65 nm 132 mean time between failure (MTBF) of (114,155 years/10,960 FIT) = 10.4 years at sea level. At the 95% confidence level, the FIT rate is between 100-183, or MTBF between 7.8-14.3 years. However, it should be noted that an SEU in the configuration memory does not always correspond to a failure of the system. It is estimated that only between 10% [21] and 40% [24] of the configuration bits used in any given design actually affect the design functionality. Therefore, for a more accurate estimate of the MTBF the sensitivity of the design based on analysis of ?care? versus ?don?t care? configuration bits should be taken into account. Nevertheless, for SRAM FPGAs to be adopted for critical avionics and space applications where little or no risk is acceptable, an effective SEU mitigation plan must be implemented. In addition, systems operating in high-radiation environments may require an SEU mitigation plan even if some risk is tolerable. As an example of the variance of SEU occurrence with altitude, consider that the neutron flux density increases by a factor of 383 at the typical commercial flight cruising altitude of 36,000 ft (relative to sea-level New York, NY) [9]. Techniques for hardening digital circuits against SEUs can be categorized as architecture dependent or architecture independent. An architecture dependent technique is one that requires a modification to the physical design of an integrated circuit; for example, high reliability systems can employ hardware redundancy in latches [3]. In FPGAs, however, architecture dependent SEU hardening techniques are only available if implemented by the FPGA manufacturer. Therefore, for a typical SRAM FPGA, any SEU hardening implemented by the user must be one that is architecture independent. One widely known architecture independent technique used in FPGAs is triple modular redundancy (TMR). The TMR approach triplicates all of the user logic and adds majority voters at the inputs to all flip-flops and on all primary outputs. By eliminating all single point failures, the design can be guaranteed to tolerate an SEU 133 in any of the three circuit copies. However, the overhead for a TMR approach can be prohibitive because it is greater than 200%. Therefore, to implement a TMR approach, the required size of the FPGA (in terms of resources) would necessarily be more than three times the size of the original, non-TMR design. TMR also consumes more power (approximately three times as much) and incurs a performance penalty. The implementation of TMR for designs in Xilinx FPGAs can be entirely automated using the Xilinx TMR Tool, which guarantees full SEU and single-event transient (SET) immunity [29]. However, without some additional form of configuration memory scrubbing, the accumulation of multiple SEUs over time can cause system failure even in designs with full TMR [23][29]. Another architecture independent method, configuration memory scrubbing, periodically refreshes the contents of the configuration memory without attempting to determine if an SEU has occurred. Power-cycling is essentially the simplest form of configuration memory scrubbing because the entire configuration memory can be refreshed each time the FPGA is power-cycled if the FPGA is set in master configuration mode [1]. A more intelligent approach is to externally read back words of configuration memory contents, comparing each word to a copy in a ?golden? configuration bitstream. This approach has the advantage of being able to detect any number of SEUs in the configuration memory (when compared to error correcting codes). Any mismatch between the ?golden? copy and the configuration memory contents should cause the erroneous configuration memory to be overwritten by the ?golden? configuration data. However, both approaches require a radiation hardened external configuration management unit (microprocessor or ASIC) and a radiation hardened ?golden? copy of the configuration data. The second approach also doubles the required amount of memory, because both the ?golden? 134 bitstream and a mask file, which is used to mask configuration bits which are subject to change during normal system operation, must be stored in the system [1]. Some FPGAs, including Virtex-4 and Virtex-5, incorporate a Hamming error correction code (ECC) in the configuration memory. The ECC, in conjunction with some additional user- accessible dedicated logic can be used to detect SEUs in the configuration memory. With additional user-defined circuitry in the FPGA core, erroneous configuration memory bits that result from SEUs can be not only detected, but also corrected [12]. It is this method that is the focus of this chapter where we present an efficient SEU correction circuit that works in combination with existing SEU detection mechanisms in Virtex-4 and Virtex-5 FPGAs to correct SEUs in the FPGA configuration memory. This circuit can be synthesized and incorporated with any user-defined digital application in any Virtex-4 or Virtex-5 FPGA for detection and correction of SEUs during normal on-line system operation. We begin with an overview of existing SEU detection mechanisms in Section 8.2 along with an overview of previous work in on-line SEU detection and correction in Virtex-4 FPGAs. The operation and architecture of the proposed SEU detection and correction circuit are presented in Sections 8.3 and 8.4, respectively. Experimental results and analysis from the actual implementation of the SEU detection and correction circuit in Virtex-4 and Virtex-5 FPGAs are presented in Sections 8.5 and 8.6 along with a comparison to prior work. The chapter concludes with a summary in Section 8.7. 8.2 Background Like any other RAM, the configuration memory of an FPGA is partitioned into words, also called frames, which represent the smallest addressable unit of the memory for write and read operations. Virtex-4 and Virtex-5 frames consist of 1,312 bits [25][26]. Each frame 135 includes a 12-bit field of eleven Hamming bits and an overall parity bit for the frame data. The eleven Hamming bits provide the potential for single error correction (SEC), and the overall parity bit enables double error detection (DED) over the frame data. The parity and Hamming bits are generated external to the FPGA by the configuration bitstream generation software and are subsequently downloaded with the application specific configuration data (an internal CRC check verifies the integrity of the downloaded data). However, system memory data subject to change during the operation of the FPGA, such as the contents of block RAMs and look-up tables (LUTs) used as distributed RAMs, are not covered by the parity and Hamming bits [4]. Virtex-4 and Virtex-5 provide a specialized core, called Frame ECC, for detection and identification of single and double-bit errors in the frame data [25][26]. For each frame read from the configuration memory, the Frame ECC module calculates the Hamming bits as well as the overall parity for the frame data, and compares these bits with the Hamming bits and parity for that frame stored in the configuration memory. Based on this comparison, the Frame ECC module produces indications for no error, single-bit error, and double-bit error conditions in addition to a syndrome indicating the location of single-bit errors. The error conditions for the Frame ECC core are summarized in Table 8.2. System memory contents?block RAMs and LUT RAMs, for example?are masked from the internal parity and Hamming calculation by the Frame ECC. Table 8.2: Frame ECC error codes [25][26] Error Type Condition (syndromevalid = 1) No bit error Hamming match, no parity error 1-bit correctable error (SEC) Hamming mismatch, parity error 2-bit error detection (DED) Hamming mismatch, no parity error 136 The Frame ECC function is performed each time a frame is read from the external serial Boundary Scan interface or parallel SelectMAP configuration interface [25][26]. In addition to these external configuration interfaces, Virtex-4 and Virtex-5 include a 32-bit internal configuration access port (ICAP), illustrated in Figure 8.2, that provides write/read access to/from the configuration memory from within the FPGA core. As is the case with the external interfaces, the Frame ECC function is performed each time a frame is read via the ICAP. Because the Frame ECC does not provide circuitry to perform error correction, some additional logic must be implemented in the FPGA fabric that uses the ICAP and Frame ECC modules to cycle through all frames of the configuration memory to detect SEUs and to correct those SEUs. Virtex-5 FPGAs also include dedicated circuitry in the FPGA that can automatically detect SEUs using built-in cyclic redundancy check (CRC) circuitry [26]. When Readback CRC is enabled (by setting the POST_CRC configuration option to ENABLE), the contents of the configuration memory are continuously read back in the background of the user design operation to calculate and check the CRC of the configuration memory contents. An SEU anywhere in the configuration memory will cause the re-calculated CRC to disagree with the stored CRC. The mismatch is signaled by asserting the CRC Error output of the Frame ECC (only present in Virtex-5 and not shown in Figure 8.2). Optionally, the external INIT_B output pin of the FPGA may also be driven low when the error is detected [26]. The Readback CRC will begin to run automatically upon a successful configuration of the FPGA and will continue to run as long as no configuration interfaces are in use; a configuration interface is considered to be in use after the synchronize (SYNC) command is decoded and until the de-synchronize (DESYNC) command is decoded [26]. Similar background CRC read back circuitry has been incorporated in recent Altera [21] and Lattice [14] FPGAs to support SEU detection. 137 Figure 8.2: Frame ECC and ICAP primitives An implementation of internal SEU detection and correction using the Frame ECC and ICAP logic in Virtex-4 devices was reported in [12]. The design uses an 8-bit PicoBlaze [19] soft-core processor with additional circuitry and RAM in the FPGA fabric for interfacing to the ICAP to read and write the configuration memory. The design can operate in a detection only mode, or can detect and correct single-bit errors. The design was later implemented in triple modular redundancy (TMR) [10]. While both [10] and [12] are applicable only to Virtex-4, the approach in [12] was recently extended in [5] to support Virtex-5 FPGAs. 8.3 Operation of SEU Detect and Correct Our SEU detect and correct circuit, or SEU controller as it is referred to in this chapter, is designed to be integrated into any existing VHDL-based user design with minimal effort. At the top level, there are only two inputs?clock and reset?and one output? error. The VHDL component declaration for the SEU controller is given in Figure 8.3. component seu_controller is generic(device : string(1 to 6)); port( rst : in std_logic; clock : in std_logic; error : out std_logic); end component seu_controller; Figure 8.3: SEU controller VHDL component declaration The generic device is a text string that specifies the device in which the SEU controller will be implemented, such as ?LX330T? for example. All Virtex-4 and Virtex-5 devices are ICAP_OUT[31:0] BUSY ICAP CLK_EN ICAP_IN[31:0] WRITE CLK ERROR Frame ECC SYNDROME[11:0] SYNDROMEVALID 138 supported such that only this generic need be specified by the user to indicate the target FPGA for synthesis. The error output is asserted when the first multiple-bit error is detected, and should trigger a reconfiguration of the FPGA from a reliable external memory since multiple bit errors cannot be corrected by the SEU controller. The clock input is directly connected to the ICAP clock and the SEU controller. It is limited by the maximum ICAP clock frequency of 100 MHz, but can operate at any frequency below 100 MHz. In Virtex-5 devices, the ICAP and SEU controller clock can be supplied by an internal 50 MHz oscillator [26]. The synchronous active high reset input forces the SEU controller into an inactive state, releasing the configuration interface for use by other applications. Asserting the reset input also resets the frame address to the first frame of configuration memory and clears the error output. When reset is released, the SEU controller will resume normal operation from the first frame of the configuration memory on the next rising edge of clock. The reset input may be tied to logic 0 for free-running SEU detection and correction in user designs that do not require access to the configuration memory during normal system operation. The operation of the SEU controller consists of the following steps: 1. A 1312-bit frame of configuration memory is read through the ICAP as forty-one 32-bit words and the frame data is stored in a block RAM. 2. If an error is indicated by the outputs of the Frame ECC primitive, the type of error is determined as shown in Table 8.2. If the error indicates a double-bit error, the error output of the SEU controller is latched high and read back continues with the next frame of configuration memory. If a single-bit error is indicated, the location of the bit is determined from the syndrome and the erroneous bit is corrected (i.e. inverted) in the frame data stored in the block RAM. 139 3. If a single-bit error was indicated in Step 2, the repaired frame is now written back into the configuration memory at the same frame address from which it was read. 4. If a single-bit error was indicated in Step 2, read back resumes with the first frame in the configuration column containing the newly repaired frame. 5. When a configuration column has been completely read and repaired (as determined by no single-bit error indications for any frames in for that configuration column), the SEU controller advances to the next configuration row/column in the array and repeats the process starting at Step 1. This SEU controller behavior is summarized by the pseudocode of Figure 8.4. Load starting frame address while (reset == 0) { Read single frame from configuration memory Read Frame ECC outputs if (single bit error is detected) { Translate syndrome to bit index in frame Read erroneous bit Write inverted (corrected) bit to same location Write frame back to configuration memory break } else if (double bit error is detected) { Assert ERROR output } Increment Frame Address } Figure 8.4: SEU controller behavioral pseudocode In Virtex-5 devices, the SEU controller may utilize the Read Back CRC feature of the Frame ECC module for the initial detection of an SEU with a small modification to the design. By enabling the Read Back CRC (in the design constraints file) and using the complement of the 140 CRC Error output of the Frame ECC circuit as the reset input to the SEU controller, the SEU controller will remain idle (held in active high reset) with the CRC Read Back circuit operating in the background (at the frequency of the ICAP input clock [26]). When a CRC mismatch is detected, the CRC Error output of the Frame ECC circuit is asserted, de-asserting the reset input to the SEU controller. The SEU controller will begin normal operation, cycling through the configuration memory detecting and correcting all single-bit errors. However, after the last frame of configuration memory is reached, the SEU controller will return to the reset state and wait for a falling edge on the reset input before resuming operation. By entering the reset state and releasing the ICAP configuration interface via a DESYNC command, the internal CRC Read Back will resume. This approach has the disadvantage of doubling the cycle time in the worst case since both the CRC Read Back circuit and SEU controller may require a complete cycle to detect and then repair the SEU. As observed in [5], however, this approach may offer some additional immunity to SEUs in the detection phase because the CRC Read Back circuit is implemented as dedicated logic at the physical circuit level, as opposed to the SEU controller, which is implemented in configurable resources. The INIT_B signal could be used to externally verify the correction of the SEU by ensuring the INIT_B output pin of the FPGA does not remain low longer than a predetermined time period (approximately three complete scan cycles of the FPGA configuration memory). If, however, the INIT_B remains low or the error output of the Frame ECC is asserted, the error is not repaired and the configuration memory should be refreshed from a radiation hardened ?golden? copy. 8.4 SEU Detect and Correct Architecture Our SEU controller is implemented entirely in configurable logic blocks (CLBs) and one 18 Kb block RAM in the FPGA fabric. It is constructed primarily around the ICAP and Frame 141 ECC primitives [25][26]. The operation of the SEU controller, described in the previous section, is managed with a finite state machine (FSM) implemented in CLB logic slices. The FSM initiates reads and writes to the FPGA internal configuration memory and control registers via the 32-bit ICAP interface. A set of sixty-four 32-bit instructions are stored in a 32?64-bit read- only memory (ROM) formed in 32 LUTs (6-inputs each) in Virtex-5 and 128 LUTs (4-inputs each) in Virtex-4. The 32?64-bit LUT ROM is addressed by a counter that is enabled by combinational logic from the FSM current state. The FSM also generates the frame address for reads and writes of the configuration memory. All reads from and writes to the configuration memory are 32-bits. The logic for the frame address counter is device dependent since every device has different numbers of rows and/or columns. Furthermore, the arrangement of different types of columns (e.g. CLB, DSP, RAM, etc.) can vary depending on the device. The generic device (shown in Figure 8.3) is used to determine and synthesize the correct frame address logic for the target device. The central component of the SEU controller architecture is the dual-port block RAM (at least two columns of 18 Kb block RAMs are included in every Virtex-4 and Virtex-5 device). A single block RAM is used to store each frame as it is read from the configuration memory. The A port of the block RAM is configured for 32-bit read/write access, and the B port is configured for 1-bit read/write access, as illustrated in Figure 8.5. The data inputs of the A port are connected directly to the outputs of the ICAP, and the A port data outputs are connected to the ICAP inputs via a 32-bit 2-to-1 multiplexer. 142 Figure 8.5: SEU controller block diagram The A port address inputs are controlled by a counter in the FSM. Every frame that is read from the ICAP is stored in the first forty-one 32-bit words of the block RAM. Single-bit errors are corrected via the 1-bit B port interface. The B port address inputs are connected to combinational logic which provides the bit offset of the bit in error based on the Frame ECC syndrome outputs. The 1-bit B port data output is inverted connected to the 1-bit B port input. The B port write enable is controlled by combinational logic from the syndromevalid and ECC error Frame ECC outputs in conjunction with the FSM. The location of single-bit errors within the frame is indicated by the syndrome[10:0] outputs of the Frame ECC primitive, however some additional combinational computational logic is required to determine the exact bit-offset of the error within the configuration frame. An equation for determining the bit-offset of the error in the range 0-1311 is given by: BRAM DINA DOUTA WEA ADDRB WEB DINB DOUTB ADDRA 12 32 syndromevalid 32 SEU Controller Logic & Instruction ROM 32 Frame ECC Clock Reset ICAP Error 11 16 ECC error 143 offset = {S[10:5] ? 6'd22 ? S[10], S[5:0]} (8.1) where S[10:0] are the Frame ECC syndrome outputs [25][26]. Otherwise, if the binary value of syndrome[10:0] is 0 or a power of 2, then the error is located in one of the Hamming bits, in which case the location of the bit error is determined as shown in Table 8.3. The output of the syndrome combinational logic is tied to the B port address inputs. In this manner, the erroneous bit, as indicated by syndrome[11:0], is inverted when the block RAM B port write enable is asserted. The repaired frame is then written back into the configuration memory via the A port 32-bit output to the ICAP. Table 8.3: Hamming bit error diagnosis [25][26] syndrome[11:0] offset syndrome[11:0] offset 100000000001 640 100001000000 646 100000000010 641 100010000000 647 100000000100 642 100100000000 648 100000001000 643 101000000000 649 100000010000 644 110000000000 650 100000100000 645 100000000000 651 A rare, but potentially problematic situation can arise when an odd number of bit errors occur in a single frame of configuration memory. These errors will cause both a syndrome mismatch and overall parity mismatch, which aliases as a correctable single-bit error (refer to Table 8.2). However, in this case, the syndrome outputs do not necessarily indicate the location of any of the actual errors, and can erroneously point anywhere in the range 0 to 211?1 (2047). Since the actual frame data only exists in the range 0 to 1311, the following two scenarios are possible. First, the odd-multiple bit error aliases as a single-bit error with the syndrome outputs pointing in the valid range of the frame data 0 to 1311. In response to the single-bit error 144 indication, the SEU controller will invert the frame-bit pointed to by the syndrome, which may satisfy the Hamming code by creating a valid distance code word, and the modified frame will be written back into the configuration memory. The SEU controller will resume read back at the start of the configuration column containing the still damaged frame. When the erroneous frame, now containing an even number of multiple errors, is read, the valid code word will cause a Hamming code match and an overall parity-bit match such that a ?no bit error? indication is obtained. However, by incorporating the CRC Read Back mechanism with the SEU controller, as described in Section 8.3, this multiple bit error can be detected because the CRC will continue to indicate a CRC Error with the SEU controller indicating no error. In the second scenario, when the frame containing an odd number of errors greater than one is read, the syndrome indicates an error bit location in the range from 1312 to 2047. This range, while a valid address in the larger block RAM, lies outside of the range of valid frame data. Therefore, if events are allowed to proceed as in the first scenario, unmodified frame data would be written back into the configuration memory, effectively creating an infinite loop, since the same frame would be continually read from and written to the configuration memory without modification. Our solution is to include a greater-than comparator in the SEU controller which detects when the syndrome points outside of the range of valid frame data (0 to 1311). When this condition occurs, the SEU controller ignores the syndrome and asserts the error output, indicating the existence of a multiple-bit error and that the FPGA configuration bitstream data should be reloaded from a reliable external memory. 145 8.5 Implementation Results The greatest benefit of our SEU controller when compared to other approaches is the relatively high speed at which errors are detected and corrected. SEUs should be corrected with a minimum amount of latency so that errors in the programming of the user logic persist for the shortest possible period of time. Figure 8.6 shows the time required for one full cycle of single- bit error correction and double-bit error detection in Virtex-4 devices for the Xilinx SEU controller described in [12] and our SEU controller, where a cycle is defined as the time to perform the operation over every configuration memory frame in the device, excluding frames containing block RAM contents. The cycle time also corresponds to the maximum amount of time that one SEU can persist in the configuration memory. The Xilinx Virtex-4 SEU controller can operate in two modes: single and double-bit error detection only mode, and single-bit error correction and double-bit error detection mode [12]. As shown in Figure 8.6, the Xilinx ?detect only? cycle time is nearly identical to our detect and correct mode. However, when single-bit error correction is enabled, the total cycle time for the Xilinx Virtex-4 SEU controller increases to about 20 times that of our normal detect and correct cycle time. On average, our SEU controller reduces the total cycle time for SEC and DED with respect to the Xilinx SEU controller by 94.7%. Figure 8.7 shows the total cycle time for our SEU controller in Virtex-5 devices. The cycle time is increased by an average of 17 ?s for each SEU detected and corrected. The repair time for one frame is negligible. However, the cycle time would double if there were one SEU present in every configuration column. 146 Figure 8.6: SEU controller LOG cycle time vs. Virtex-4 device 110 10 0 10 00 LX 15 LX 25 LX 40 LX 60 LX 80 LX 100 LX 160 LX 200 SX 25 SX 35 SX 55 FX 12 FX 20 FX 40 FX 60 FX 100 FX 140 C y c l e T i m e ( m s ) De te ct an d Co rre ct Xi lin x De te ct On ly Xi lin x De te ct an d Co rre ct 147 Figure 8.7: SEU controller cycle time vs. Virtex-5 device 04812162024 L X30 LX 50 LX 85 L X11 0 LX 155 LX 220 LX 330 LX T20 LX T30 LX T50 LX T85 L XT 110 LX T15 5 LX T22 0 LX T33 0 SX T35 SX T50 SX T95 C y c l e T i m e ( m s ) Ov er he ad Fr am e Re ad 148 To increase the reliability of the Xilinx SEU controller, the authors of [10] used the Xilinx TMR Tool [29] to implement the Xilinx Virtex-4 SEU controller with full TMR. However, as the results in [10] show, this approach may be impractical for some applications because of its high area overhead. A comparison of the device utilization for the Xilinx SEU controller [12], the Xilinx SEU controller with TMR [10], and our approach implemented in Virtex-4 is summarized in Table 8.4. While the Xilinx approach uses 23 fewer slices, we use one less block RAM and complete each cycle of the configuration memory an average of 20 times faster. The Xilinx Virtex-4 SEU controller with TMR utilizes 1,308 logic slices and 6 block RAMs [10] ? a 770% increase in area versus the non-TMR SEU controller. Table 8.4: SEU controller resource utilization in Virtex-4 devices Resource Xilinx [12] Xilinx TMR [10] SEU Controller # Slices 149 1308 182 # Block RAMs (18 Kb) 2 6 1 Avg Cycle (ms) 105.5 105.5 5.603 # Lines VHDL 3656 -- 1051 A comparison of our SEU controller with the recently proposed Xilinx Virtex-5 SEU controller [5] is given in Table 8.5. Our approach uses one less block RAM and 30 fewer slices. The cycle time for the Xilinx Virtex-5 SEU controller approach was not reported. However, due to the similarity of the Virtex-4 [12] and Virtex-5 [5] SEU controller architectures, our Virtetx-5 SEU controller is likely to have a speed-up factor similar to that observed in Virtex-4. Table 8.5: SEU controller resource utilization in Virtex-5 devices Resource Xilinx [5] SEU Controller # Slices 95 65 # Block RAMs (18 Kb) 2 1 Average Cycle Time (ms) -- 9.338 # Lines VHDL 2625 945 149 Our SEU controller could also be implemented using the Xilinx TMR Tool to mitigate the risk of failure due to an SEU, as was done in [10] for the Xilinx Virtex-4 SEU controller TMR design. This approach would essentially allow two error-free SEU controllers to correct an SEU affecting the third SEU controller. However, the configurable routing resources surrounding the ICAP and Frame ECC cores could still be vulnerable to SEUs since these modules and their interfaces cannot be replicated. 8.6 Experimental Results Our SEU controller has been synthesized for all Virtex-4 and Virtex-5 FPGAs. Furthermore, the SEU controller has been downloaded and verified on Virtex-4 FX12, SX35, and LX60 devices as well as Virtex-5 LX30T, LX50T, SX35T, SX50T, FX30T and FX70T devices. The number of utilized CLB logic slices has been observed to vary by ?3 slices in both Virtex-4 and Virtex-5 devices depending on the device and the area optimization used with the place and route software. During synthesis, the SEU controller logic and block RAM may be constrained to any area of the FPGA or may be left unconstrained for automatic placement with the user?s system function. The routed SEU controller in a Virtex-5 LX30T device is shown in Figure 8.8 where its location was constrained to the area shown. The dynamic power dissipation of the SEU controller was measured on both Virtex-4 and Virtex-5 FPGAs and found to be less than 5 mW at 100 MHz. Power requirements for the previous approaches in [5], [10] and [12] were not reported. 150 Figure 8.8: Routed SEU controller implemented in Virtex-5 LX20T device For design verification and analysis, we developed an approach to emulate SEUs in the configuration memories of Virtex-4 and Virtex-5 FPGAs using a configuration memory read- modify-write process [8] similar to the approach described in [11]. The read-modify-write process is executed by an external computer connected to the FPGA via the Boundary Scan configuration interface. A list of configuration bit addresses is generated by software we developed to select random locations for SEU injection. Our SEU list generation software also allows for control of the locations of the SEUs to either a specific region or the entire configuration memory. Additionally, a rectangular area of the FPGA can be masked such that SEUs are randomly located outside of the mask area. Our approach is capable of injecting any 151 number of errors in the configuration memory, simultaneously or individually, as determined by the length of the SEU target list [8]. This SEU emulation approach was shown in [11] to reproduce 97% of actual SEU and SET induced faults in radiation chamber experiments. Furthermore, because the entire configuration memory is accessible, greater than 99% of all possible SEUs in the configuration memory of a given Virtex-5 FPGA can be emulated with this approach (refer to Table 8.1). The analysis process begins by configuring the target device with the error-free SEU controller configuration. The SEU controller is held in reset while the SEUs are injected into the configuration memory via the Boundary Scan interface. For each SEU in the list, the corresponding frame of configuration memory is read back from the target device to the external computer. The SEU emulation bit in the frame is inverted, and the frame is written back to the same location in the configuration memory. After injection of the SEU(s), the SEU controller is released from reset and executed for one or more complete cycles. The number of single-bit and multiple-bit errors reported by the SEU controller are recorded by internal counters included for analysis and verification only, and these count values are read via the Boundary Scan interface at the end of the error detection/correction cycles. The success of the SEU controller is determined by comparing the values in the counters to the number of SEUs contained in the original list. Emulated configuration memory SEUs are classified in two categories. The first category includes all SEUs that are detected and corrected normally, as verified by a comparison of the retrieved count values and the original SEU list. The second category encompasses any SEU that affects the operation of the SEU controller such that either the SEU cannot be detected and corrected or the values contained in the counters are incorrect or cannot be retrieved for verification. Note that a slight penalty is incurred for the inclusion of the counters, which are 152 susceptible to SEUs, and could produce a failing pattern despite the correction of the emulated SEU. A total of 8,000 randomly generated SEUs were individually injected in the configuration memory of a Virtex-5 LX50T and the result of each trail was recorded. Our trials showed that, of the 8,000 random SEUs, all but 178 were detected and corrected in the first full execution cycle, yielding a probability of detection and correction of 97.78%. Considering the SEU locations to be randomly distributed, independent samples, the lower bound for the probability of correction of SEUs at the 99% confidence level is 97.30% [22]. Therefore, the likely probability of detection and correction of any number of simultaneous SEUs greater than one is given by: Pr(correction) = [1?Pr(failure)]N where N is the number of simultaneously occurring SEUs. The results of SEU emulation for 1000 SEUs in four Virtex-5 devices are shown in Table 8.6. In our trials, 100% of SEUs that lie outside of the area of the configuration memory that controls the functionality of the SEU controller are corrected. The experimental success rates for [10] and [12] were not reported. Table 8.6: SEU emulation results Device Slice Count Pop. Size (Mb) Corrected/Injected Pr(correction) 99% Confidence LX30T 59 7.29 950/1000 93.22% SX35T 60 9.26 955/1000 93.46% LX50T 59 10.9 980/1000 96.86% SX50T 60 13.9 967/1000 94.96% LX50T 59 10.9 7822/8000 97.35% In general, the percentage of correctable SEUs is positively correlated to the size of the configuration memory of the given device because the number of configuration bits affecting the SEU controller functionality are fixed in relation to the total size of the configuration memory. According to the data provided in [5], the adjusted FIT rate, considering only the vulnerable bits which implement the SEU controller functionality, may be approximately calculated based on 153 the number of resources in use by the SEU controller and the number of configuration bits affecting the programming of each type of resource (shown in Table 8.7). For the Xilinx Virtex- 5 SEU controller, the approximate number of sensitive configuration bits was reported to be 113,365 bits, or 0.108 Mb, yielding a nominal FIT rate of 16.33, or MTBF of approximately 6,992 years [5]. For our SEU controller, which utilizes less logic resources in Virtex-5, there are approximately [(65 ? 1,181) + (1 ? 585)] = 77,350 bits, or 0.0738 Mb, that are sensitive to SEUs. Therefore, the adjusted FIT rate for our SEU controller is 11.14, or MTBF of approximately 10,247 years. As was observed in the SEU emulation results, the adjusted FIT rate for designs protected by the SEU controller is independent of the device size because the size of the SEU controller is approximately device independent. Table 8.7: Approximate number of configuration bits for common resources [5] Resource Approximate number of configuration bits Logic Slice 1,181 Block RAM (36 Kb) 1,170 Block RAM (18 Kb) 585 I/O Tile 2,657 DSP48E Slice 4,592 8.7 Conclusions The increased use of FPGAs for implementing digital systems, in conjunction with their larger configuration memories and shrinking design rules, has raised concerns about the effects of SEUs, particularly for high-altitude and space applications as well as for high-reliability, high- availability applications. As a result, some FPGA manufacturers are reducing the FIT rate through their design of the configuration memory and by incorporating modules that support SEU detection, such as the Frame ECC and ICAP in recent Xilinx FPGAs [25][26] and CRC background check circuitry in recent Altera [21] and Lattice [14] FPGAs. We have presented an 154 SEU controller applicable to all Xilinx Virtex-4 and Virtex-5 FPGAs that is capable of correcting single-bit errors and detecting double-bit errors in the FPGA configuration memory, which represents greater than 99% of all memory elements susceptible to SEUs. Note that block RAMs account for the second largest percentage (approximately 14%) of memory elements susceptible to SEUs. However, recent Xilinx [25][26] and Altera [21] FPGAs include RAMs cores with user optional ECC modes of operation. The SEU controller VHDL is easily integrated with any existing user design with minimal resource overhead and power dissipation. Our approach detects and corrects errors in the configuration memory 20 times faster than other reported approaches in [10] and [12]. In addition, our design is less susceptible to SEU induced failure because it uses less logic resources, which results in a failure rate improvement of about 46.6% for Virtex-5 FPGAs. Finally, TMR techniques can be used to prevent SEUs that occur within the configuration bits that establish the SEU controller logic from causing the SEU controller to fail in high-reliability, high-availability applications. 8.8 Acknowledgements The contents of this chapter are published under the title ?On-line Single Event Upset Detection and Correction in FPGAs Configuration Memories? in The ISCA International Journal on Computers and Their Applications, Vol. 17, No. 2. Prof. Charles Stroud is a co-author on the journal article. The journal article is an extended version of the work previously published in Proceedings of the ISCA International Conference on Computers and Their Applications, 2009, pp. 57-62, under the title ?Single Event Upset Detection and Correction in Virtex-4 and Virtex-5 FPGAs?. A majority of the actual research and the writing of the published paper represents the efforts of the primary student author and not collaborators, and the research represents work performed while in the graduate program at Auburn University. 155 8.9 References [1] B. Bridgford, C. Carmichael, and C. Tseng, ?Single-Event Upset Mitigation Selection Guide,? XAPP987 (v1.0), Xilinx Inc., March 2008. [2] M. Caffrey, P. Graham, E. Johnson, M. Wirthlin, C. Carmichael, ?Single-Event Upsets in SRAM FPGAs,? Military and Aerospace Programmable Logic Devices Conf., Sept. 2002. [3] T. Calin, M. Nicolaidis, and R. Velazco, ?Upset Hardened Memory Design for Submicron CMOS Technology,? IEEE Trans. on Nuclear Science, vol. 43, no. 6, pp. 2874-2878, Dec. 1996. [4] C. Carmichael and C. Wei Tseng, ?Correcting SEUs in Virtex-4 Platform FPGA Configuration Memory,? XAPP988, (v1.0), Xilinx Inc., March 2008. [5] K. Chapman and L. Jones, ?SEU Strategies for Virtex-5 Devices,? XAPP864 (v1.0.1), Xilinx Inc., March 2009. [6] Device Reliability Report: Fourth Quarter 2008, UG116 (v5.3) , Xilinx Inc., Feb. 2009. [7] B. Dutton and C. Stroud, ?Single Event Upset Detection and Correction in Virtex-4 and Virtex-5 FPGAs,? Proc. ISCA Int. Conf. on Computers and Their Applications, pp. 57- 62, April 2009. [8] B. Dutton, M. Ali, J. Sunwoo and C. Stroud, ?Embedded Processor Based Fault Injection and SEU Emulation for FPGAs,? Proc. Int. Conf. on Embedded Systems and Applications, pp. 183-189, July 2009. [9] ?Flux Calculation,? , April 2009. [10] J. Heiner, N. Collins, and M. Wirthlin, ?Fault Tolerant ICAP Controller for High- Reliable Internal Scrubbing,? Proc. IEEE Aerospace Conf., pp. 1-10, March 2008. [11] E. Johnson, M. Caffrey, P. Graham, N. Rollins, and M. Wirthlin, ?Accelerator Validation of an FPGA SEU Simulator,? IEEE Trans. on Nuclear Science, vol. 50, no. 6, pp. 2147- 2157, Dec. 2003. [12] L. Jones, ?Single Event Upset (SEU) Detection and Correction Using Virtex-4 Devices,? XAPP714 (v 1.5), Xilinx Inc., Jan. 2007. [13] F. Kastensmidt, L. Carro, and R. Reis, Fault-Tolerance Techniques for SRAM-based FPGAs, Frontiers in Electronic Testing, Vol. 32, Dordrecht, The Netherlands, Springer, 2006. [14] ?LatticeECP3 Soft Error Detection (SED) Usage Guide,? TN1184 (v1.0), Lattice Semiconductor Inc., Feb. 2009. 156 [15] A. Lesea, ?Continuing Experiments of Atmospheric Neutron Effects on Deep Submicron Integrated Circuits,? WP286 (v1.0), Xilinx Inc., March 2008. [16] A. Lesea and P. Alfke, ?Xilinx FPGAs Overcome the Side Effects of Sub-90 nm Technology,? WP256 (v1.0.1), Xilinx Inc., March 2007. [17] A. Lesea, S. Drimer, J. Fabula, C. Carmichael, and P. Alfke, ?The Rosetta Experiment: Atmospheric Soft Error Rate Testing in Differing Technology FPGAs,? IEEE Trans. on Device and Materials Reliability, Vol. 5, No. 3, pp. 317-328, Sept. 2005. [18] M. Ohlsson, P. Dyreklev, and K. Johansson, ?Neutron Single Event Upsets in SRAM- based FPGAs,? Proc. IEEE Radiation Effects Data Workshop, pp. 177-180, July 1998. [19] PicoBlaze 8-bit Embedded Microcontroller User Guide, UG129 (v1.1.2), Xilinx Inc., June 2008. [20] H. Quinn, P. Graham, K. Morgan, M. Caffrey and J. Krone, "A Test Methodology for Determining Space-Readiness of Xilinx SRAM-based FPGA Designs," Proc. IEEE Automatic Test Conf. (AUTOTESTCON), pp. 252-258, Sept. 2008. [21] ?Robust SEU Mitigation with Stratix III FPGAs,? WP-01012-1.0, Altera Inc., Jan. 2007. [22] J. Sauro and J.R. Lewis, ?Estimating Completion Rates From Small Samples Using Binomial Confidence Intervals,? Proc. Human Factors and Ergonomics Society, pp. 2100-2104, 2005, available at . [23] L. Sterpone and M. Violante, ?A Design Flow for Protecting FPGA-based Systems Against Single Event Upsets," Proc. IEEE Int. Symp. on Defect and Fault Tolerance in VLSI Systems, pp. 436-444, Oct. 2005. [24] P. Sundararajan, S. McMillan, B. Blodget, C. Carmichael, and C. Patterson, ?Estimation of Single Event Upset Probability Impact of FPGA Designs,? Military and Aerospace Programmable Logic Devices Conf., Sept. 2003. [25] Virtex-4 FPGA Configuration User Guide, UG071 (v1.10), Xilinx Inc., April 2008. [26] Virtex-5 FPGA Configuration User Guide, UG191 (v3.6), Xilinx Inc., Feb. 2009. [27] Virtex-5 Family Overview, DS100 (v5.0), Xilinx Inc., Feb. 2009. [28] Virtex-6 Family Overview, DS150 (v1.0), Xilinx Inc., Feb. 2009. [29] Xilinx TRMTool User Guide: TMRTool Software Version 9.2i, UG156 (v2.2), Xilinx Inc., 2009. [30] C. Yui, G. Swift, and C. Carmichael, ?Single Event Upset Susceptibility Testing of the Xilinx Virtex-II FPGA,? Military and Aerospace Programmable Logic Devices Conf., Sept. 2002. 157 Chapter Nine. Summary and Conclusions This chapter concludes and summarizes the thesis. First, a summary of the work presented in this thesis is provided, followed by suggestions for future research and any improvements to the work. 9.1 Summary of Work A BIST approach was presented for the CLBs in Virtex-5 FPGAs. A total of 17 configurations were used to obtain 100% stuck-at fault coverage in every CLB in any Virtex-5 device. Gate level fault simulation and configuration memory fault emulation were used for the development and verification of test configurations and for calculating fault coverage. A new ORA design was introduced which provides a single-bit pass/fail result for all of the resources under test. This ORA design has since been used in every BIST configuration that has been developed for Virtex-4 and Virtex-5 FPGAs. The overall test time is minimized by using partial reconfiguration of the resources under test and the single-bit pass/fail indication at the conclusion of each test session. However, for fault diagnosis, the contents of every ORA may be retrieved via partial configuration memory readback, and the locations of faults determined algorithmically based on the locations of the failing ORAs. This thesis also presented a BIST approach for the I/O Tiles in Virtex-5 FPGAs. This approach shares many features of the approach for CLBs, including pseudo-exhaustive testing of the embedded resources and comparison-based output response analysis (using the improved ORA design with single-bit pass/fail). One interesting difference with the I/O BIST approach is the ability to apply a limited number of deterministic test patterns using block RAMs in the 158 FPGA fabric to store the test pattern set. For Virtex-4, 512 test patterns could be stored in a block RAM, and in Virtex-5 the number increased to 1024. However, due to the lack of any gate-level description of the I/O Tiles in Xilinx devices, it is difficult to evaluate the effectiveness of the test patterns. One of the most significant contributions of this work is the use of dedicated feedback routing in the I/O Tile to bypass the I/O buffer (and pad) during tests of the digital logic resources in the I/O Tile. This effectively separates the digital logic portion of the I/O tiles from the external ?analog? environment, making the approach applicable to board-level and in-system testing. Consequently, independent tests for the I/O buffers were developed. These BIST configurations are also package independent because they can test I/O tiles with both bonded and unbonded I/O buffers, which is important because synthesis tools will sometimes use the logic resources in an I/O Tile with an un-bonded I/O buffer to implement a portion of the system function. Next, a BIST approach was presented for the embedded cores in Xilinx Virtex-4 and Virtex-5 FPGAs that are used for the detection and correction of SEUs in the configuration memory of these devices. This work is related to the SEU controller that is presented later in the thesis in that the SEU controller uses these cores for detection and correction of SEUs; therefore, the fault-free operation of the cores is essential. One interesting difference between this BIST approach and the approaches presented for CLBs and I/O Tiles is that this approach was developed entirely in VHDL (as opposed to an XDL netlist). A VHDL-based approach is possible because there is only one circuit to test, and, therefore, no redundant TPG or ORA logic and no placement restrictions for the CUT. Fault injection is a well known method for emulating faults or SEUs in the configuration memory of FPGAs. However, this thesis improves upon the existing approach by performing 159 fault-injection using a soft-processor configured in the fabric of the FPGA. This approach can be used during the development of BIST for FPGA resources or for verification of SEU mitigation schemes (but not as part of the manufacturing or system-level test). For example, the fault- injection core could ?inject? a list of random SEUs while monitoring the behavior of the system function. Based upon the occurrence of errors in the system function, the actual FIT rate of the user function in any environment could be estimated, and several different SEU mitigation schemes could be quickly evaluated. The next two chapters of the thesis present a new approach for BIST of FPGAs. This approach uses a soft-core processor configured in the fabric of the FPGA under test to perform reconfiguration of the BUTs, control the BIST sequence, and even perform fault diagnosis. However, the irregularity of the embedded processor makes configuration files too large to compete with the highly optimized BIST configurations. This thesis shows that the overall test time is significantly less when performing partial reconfiguration of the full FPGA array from an external BIST controller. However, the approach may still be useful for in-system testing, especially in fault tolerant applications, because it significantly reduces the complexity of the external BIST control hardware. For example, the embedded processor can perform all of the reconfigurations of the BUTs and determine the results of the BIST, reporting a single-bit pass/fail result to the system for all of the resources under test. Finally, an approach for the on-line detection and correction of SEUs in the configuration memory of Virtex-4 and Virtex-5 FPGAs is presented. This chapter shows that no external hardware is required for the approach, because readback of configuration data and error detection and correction are all performed by additional logic included in the FPGA fabric. While greatly reducing the probability of an SEU, experimental results are provided to show that the approach 160 is not entirely immune to an SEU induced error. However, no single SEU can permanently corrupt the user function, and SEUs can only persist in the user function for a period of time equal to the cycle period of the SEU controller (i.e. the amount of time for the SEU controller to read every frame of configuration data in a given device). The thesis also shows that the cycle time and probability of an SEU induced failure are functions of the device size, with larger devices having a longer cycle time and lower probability of failure. In addition, a quantitative method for estimating the FIT rate in devices protected by the SEU controller is provided based on an approach in the previous work. 9.2 Future Work The BIST approaches presented for the CLBs and I/O Tiles in Virtex-5 FPGAs can be adapted to Virtex-6 devices with few architectural modifications. The TPGs and ORAs can be implemented in a similar manner in Virtex-6 devices (which include DSPs and Block RAMs), but the detailed test configurations will need to be modified for the new device architectures. The embedded BIST approach can also be updated to support Virtex-6 devices, but larger configuration file sizes for these devices may make the approach impractical. However, in systems with an intelligent BIST controller (embedded processor, PC, etc?) the configuration file compression methods presented in this thesis are applicable and potentially very useful for saving memory, especially for in-system testing. The SEU controller is becoming more important due to the increasing size of the configuration memory and shrinking design rules. The configuration memory size in Virtex-6 devices is on average double that of Virtex-5 devices; and because the SEU controller cycle time is a function of the size of the configuration memory, the average cycle time can be expected to double. Testing the Frame ECC logic is also more important in Virtex-6 devices. Due to the 161 doubling of the configuration frame size, there is more logic in the Frame ECC that must be tested. 162 Bibliography [1] M. Abramovici and C. Stroud, ?BIST-Based Test and Diagnosis of FPGA Logic Blocks, IEEE Trans. on VLSI Systems, vol. 9, no. 1, pp. 159-172, 2001. [2] M. Abramovici, C. Stroud, and J. Emmert, ?Online BIST and BIST-based diagnosis of FPGA logic blocks,? IEEE Trans. on Very Large Scale Integr. (VLSI) Syst., vol.12, no.12, pp. 1284-1294, 2004. [3] AT94K Series Field Programmable System Level Integrated Circuit, DS1138, Atmel Corp., 2001. [4] J. Bailey et. al., ?Bridging Fault Extraction from Physical Design Data for Manufacturing Test Development,? Proc. IEEE Int. Test Conf., pp. 760-769, 2000. [5] D. Bossen, D. Ostapko, and A. Patel, ?Optimum test patterns for parity networks,? Proc. AFIPS Fall 1970 Joint Comput. Conf., pp. 63-68, 1970. [6] B. Bridgford, C. Carmichael, and C. Tseng, ?Single-Event Upset Mitigation Selection Guide,? XAPP987 (v1.0), Xilinx Inc., 2008. [7] S. Brown and J. Rose, ?FPGA and CPLD architectures: a tutorial,? IEEE Design & Test of Computers, vol.13, no.2, pp.42-57, 1996. [8] M. Bushnell and V. Agrawal, Essentials of Electronic Testing for Digital, Memory and Mixed-Signal VLSI Circuits, New York: Springer, 2000. [9] M. Caffrey, P. Graham, E. Johnson, M. Wirthlin, C. Carmichael, ?Single-Event Upsets in SRAM FPGAs,? Military and Aerospace Programmable Logic Devices Conf., 2002. [10] T. Calin, M. Nicolaidis, and R. Velazco, ?Upset Hardened Memory Design for Submicron CMOS Technology,? IEEE Trans. on Nuclear Science, vol. 43, no. 6, pp. 2874-2878, 1996. [11] C. Carmichael and C. Wei Tseng, ?Correcting SEUs in Virtex-4 Platform FPGA Configuration Memory,? XAPP988, (v1.0), Xilinx Inc., 2008. [12] K. Chapman and L. Jones, ?SEU Stratagies for Virtex-5 Devices,? XAPP864 (v1.0.1), Xilinx Inc., 2009. 163 [13] P. Christie, D. Stroobandt, ?The Interpretation and Application of Rent?s Rule,? IEEE Trans. on VLSI Systems, vol. 8, no. 6, pp. 639-648, 2000. [14] P. Civera, L. Macchiarulo, M. Rebaudengo, M. Reorda and M. Violante, ?An FPGA- Based Approach for Speeding-Up Fault Injection Campaigns on Safety-Critical Circuits,? Journal of Electronic Testing: Theory and Applications, vol. 18, pp, 261?271, 2002. [15] A. Cosoroaba and F. Rivoallon, ?Achieving Higher System Performance with the Virtex- 5 Family of FPGAs,? Xilinx Inc., 2006. [16] Device Reliability Report: Fourth Quarter 2008, UG116 (v5.3) , Xilinx Inc., 2009. [17] S. Dhingra, D. Milton, and C. Stroud, ?BIST for logic and memory resources in Virtex-4 FPGAs,? Proc. IEEE North Atlantic Test Workshop, pp. 19-27, 2006. [18] S. Dhingra, S. Garimella, A. Newalker, and C. Stroud, ?Built-in self-test of Virtex and Spartan II FPGAs using partial reconfiguration,? Proc. IEEE North Atlantic Test Workshop, pp. 7-14, 2005. [19] B. Dutton, M. Ali, J. Sunwoo and C. Stroud, ?Embedded Processor Based Fault Injection and SEU Emulation for FPGAs,? Proc. Int. Conf. on Embedded Systems and Applications, pp. 183-189, 2009. [20] B. Dutton and C. Stroud, ?Built-In Self-Test of Configurable Logic Blocks in Virtex-5 FPGAs,? Proc. IEEE Southeastern Symp. on System Theory, pp. 230-234, 2009. [21] B. Dutton and C. Stroud, ?Built-In Self-Test of Programmable Input/Output Tiles in Virtex-5 FPGAs,? Proc. IEEE Southeastern Symp. on System Theory, pp. 235-239, 2009. [22] B. Dutton and C. Stroud, ?Single Event Upset Detection and Correction in Virtex-4 and Virtex-5 FPGAs,? Proc. ISCA Int. Conf. on Computers and Their Applications, pp. 57- 62, 2009. [23] B. Dutton and C. Stroud, ?Soft-core Embedded Processor Based Built-In Self-Test of FPGAs,? Proc. IEEE Int. Symp. On Defect and Fault Tolerence in VLSI Systems, pp. 29- 37, 2009. [24] P. Ellervee, J. Raik, K. Tammem?e and R. Ubar, ?Environment for FPGA-based Fault Emulation,? Proc. Estonian Acad. Sci. Eng., vol. 12, pp. 323?335, 2006. [25] ?Flux Calculation,? , April 2009. [26] B. Garrison, D. Milton, and C. Stroud, ?Built-In Self-Test for Memory Resources in Virtex-4 FPGAs,? Proc. ISCA Int. Conf. on Computers and Their Applications, pp. 63- 68, 2009. [27] S. Gupta, J. Rajski, and J. Tyszer, ?Test pattern generation based on arithmetic operations,? Proc. IEEE Int. Conf. on Computer-Aided Design, pp. 117-124, 1994. 164 [28] J. Heiner, N. Collins, and M. Wirthlin, ?Fault-tolerant ICAP Controller for High-Reliable Internal Scrubbing,? IEEE Aerospace Conf., pp. 1-10, 2008. [29] S. Hwang, J. Hong and C. Wu, ?Sequential Circuit Fault Simulation Using Logic Emulation,? IEEE Trans. on CAD of ICs and Systems, vol. 17, no. 8, pp. 724-736, 1998. [30] IEEE Standard Test Access Port and Boundary-Scan Architecture, IEEE Std 1149.1- 2001, New York, 2001. [31] IEEE Standard Testability Method for Embedded Core-Based Integrated Circuits, IEEE Std. 1500-2005, New York, 2005. [32] C. Jia and L. Milor, ?A BIST Solution for the Test of I/O Speed,? Proc. IEEE Int. Test Conf., pp. 1023-1030, 2003. [33] E. Johnson, M. Caffrey, P. Graham, N. Rollins, and M. Wirthlin, ?Accelerator Validation of an FPGA SEU Simulator,? IEEE Trans. on Nuclear Science, vol. 50, no. 6, pp. 2147- 2157, Dec. 2003. [34] W-B Jone and C-J Wu, "Multiple fault detection in parity checkers," IEEE Trans. on Computers, vol.43, no.9, pp.1096-1099, 1994. [35] L. Jones, ?Single Event Upset (SEU) Detection and Correction Using Virtex-4 Devices,? Application Note XAPP714 (v 1.5), Xilinx Inc., 2007. [36] F. Kastensmidt, L. Carro, and R. Reis, Fault-Tolerance Techniques for SRAM-based FPGAs, Frontiers in Electronic Testing, Vol. 32, Dordrecht, The Netherlands: Springer, 2006. [37] I. Kuon and J. Rose, ?Measuring the Gap Between FPGAs and ASICs,? IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol.26, no.2, pp.203-215, 2007 [38] K. Leach et. al., ?BIST for Xilinx 4000 and Spartan Series FPGAs: A Case Study,? Proc. IEEE Int. Test Conf., pp. 1258-1267, 2003. [39] ?LatticeECP3 Soft Error Detection (SED) Usage Guide,? TN1184 (v1.0), Lattice Semiconductor Inc., 2009. [40] A. Lesea, ?Continuing Experiments of Atmospheric Neutron Effects on Deep Submicron Integrated Circuits,? WP286 (v1.0), Xilinx Inc., 2008. [41] A. Lesea and P. Alfke, ?Xilinx FPGAs Overcome the Side Effects of Sub-90 nm Technology,? WP256 (v1.0.1), Xilinx Inc., 2007. [42] A. Lesea, S. Drimer, J. Fabula, C. Carmichael, and P. Alfke, ?The Rosetta Experiment: Atmospheric Soft Error Rate Testing in Differing Technology FPGAs,? IEEE Trans. on Device and Materials Reliability, Vol. 5, No. 3, pp. 317-328, 2005. 165 [43] L. Lerner, ?Built-In Self-Test for Input/Output Tiles in Field Programmable Gate Arrays,? M.S. thesis, Dept. of Elect. and Comput. Eng., Auburn Univ., Auburn, AL, Dec. 2007. [44] L. Lerner, S. Vemula, and C. Stroud, ?System-Level BIST for Programmable I/O Buffers in FPGAs and SoCs,? Proc. IEEE North Atlantic Test Workshop, pp. 1-9, 2006. [45] MicroBlaze Processor Reference Guide, UG081(v.9.0), Xilinx Inc., 2008. [46] D. Milton, S. Dhingra, and C. Stroud, ?Embedded Processor Based Built-In Self-Test and Diagnosis of Logic and Memory Resources in FPGAs,? Proc. Int. Conf. on Embedded Systems and Applications, pp. 87-93, 2006. [47] G. Moore, ?Cramming More Components onto Integrated Circuits,? Proc. of the IEEE, vol. 86, no. 1, pp. 82-85, 1998. [48] S. Mourad and E. McCluskey, ?Testability of parity checkers,? IEEE Trans. on Industrial Electronics, vol. 36, no. 2, pp. 254-262, 1989. [49] E. Normand, ?Single Event Upset at Ground Level,? IEEE Transs on Nuclear Science, vol. 43, pp. 2742-2750, 1996. [50] M. Ohlsson, P. Dyreklev and K. Johansson, ?Neutron Single Event Upsets in SRAM- Based FPGAs,? Proc. IEEE Nuclear and Space Radiation Effects Conf., pp. 177-180, 1998. [51] PicoBlaze 8-bit Embedded Microcontroller User Guide, UG129 (v1.1.2), Xilinx Inc., 2008. [52] M. Pulukuri and C. Stroud, ?Built-In Self-Test of Digital Signal Processors in Virtex-4 FPGAs,? Proc. IEEE Southeastern Symp. on System Theory, pp. 34-38, 2009. [53] H. Quinn, P. Graham, K. Morgan, M. Caffrey and J. Krone, "A Test Methodology for Determining Space-Readiness of Xilinx SRAM-based FPGA Designs," Proc. IEEE Automatic Test Conf. (AUTOTESTCON), pp. 252-258, 2008. [54] R. Rajsuman, ?Testing a System-On-Chip with Embedded Microprocessor,? Proc. IEEE Int. Test Conf., pp. 499-508, 1999. [55] ?Robust SEU Mitigation with Stratix III FPGAs,? WP-01012-1.0, Altera Inc., Jan. 2007. [56] J. Sauro and J.R. Lewis, ?Estimating Completion Rates From Small Samples Using Binomial Confidence Intervals,? Proc. Human Factors and Ergonomics Society, pp. 2100-2104, 2005, available at . [57] R. Sedaghat, ?Routability estimation of FPGA-based fault injection,? Electronics Letters, vol. 41, no. 14, pp. 790-792, 2005. 166 [58] Semiconductor Industry Association, International Technology Roadmap for Semiconductors: 2007 edition, http://public.itrs.net. [59] T. Slaughter, C. Stroud, J. Emmert and B. Skaggs, ?Fault Injection Emulation for Field Programmable Gate Arrays,? Proc. Int. Society for Optical Eng., vol. 4525, pp. 1-9, 2001. [60] M. Smith, Application-Specific Integrated Circuits, Addison-Wesley, 1997. [61] L. Sterpone and M. Violante, ?A Design Flow for Protecting FPGA-based Systems Against Single Event Upsets," Proc. IEEE Int. Symp. on Defect and Fault Tolerance in VLSI Systems, pp. 436-444, 2005. [62] C. Stroud, A Designer?s Guide to Built-In Self-Test, Boston: Springer, 2002. [63] C. Stroud, S. Konala, P. Chen, and M. Abramovici, ?Built-in self-test of logic blocks in FPGAs,? Proc. IEEE VLSI Test Symp., pp.387-392, 1996. [64] C. Stroud and S. Garimella, ?BIST and diagnosis of multiple embedded cores in SoCs,? Proc. Int. Conf. on Embedded Systems and Applications, pp. 130-136, 2005. [65] C. Stroud, S. Garimella and J. Sunwoo, ?On-Chip BIST-Based Diagnosis of Embedded Programmable Logic Cores in System-On-Chip Devices,? Proc. ISCA Int. Conf. on Computers and Their Applications, pp. 308-313, 2005. [66] C. Stroud, J. Harris, S. Garimella, and J. Sunwoo, ?Built-in self-test for system-on-chip: a case study,? Proc. IEEE Int. Test Conf., pp. 837-846, 2004. [67] C. Stroud, K. Leach, and T. Slaughter, ?BIST for Xilinx 4000 and Spartan series FPGAs: a case study,? Proc. IEEE Int. Test Conf., pp. 1258-1267, 2003. [68] C. Stroud, J. Nall, M. Lashinsky and M. Abramovici, ?BIST-Based Diagnosis of FPGA Interconnect,? Proc. IEEE Int. Test Conf., pp. 618-627, 2002. [69] P. Sundararajan, S. McMillan, B. Blodget, C. Carmichael, and C. Patterson, ?Estimation of Single Event Upset Probability Impact of FPGA Designs,? Military and Aerospace Programmable Logic Devices Conf., 2003. [70] J. Sunwoo and C. Stroud, ?Built-In Self-Test of Configurable Cores in SoCs Using Embedded Processor Dynamic Reconfiguration,? Proc. Int. SoC Design Conf., pp. 174- 177, 2005. [71] S. Toutounchi and A. Lai, ?FPGA test and coverage,? Proc. IEEE Int. Test Conf., pp. 599-607, 2002. [72] A. van de Goor, Testing Semiconductor Memories Theory and Practice, Hoboken: John Wiley and Sons, 1991. 167 [73] S. Vemula and C. Stroud, ?Built-In Self-Test for Programmable I/O Buffers in FPGAs and SoCs?, Proc. IEEE Southeastern Symp. on System Theory, pp. 534-538, 2006. [74] Virtex-4 FPGA Configuration User Guide, UG071 (v1.1), Xilinx Inc., 2008. [75] Virtex-4 FPGA User Guide, UG070 (v2.5), Xilinx Inc., 2008. [76] Virtex-5 Family Overview, DS100 (v5.0), Xilinx Inc., 2009. [77] Virtex-5 FPGA Configuration User Guide, UG191 (v3.2), Xilinx Inc., 2008. [78] Virtex-5 FPGA ExtremeDSP Design Considerations: User Guide, UG193 (v3.3), Xilinx Inc., 2009. [79] Virtex-5 FPGA User Guide, UG190(v4.2), Xilinx Inc., 2008. [80] Virtex-6 Family Overview, DS150 (v1.0), Xilinx Inc., 2009. [81] L-T Wang, C. Stroud, and N. Touba, System-on-Chip Test Architectures, San Francisco: Morgan Kaufmann, 2007. [82] L-T Wang, C-W Wu, and X. Wen, VLSI Test Principles and Architectures, San Francisco: Morgan Kaufmann, 2006. [83] Xilinx TRMTool User Guide: TMRTool Software Version 9.2i, UG156 (v2.2), Xilinx Inc., 2009. [84] XPS HWICAP Product Specification, DS586(v1.00.a),. Xilinx Inc., 2007. [85] J. Yao et. al., ?Built-In Self-Test of Programmable Interconnect in Virtex-4 FPGAs,? Proc. IEEE Southeastern Symp, on System Theory, pp. 29-33, 2009. [86] C. Yui, G. Swift, and C. Carmichael, ?Single Event Upset Susceptibility Testing of the Xilinx Virtex-II FPGA,? Military and Aerospace Programmable Logic Devices Conf., 2002. [87] L. Zhao, D. Walker and F. Lombardi, ?IDDQ Testing of Input/Output Resources of SRAM-Based FPGAs,? Proc. Asian Test Symp., pp. 375-380, 1999.