Digital Phase Accumulation for Direct Digital Frequency Synthesis by Joseph Dominic Cali A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Auburn, Alabama May 5, 2013 Keywords: DDS, DDFS, DCDO, DAC, Phase Truncation Errors, CORDIC Copyright 2013 by Joseph Dominic Cali Approved by Fa Dai, Chair, Professor of Electrical and Computer Engineering Richard Jaeger, Ginn Distinguished Professor of Electrical and Computer Engineering Robert Dean, Associate Professor of Electrical and Computer Engineering Stanley Reeves, Professor of Electrical and Computer Engineering Abstract This work explores direct digital frequency synthesis (DDFS) theory and design and its application in radar systems. Though there is nothing particularly novel about DDFS in general, recent designs have been revolutionized with the advancements in CMOS processes and SiGe BiCMOS integration from 2000 to the current day. Many of the performance limitations highlighted in early literature, such as the area and power of the sinusoidal read- only memory (ROM), no longer apply to designs in modern integrated circuit (IC) processes. The digitally-controlled digital oscillator (DCDO) of the DDFS can now produce signals with spectral purity far beyond the capabilities of the digital to analog converter (DAC). CMOS miniaturization allows for high dynamic range sinusoids to be generated with CORDICs instead of lossy compressed sine and cosine ROMs. Parallelization in the accumulator and modulation paths eliminate the need for power hungry, current mode logic (CML) pipeline accumulators. Noise shaping is better understood than at any point prior to this moment, whichallowsustomitigatequantizationnoisethatarisesfromphaseoramplitudetruncation. However, alarmingly few DDFS designs published in the past five years have taken note of the radical shift in the design landscape. Of equal importance are the new challenges that have arisen in small feature size geometries. In a way, this document is an attempt to consolidate the state of the art in DDFS design and propose improvements from the study. To this end, the dissertation is organized into two distinct sections, the DCDO and the DAC. Digital phase accumulation and sinusoid generation are approached from number theory and real analysis respectively. An exact computation of the spurs generated through phasetruncationisdevelopedthatresultsinclosedformexpressionsfortheDCDOspectrum. Current switches and architectures for improved DAC performance is presented qualitatively. ii Acknowledgments Journeying down the path of higher education can rarely be attributed to the will power or foresight of the individual in pursuit. In recent years, I have appreciated the support of the faculty and staff of Auburn University who have guided me through a challenging five years of graduate school. In addition, I benefitted from the assistance of my fellow graduate students with whom all my designs have interfaced in some manner. I acknowledge my major advisor, Dr. Fa Dai for taking me on as a graduate student and funding eight integrated circuit designs through my stint as a graduate student. I also must mention the members of my committee Dr. Dean, Dr. Jaeger and Dr. Reeves for their specialized assistance through many challenging design problems. I cannot fail to mention Dr. Niu, as his passionate and skilled teaching of semiconductor physics from his deep knowledge of the subject has proven helpful dozens of times on the job in my short time in the workforce. There are countless teachers who from kindergarten through my undergraduate degree at Louisiana State University (LSU) have devoted their energy and time to teaching me and putting up with my relentless questions with regard to the ?how?s? and the ?why?s? of this world. Without the prodding of my professors at LSU, I may have never considered an advanced degree. Above these teachers stand the two greatest teachers in my life, my mother and father, who have patiently raised me and provided emotional and financial support throughout my academic journey. They sacrificed many conveniences for me to attend a private school in preparation for college. Lastly, I must thank my wife, Alison, for supporting me through the endless nights of class work, the many weekends of research, my late night existential crises (now why am I in graduate school again?), and tough medical challenges. She has certainly done more to shape the outcome of this work than any other person in my life. iii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii List of Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii 1 Introduction to Phase Accumulators . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Explanation of Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.1 Number Theory Axioms and Notation . . . . . . . . . . . . . . . . . 6 1.1.2 Binary Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2 Overview of Direct Digital Frequency Synthesis . . . . . . . . . . . . . . . . 13 1.3 Advantages of DDFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.3.1 Digital Phase Modulation . . . . . . . . . . . . . . . . . . . . . . . . 20 1.3.2 Digital Frequency Modulation . . . . . . . . . . . . . . . . . . . . . . 22 1.3.3 Digital Amplitude Modulation . . . . . . . . . . . . . . . . . . . . . . 23 1.3.4 Fine Frequency Resolution and Fast Switching . . . . . . . . . . . . . 24 1.4 Summary of Contributions and Chapter Breakdown . . . . . . . . . . . . . . 24 2 Background of Phase Truncation Analysis . . . . . . . . . . . . . . . . . . . . . 26 2.1 Mehrgardt?s Analysis (1983) . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2 Nicholas?s Analysis (1985) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3 Jenq?s Analysis (1988) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3.1 Jenq?s Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.3.2 Jenq?s Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 iv 2.4 Torosyan?s Analysis (2001) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3 Phase Accumulator Sequences from Number Theory . . . . . . . . . . . . . . . . 43 3.1 Phase Accumulator Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2 Phase Accumulator Period . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3 Truncated Phase Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4 Relationships Between Sequences . . . . . . . . . . . . . . . . . . . . . . . . 62 3.5 Comments on Mathematical Structure . . . . . . . . . . . . . . . . . . . . . 66 4 Spectrum of Truncated Phase Sequences . . . . . . . . . . . . . . . . . . . . . . 68 4.1 Intuitive Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2 Characteristics of Truncated Phase Sequences . . . . . . . . . . . . . . . . . 72 4.3 Spectrum in the Presence of Phase Truncation . . . . . . . . . . . . . . . . . 79 4.4 Interpreting Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.4.1 Ideal SCMF Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.5 Numerical Verification of Theory . . . . . . . . . . . . . . . . . . . . . . . . 96 4.6 SFDR and SNR in the Presence of Phase Truncation . . . . . . . . . . . . . 96 4.6.1 SFDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.6.2 Worst Case SFDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.6.3 Spur Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.6.4 SNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.7 Architecture Changes for Improved Spurious Response . . . . . . . . . . . . 106 4.7.1 Force Coprime FCWs . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.7.2 Phase Accumulator with Prime Number of States . . . . . . . . . . . 109 5 Parallelization of Phase Accumulator . . . . . . . . . . . . . . . . . . . . . . . . 111 5.1 Pipelined Accumulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.2 Parallel Accumulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.2.1 Prior Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.2.2 Derivation of LFM Enabled Architecture . . . . . . . . . . . . . . . . 117 v 5.2.3 Area and Power Growth Analysis . . . . . . . . . . . . . . . . . . . . 119 5.2.4 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . 121 5.3 Multiplexer Upconversion Analysis . . . . . . . . . . . . . . . . . . . . . . . 123 5.4 Behavioral HDL Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.4.1 Problems with Existing Techniques . . . . . . . . . . . . . . . . . . . 126 5.4.2 A Simple Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.4.3 EDA Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 5.4.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 6 Radar Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.1 Previous DDFS Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.1.1 Sine Wave Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.1.2 MTM DDFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 6.1.3 BTM DDFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.1.4 Output Response Analyzer . . . . . . . . . . . . . . . . . . . . . . . . 145 6.2 Overview of Basic Radar Theory . . . . . . . . . . . . . . . . . . . . . . . . 149 6.3 Overview of Stretch Processing . . . . . . . . . . . . . . . . . . . . . . . . . 150 6.3.1 Single Chip Radar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.4 CORDIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 6.4.1 Basic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 6.4.2 Conventional CORDIC . . . . . . . . . . . . . . . . . . . . . . . . . . 168 6.4.3 Optimizing the CORDIC Algorithm for DDFS . . . . . . . . . . . . . 170 6.4.4 Partial Dynamic Rotation CORDIC . . . . . . . . . . . . . . . . . . . 173 6.5 Stretch Processing DDFS Architecture . . . . . . . . . . . . . . . . . . . . . 175 6.5.1 Inverse Sinc Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 6.5.2 Radar Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 6.6 Design of 12-bit CMOS DAC . . . . . . . . . . . . . . . . . . . . . . . . . . 179 6.7 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 vi 7 Digital-To-Analog Converters (DAC) . . . . . . . . . . . . . . . . . . . . . . . . 184 7.1 Basic Sampling Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 7.2 DAC Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 7.3 DAC Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 7.3.1 Static DAC Performance . . . . . . . . . . . . . . . . . . . . . . . . . 196 7.3.2 INL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 7.3.3 DAC Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 7.4 Dynamic DAC Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 7.5 DAC Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 7.5.1 R-2R DACs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 7.5.2 Thermometer Coded and Segmented DACs . . . . . . . . . . . . . . . 209 7.5.3 Return-to-Zero (RTZ) . . . . . . . . . . . . . . . . . . . . . . . . . . 211 7.5.4 Translinear Output Buffers and Non-Linear DACs . . . . . . . . . . . 211 7.6 Current Steering Cell Architectures . . . . . . . . . . . . . . . . . . . . . . . 218 8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 vii List of Figures 1.1 Basic DDFS Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Gate Logic for One?s Complement . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3 Phase Accumulator State Plots (Circle) . . . . . . . . . . . . . . . . . . . . . . 14 1.4 BPSK Waveforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.5 Simple Chirp Accumulator Diagram . . . . . . . . . . . . . . . . . . . . . . . . 22 1.6 10ns Chirp Waveform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1 Sawtooth Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2 Error Sequence Waveform Components . . . . . . . . . . . . . . . . . . . . . . . 33 3.1 Phase Accumulator State Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.1 Spectrums from Two Adjacent FCWs . . . . . . . . . . . . . . . . . . . . . . . . 69 4.2 Simple Estimates for Worst Case SFDR due to Phase Truncation . . . . . . . . 71 4.3 Window Function from Example . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.4 Window Function from Example . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.5 Numerical Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.6 Numerical Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 viii 4.7 SFDR Function (Magnitude) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.8 Forcing Coprime FCWs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.9 Modification SFDR Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . 108 4.10 Forcing Coprime FCWs (Modification) . . . . . . . . . . . . . . . . . . . . . . . 109 4.11 Mersenne Prime (17) Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.1 Phase Accumulator with LFM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.2 Block Diagram of Pipeline Accumulator . . . . . . . . . . . . . . . . . . . . . . 112 5.3 Block Diagram of Pipeline Accumulator with LFM . . . . . . . . . . . . . . . . 113 5.4 [1] Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.5 FSM Chirp-Enabled DDFS with Parallel Processing Path . . . . . . . . . . . . 116 5.6 Finite State Machine for Parallel Processing Path . . . . . . . . . . . . . . . . . 117 5.7 Proposed DDFS Using Novel Parallel Accumulator . . . . . . . . . . . . . . . . 118 5.8 Frequency and Phase Predictive Step . . . . . . . . . . . . . . . . . . . . . . . . 121 5.9 Parallel Phase Accumulator using Predictive Step . . . . . . . . . . . . . . . . . 122 5.10 4-to-1 Upconverting Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.11 CML Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.1 Quadrature, Quarter Sine Compression . . . . . . . . . . . . . . . . . . . . . . . 133 6.2 MTM DDFS Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 ix 6.3 MTM Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.4 MTM DDFS GDSII (130 ?m BiCMOS) . . . . . . . . . . . . . . . . . . . . . . 138 6.5 BTM DDFS Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 6.6 Phase Accumulator State Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.7 BTM ROM Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 6.8 BTM, CORIDC, ORA and DACs (130 ?m BiCMOS) . . . . . . . . . . . . . . . 144 6.9 Galois 18-Bit LFSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 6.10 Phase Accumulator State Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6.11 BTM Simulation Versus Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 147 6.12 Two Tone Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 6.13 ORA Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.14 Example of Stretch Processing Signals . . . . . . . . . . . . . . . . . . . . . . . 151 6.15 Radar-On-Chip Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.16 Die Photograph of RoC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 6.17 CORDIC Vector Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 6.18 CORDIC Coverage Requirement . . . . . . . . . . . . . . . . . . . . . . . . . . 163 6.19 Conventional CORDIC Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 6.20 arctan Small Angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 x 6.21 CORDIC Bit Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 6.22 PDR CORDIC Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 6.23 PDR CORDIC Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 6.24 Block Diagram for Radar DDFS . . . . . . . . . . . . . . . . . . . . . . . . . . 175 6.25 Die Photograph of RoC (DDFS Zoomed) . . . . . . . . . . . . . . . . . . . . . . 176 6.26 Inverse Sinc FIR Filter (Block Diagram) . . . . . . . . . . . . . . . . . . . . . . 177 6.27 Block Diagram of 12-Bit CMOS DAC . . . . . . . . . . . . . . . . . . . . . . . 179 6.28 DAC Current Source Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 6.29 Synchronization Circuit for 12-Bit CMOS DAC . . . . . . . . . . . . . . . . . . 180 6.30 Clock Tree for 12-Bit CMOS DAC . . . . . . . . . . . . . . . . . . . . . . . . . 181 6.31 Inverse Sinc Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 6.32 DDFS with Single Tone Output . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 7.1 Rectangle Function Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 7.2 INLCurvesforThermometer-CodedDACModelswithFiniteOutputImpedance Current Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 7.3 Graphical Explanation of Gain and Offset Errors . . . . . . . . . . . . . . . . . 197 7.4 Graphical Explanation of INL and DNL . . . . . . . . . . . . . . . . . . . . . . 198 7.5 Simple Single-Ended Binary-Weighted Model . . . . . . . . . . . . . . . . . . . 200 7.6 Simple Single-Ended Thermometer Model . . . . . . . . . . . . . . . . . . . . . 200 xi 7.7 Single-Ended Single Bit Active . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 7.8 INLCurvesforThermometer-CodedDACModelswithFiniteOutputImpedance Current Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 7.9 Simple Differential Thermometer Model . . . . . . . . . . . . . . . . . . . . . . 203 7.10 Glitch Versus Device Size (1 ?m to 10 ?m) . . . . . . . . . . . . . . . . . . . . . 206 7.11 R-2R with Binary Scaling (Emitter Network) . . . . . . . . . . . . . . . . . . . 207 7.12 R-2R with Binary Attenuation (Collector Network) . . . . . . . . . . . . . . . . 208 7.13 Segmented R-2R Binary with Thermometer MSBs . . . . . . . . . . . . . . . . 210 7.14 Differential Pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 7.15 Pad? Sine Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 7.16 Translinear Sine Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . 219 7.17 Differential Translinear Cosine Implementation (Ideal Current Sources) . . . . . 220 7.18 Quadrature Translinear DDFS . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 7.19 Simple Current Steering Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 7.20 Current Steering Cells with Cascoding . . . . . . . . . . . . . . . . . . . . . . . 223 7.21 Current Steering Cell with Cascode Output and Keep Alive . . . . . . . . . . . 224 7.22 Current Steering Cell with Cascode, Keep Alive and RTZ . . . . . . . . . . . . 225 xii List of Tables 1.1 Built-in Barker Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.1 Table of Truncated Phase States (4-bit) . . . . . . . . . . . . . . . . . . . . . . 38 4.1 List of Mersenne Primes for Phase Accumulation . . . . . . . . . . . . . . . . . 109 5.1 Comparison of Accumulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.1 Table of Initial Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.2 Example BTM Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.3 Summary of DDFS Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 6.4 DDFS Performance Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 7.1 Published RTZ DACs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 7.2 SFDR of NRTZ DACs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 xiii List of Symbols P Current State of Phase Accumulator . . . . . . . . . . . . . . . . . . . . . 4 A Current Amplitude Output of DCDO . . . . . . . . . . . . . . . . . . . . 4 BP Number of Bits in Phase Accumulator . . . . . . . . . . . . . . . . . . . . 5 BA Number of Bits of Amplitude Resolution in DCDO . . . . . . . . . . . . . 5 NP Number of States in Phase Accumulator . . . . . . . . . . . . . . . . . . 6 ?P Least Period of Phase Accumulator Sequence . . . . . . . . . . . . . . . . 6 F Frequency Control Word . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 ?P Reduced Frequency Control Word . . . . . . . . . . . . . . . . . . . . . . 6 ? Discrete Time Continuous Angular Frequency . . . . . . . . . . . . . . . 36 NE Number of Truncation Error States . . . . . . . . . . . . . . . . . . . . . 55 ?E Least Period of Truncated Sequence . . . . . . . . . . . . . . . . . . . . . 58 NQ Number of unique states in the truncated phase word. . . . . . . . . . . . 59 fT Unity gain Bandwidth Product . . . . . . . . . . . . . . . . . . . . . . . . 112 ?(x) Dirac delta function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 ?T (t) Dirac comb function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 ? Continuous Time Angular Frequency . . . . . . . . . . . . . . . . . . . . 190 t Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 f Ordinary Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 VA Early Voltage of Transistor . . . . . . . . . . . . . . . . . . . . . . . . . . 221 gm Transconductance of Bipolar Transistor . . . . . . . . . . . . . . . . . . . 222 VT Thermal Voltage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 xiv List of Theorems 1.1 Principle (Mathematical Induction) . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Principle (Well-Ordering Principle) . . . . . . . . . . . . . . . . . . . . . . . 6 1.1 Definition (Divides) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 Definition (Least Common Multiple) . . . . . . . . . . . . . . . . . . . . . . 7 1.1 Theorem (Binary Number Representation) . . . . . . . . . . . . . . . . . . . 8 1.1 Lemma (Dropping Modulo Operation in Sinusoids) . . . . . . . . . . . . . . 14 1.2 Lemma (Geometric Series) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3 Lemma (When the Complex Exponential Equals 1) . . . . . . . . . . . . . . 18 2.1 Definition (Fourier Series of Real-Valued Function) . . . . . . . . . . . . . . 29 2.1 Theorem (Nicholas Number of Spurs) . . . . . . . . . . . . . . . . . . . . . . 34 2.2 Theorem (Nicholas Spur Index) . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.3 Theorem (Nicholas Spur Magnitude) . . . . . . . . . . . . . . . . . . . . . . 35 2.4 Theorem (Nicholas Spur Phase) . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.5 Theorem (Jenq?s Non-Uniform Sampling Theorem) . . . . . . . . . . . . . . 36 2.2 Definition (Parseval?s Relation) . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.1 Theorem (The Division Algorithm) . . . . . . . . . . . . . . . . . . . . . . . 44 3.1 Definition (Congruence) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2 Theorem (Phase Accumulator Sequence) . . . . . . . . . . . . . . . . . . . . 46 3.2 Definition (Greatest Common Divisor) . . . . . . . . . . . . . . . . . . . . . 49 3.3 Definition (Relatively Prime) . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.1 Lemma (GCD Divisibility) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.2 Lemma (Linear Modulo Normalization) . . . . . . . . . . . . . . . . . . . . . 50 xv 3.3 Theorem (Phase Accumulator Periodicity) . . . . . . . . . . . . . . . . . . . 51 3.3 Lemma (Alternative Phase Accumulator Expression) . . . . . . . . . . . . . 53 3.4 Lemma (Sum of Two Integers Modulo N) . . . . . . . . . . . . . . . . . . . . 53 3.4 Definition (Truncation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4 Theorem (Truncated Phase Sequence) . . . . . . . . . . . . . . . . . . . . . . 56 3.5 Lemma (Least Period of the Modulo of a Modulo Sequence) . . . . . . . . . 57 3.5 Theorem (Periodicity of Phase Truncation Error Sequence) . . . . . . . . . . 58 3.6 Theorem (Periodicity of the Difference of Two Modulo Sequences) . . . . . . 59 3.7 Theorem (Truncated Phase Sequence Period) . . . . . . . . . . . . . . . . . 61 3.6 Lemma (GCD and Linear Diophantine Equations) . . . . . . . . . . . . . . . 62 3.8 Theorem (Multiplicative Inverse in Modulo Arithmetic) . . . . . . . . . . . . 63 3.9 Theorem (FCW Time Sequence Permutation Relationship) . . . . . . . . . . 64 3.5 Definition (Groups) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.1 Definition (Taylor Series) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.1 Theorem (Delta Phase Steps) . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2 Definition (Kronecker Delta Function) . . . . . . . . . . . . . . . . . . . . . 74 4.2 Theorem (Sub-Sequences of a Finite Sequence) . . . . . . . . . . . . . . . . . 74 4.3 Theorem (Interchanging Summations for Finite Sequences) . . . . . . . . . . 75 4.4 Theorem (Adjacent Truncated Phase Elements) . . . . . . . . . . . . . . . . 76 4.5 Theorem (When Truncated Values Repeat) . . . . . . . . . . . . . . . . . . . 78 4.1 Lemma (Special Sub-Sequence Arrangement for Periodic Sequences) . . . . . 78 4.3 Definition (Discrete Fourier Transform) . . . . . . . . . . . . . . . . . . . . . 79 4.4 Definition (Inverse Discrete Fourier Transform) . . . . . . . . . . . . . . . . 80 4.6 Theorem (Spectrum of Truncated Phase Sequence) . . . . . . . . . . . . . . 80 4.7 Theorem (DCDO Spectrum with Phase Truncation and Arbitrary ROM) . . 85 4.8 Theorem (FCW Frequency Sequence Permutation Relationship) . . . . . . . 86 4.9 Theorem (Number of Phase Accumulator Least Periods) . . . . . . . . . . . 88 xvi 4.2 Lemma (DFT Periodicity) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.3 Lemma (Window Function Periodicity) . . . . . . . . . . . . . . . . . . . . . 91 4.4 Lemma (Period of Amplitude Spectrum with Phase Truncation) . . . . . . . 92 6.1 Definition (Convergent Series (Real)) . . . . . . . . . . . . . . . . . . . . . . 159 6.1 Theorem (Cauchy Convergence Criterion (Real)) . . . . . . . . . . . . . . . . 160 6.1 Lemma (Sequences for Convergent Series) . . . . . . . . . . . . . . . . . . . 161 6.2 Theorem (CORDIC Convergence Theorem) . . . . . . . . . . . . . . . . . . 163 6.2 Definition (Conventional CORDIC Iteration) . . . . . . . . . . . . . . . . . . 167 7.1 Definition (Dirac Delta) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 7.1 Theorem (Nyquist-Shannon Sampling Theorem) . . . . . . . . . . . . . . . . 187 7.2 Definition (Convolution) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 7.2 Theorem (Fourier Convolution Theorem) . . . . . . . . . . . . . . . . . . . . 188 xvii List of Abbreviations BIST Built-In Self-Test BPSK Binary Phase Shift Keying BTM Bipartite Table Method CDMA Code Division Multiple Access CML Current Mode Logic CORDIC COordinate Rotation DIgital Computer CS current steering CW Continuous Wave DAC Digital-to-Analog Converter DDFS Direct Digital Frequency Synthesis DEM Dynamic Element Matching DFF D-Flip-Flop DFT Discrete Fourier Transform DNL Differential Non-Linearity DSP Digital Signal Processing Emitter Coupled Logic ENOB Effective Number of Bits xviii FCW Frequency Control Word FIR Finite Impulse Response GCD Greatest Common Divisor IDE Integrated Development Environment INL Integral Non-Linearity KCL Kirchoff Current Law KVL Kirchoff Voltage Law LFM Linear Frequency Modulation LSB Least Significant Bit MSB Most Significant Bit MTM Multipartite Table Method NRTZ Non-Return-to-Zero ORA Output Response Analyzer PLL Phase-Locked Loop PM Phase Modulation QAM Quadrature Amplitude Modulation QPSK Quadrature Phase Shift Keying RAM Random Access Memory ROM Read-Only Memory RTZ Return-to-Zero xix SCMF Sine or Cosine Mapping Function SFDR Spurious Free Dynamic Range SINAD Signal to Noise Ratio and Distortion SNDR Signal to Noise and Distortion Ratio SNR Signal to Noise Ratio Source Coupled Logic SPI Serial Peripheral Interface SSM Static Mismatch Shaping SSPA Switching Sequence Post Adjustment THD Total Harmonic Distortion TSPC True Single Phase Clock WLAN Wireless Local Area Network xx Chapter 1 Introduction to Phase Accumulators In this chapter, Direct Digital Frequency Synthesis (DDFS) is introduced as an impor- tant component in modern 21st century communication systems, and its fundamental oper- ating principles are presented. Wireless cellular communication techniques such as code divi- sion multiple access (CDMA) and spread spectrum wireless local area networks (WLAN) [2] require fast frequency switching, an attribute in which DDFS excels over conventional ana- log frequency synthesis approaches. As integrated circuit processes advance, DDFS is also emerging as a critical component in commercial radar systems, agile clock synthesizers [3] and high speed testing equipment [4] opening up new opportunities in industries outside of telecommunications, the automotive industry being one of the more exciting [5],[6]. In DDFS systems, the amplitude, frequency, and phase of synthesized waveforms can be modulated digitally and nearly instantaneously, which depending upon the operating frequency of the technology and the level of pipelining in the digital core could mean less than a few nanoseconds. The lock time of a standard analog phase-locked loop (PLL) can be on the order of several hundred microseconds as a result of the slow settling time of the loop filter [7]. The ability to directly modulate the signal also allows for arbitrary, high- bandwidth waveform synthesis varying from simple phase-shift keying used in low cost signal data transmission systems to complex non-linear frequency sweeps used in radar systems [8]. One of the better published results of an arbitrary waveform generator is presented by Van de Sande et al. [4] the year of this writing, indicating that research in the field remains active. The DDFS operates not by digitally controlling an analog oscillator component but by numerically computing a complex digital signal and directly converting it to a physical electrical quantity through a digital-to-analog converter (DAC). The phase, frequency, and 1 Phase Accumulator PhaseRegister summationtext BPF BF BPT1 BP Sine/Cosine Mapping BA CLK DAC y(t)P A Figure 1.1: Basic DDFS Block Diagram amplitude of a DDFS system are themselves digital codes that can be modulated in the digital domain through simple multiplication and addition operations. These operations, excluding quantization, are completely linear and thus superior to the analog equivalent operations that apply unwanted harmonic distortion and spurious mixing to the signal. The earliest implementation of a circuit in an academic publication that resembles the modernDDFSappearsin 1971byJosephTierneyet al.[9]. Figure1.1shows thearchitecture of the DDFS proposed in [9], exluding the quadrature sine and cosine outputs and the analog reconstruction filter after the DAC, which is the basic architecture of modern DDFS devices. The focus of this dissertation is explaining how to generate spectrally pure sinusoids with suchadeviceinanefficientmannerbygatheringpublishedresultsforthevariouscomponents comprising the device and inserting mathematical explanations when necessary or helpful. Care is taken in an attempt to place the analysis of DDFS systems on a clear mathematical foundation and perhaps to illuminate some of the more difficult concepts such as the rise of spurs through phase truncation. For readers not familiar with the terminology, a spur is unwanted coherent spectral energy, harmonically related to the intended synthesized signal or otherwise. This implies that attaching a spectrum analyzer to the capture the waveform generated, one would see a distinct tone that did not decrease in power when averaged over time, hence the use of coherency in the definition. In the digital domain, one would find that increasing the length 2 of the Discrete Fourier Transform also had no influence over the magnitude or phase of the unwanted tones. The fundamental components of the DDFS are a clock receiver and distribution tree, an overflowingaccumulator, asineand/orcosinemappingfunction(SCMF),oneormoreDACs, andareconstructionfilterattheDAC(s)output. Theoverflowingaccumulatorisoftencalled the phase accumulator, as its cyclic overflowing is analogous to the phase of a sinusoid. The accumulator is incremented by a value known as the frequency control word (FCW). The reconstruction filter is not studied in detail, but several DAC clocking methodologies to reduce the stringent requirements of the filter are presented during the DAC architecture survey (Section 7.5). The term SCMF is used instead of the more common read-only-memory (ROM) or ?lookup table? (LUT). The terminology is borrowed from Torosyan in his dissertation and publications [10] and is general enough to encompass the wide range of techniques available for sinusoidal phase to amplitude conversion. The name choice also separates the functional behavior of the component from its realization on silicon. The cost of digital memory has become so remarkably inexpensive both in area and power that some designs use random access memory (RAM) as a opposed to a ROM to implement the SCMF function. Bipartite and multipartite table methods , BTM [11] and MTM [12] respectively, and the COrdinate Rotation DIgital Computer (CORDIC) [13] are implemented and studied in this work as effective techniques for implementing the SCMF. 1.1 Explanation of Notation The conventions used in this document are described in this section for reference. This is particularly important in mixed-signal systems such as a DDFS, as many of the analyses of the behavior of the device transition between discrete-time and continuous-time repre- sentations of the signal. The same conceptual entity crosses several processing domains. In order to clearly denote when a digital variable, or some non-digitized discrete time sequence, 3 is intended, an upper case English letter glyph is used. For instance, P represents the phase state of the phase accumulator and A represents the amplitude output of the SCMF. An immense effort was put into the writing attempting ? To provide consistency in notation. The author wants avoid incessant flipping between this section and subsequent sections. ? Toavoidcollisionswithimportantvariablesinliterature. Forinstance,repeatedlyusing a variable Q in a text about passive filters in a manner unrelated to the quality-factor of an inductor or energy storage tank can be confusing. The nth element in the sequence P is denoted with square brackets, P[n] being the state of the phase accumulator at clock cyclen. Parentheses are used to denote continuous functions, where y = x(t) is the value of the function x corresponding to the argument t. The phase accumulator at time nTs is given as P(nTs), where TS is the period of the clock driving the DDFS. Some mathematics texts [14] use a more general and formal notation to represent the same function concept x : tmapsto?y, where t?ST and y?SY and ST and SY are sets. The notation x?S means that the element x is contained in the set S. There are some commonly used sets in mathematics that appear frequently in DDFS analysis. Instead of repeatedly listing the elements that form the set, a list of all sets used 4 in this document are presented below (many used by [15] in his Modern Algebra text): ?= The empty set (i.e. that set containing no elements) (1.1) B = The set containing only 0 and 1 (1.2) P = The set of all positive integers, also known as the natural numbers (1.3) P0 = The set of all positive integers including zero (1.4) Z = The set of all integers (1.5) Zn = The set: {x?Z : 0?x 0 and b0 = 1. x = ?summationdisplay n=0 2ibi = 20b0 = 1, bi = 0,i> 0 (1.11) Thus 1?S. Now we take the induction step. Assume that x?S and every positive integer from 1 to x can be represented by a binary number. We must now show that x+ 1?S. We can do this by finding a binary representation of x+ 1, or equivalently, finding ci such that x+ 1 = ?summationdisplay n=0 2ici (1.12) Let us briefly consider some of the properties of x. By our induction step, we know that x can be written as an unsigned binary number: x = ?summationdisplay n=0 2ibi = 20b0 + 21b1 +??? = 20b0 + 2 parenleftBig b1 + 21b2 +??? parenrightBig (1.13) Clearly, the second term is a multiple of 2 and is therefore an even number by definition. If b0 = 1, then the first term evaluates to 1, and adding 1 to an even number yields an odd number. So for x to be even b0 = 0 otherwise x is odd. Now let us tackle the case of x + 1 assuming x is even. x+ 1 = ?summationdisplay n=0 2ici = ?summationdisplay n=0 2ibi + 1 20c0 + 21c1 +???= parenleftBig 20b0 + 21b1 +??? parenrightBig + 1 (1.14) Since x is assumed even, b0 = 0. Applying this knowledge and rearranging we get 20c0?1 = parenleftBig 21b1 + 22b2 +??? parenrightBig ? parenleftBig 21c1 + 22c2 +??? parenrightBig (1.15) 9 Setting c0 = 1 and setting bi = ci for i = {1,2,???}sets both the left and right hand sides of the equation to zero and the equality holds. Thus x+ 1?S whenever x is even. Now let us consider the case when x is odd. If x is odd then x+ 1 is even. Since x + 1 is even, 2|(x + 1) by definition and there exists a number d such that 2d = (x + 1), in this case d is a positive integer since x + 1 is an even positive integer. If we can show that d can be written as an unsigned binary number then x+ 1 can be written as an unsigned binary number. We can show this by proving that d?x and thus by our induction hypothesis, i.e. every positive integer from 1 to x can be represented by a binary number, d?S. d?x? (x+ 1)2 ?x?(x+ 1)?2x (1.16) Since the least integer in our set S is 1, it is clear that the previous inequality holds (if this is not satisfactory, then apply induction to the inequality). Since d?S, we can now find the binary representation of x+ 1. (x+ 1) = 2d ?summationdisplay n=0 2ici = 2 parenleftBigg?summationdisplay n=0 2ibi parenrightBigg parenleftBig 20c0 + 21c1 + 22c2 +??? parenrightBig = parenleftBig 21b0 + 22b1 +??? parenrightBig parenleftBig 21c1 + 22c2 +??? parenrightBig = parenleftBig 21b0 + 22b1 +??? parenrightBig (1.17) Since x+ 1 is even, c0 = 0. It is clear from Equation 1.17 that setting ci = bi?1 for all i> 0 causes the equality to hold. Therefore x + 1 ?S whenever x is odd. Since the induction step holds for all x + 1, the set S = P and we have shown that all positive integers can be represented by an unsigned binary number. 10 In two?s complement representation, the value of B is vB =?2N?1bN?1 + N?2summationdisplay i=0 2ibi. (1.18) Theminimumandmaximumvaluesofthetwo?scomplementrepresentationcanbecomputed similarly to the unsigned binary case. Using Equation 1.18, the maximum value is obtained by setting bN?1 = 0 and bN?2 down to b0 to 1. The minimum value is obtained by setting bN?1 = 1 and bN?2 down to b0 to 0. max{vB}= N?2summationdisplay i=0 2i = 2N?1?1 (1.19) min{vB}=?2N?1 (1.20) The conversion of an unsigned full-scale binary number to the two?s complement number system such that the zero from the unsigned representation maps to the lowest two?s com- plement value and the maximum valued in unsigned representation maps to the maximum two?s complement value involves only inverting the most significant bit (MSB) of the un- signed number. A technique used commonly in DDFS designs to approximate the negation of the value of an integer is to take a one?s complement of the two?s complement binary representation of the number. In one?s complement, all the bits of B are inverted. This operation is popular because of its efficient hardware implementation, as only N XOR gates are required for the inversion of an N-bit word. The architectures presented in Chapter 6 utilize this technique in the sinusoidal compression algorithm. Figure 1.2 is a gate level block diagram of a conditional one?s complement operation. The bit a inverts all the bi bits when asserted high but does not affect the value of bi when asserted low. One must carefully evaluate the approximation of negation using one?s complement in the system. So consider the effect of one?s complement on a word B in a two?s complement 11 a bn?1 bn?2 bn?3 b0 bn?1 bn?2 bn?3 b0 Figure 1.2: Gate Logic for One?s Complement binary system. The resulting one?s complement value vB1 is given in Equation 1.21. vB1 =?2N?1bN?1 + N?2summationdisplay i=0 2ibi (1.21) where bi is the complement of bi, meaning that if bi = 0 then bi = 1 and if bi = 1 then bi = 0. From this one can see that the one?s complement does not negate vB, which is to say ?vBnegationslash= vB1. vB1 +vB =?2N?1 parenleftBig bN?1 +bN?1 parenrightBig + N?2summationdisplay i=0 2i parenleftBig bi +bi parenrightBig =?2N?1 + N?2summationdisplay i=0 2i =?1 since bi +bi = 1 using the definition previous supplied by the definition of a complement in a binary number system. From the previous equation, we see that vB1 =?vB?1. 12 1.2 Overview of Direct Digital Frequency Synthesis TheDDFSofFigure1.1operatesbyincrementinganaccumulatorattheclockfrequency fclk by the value F, where F ?ZNP and ZNP ={0,1,2,...,NP?1}is the set of all integers between 0 and NP?1 inclusive. F is generally constrained between zero and the maximum value of the phase accumulator, though there are exceptions to this rule when reducing area overhead of the adder and control logic by a minuscule margin is critical [13]. F represents the FCW and will be used as its symbol in mathematical notation. The accumulator has a finite bit resolution BP and therefore the accumulator will overflow periodically as F is continually added to the previous accumulator state. The rate of overflow of the phase accumulator is thus dependent on F. F thereby controls the frequency of the synthesized DDFS waveform, suggesting that both the phase accumulator and FCW are aptly named. The periodic overflowing of the accumulator is a remarkably efficient technique for implementingtheperiodicphasebehaviorofasinusoid. Thephaseaccumulatormaps [0,NP) to [0,2pi) when driving a lookup table that maps P ? [0,NP) to sin (2piP/NP). Referring to Equation 1.10, NP = 2BP ?1 is the maximum integer value that can be stored in the phase accumulator when treating the phase accumulator value as an unsigned integer. With additional hardware, the number of phase states can be set to any positive integer less than 2BP ?1. One of the new achievements of this work is finding closed form equations for the spectrum of a DCDO for any NP. The relationship between the current integer phase state of the accumulator, P, and the analogous normalized phase value, ?, is then ?[n] = 2piN P P[n] (1.22) where ? is in radians. This maps the phase states uniformly across [0,2pi) in NP steps. Figure 1.3a demonstrates the mapping between the phase states of a 3-bit accumulator and the corresponding phase in radians. But this particular mapping need not be the case. The phase accumulator could, in fact, map to any closed interval in R. This feature will prove 13 useful in implementing sinusoidal quarter-wave compression in Section 6.1.1. Figure 1.3b shows a 3-bit phase accumulator with a 1/2 least significant bit (LSB) offset in which case P ?[0,NP) maps to [pi/8,2pi +pi/8). Note that in both the figures, the phase step remains 000 001 010 011 100 101 110 111 pi 4 (a) Phase Mapping Circle 000 001010 011 100 101 110 111 pi 8 pi 4 (b) Phase Mapping Circle (1/2 LSB Offset) Figure 1.3: Phase Accumulator State Plots (Circle) uniform. The adjustment in the mapping happens through the SCMF and will be discussed in more detail in Chapter 6. The SCMF of the DDFS takes the value of P (Equation 3.4) and maps it to the appro- priate sine or cosine value A. Equation 1.23 shows the SCMF mapping for an untruncated phase accumulator word to an ideal sine function. A[n] = sin parenleftbigg 2pi NPP[n] parenrightbigg (1.23) = sin parenleftbigg 2pi NP ?nF +P0?NP parenrightbigg = sin parenleftbigg2piF NP n+ 2pi NPP0 parenrightbigg The modulo NP arithmetic in the argument can be dropped since the period of the sinusoid is 2pi and implicitly executes the modulo operation. This can be demonstrated through the following simple lemma. 14 Lemma1.1(DroppingModuloOperationinSinusoids). Themodulooperatorcanbedropped within the sine and cosine functions with argument 2piP[n]/NP, where P[n] is given by Equa- tion 3.1. Equivalently, sin parenleftbigg 2pi NP ?nF +P0?NP parenrightbigg = sin parenleftbigg 2pi NP (nF +P0) parenrightbigg (1.24) Proof. Using the definition of the modulo operation described in Section 3.1, for arbitrary n?Z and P0 ?Z there exists an integer d such that ?nF +P0?NP = nF +P0?dNP = r (1.25) where 0 ? r < NP. Plugging nF + P0 ?dNP for P[n] and applying the trigonometric difference identity for sine yields: sin parenleftbigg 2pi NP (nF +P0?dNP) parenrightbigg = sin parenleftbigg 2pi NP (nF +P0) parenrightbigg cos parenleftbigg 2pi NP (dNP) parenrightbigg (1.26) ?sin parenleftbigg 2pi NP (dNP) parenrightbigg cos parenleftbigg 2pi NP (nF +P0) parenrightbigg This can be further reduced by observing that cos parenleftbigg 2pi NP (dNP) parenrightbigg = cos (2pid) = 1 (1.27) sin parenleftbigg 2pi NP (dNP) parenrightbigg = sin (2pid) = 0 (1.28) Finally, substituting Equation 1.27 and Equation 1.28 back into Equation 1.26, sin parenleftbigg 2pi NP (nF +P0?dNP) parenrightbigg = sin parenleftbigg 2pi NP (nF +P0) parenrightbigg (1.29) 15 The spectrum of such a sinusoid (Equation 1.23) can calculated by executing a discrete Fourier transform over one period of the waveform, which is NP in this particular case. The discrete Fourier transform is defined later in Section 4.3. F{A}[k] = NP?1summationdisplay n=0 sin parenleftbigg2piF NP n parenrightbigg e?j2pikn/NP, 0?k?NP?1 = NP?1summationdisplay n=0 1 2j bracketleftBig ej2piFn/NP ?e?j2piFn/NP bracketrightBig e?j2pikn/N = NP?1summationdisplay n=0 1 2j bracketleftBig ej2pin(F?k)/NP ?e?j2pin(F+k)/NP bracketrightBig (1.30) Euler?s formula, given in Equation 1.31, was applied to the sine function to simplify the expression. ejx = cos (x) +jsin (x) (1.31) Euler?s formula can be used to write sine and cosine function as the sum of two complex ex- ponentials as follows (to verify, substitute Equation 1.31 for each of the complex exponential terms in the equations below): sin (x) = 12j bracketleftBig ejx?e?jx bracketrightBig (1.32) cos (x) = 12 bracketleftBig ejx +e?jx bracketrightBig . (1.33) Equation 1.30 is the difference of two geometric series. A technique for finding the closed form solution of a geometric series is presented here. Lemma 1.2 (Geometric Series). A summation series of the form N?1summationdisplay n=0 rn 16 is called a finite geometric series and can be written as the ratio of two numbers, N?1summationdisplay n=0 rn = 1?r N 1?r (1.34) if rnegationslash= 1. Proof. Let rnegationslash= 1. The problem is solved by expanding the summation N?1summationdisplay n=0 rn = 1 +r +r2 +???+rN and multiplying both sides by 1?r and using the distributive property of multiplication over addition. (1?r) N?1summationdisplay n=0 rn = (1?r) parenleftBig 1 +r +r2 +???+rN?1 parenrightBig = parenleftBig 1 +r +r2 +???+rN?1 parenrightBig ? parenleftBig r +r2 +r3 +???+rN parenrightBig = 1?rN Then dividing both sides of previous equation by 1?r, which can be done since rnegationslash= 1, gives N?1summationdisplay n=0 rn = 1?r N 1?r (1.35) and the proof is complete. The summation of ej2pian/NP for a?Z over one period is zero if NP - a. This can be shown by directly computing the summation using Equation 1.34, NP?1summationdisplay n=0 ej2pian/NP = NP?1summationdisplay n=0 parenleftBig ej2pia/NP parenrightBign = 1?e j2piaNP/NP 1?ej2pia/NP = 1?e j2pia 1?ej2pia/NP = 1?1 1?ej2pia/NP = 0 17 The summation of a geometric series works because ej2pia/NP negationslash= 1 when NP -a. Lemma 1.3 (When the Complex Exponential Equals 1). ej2pia/NP = 1 (1.36) if and only if NP |a. Proof. The forward proof is simple, as NP |a means that there exists an integer d such that dNP = a. Substituting this value for a in Equation 1.36 yields: ej2pidNP/NP = ej2pid. (1.37) Applying Euler?s formula (Equation 1.31) to Equation 1.37 ej2pid = cos (2pid) +jsin (2pid) = 1 +j0 = 1 (1.38) Now one must show that if Equation 1.36 holds then NP |a. Applying Euler?s formula to Equation 1.36 yields ej2pia/NP = cos parenleftbigg2pia NP parenrightbigg +jsin parenleftbigg2pia NP parenrightbigg = 1 (1.39) The right hand side of Equation 1.39 only equals 1 when the cosine term equals 1 and the sine term equals 0. Cosine only equals 1 when the argument is 2pin for n?Z. Then 2pin = 2piaN P (1.40) a = NPn (1.41) Then NP |a. Plugging the solved value of a into the sine argument gives 0 and the equation holds. Therefore, Equation 1.36 is true if and only if NP |a. 18 Finally, the value of the summation for when F?k = 0 must be computed NP?1summationdisplay n=0 1 2je j2pin(F?k)/NP = 1 2j NP?1summationdisplay n=0 ej2pin0/NP (1.42) = 12j NP?1summationdisplay n=0 1 = NP2j (1.43) Gathering together results of the analysis gives the final DFT result. F{A}[k] = ? ???? ??? ??? ???? NP 2j k = F ?NP2j k =?F 0 otherwise (1.44) The results of the DFT indicate that all the spectral energy generated by a DDFS assum- ing no phase truncation and an ideal SCMF is located at a single frequency bin F, which is precisely the FCW. This leads to what should perhaps be referred to as the fundamen- tal equation for DDFS systems. Noting the relationship between the DFT and CTFT for uniform sampling interval Tclk, an equation for the output frequency of the DDFS can be formulated f0 = FN P fclk (1.45) where f0 is the frequency of the fundamental tone generated by the DDFS system. 1.3 Advantages of DDFS As is the custom for Auburn University dissertations and publications concerning DDFS devices, a brief explanation of the advantages of DDFS over traditional PLLs is supplied. The benefits ultimately reduce to the benefits of operating on signals in the digital domain over the analog domain: ? Straight-forward efficient implementation of complex modulation schemes. 19 ? Direct manipulation of parameters that are often difficult to precisely control in the analog equivalent circuit. ? Digital signals are not corrupted by noise as easily as their analog domain counterparts. ? Arithmetic operations are highly linear and tolerant to device mismatch. In addition to these, several other benefits of DDFS already mentioned earlier in the introduction are summarized. ? Very fast frequency switching, several orders of magnitude faster than that of a tradi- tional PLL [7]. ? Fine frequency resolution (see Equation 1.45). ? Broadband frequency synthesis, as the same device with a sufficiently fast input clock cansynthesizesignalsontheorderofseveralKilohertztotheorderofseveralGigahertz. 1.3.1 Digital Phase Modulation Phase modulation (PM) is the process of adding, subtracting, multiplying or otherwise affecting the output of the phase accumulator. The phase signal after phase modulation is Pm[n] = P[n] +M[n] or Pm[n] = M[n]P[n] (1.46) where P[n] is the phase accumulator, M[n] is the modulation sequence and Pm[n] is the resulting modulated phase word. Typically this is used to efficiently generate various phase shift keying techniques such as Binary Phase Shift Keying (BPSK) and Quadrature Phase Shift Keying (QPSK). By controlling the phase of the synthesized waveform, information can be encoded in the synthesized signal for transmission over noisy mediums. Figures 1.4a and 1.4b show the transient waveforms of continuous wave (CW) signals modulated with BPSK sequences. This is not to be confused with frequency modulation, which is discussed next in Section 1.3.2. 20 -1 -0.5 0 0.5 1 0 10 20 30 40 50 Amplitude Time Coded WaveCode Sequence (a) BPSK for 1110010 Sequence -1 -0.5 0 0.5 1 0 10 20 30 40 50 Amplitude Time Coded WaveCode Sequence (b) BPSK for 1010101 Sequence Figure 1.4: BPSK Waveforms In the radar architecture described in Chapter 6, Barker codes are hardwired into the PM circuitry for testing. Table 1.1 shows the list of Barker codes [8] built into the DDFS. These and other more complex codes were implemented in the radar system for ?short- range?, low-power pulse compression operating modes. The phase is flipped every k clock Table 1.1: Built-in Barker Codes Number Of Bits Code 3 110 5 11101 7 1110010 11 11100010010 13 1111100110101 cycles by 180 degrees if a binary ?1? is encountered and is not flipped if a ?0? is encountered. The hardware implementation is as simple as toggling the MSB of the phase accumulator, which only requires an XOR gate. 21 1.3.2 Digital Frequency Modulation Linear frequency modulation (LFM), also known as chirp modulation, can be imple- mented by adding a frequency accumulator before the phase accumulator. Figure 1.5 shows a block diagram implementation of the modification that should be made to the accumulator to allow for LFM. A full derivation of LFM is provided in Section 5.2.2, but in short, the 32 32 32 Frequency Accumulator Phase Accumulator Ph a se R egi s ter F r equ en c y R egi s ter 32 32 Figure 1.5: Simple Chirp Accumulator Diagram frequency of the DDFS increments by F0 every clock cycle. While not shown explicitly in the figure, the frequency register is typically initialized to some start frequency and other control circuitry watches the frequency register value to issue a stop command. The chirp rate is controlled with the same precision and agility as the frequency of a traditional phase accumulator. Figure 1.6 shows a 10 ns duration chirp waveform with frequency accelerating from around 500 MHz to 5 GHz. The chirp waveform is well suited for radar application requiring pulse compression. In Chapter6, LFMisimplementedinastretchprocessingpulsecompressionradar. Figure6.14a shows several conceptual chirp waveforms in the explanation of stretch processing. The fine control of the chirp rate allows the radar system to dynamically tailor its output waveform based on the distance of the target under investigation. 22 -1 -0.5 0 0.5 1 0 2 4 6 8 10 Amplitude Time (ns) Figure 1.6: 10ns Chirp Waveform 1.3.3 Digital Amplitude Modulation By placing a multiplier at the output of the SCMF, the amplitude of the output can be digitally modulated. This operation is important for implementing quadrature amplitude modulation (QAM) schemes such as QAM16 and QAM64, which are commonly used in digital communications systems such as fiber and cable internet [18]. The operation is performed in the digital domain, so implementing higher order (such as QAM 256) or more complex modulation schemes is feasible without much added overhead to the DDFS system, though the DAC and filter requirements become increasingly difficult. In more complex systems, a finite impulse response (FIR) filter can be added to the output of the SCMF. This filter can be used to pre-distort the signal before driving the DAC to compensate for the non-idealities of the following analog circuitry, or of the non-linearity of the DAC itself. In Section 6.5.1, the design of an inverse sinc filter is shown that is applied to the radar system of Chapter 6. The inverse sinc FIR compensates for the zero-order hold operation of a traditional current steering DAC. 23 1.3.4 Fine Frequency Resolution and Fast Switching One of the most cited benefits to DDFS designs is fine frequency tuning [7]. Equa- tion 1.45 provides the frequency of a DDFS given an FCW F. Informally, the resolution of a DDFS device is the difference in synthesized frequency between two adjacent FCWs. This can be calculated by setting F = 1 f0 = FN P fclk = fclkN P (1.47) A 32-bit phase accumulator is a value commonly found in commercial DDFS parts [19],[20]. For a 1 GHz clock frequency, this results in a frequency resolution of 109/232 ?0.23 Hz. This also implies that the LFM discussed in Section 1.3.2 is capable of generating remarkably smooth chirp waveforms when given enough bits of resolution. The DDFS can also rapidly switch between different output frequencies. This is of critical importantance in spread spectrum applications, where the speed of the frequency switching directly impacts the performance of the system. Using the DDFS as a local oscillator allows one to quickly switch to a different band, as quickly as the analog filter can respond to the changes. Combining the fine frequency resolution and fast frequency switching makes for a versatile solution for demanding problems from a host of fields [19], ranging from biomedical to military. 1.4 Summary of Contributions and Chapter Breakdown This section summarizes the contributions of the author to the state of the art in the analysis of phase truncation spurs and to DDFS design in general. In Chapter 3, a complete approach for calculating the least period of the output sequence of a phase accumulator with truncation is derived. This approach is completely general and does not depend on the number of states in the phase accumulator to be a power of two. To the author?s knowledge, there are no publications that perform this analysis. In Chapter 2, the most frequently cited 24 published analyses on phase truncation spurs are presented in chronological order. The notation between the analysis is unified. Again, the author is not aware of any such analysis available in published literature. By comparing the methodologies, it becomes clear what can and what cannot be computed using each methodology and how strongly analyses are dependent upon their predecessors? research. In Chapter 4, an exact and fully general analysis of phase truncation spurs is developed. The technique generalizes and builds upon Torosyan?s work on phase truncation spurs and is not subject to the same limitations. Furthermore, the approach on computing the closed form expression for phase truncation spurs is completely quantitative and does not depend on any approximations or intuition to arrive at the results. This is to say that the analysis is a direct computation of the discrete Fourier transform on the output of a DCDO. Here it is further shown how the spectrum of the quantized amplitude waveform is mostly independent of the phase truncation spurs. Application of the theory is applied to the author?s previous published design [21] with suggestions for improvements. In Chapter 5, a novel approach to parallelizing a phase accumulator with frequency modulation is presented. It covers several patents on the topic, as there is not much in the way of academic publications. In Chapter 6, the DDFS designs at Auburn University and their application in a simple single-chip radar system is explored [22]. The design attempted the challenging task of co-locating a radar transmitter and receiver onto the same silicon substrate. A quadrature DDFSgeneratedtheradarwaveformandpulsecompressionsequence. Lastly, inChapter7, a surveyofliteraturehighlightsmanyoftheproblemsintheDACdesignsatAuburnUniversity and offers a suggestions for improving DAC designs based on observations made during the survey. Some of the analysis and observations in the chapter have not been published to the author?s knowledge. 25 Chapter 2 Background of Phase Truncation Analysis In this chapter, an overview of the literature surrounding the analysis of the perfor- mance of DDFS devices is presented. Despite widely cited publications by Nicholas [23], Jenq [24] and Torosyan [10] on the analysis of phase truncation spurs in DDFS, papers are still published (or submitted for publishing) with either old approximations of the spurious behavior for which concise closed-form equations exist, or, worse yet, incorrect reasoning related to the location, magnitude and optimization of such spurs. Oftentimes early, widely cited papers continue to propagate while newer analyses go unnoticed. Two authors, Jenq and Torosyan, through two separate mathematical techniques, provide a closed-form solution to the spurs generated from phase truncation. Jenq?s analysis can be elegantly used to compute the signal-to-noise ratio in the presence of phase truncation but does not compute the location of the spurs. Torosyan goes a step further, presenting an elegantalgorithmforefficientlycalculatingall ofthespursthatresultsfromphasetruncation in the order of magnitude. An attempt is also made in this chapter to unify several of the previous published techniques by using a consistent notation between techniques. The publications on phase truncation errors span nearly three decades and use various mathematical approaches for deriving the spurious content. The techniques are presented chronologically from publication date. 2.1 Mehrgardt?s Analysis (1983) Mehrgardt [25] attempts to explain the non-intuitive spectral output of a signal gener- ated from a finite length sinusoidal lookup table. The critical observation in the analysis is 26 that the phase word can be written as the sum of two sawtooth waveforms, the truncated phase word and kept phase word. ??[n] = ?[n]???[n] (2.1) where ??[n] is the phase word after truncation, ?[n] is given by Equation 1.22 and ??[n] is the value of the bits truncated after mapping. The truncated phase word drives anNQ entry sinusoidal lookup table with values A[n] = sin parenleftBigg 2pi NQn parenrightBigg ,n?{0,...,NQ?1} (2.2) where NQ = NP/NE as in Chapter 3. Note that no amplitude truncation is applied to the table values, meaning the analysis uses an ideal SCMF. The spurious response will be generated completely from phase truncation. Mehrgardt decides to tackle the problem by considering an analogous system in the continuous time domain. Consider the function below S(t) = sin parenleftbigg 2pi NP [NPft?NExsw(t)] parenrightbigg = sin parenleftBigg 2pift? 2piN Q xsw(t) parenrightBigg (2.3) where xsw(t) is a sawtooth waveform of amplitude 1 and frequency NQf where f = FN P fclk is the desired frequency of the synthesized tone. NE is the number of error states and NP is the number of phase states, as discussed in Chapter 3. The sawtooth waveform can be represented mathematically as xsw (t) = NQft?floorleftNQftfloorright (2.4) 27 wherefloorleft?floorrightis the real domain truncation operation that maps an real number r to the nearest integer that has a value less thanr. That this is a reasonable approximation for the behavior of the phase accumulator can be shown without too much analysis. As an example, letTclk = 10?9 s, F = 3, NP = 27 and NE = 25. Figure 2.1 is a plot of the phase accumulator values along with the continuous time approximation of the phase from Equation 2.3. The phase accumulator overflows approximately every TclkNP/F seconds or at a rate of F/(TclkNP). Notice the actual digital error values look like samples of the continuous time function used 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 Phase Time (ns) Full PhaseTruncated Error Sawtooth Approx. Figure 2.1: Sawtooth Approximation in the analysis. Applying the difference trigonometric identity for sine, Equation 2.3 can be rewritten as S(t)?sin (2pift) cos parenleftBigg 2pi NQxsw(t) parenrightBigg ?cos (2pift) sin parenleftBigg 2pi NQxsw(t) parenrightBigg (2.5) Now in general, NE lessmuchNP making NQ large and therefore the small angle approximation for sine and cosine can be applied on Equation 2.5. S(t)?sin (2pift)?cos (2pift) bracketleftBigg 2pi NQxsw(t) bracketrightBigg (2.6) 28 Since xsw(t) is periodic, the Fourier series can be computed but it must be noted that xsw(t) is not a Dirichlet sawtooth because it does not take the value of the mid-point at discontinuities. Consequently,theFourierseriesdoesnotactuallyexistforxsw(t) asdescribed by Equation 2.4. This oversight is corrected in Nicholas?s analysis by adding a periodic pulse trainthatforcesthesawtoothtotakethevalueofthemidpointatdiscontinuities. Regardless, xsw(t) is almost a Dirichlet sawtooth and thus the Fourier series of the sawtooth is used to represent it. First, the definition of the Fourier series is presented. Definition 2.1 (Fourier Series of Real-Valued Function). The Fourier series of a real valued function is defined as Fs{f(x)}= ?summationdisplay n=0 [an cos (nx) +bn sin (nx)] (2.7) where the coefficients of Equation 2.7 are computed using the inner product described below an = 1pi integraldisplay pi ?pi f (x) cos (nx)dx, n?0 (2.8) bn = 1pi integraldisplay pi ?pi f (x) sin (nx)dx, n?0 (2.9) Without performing the full calculation, the Fourier series of a sawtooth waveform is xsw(t)? ?summationdisplay k=1 (?1)ksin (2pikNQft)pik (2.10) Equation 2.10 is an approximation because the constant component has been ignored, as it does not contribute to the spurious response of the waveform. Substituting the Fourier series ofxsw(t) back into Equation 2.6, the derivation of the spectrum of the continuous time analogy is complete. Note that the product to sum identity was used in the derivation. S(t)?sin (2pift)? ?summationdisplay k=1 (?1)k NQk [sin (2pi(kNQ + 1)ft) + sin (2pi(kNQ?1)ft)] (2.11) 29 At this point, S(t) must be sampled to get back to the discrete time DCDO case. Mehrgardt does this is stages, starting with sawtooth waveform. The sampling process is executed by replacing f with its discrete value and substituting n/fclk for t. Applying the sampling process to xsw(t) first, the following equation is derived. xsw[n] = ?summationdisplay k=1 (?1)ksin parenleftBig 2pikNQ parenleftBig F NPfclk parenrightBigparenleftBig n fclk parenrightBigparenrightBig pik (2.12) = ?summationdisplay k=1 (?1)ksin (2pikFn/NE)pik (2.13) Now F and NE can be reduced by removing common factors, using the modular arithmetic of the previous chapter such that F/NE = ?E/?E (Lemma 3.2). If following along using Mehrgardt?s publication, one will notice the analysis in this work has diverged. In particular, he makes the statement that the finite precision frequency can be written as 2pia/b where a and b are not defined but exist and places the final results in these terms. In this analysis, the introduction of the extra terms are not necessary because the meaning of the symbols has been carefully tracked. Note that the sinusoid in the sawtooth equation is periodic in k with ?E. After several summations and trigonometric identities, which are left as an exercise to the reader in the original publication from Mehrgardt and so will also be done so here, a final result is achieved, S[n] = sin parenleftbigg 2pi FN P n parenrightbigg ? piN Q ( ?NP?1)/2summationdisplay m=1 ?km? braceleftBigg sin parenleftBigg 2pi parenleftBigg m ?NP + F NP parenrightBigg n parenrightBigg + sin parenleftBigg 2pi parenleftBigg m ?NP ? F NP parenrightBigg n parenrightBiggbracerightBigg (2.14) where km is calculated through the following relation km =?mkprime? ?NP (2.15) 30 and where kprime is the solution for k in the linear congruence relation angbracketleftBig k ?NQF angbracketrightBig ?NP = 1 (2.16) Lastly, the coefficients are of the form ?k = ?? ?? ??? (?1)k ?NP sin(pik/ ?NP) for ?NP odd (?1)k ?NP tan(pik/ ?NP) for ?NP even (2.17) The result is never related back to the FCW to characterize the behavior of the DCDO with phase truncation or calculate the SFDR or SNR. The result here shows the relationship of the FCW and spurs because of the changes made in symbolic notation in the derivation. Only qualitative observations such as the number of spurs, the magnitude of spurs and that such spurs should be expected are presented. As a summary, this technique provides: ? the number of phase truncation spurs and ? the magnitude of the phase truncation spurs. ItmissessomespursbecauseofitsfailuretouseaDirichletsawtoothwaveforminitsanalysis. The author makes the observation of the discrepancy between his analysis and simulation in the later paragraphs of his paper. 2.2 Nicholas?s Analysis (1985) H. T. Nicholas provides [23] one of the most well-known analyses of phase truncation spurs. The analysis provides equations for the location, phase and amplitude of the spurs generated through phase truncation. The general idea is similar Mehrgardt?s analysis de- scribed in Section 2.1, in that the phase error is thought of as a sawtooth waveform. Nicholas takes a more formal mathematical approach and finds several clever trigonometric reduction 31 techniques to bring about a final result that is concise, accurate and efficient in implemen- tation. The results are summarized by the following theorems which are presented without proofs. To summarize the steps taken by Nicholas: 1. Find the analogous continuous time representation of the phase and the truncation error sequence. This takes the form of a pulse train pe(t) = NE? E ?summationdisplay k=1 ?E pik sin parenleftBiggpik ?E parenrightBigg cos parenleftbigg 2pik FN E t parenrightbigg + 12 (2.18) and a sawtooth that meets Dirichlet conditions xsw(t) = ?summationdisplay k=1 NE pik sin parenleftbigg 2pik FN E t parenrightbigg + NE2 (2.19) Note that these waveforms are slightly different than those shown in the publication. If plotting the waveforms exactly from the publication, one will not get the correct error sequence. This is because Nicholas removed the DC term from both the pulse train, which is mentioned in the publication, and the sawtooth waveform. However, in the plots in the publication, the DC term is added back. 2. Sample the continuous time representation by the DDFS sample rate, which is per- formed in the same manner described in Section 2.1: pe[n] = NE? E ?summationdisplay k=1 ?E pik sin parenleftBiggpik ?E parenrightBigg cos parenleftbigg 2pik FN E n parenrightbigg + 12 (2.20) and the Dirichlet sawtooth waveform xsw[n] = ?summationdisplay k=1 NE pik sin parenleftbigg 2pik FN E n parenrightbigg + NE2 (2.21) 32 The truncation error sequence can then be reconstructed by subtracting the pulse train from the sawtooth waveform ep[n] = xsw[n]?pe[n] (2.22) Figure 2.2a shows the sawtooth waveform and pulse train waveform for NE = 64 and F = 3. Figure 2.2b shows the subtraction of the pulse train from the sawtooth waveform, yielding the the very familiar phase truncation error sequence plotted in Figure 2.1. (a) Nicholas? Sawtooth and Pulse Train -20 0 20 40 60 80 0 20 40 60 80 100 120 n SawtoothPulse Train (b) Nicholas? Phase Error Sequence 0 10 20 30 40 50 60 70 0 20 40 60 80 100 120 n Figure 2.2: Error Sequence Waveform Components 3. Perform an spectacularly complex set of trigonometric manipulations found in mathe- matics texts dedicated to the task, Nicholas arrives at his final error sequence equation ep[n] = ?NE? E ?E/2summationdisplay k=1 bracketleftBigg cot parenleftBiggpik ?E parenrightBigg sin parenleftbigg 2pik FN E n parenrightbigg ?cos parenleftbigg 2pik FN E n parenrightbiggbracketrightBigg (2.23) 4. Use number theory to relate the spurs in k to the frequency control word. 33 5. Plugtheerrorsequenceintotheoriginalsineexpressionandusethesmallangleapprox- imation. This is where Nicholas? approach becomes an approximation to the spurious behavior (albeit a very good one). 6. Perform an enormous amount of trigonometric manipulations and number to theory to arrive at the final answer. The results from Nicholas? work are summarized in the following set of theorems. The number of spurs due to phase truncation is provided in Theorem 2.1 Theorem 2.1 (Nicholas Number of Spurs). An accumulator withBP bits whose leastBT bits are truncated has (?E/2?1) spurs. Or written in notation used previously in the document, an accumulator with NP phase states and NE error states such that NE|NP has (?E/2?1) spurs. The mapping from the ? (spur index) value to the frequency control word is provided by Theorem 2.2. Theorem 2.2 (Nicholas Spur Index). The spur index, ?, for the spur located at the DFT frequency bin k is, if 2|(k??E/2) ? = angbracketleftBiggk?? E 2BP?BT ? (?E/2?1) E angbracketrightBigg ?E (2.24) where ?E is the reduced frequency control word for the error sequence (Lemma 3.3) ?E = FGCD(F,N E) (2.25) and NE = 2BT is the modulo number for the error sequence. If 2|(?k??E/2), ? = angbracketleftBigg?k?? E 2BP?BT ? (?E/2?1) E angbracketrightBigg ?E (2.26) The magnitude of the spurs given a spur index is provided by Theorem 2.3. 34 Theorem 2.3 (Nicholas Spur Magnitude). The magnitude of the spur at spur index ? is m? = pi2 BT?BP ?E cosec parenleftbigg?pi ?E parenrightbigg (2.27) Theorem 2.4 (Nicholas Spur Phase). The phase of the spur at spur index ? is ?? =?cot parenleftbigg?pi ?E parenrightbigg (2.28) As mentioned in above, the analysis starts in a similar vein as Mehrgardt, but with one critical difference in the specification of the error sawtooth waveform. Nicholas notes that the Dirichlet conditions for the Fourier series of a function with discontinuities require that the function take on the average value of the function at the point of discontinuity. He thus adds a separate pulse train function to create a sawtooth error function that can be analyzed by using the Fourier series. In Mehrgardt?s publication, he notes that there are extra terms in the result that resulted from insufficiencies of the sawtooth approximation to the actual error function behavior. Nicholas avoids those terms by creating a pair of continuous time functions that have convergent Fourier Series. To demonstrate the importance of this dissertation, outside of avoiding trigonometric small-angle approximations, the final discrete time equation in Nicholas? analysis is A[n] = sin parenleftbigg 2pi FN P n parenrightbigg ? ?E/2summationdisplay k=1 parenleftBigg pi ?ENQ cosec parenleftBiggkpi ?E parenrightBiggparenrightBigg ? parenleftBig ej2piGCD(F,NE)(?E?k?ENQ)n/NP +e?j2piGCD(F,NE)(?E?k?ENQ)n/NP parenrightBig ? e?jcot(kpi/?E) (2.29) Compare this equation with the closed form solution from this document in Chapter 4 under Theorem 4.6. The small angle approximation prevents simplification to an elegant final form. Lastly, the worst case spur magnitude is predicted by the following equation. This flows from the observation that the spur magnitude is a strictly decreasing function of the spur 35 index. So selecting a spur index of 1 yields mwc = 1N Q pi ?E sin parenleftBig pi ?E parenrightBig (2.30) 2.3 Jenq?s Analysis (1988) Y. C. Jenq published a series of papers on analyzing the spectrum of non-uniformly sampled signals [26, 24, 27, 28]. Re-deriving the analysis as it pertains to modern DCDO?s is worthwhile, as the notation used in [24] is decidedly different than that commonly used in DDFS literature. There is one critical difference between Jenq?s analysis and the following derivation: Jenq uses a continuous time domain representation for the output of the DCDO. Using the notation for a uniformly sampled discrete time signal, the analysis is more easily accessible to one familiar with digital signal processing (DSP) without having to concern one?s self with integrals and the Dirac delta function. Before presenting the theorem, the discrete time Fourier transform (DTFT) is given in Equation 2.31. X (?) = ?summationdisplay n=?? x[n]e?j?n (2.31) The DTFT can be derived from the continuous time Fourier transform, or just Fourier trans- form, described in Section 7.2. The DTFT is used to compute the spectrum of waveforms discretized in the time domain, which is precisely the types of waveforms considered in this work. Theorem 2.5 (Jenq?s Non-Uniform Sampling Theorem). The DTFT of a signal formed by non-uniformly sampling a waveform g(t) that is band-limited between parenleftBig?1 2T, 1 2T parenrightBig in such a way that the overall sampling process repeats with period M?T is given by G(?) = 1MT M?1summationdisplay m=0 ? ? ?summationdisplay k=?? G0 parenleftbigg ??k parenleftbigg 2pi MT parenrightbiggparenrightbigg ej2pi?ktm/(MT) ? ?ejm?T (2.32) 36 where tm is the uniform sampling rate of a sub-sequence of the sampled g(t) by taking every Mth sample (it is uniform because the sampling process is periodic with MT), ? is the angular frequency of the waveform in radians and Ga is the CTFT of g(t). Thistheoremisnotdifficulttoshow. ThereasonthatJenqgetscitedinDDFSliterature is an interesting observation made about the behavior of the truncated phase accumulator output. 2.3.1 Jenq?s Observation The key observation made by Jenq in his analysis (Section 2.3) is that in the presence of phase truncation the phase code sent to the SCMF is not uniformly spaced. Effectively this ?looks like? a non-uniform sampling operation on the generated sinusoid. But the phase accumulator is of finite length, so regardless of phase truncation, it is periodic and therefore the truncated phase word is also periodic. In Section 1.2, the periodicity of the phase accumulator given a frequency control word F is given in Equation 3.12. Now consider a four bit phase accumulator incremented by F = 3 that is truncated to three bits and fed into an ideal SCMF (i.e. assume there is no quantization in the amplitude value stored in the SCMF). Thus in the example, the number of bits truncated from the phase word is BT = 1, the number of bits in the phase accumulator is BP = 4 and the number of kept bits after truncation is BPT = 3. Table 2.1 shows the state of the phase accumulator, the truncated phase word, and the phase step between adjacent phase words. Note that the leftmost column of the table matches the predicted untruncated phase sequence of Theorem 3.2 and also the second column shows the predicted truncated phase sequence of Theorem 3.4. While the proofs should be sufficient, working through an example provides a helpful check on the result and some intuition into the behavior of the devices modeled by the mathematical analysis. There are a few observations that can be made from the sequence. 37 Table 2.1: Table of Truncated Phase States (4-bit) Phase Accumulator Truncated Phase Truncated Phase Step 0000 (00) 000 (0) - 0011 (03) 001 (1) 1 0110 (06) 011 (3) 2 1001 (09) 100 (4) 1 1100 (12) 110 (6) 2 1111 (15) 111 (7) 1 0010 (02) 001 (1) 2 0101 (05) 010 (2) 1 1000 (08) 100 (4) 2 1011 (11) 101 (5) 1 1110 (14) 111 (7) 2 0001 (01) 000 (0) 1 0100 (04) 010 (2) 2 0111 (07) 011 (3) 1 1010 (10) 101 (5) 2 1101 (13) 110 (6) 1 0000 (00) 000 (0) 2 ? As is the case in Figure 3.1b, the phase accumulator is periodic with NP = 24 = 16 clock cycles. ? The truncated phase sequence is also periodic with the phase accumulator. While already stated, working through a simple example helps visualize the periodicity. ? The truncated phase step is not uniform. It varies between the values of 1 and 2. An interesting feature follows from this analysis. Since the truncated phase sequence is periodic, the delta phase cycle is also periodic and the periodicity of the delta phase cycle is the same as the periodicity of the phase error sequence from truncation. From Lemma 3.5, if 2BT|NP, which is does in our example, then ?E = 2 BT GCD(F,2BT ) = 2 GCD(1,2) = 2 (2.33) 38 2.3.2 Jenq?s Results Jenq does not deal with spur locations, magnitudes or phases in any of his publications. Instead the non-uniform sampling theorem is applied and then Parseval?s relation is used to find a noise power boundary. This allows for a completely general calculation of the signal to noise ratio due to phase truncation spurs. He describes the problem in a manner differently than any of the authors, by thinking of the phase accumulator value as an integer value plus a coprime rational number. Variables W, L and M are introduced to describe the output of the phase accumulator, where L and M are coprime. d = W +L/M (2.34) Looking at Equation 3.38 the expression can be rewritten in the notation used in this work. M is periodicity of the error sequence (?E), as multiplying d by M yields an integer value. L is the reduced FCW over M, that is to say L = ?E. Applying the non-uniform sampling theorem gives the an answer in the time domain. G(?) = 2piT? E ?summationdisplay k=?? ? ? ?E?1summationdisplay m=0 e?j2pirmf0/fclke?j2pikm/?E ? ?? bracketleftbigg ???0?k parenleftbigg 2pi ?ET parenrightbiggbracketrightbigg (2.35) The amplitude of the spurs only considered at this point. Sampled the waveform yields A(k) = 1? E ?E?1summationdisplay m=0 e?j2pi?m?E??E/(?ENQ)e?j2pimk/?E (2.36) Definition 2.2 (Parseval?s Relation). Parseval?s relation for the sequence g[n] of length N N?1summationdisplay n=0 |g[n]|2 = 1N N?1summationdisplay k=0 |G[k]|2 (2.37) 39 Applying Parseval?s relationship yields the final result for the signal to noise ratio SNR = 10 log10 bracketleftBigg |A(k)|2 1?|A(k)|2 bracketrightBigg (2.38) Several observations can be made about how the function increases and decreases with various values of ?E and ?E. Using this information, the best and worst case signal to noise ratio is computed. The worst case is calculated as SNRwc = 10 log10 ? ?? bracketleftBig sin parenleftBig pi NQ parenrightBigparenleftBigN Q pi parenrightBigbracketrightBig2 1? bracketleftBig sin parenleftBig pi NQ parenrightBigparenleftBigN Q pi parenrightBigbracketrightBig2 ? ?? (2.39) and the best case is derived as SNRbc = 20 log10 bracketleftBigg cot parenleftBigg pi 2NQ parenrightBiggbracketrightBigg (2.40) For the details of the derivation, consider reading Jenq?s series of non-uniform sampling papers. 2.4 Torosyan?s Analysis (2001) Although an excellent analysis of the location and magnitude of phase truncation spurs existed at the time of his publications, Arthur Torosyan and Alan Willson of UCLA provided an exact, clear and practical means for an analytical understanding of phase truncation spurs [29, 10]. Instead of working from analogous, time domain functions such as Nicholas and Mehrgardt, Torosyan approaches the problem using elementary number theory. The critical observation is that any two FCWs that generate periodic phase sequences of the same length, the resulting sequences can be related through a simple rearrangement and that the frequency responses of sequences related in this manner are also simple rearrangements of each other. The exact same observation is actually made by Nicholas in [23] but is not used in his analysis. 40 TheimplicationisenormousforDCDOanalysis,astheDFTneedonlyberunbynumber of prime factors of NP times and all other frequency domain responses can be generated by permuting the frequency response. Consider the case of a BP = 32 bit accumulator. If every FCW should be tested, then the DFT would need to run NP = 232 = 4294967296 times on sequences of at least half are periodic with NP. The current state of computer technology simplycannothandlethecomputationalrequirementsofsuchataskinamanageableamount of time (i.e. if each DFT of such a sequence took one second, 4294967296 s is equal to approximately 136 years). It is great news then that mathematicians have shown that such computations are not required to fully characterize DCDOs. Much of the work in Chapter 4 will be used in this analysis, so reading that chapter before continuing may be helpful. The analysis begins by showing that FCWs of the same period generate sequences that are simple permutations of each other (Theorem 3.9). Then it will be shown that the frequency domain representation of such sequences are also simple permutations of each other (Theorem 4.8). Then the DFT of the ? = 1 case is computed (Theorem 4.6 with ? = 1). Since the all sequences can be generated as permutations of ? = 1, the analysis is complete. To summarize the results: 1. Onlyonefrequencycontrolwordforeachpossiblephaseaccumulatorleastperiodneeds to be considered. Let this FCW be chosen such that after reducing the word modulo the number of phase accumulator states, the resulting number is 1 (i.e. ? = 1). 2. All other frequency control words for given least period are permutations of the previ- ously described FCW. 3. The DFT happens to commute with a set of vectors for a given least period and therefore the frequency spectrums are also permutations. This leads to the ?window function? after simplifying the result. 41 4. An interesting simplification can be made when observing the reduced word 1, allowing the grouping of repeating terms and resulting in a simple final expression for the phase truncation spurs. 42 Chapter 3 Phase Accumulator Sequences from Number Theory In this chapter, a complete analysis of the sequences generated by a phase accumulator is developed from elementary number theory. The analysis begins by proving the well-known untruncated phase accumulator state equation using basic modulo arithmetic (Section 3.1). The number theoretic techniques may seem excessive at this point, but the motivation for approaching the problem from a mathematical standpoint will eventually become apparent in Chapter 4. Several concepts from number theory are presented to support the analysis from the previous section and will be used in later proofs as the work progresses. After determining the expected phase accumulator sequence, the periods of such sequences are explored (Section 3.2). Several Greek letters are given special meaning, such as the reduced FCW ? and the least period length ?. The effects of truncation on the phase accumulator sequence are explored in Section 3.3. The tools developed for deriving the period of an untruncated phase sequence are applied to the truncation problem resulting in a general theory for the periodicity of both the trun- cated phase sequence and the error sequence. Developing a pure mathematical framework for analyzing the phase accumulator prevents common mistakes in calculating the periodic- ity of DDFS waveforms. The relationship between both untruncated and truncated phase sequences of different FCWs is established in Section 3.4. Lastly, some comments on abstract algebraic structures that can be used to fully describe the operation of the phase accumulator are presented in Section 3.5. 43 3.1 Phase Accumulator Sequence The state of the phase accumulator at clock cycle n can be written as a function of the FCW and the initial phase, P[n] =?nF +P0?NP (3.1) whereP0 is the initial state of the phase accumulator, F is the FCW and?a?m represents the smallest non-negative integer remainder when dividing a by m (sometimes called the least residue). This is commonly referred to as the modulo m operator on the integer a. The modulo operation is clearly intimately related to integer division by definition. The division algorithm for positive integers is stated below [30]: Theorem 3.1 (The Division Algorithm). Given non-negative integers a and b, bnegationslash= 0, there exist unique integers q and r, with 0?r 1. (3.14) 49 Then by Definition 3.2, k| ad and k| bd. By Definition 1.1, there exists c0,c1 ?Z such that c0k = ad?c0kd = a (3.15) c1k = bd?c1kd = b (3.16) Clearly then kd|a and kd|b, but since k > 1, kd > d making it a common divisor larger than d, which leads to a contradiction since d by choice is the greatest common divisor of a and b. Assuming k < 1 and not equal to zero also leads to a contradiction in a similar manner. Therefore k = 1. Now we show another helpful theorem [30] that will prove useful in the derivation of the phase accumulator least period. Lemma 3.2 (Linear Modulo Normalization). If ac?bc(modm) and GCD(c,m) = d, then a?b parenleftBig mod md parenrightBig . Proof. Let ac?bc(modm) and GCD(c,m) = d. From the definition of congruence given above, m|(ac?bc)?m|c(a?b). Since d divides c and m (from the definition of GCD), m d | c d (a?b). Now from Lemma 3.1, we know that GCD parenleftBigc d, m d parenrightBig = 1. Since md - cd, md |(a?b). From Definition 3.1, a?b parenleftbigg mod md parenrightbigg (3.17) Here we note that nothing in the theorem or proof prevents d = 1, so it immediately follows that ac?bc(modm)?a?b(modm) (3.18) if GCD(c,m) = 1. AstheDDFSisdesignedtogeneratespectrallypuresignals, itisofcriticalimportanceto determine the periods of the sequences generated by the DCDO. Unwanted periodic behavior 50 generates deterministic spurs and will become a major topic in the discussions of Chapter 4. Now consider, yet again, the fundamental phase accumulator expression Equation 3.1. As an example of the case when F = 1, the the sequence generated is P = [0,1,2,...,NP?1,0,1,2,...] (3.19) Clearly then the length of the period for F = 1 is NP. Now, in general, Equation 3.12 can be shown to be true for arbitrary F and P0. Theorem 3.3 (Phase Accumulator Periodicity). The least period of the sequence generated by applying the FCW F to a phase accumulator with NP states without phase truncation is ?P , NPGCD(F,N P) (3.20) Proof. Recall that the state of the phase accumulator at clock cycle n is P[n] =?Fn+P0?NP (3.21) If GCD(F,NP) = 1, then the period of the generated sequence is NP and this can be shown by noting that forn = 0 thatP[0] = P0 and findingk> 0 such thatP[k] = P0. Equivalently, P[k] = P[0] =?P0?NP, which means that kF +P0 ?P0 (modNP)?NP |(kF +P0?P0)?NP |kF (3.22) But NP - F since GCD(F,NP) = 1 and therefore NP |k and an integer d exists such that dNP = k. The simplification was made using Lemma 3.4. Then we wish to find the smallest 51 positive integer d such dNP = k is true. Let d = 1, then k = NP. P[NP] =?NPF +P0?NP = angbracketleftBig ?NPF?NP +?P0?NP angbracketrightBig NP =?P0?NP (3.23) so d = 1 satisfies the equality and the period of the sequence for GCD(F,NP) = 1 is NP. Now consider F such that GCD(F,NP) = d. We are again looking for k such that [kF +P0]?P0 (modNP)?nF ?0 (modNP) (3.24) as P[0] = P0 still for this F. From Lemma 3.2 we know that nF ?0 (modNP)?nFd ?0 parenleftbigg mod NPd parenrightbigg (3.25) From Lemma 3.1, GCD parenleftBigF d, NP d parenrightBig = 1 and from our previous analysis for the periodicity of the sequence for relatively prime F and NP, it follows that the period is NPd . So we have shown the origin of Equation 3.12 and proved it to be true for all F. It is interesting to note that typically NP is a power of 2, so GCD(F,NP) will be a power of 2 if F is not relatively prime to NP. Thus looking at the position of first ?1? in the bit string counting from the LSB of F gives the period of the sequence generated by the phase accumulator. Consider an 4-bit accumulator as a simple example. If F = 6, then the binary representation of F = 0110. Noting that the first 1 appears in the second bit position, meaning that the largest power of 2 divisor of F is 21 and the period of the phase accumulator is 24/21 = 23 = 8. 52 3.3 Truncated Phase Sequences Applying Lemma 3.2 to Theorem 3.2, it is possible to write the phase accumulator sequence Equation 3.4 in a ?reduced? form. The reduced form is useful when deriving techniques for minimizing the number of required computations for a DFT of a sequence. Lemma 3.3 (Alternative Phase Accumulator Expression). The phase accumulator Equa- tion 3.4 with NP states and FCW F can be written as: P[n] = d??Pn??P (3.26) whered = GCD(F,NP), ?P = F/dand ?P is the least period of?Fn?NP and can be calculated from Theorem 3.3. Proof. Letd = GCD(F,NP) and?Fn?NP = r0. Then by Definition 3.2,d|F andd|NP and there exist integers ?P = F/d and ?P = NP/d. By the definition of the modulo operator, there exists c0 ?Z such that ?Fn?NP = r0 ?Fn?c0NP = r0 (3.27) ?d?Pn?c0d?P = r0 (3.28) ??Pn?c0?P = r0d (3.29) where 0?r0 0, ??R there exists an integer N > 0 such that vextendsinglevextendsingle vextendsinglevextendsingle vextendsingleX? nsummationdisplay i=1 xi vextendsinglevextendsingle vextendsinglevextendsingle vextendsingleN. Convergence is sometimes written as a single limit statement for conciseness: X = limn?? nsummationdisplay i=1 xi. (6.57) For the algorithm to be useful for computing sine and cosine, ?sn must converge to any arbitrary angle between 0 and pi/2. Or more precisely, for any arbitrary ?? [0,pi/2) and ?> 0 there exists an integer N > 0 such that vextendsinglevextendsingle vextendsinglevextendsingle vextendsingle?? nsummationdisplay i=1 ?i tan?1 (|?i|) vextendsinglevextendsingle vextendsinglevextendsingle vextendsingle N. The properties of series that converge are set forth using the Cauchy convergence criterion [14], which is Theorem 6.1 (Cauchy Convergence Criterion (Real)). A series ?summationdisplay i=1 xi (6.59) converges if and only if, given any ?> 0, ??R there exists an integer N such that |xm+1 +???+xn|m?N. Let us consider the sequences for which the series converges. This can be inferred from the Cauchy convergence criterion or proved simply as below: 160 Lemma 6.1 (Sequences for Convergent Series). If the sum of {x1,x2,???}?R converges, then limn??xn = 0 (6.61) Proof. Let X = limn??summationtextni=1xi. Note that X = limn??summationtextn+1i=1 xi also. Then limn?? parenleftBiggn+1summationdisplay i=1 xi? nsummationdisplay i=1 xi parenrightBigg = limn??xn+1 = 0 (6.62) since the limits of both series are the same. Note that the converse of Lemma 6.1 does not guarantee convergence for the series but Theorem 6.1 does. The harmonic series is a good example of series whose elements approach zero toward infinity, but whose series diverges. From the previous lemma, limn???n tan?1 (|?n|) = limn??tan?1 (|?n|) = 0 (6.63) The sign ?n has no impact on the convergence of the sequence to zero. So ?i must be chosen such that the inverse tangent of its sequence has a limit of zero. There are several ways to easily show this, but perhaps the easiest is to note the Taylor series expansion of inverse tangent leads to a small angle approximation similar to that of sine (i.e. tan?1(x) ?x for xlessmuch1. tan?1 (x) = ?summationdisplay n=0 (?1)n 2n+ 1 parenleftBig x2n+1 parenrightBig = x?x 3 3 + x5 5 ???? (6.64) Note the higher order x terms rapidly become negligible for small x. 161 There are multiple tests that can be used to determine convergence in a series, one such test is the ratio test [14]. If limn?? vextendsinglevextendsingle vextendsinglevextendsingle vextendsingle tan?1 (?n+1) tan?1 (?n) vextendsinglevextendsingle vextendsinglevextendsingle vextendsingle< 1 (6.65) then the series converges absolutely. It has already been shown that it is required that tan?1 (?i) approach zero for large values of i. Using the small angle approximation as n tends towards infinity, Equation 6.65 can be rewritten as limn?? vextendsinglevextendsingle vextendsinglevextendsingle vextendsingle ?n+1 ?n vextendsinglevextendsingle vextendsinglevextendsingle vextendsingle< 1 (6.66) Now it need only be shown that the algorithm described in Definition 6.2 can converge to any angle between some convergence limits ?max and ?min. This would complete the description of an algorithm capable of simultaneously computing a scaled sine or cosine to arbitrary precision given enough iterations. In order to cover the entire interval, a second less obvious property must be met. |?n|? ?summationdisplay i=n+1 |?i| (6.67) For the series of ?i to take any value on the interval, the size of the previous step must be less than or equal to the sum of the remaining steps. This is more obvious when shown graphically, as in Figure 6.18. Unless the sum of the remaining steps is equal or greater than the size of the previous step, then there is a gap in the obtainable values from the algorithm. A gap in ?i would result in a gap in tan?1 (?i) and there would be unobtainable angles of rotation. The CORDIC algorithm works by keeping track of the angle of rotation at each iteration to determine if the rotations have passed the desired target angle. In CORDIC literature, the variable zi is used to denote the angle information at iteration i. One could start z0 162 ?n Step ? ?summationdisplay i=n+1 ?i ?summationdisplay i=n+1 ?i Unreachable Figure 6.18: CORDIC Coverage Requirement at zero and sum the ?di components and ?i could be computed by subtracting ? from the current?i to determine the direction of rotation. A more efficient approach would be to start z0 = ? and subtract the ?di while checking the sign of the residual angle at each iteration. This is how ?i is selected. Define the iterative relationship: zi+1 = zi??di = zi??i tan?1 (|?i|) (6.68) where ?i = 1 if zi?0 or ?i =?1 otherwise. At the nth iteration, zn = z0? n?1summationdisplay i=0 ?i tan?1 (|?i|) = z0??sn (6.69) If zn converges to zero at the limit for n as it approaches infinity, then ?sn = z0 and from Equations 6.54 and 6.55 one sees that the algorithm computes the sine and cosine of ?sn. Clearly then, proving that zn converges to zero from a given z0 = ? is sufficient to prove convergence for the CORDIC algorithm. Theorem 6.2 (CORDIC Convergence Theorem). Let tan?1 (?1),tan?1 (?2),... be a se- quence of real numbers whose series is convergent by the Cauchy convergence criterion for 163 series (Theorem 6.1). If vextendsinglevextendsingle vextendsingletan?1 (?n) vextendsinglevextendsingle vextendsingle? ?summationdisplay k=n+1 vextendsinglevextendsingle vextendsingletan?1 (?k) vextendsinglevextendsingle vextendsingle (6.70) for any n?P, then for any ??R constrained by ? ?summationdisplay i=0 vextendsinglevextendsingle vextendsingletan?1 (?i) vextendsinglevextendsingle vextendsingle??? ?summationdisplay i=0 vextendsinglevextendsingle vextendsingletan?1 (?i) vextendsinglevextendsingle vextendsingle (6.71) The CORDIC z recurrence relation when seeded with z0 = ? converges, i.e. limn??zn = 0 (6.72) where zn is defined by Equation 6.68. Proof. Let ? be chosen according to Equation 6.71. Let S be a non-empty subset of P0. Let us show that ? ?summationdisplay k=n vextendsinglevextendsingle vextendsingletan?1 (?k) vextendsinglevextendsingle vextendsingle?zn? ?summationdisplay k=n vextendsinglevextendsingle vextendsingletan?1 (?k) vextendsinglevextendsingle vextendsingle (6.73) for all n by induction. Since the series ?summationdisplay i=1 tan?1 (?i) (6.74) converges by the Cauchy criterion (Theorem 6.1), for any arbitrary ?> 0, ??R there exists an integer N > 0 such that vextendsinglevextendsingle vextendsingletan?1 (?m+1) +???+ tan?1 (?n) vextendsinglevextendsingle vextendsinglem?N. This implies limn?? ?summationdisplay k=n vextendsinglevextendsingle vextendsingletan?1 (?k) vextendsinglevextendsingle vextendsingle = 0 (6.76) and thus showing Equation 6.73 to be true would prove convergence by making the upper and lower limit of zn equal to zero. First let us check that 0?S by setting n = 0 and using Equation 6.73. ? ?summationdisplay k=0 vextendsinglevextendsingle vextendsingletan?1 (?k) vextendsinglevextendsingle vextendsingle?z0 ? ?summationdisplay k=0 vextendsinglevextendsingle vextendsingletan?1 (?k) vextendsinglevextendsingle vextendsingle (6.77) Since z0 = ? and ? has been chosen according to Equation 6.71, 0 ?S. Now we take the induction step. Assume x?S. It must be shown that x+ 1?S by showing that ? ?summationdisplay k=x+1 vextendsinglevextendsingle vextendsingletan?1 (?k) vextendsinglevextendsingle vextendsingle?zx+1 ? ?summationdisplay k=x+1 vextendsinglevextendsingle vextendsingletan?1 (?k) vextendsinglevextendsingle vextendsingle. (6.78) zx+1 is computed as follows: zx+1 = zx??x tan?1 (|?x|) (6.79) Since we have assumed x?S, ? ?summationdisplay k=x vextendsinglevextendsingle vextendsingletan?1 (?k) vextendsinglevextendsingle vextendsingle?zx? ?summationdisplay k=x vextendsinglevextendsingle vextendsingletan?1 (?k) vextendsinglevextendsingle vextendsingle (6.80) There are two cases to examine, zx ? 0 for which ?x = 1, ?x ? 0 and zx < 0 for which ?x =?1, ?x < 0 . Assume zx?0, then ?x = 1 and zx+1 = zx?tan?1 (?x) = zx? vextendsinglevextendsingle vextendsingletan?1 (?x) vextendsinglevextendsingle vextendsingle (6.81) 165 The upper bound is found by substituting the upper bound of Equation 6.80 into Equation 6.81 and adjusting the equality to an inequality. zx+1 ? ?summationdisplay k=x vextendsinglevextendsingle vextendsingletan?1 (?k) vextendsinglevextendsingle vextendsingle? vextendsinglevextendsingle vextendsingletan?1 (?x) vextendsinglevextendsingle vextendsingle (6.82) ? ?summationdisplay k=x+1 vextendsinglevextendsingle vextendsingletan?1 (?k) vextendsinglevextendsingle vextendsingle (6.83) The lower bound is found by substituting Equation 6.70 into Equation 6.81 and adjusting the equality to an inequality. zx+1 = zx? vextendsinglevextendsingle vextendsingletan?1 (?x) vextendsinglevextendsingle vextendsingle?zx+1 ?zx? ?summationdisplay k=x+1 vextendsinglevextendsingle vextendsingletan?1 (?k) vextendsinglevextendsingle vextendsingle (6.84) But zx is positive or zero, so zx+1 ?? ?summationdisplay k=x+1 vextendsinglevextendsingle vextendsingletan?1 (?k) vextendsinglevextendsingle vextendsingle (6.85) and the zx+1 is bounded for the case zx?0. Now consider zx < 0, then ?x =?1. zx+1 = zx + vextendsinglevextendsingle vextendsingletan?1 (?x) vextendsinglevextendsingle vextendsingle (6.86) ThelowerboundiscomputedbysubstitutingthelowerboundofEquation6.80intoEquation 6.86 zx+1 ?? ?summationdisplay k=x vextendsinglevextendsingle vextendsingletan?1 (?k) vextendsinglevextendsingle vextendsingle+ vextendsinglevextendsingle vextendsingletan?1 (?x) vextendsinglevextendsingle vextendsingle (6.87) ?? ?summationdisplay k=x+1 vextendsinglevextendsingle vextendsingletan?1 (?k) vextendsinglevextendsingle vextendsingle (6.88) The upper bound is computed by substituting Equation 6.70 into Equation 6.86 zx+1 = zx + vextendsinglevextendsingle vextendsingletan?1 (?x) vextendsinglevextendsingle vextendsingle?zx+1 ?zx + ?summationdisplay k=x+1 vextendsinglevextendsingle vextendsingletan?1 (?k) vextendsinglevextendsingle vextendsingle (6.89) 166 But zx is negative, so zx+1 ? ?summationdisplay k=x+1 vextendsinglevextendsingle vextendsingletan?1 (?k) vextendsinglevextendsingle vextendsingle (6.90) Thus x+ 1?S and the proof is complete. Note that this proof for the CORDIC algorithm is different from Walther?s proof (per- haps a little more formal) [54]. As far is the author is aware, this is an original proof. The conventional CORDIC algorithm has been fully derived at this point. Definition 6.2 summarizes the results of the analysis thus far. Definition 6.2 (Conventional CORDIC Iteration). The conventional CORDIC iteration is defined as xi+1 = xi??i?iyi (6.91) yi+1 = yi +?i?ixi (6.92) zi+1 = zi??i tan?1 (?i). (6.93) where ?i = 1 if zi?0 or ?i = ?1 otherwise. Performing successive CORDIC iterations to compute some desired value is called the CORDIC algorithm. There are infinitely many such ?i that can satisfy Equation 6.66 and Equation 6.67. What is desired is an efficient hardware implementation of Equation 6.41 and Equation 6.42. As has already been mentioned in the BTM and MTM sections, a multiplication or division by 2 is merely a shift operation, which is remarkably efficient in hardware (no combinatorial logic is required). Let |?i| = 2?i, a shift by i operation in hardware. Now let us calculate whether this series converges using the ratio test. limn?? vextendsinglevextendsingle vextendsinglevextendsingle vextendsingle 2?(n+1) 2?n vextendsinglevextendsingle vextendsinglevextendsingle vextendsingle = limn?? 1 2 = 1 2 < 1 (6.94) 167 Clearly, the series then converges. Now let us calculate the interval of convergence. The maximum angle of rotation would be ?max = ?summationdisplay i=0 tan?1 parenleftBig 2?i parenrightBig (6.95) = tan?1 (1) + tan?1 parenleftbigg1 2 parenrightbigg + tan?1 parenleftbigg1 4 parenrightbigg +??? (6.96) = 1.7432865...?0.55pi (6.97) The minimum obtainable rotation angle would be ?min = ??max = ?1.7432865.... Since quarter sine compression is typically used in a DDFS, the 0.55pi is more than sufficient for implementing the SCMF. From this point forward ?i will assumed to be a power of 2. 6.4.2 Conventional CORDIC Figure 6.19 shows the hardware implementation of the conventional CORDIC iteration. Thusthreefulladdersarerequired. Sincezi isatwo?scomplementnumber, thesigndetection block is merely a check on the MSB of zi. The hard-wired shifts are ?free? operations, requiring no additional hardware. The multiplication by?1 can be implemented using the one?s complement technique described in Figure 1.2. Overall this is rather low hardware overhead for a technique that can compute a sinusoid to arbitrary precision. One of the drawbacks of conventional CORDIC implementations is the scalar K from the iterative rotations. As n tends towards infinity, K? = ?productdisplay i=0 parenleftBig? 1 + 2?2i parenrightBig = 1.64676... (6.98) But since Kn is a function of the number of iterations, the conventional CORDIC typically has a fixed number of iterations. One brute force method to is to initalize x0 = 1/Kn and thus the final computation results in cos (?) and sin (?). Other methods to correct for this unwantedscaling, andwhenthescalingcanbeignored, willbediscussedlaterinthischapter. 168 summationtext summationtext summationtext greatermuchi greatermuchi ?1 ?1 ?1 1 0 1 0 1 0 Hard-Wired Shift Hard-Wired Shift Sign Detection xi yi zi tan?1 (2?i) Dx Dy Dz Qx xi+1 Qy yi+1 Qz zi+1 Figure 6.19: Conventional CORDIC Stage Note that the upper bound of the remaining phase error, assuming vectoring mode operations with z0 = ?, after n rotations is ?summationdisplay i=n+1 vextendsinglevextendsingle vextendsingletan?1 parenleftBig 2?i parenrightBigvextendsinglevextendsingle vextendsingle (6.99) The angle remaing phase error is bounded by the small angle approximation for arctangent. Figure 6.20a shows the value of arctangent against each CORDIC iteration as well as the small angle approximation of arctangent at each value. The better question is how the qualityofthesmallangleapproximationchangeswitheachiteration. Figure6.20bshowsthis information by taking the ratio of the magnitude of error of the small-angle approximation for arctangent at a certain iteration and dividing it by the arctangent value of that iteration. This will prove helpful in calculating upper error bounds on the phase and amplitude in the CORDIC algorithm in the following section. 169 0 10 20 30 40 50 60 0 5 10 15 20 Phase Step (degrees) Iteration 2?itan?1parenleftbig2?iparenrightbig (a) arctan Versus Iteration 10?14 10?12 10?10 10?8 10?6 10?4 10?2 100 0 5 10 15 20parenleftbig 2? i? tan ?1 (2? i) parenrightbig/ tan ?1 (2? i) Iteration (b) Quality of Approximation Figure 6.20: arctan Small Angle 6.4.3 Optimizing the CORDIC Algorithm for DDFS The conventional CORDIC algorithm when started at zero phase and initialized such that the scaling factor is normalized covers a phase range of?0.55pi to 0.55pi. Since DDFS systems use quarter wave sinusoidal compression, this is over twice the required range for convergence. We only wish to compute angles in the interval [0,pi/2). This means that the CORDIC algorithm always takes the same first step, since z0 = ? is always greater than or equal to zero. Figure 6.21a shows the upper bound of the ?remaining phase error? at iteration i (computed using Equation 6.99). Starting at iteration 0, ?0 = 0, the phase error upper boundary is the entire pi/2, or 90?, range. After one iteration, the maximum phase error is 54.9?. The question remains of how y0, x0 and z0 should be initialized for the optimization to work correctly. Using Equation 6.52 and 6.53 is clear that x0 = cos (?0) = cos (pi/4) =?2/2 and y0 = sin (pi/4) =?2/2. These values can be pre-computed and stored in the a ROM for initialization. Using Equation 6.68, it is clear that z must be set to z0 = ??pi4 (6.100) 170 10?4 10?3 10?2 10?1 100 101 102 0 5 10 15 20Remaining Phase Error (degrees) 0 5 10 15 20 Phase Resolution (bits) CORDIC Iteration (a) Phase Error Versus Iteration 10?14 10?12 10?10 10?8 10?6 10?4 10?2 100 0 5 10 15 20 K Scaling Error -10 0 10 20 30 40 Amplitude Resolution (bits) Iteration (b) K Versus Iteration Figure 6.21: CORDIC Bit Resolution A brute force approach would be to directly compute this number and use it as a seed for the z-path of the CORDIC. Starting at ?0 = pi/4 means that the CORDIC algorithm only needs to cover the interval [?pi/4,pi/4] or [?0.25pi,0.25pi]. Skipping the first CORDIC stage (i.e. starting i = 1) converges over the interval [?0.305pi,0.305pi] which is sufficient to cover the required interval. Theoptimizationofstartingthevectoratanangleofpi/4 andskippingthefirstCORDIC iteration effectively eliminates one CORDIC iteration. The number of rotations can be further reduced by extending this idea to a general purpose look-up table that initializes the CORDIC algorithm to the mth iteration. The green line of Figure 6.21a shows the numbers of bits of phase resolution obtained for a given upper bound of phase error. From Figure 6.20a and Figure 6.20b, along with Equation 6.64 for the small angle approximation, it shows that each CORDIC iteration gain a single bit of phase accuracy. This can be directly related to an amplitude error and consequently an effective number of bits. Let ?r be the remaining phase error. The sine difference formula, which has been used in nearly every chapter of this 171 document, yields sin (???r)?sin (?) +?r cos (?) for ?rlessmuch1 (6.101) The error is then ?r cos (?), but ? ? [0,pi/2) and therefore the error is bounded by the maximum value of cosine over that interval, or cos(0) = 1. The maximum error is then ?r, which is precisely the remainin phase error. So the green line of Figure 6.21a also predicts the number of bits of amplitude resolution provided by the CORDIC after i iterations without error. Figure 6.22 shows howx0 andy0 are derived by sending the MSBs of the DDFS phase word to a look-up table. Assume that BLUT bits are used for the lookup table. Then the pi/2 CORDIC search range is reduced to: ?ROT = pi2B LUT+1 (6.102) If an offset is introduced into the LUT in the same manner as described in the pi/4 opti- mization offset discussion, an extra iteration can be reduced. Thus a six bit LUT with a half-LSB offset elminates 7 CORDIC operations. That is to say, the first iteration would be i = 8 using Figure 6.21a. AninterestingphenomenonhappenswiththescalingfactorwheninitializingtheCORDIC algorithm. Figure 6.21b shows the scaling factor value of the ith iteration. The green line shows the number of amplitude bits of resolution required at the output before the Kn scal- ing factor to introduce an error above quantization. From Equation 6.44, it should come as no surprise that the amplitude error decreases at a rapid rate, particularly with the 2?2i rate of decrease in error per term. 172 6.4.4 Partial Dynamic Rotation CORDIC The generalized PDR CORDIC architecture with support for conventional CORDIC stages is shown in Figure 6.22. This component is used the radar DDFS system described in this chapter. Consider a conventional CORDIC with ?0 = 0 and let ? = 5?. The first step would be tan?1(20) = 45?. This step far overshoots the desired angle of rotation. There is enough information to avoid this overshoot because the starting angle is known, the desired angle is known and the amount of rotation for a given 2?i is known. Keep in mind that ? is a binary word. Looking at the MSBs of word give an idea of how much rotation is required. In this case, starting at iteration i = 3 yields tan?1 (2?3) = 7.125?. Coarse sin/cos LUT Conv. CORDIC Stages PDR CORDIC Stages QuadratureOp eration Conv. z-path Optimized z-path PT BPT BLUT = MSBs(BQ) BROT = LSBs(BQ) BAL BAL AsBA AcB A zr 2 MSBs(BPT) ? ?,? BQ Figure 6.22: PDR CORDIC Architecture One of the major drawbacks of the PDR CORDIC is that Kn for a fixed number of stages changes based on the requested angle ?. It has already been shown in Figure 6.21b that if the CORDIC is initalized before rotations begin, the impact of Kn on the magnitude of the output is negligible. Figure 6.23 shows the block diagram for a PDR CORDIC rotation stage. The dynamic rotation selection (DRS) logic shown in Figure 6.23 becomes quite simple if the CORDIC stages are seeded with enough resolution. Recall from the small angle approximation in Figure 6.20a that tan?1(2?i)?2?i for large i. Then the check to find the appropriate rotation angle is merely a check on the MSB position of the remaining phase 173 summationtext summationtext summationtext greatermuchj greatermuchj ?1 ?1 ?1 1 0 1 0 1 0 Barrel Shifter Barrel Shifter Sign Detection Dynamic Rotation SelectionDRS tan?1 (2?j) LUT xi yi zi Dx Dy Dz Qx xi+1 Qy yi+1 Qz zi+1 j ?j Figure 6.23: PDR CORDIC Stage Design BA BP SINAD SFDR (W.C) Frequency Area BTM 12 32 66 70 dBc 680 MHz 0.013 mm2 MTM 11 32 N/A 58 dBc 1.0 GHz 0.008 mm2 CORDIC (RoC) 12 32 64 66 dBc 1.1 GHz 0.011 mm2 CORDIC (ORA) 12 32 73 78 dBc 680 MHz 0.054 mm2 Table 6.3: Summary of DDFS Designs (zi). This is because we are comparing a 2?i number, which is a value represented by a single bit. Table 6.3 summarizes the size of the various lookup tables implemented and fabricated. Inallcases, theSFDRandSNRoftheDACwhereseveralordersofmagnitudeworsethanthe digital code word produced by the DCDO. Therefore the frequency is a measured operating frequency, but the SINAD and SFDR are HDL simulated DCDO outputs. 174 6.5 Stretch Processing DDFS Architecture One pulse compression technique for which a DDFS with LFM is well suited is stretch processing [42]. In fixed chirp rate stretch processing, a high bandwidth linear chirp of chirp rate (?) with a fixed time length (TTX) is transmitted into the environment. The transmit period times the chirp rate yields the effective bandwidth of the transmitted chirp (?TX). This bandwidth sets the range resolution (i.e. the radars ability to uniquely distinguish closely spaced targets). A first order approximation of the range resolution (RRES) of a stretch processing system is given in Equation 6.25. During reception of the signal reflected from a target, a ?destretch? signal of the same chirp rate ? as the transmitted signal, but with a longer time duration (TRX) and con- sequently wider bandwidth, is used to demodulate the signal. The difference in the time duration between the transmit and receive chirps sets the range interval of the radar. The range interval is the ?window? through which the radar can detect objects. It is the band- width of the destretch signal (?RX), and not the transmitted pulse, that sets the system bandwidth requirement for the DDFS. Figure 6.24 shows the top level DDFS architecture used for the radar system. The components of radar directly related to this dissertation are the DDFS and corresponding control circuitry. Two DACs for in-phase and quadrature phase sinusoid generation are implemented in the DDFS system. FrequencyAccum.F BF PhaseAccum.PT BPT QuadraturePDR CORDIC AIBA InverseSinc Filter DACI AQBA InverseSinc Filter DACQ GaloisPRNG (65?b) BDD PhaseModulator BPM RadarController FSTAR T FSTEPPSTAR T PPA USE C C Figure 6.24: Block Diagram for Radar DDFS 175 Figure 6.25: Die Photograph of RoC (DDFS Zoomed) DCDO/SPI DACs The additive dithering technique that was used in both the MTM DDFS and BTM DDFS was also employed for the RoC DDFS. However, the phase truncation spurs rested so far beneath the noise floor of the DAC, that no spurious improvement was detected. Figure 6.25 is the die photo of the RoC chip zoomed around the DDFS. The DAC and DCDO/SPI are labelled to help make sense of the silicon. 6.5.1 Inverse Sinc Filter The high speed DAC implementation used in the DDFS inherently applies a zero order hold (ZOH) operation on the output waveform. The ZOH transfer function is actually a sinc function in the frequency domain and its reason for existence is described in Section 7.2. As stated in Section 6.5, the DDFS generates a wide bandwidth ?destretch? chirp signal that performs the pulse compression step of the radar. It is important that the amplitude of the generated chirp not fluctuate with frequency (and hence distort the radar measurement). One solution [56] is to apply an inverse sinc operation using a finite impulse response (FIR) filter to shape the waveform before sending it to the DAC. In the DDFS, two FIR filters with 9-bit coefficient resolution, one for the I-path and one for the Q-path, were implemented after the PDR CORDIC. Pipelining the FIR filter was essential to allow it to reach 1 GHz operation. Figure 6.26 shows a block diagram of the inverse sinc filter component as implemented, wherec0 =?1,c1 = 4,c2 =?16, andc3 = 192 for nine bit coefficient resolution. Note that after each addition the results are stored in a 176 Z?1 Z?1 Z?1 Z?1 Z?1 Z?1 Z?1x summationtext Z?1 Z?1 c0 summationtext Z?1 Z?1 c1 summationtext Z?1 Z?1 c2 Z?1 Z?1 Z?1 c3 summationtext summationtext Z?1 Z?1 summationtext y Figure 6.26: Inverse Sinc FIR Filter (Block Diagram) pipeline register. The coefficients from Samueli?s work were verified to be optimal in a min- max sense using a linear programming (LP) algorithm. An LP algorithm for finding optimal coefficients was developed with Python programming. The coefficients where found to be in full agreement with previously published work [56]. The measured results match the theory quite well (Figure 6.31a). To test the filters, a 90 MHz sweep with the clock frequency at 200 MHz was generated using the DDFS. Four markers are placed equally across the waveform and the amplitude was measured. With the inverse sinc filter deactivated, the measured values across the spectrum were -23.60 dBm, -24.03 dBm, -24.87 dBm and -26.37 dBm. This roll off indicates that the DACs do indeed exhibit ZOH behavior. Next, the same 90 MHz sweep with the clock frequency at 200 MHz was generated with the inverse sinc filter active. The four markers now read -27.24 dBm, -27.17 dBm, -27.09 dBm and -27.30 dBm. Here the gain variation is less than 0.21 dBm. 6.5.2 Radar Controller The DDFS is tightly integrated with a digital radar controller. Several default modes of operation are programmable for the DDFS depending on the radar operating environment. The default mode is stretch processing mode for longer range target acquisition. A BPSK 177 mode is available for detecting objects close in to the radar with built-in Barker code mod- ulation schemes. This mode is required because the RoC chip must operate in half-duplex mode from a single antenna and thus a long chirp would prevent detection of close in targets. There are also QPSK and general LPM modes for experimenting with different detection techniques in the lab. Several device characterization and test modes for the DACs, filters and DCDO were also implemented such as a single tone mode. The basic operation of the radar transceiver in stretch processing mode and the algo- rithm is described as follows: 1. Initialize common analog components such as the PLL and bandgaps. 2. Deactivate the analog receiver circuitry. 3. Activate the analog transmitter circuitry. 4. Load the transmitter frequency control words into the start frequency, stop frequency and step frequency DDFS registers. 5. Clock the start frequency state into the frequency accumulator. Clock the start phase state into the phase accumulator. 6. Run the DDFS until the transmitter stop frequency control word is reached. 7. Load the transmitter timer, store the old receiver wait time and start waiting. 8. While waiting, activate the analog receiver circuitry. 9. Deactivate the analog transmitter circuitry. 10. Load the receiver frequency control words into the start frequency, stop frequency and step frequency DDFS registers. 11. Run the DDFS until the receiver stop frequency control word is reached. 12. Load the receiver timer, store the previous transmitter wait time, and start waiting. 178 13. Proceed to state (2). TheLPMmodesoperatewithasimilaralgorithmexceptalltheLFMbehaviorisdeactivated. 6.6 Design of 12-bit CMOS DAC Two fully differential, current steering 12-bit, 1 GHz CMOS DACs convert the digital output of the DCDO to a voltage. The DACs use a segmented architecture with 6-bits of thermometer coding for the MSB and 6-bits of binary coding for the LSB. Figure 6.27 is a block diagram of the DAC. The output of the DAC has 20 dB of digitally controlled, programmable gain. The gain is programmed by modifying the values of current reference and thus reducing the magnitude of the current through the current steering switches. This reduces the DACs operating frequency, which is described in Section 7.5.1. The DAC uses a triple-centroid switching scheme made popular in [57], that randomizes spurs due to current cell mismatch. The clock tree is a balanced H-tree in an attempt to minimize clock skew along the wire paths. A single base transistor size was chosen such that Registers ThermometerRowDecoder ThermometerCol.Decoder 63-bitThermometer Decoder Latchesand CurrentSwitchMatrix CurrentSourceMatrix DelayStages D[11:0] D[11:9] D[8:6] D[5:0] Tr[6:0] Tc[6:0] T[62:0] B[5:0] im RL ip RL Figure 6.27: Block Diagram of 12-Bit CMOS DAC the variation exhibited through Monte Carlo simulations kept the DAC DNL within bounds. Figure 6.28 demonstrates how the single unit transistor is used to build the current source network. [ht] The thermometer coded current sources have a cascade transistor added to increase the output impedance of the DAC. The DAC also implements custom high speed latches that convert the single ended digital input to a differential signal. These latches aid 179 M=16Vref I64 Vcas Ti RL Ti RL M={8,4,2,1}Vref B5,4,3,2 B5,4,3,2 I32,16,8,4 M=1Vref M=1 B1 B1 I2 M=1 M=1 M=1 M=1 B0 B0 I1 VDD Figure 6.28: DAC Current Source Sizing D CLK VDD GND Figure 6.29: Synchronization Circuit for 12-Bit CMOS DAC in synchronizing the the digital bits sent to the DAC from the synthesized digital component. The synchronization reduces timing mismatch and consequently the improves the spurious response of the DAC. The total area of the DAC, including the digital front-end, is 400 ?m ?500 ?m. The SFDR of the DAC is approximately 55 dBc (better than 60 dBc at certain frequencies) through about two thirds of the Nyquist frequency. The measured narrowband noise, where narrowband is measured before the third harmonic of the fundamental tone, is approximately 90 dBc. The DAC high speed clock distribution tree makes use of an H-tree network as shown in Figure 6.30. This technique is used to equalize the amount of static delay between current steering cells on the clock distribution network. Any statis mismatch will directly translate into deterministic spurs when a periodic varying signal drives the DAC. 180 CLK Figure 6.30: Clock Tree for 12-Bit CMOS DAC 6.7 Measurements TheperformanceofDDFSissummarizedinTable6.4. Fromapreviousimplementation, the research has shown that the digital component can run without errors up to 1.1 GHz. However, due to process manufacturing issues with this particular run, the system could only run at 650 MHz. Another version will be resubmitted without modification that should allow it to reach the proper operating frequency. The static DNL and INL of DAC cannot Table 6.4: DDFS Performance Summary Parameter Value fclk 650 MHz SFDR (low) 55 dBc (at 1.26 MHz) SFDR (mid) 60 dBc (at 88 MHz) SNR (narrowband) 91 dBc Power (Analog) 150 mW Power (Digital) 700 mW DAC Area 400 ?m?500 ?m (2X) Digital Area (inc. SPI/control logic) 400 ?m?800 ?m be measured as the inputs of the DAC are not accessible from the pad frame of the chip. The SFDR however indicates that the DAC performs poorly in INL. This is likely due to the programmable gain stage of the DAC, as the third order harmonic was strongly dependent on the gain state. 181 Lastly, Figure 6.31a shows a 100 MHz chirp with the inverse sinc filter activated. The attenuation at low frequencies is from the test setup, in which the output of the packaged RoC chip is AC coupled to the spectrum analyzer. Figure 6.31b shows the waveform without the inverse sinc filter activated. Because of the scale of the waveform it is difficult visually determine the impact of the filter on the waveform. However, looking at the readings from the markers, the impact of the inverse sinc filter is more easily understood. The output spectrum when the inverse sinc filter is activated does not attenuate at higher frequencies. The deviation between the waveforms becomes even more pronounced as the output of the DAC approaches the Nyquist frequency. (a) Chirp with Inverse Sinc On (b) Chrip with Inverse Sinc Off Figure 6.31: Inverse Sinc Filter Figure 6.32 shows the DDFS operating in single tone mode. The main tone is located at 85.8 MHz and the largest spur is the fourth order harmonic of the main tone located at 343.2 MHz and is 57 dB down. This tone is measured before the low pass filter. This work briefly describes a fully functional DDFS for stretch processing radar applications. The performance of the digital logic of the DDFS is competitive with other published DDFS implementations given the feature size of the technology, almost certainly when frequency resolution and features are considered. 182 Figure 6.32: DDFS with Single Tone Output 183 Chapter 7 Digital-To-Analog Converters (DAC) As digital circuitry evolves to smaller geometry nodes that result in higher speeds, lower power and smaller area, more functionality is relegated to the digital processing domain. The transition from the discrete-time digital domain to the continuous-time analog domain is performed by a Digital-to-Analog Converter (DAC). In modern designs, the performance of the DAC dominates the performance of the DDFS [4], since small feature size CMOS allows spectrally pure digital sinusoids to be generated with little overhead at sufficiently high speeds. It is not uncommon to have a digital SFDR and SNR that are 10 to 20 dB better than what can be achieved by a DAC in the same technology. The term DAC is general and includes small devices that tune static circuit parameters to massive RF DACs as those designed by Analog Devices [58] or e2v [43]. The DACs discussed in this thesis are > 1 GSample/s current steering (CS) designs. The design issues discussed include clock and data timing errors and frequency dependent non-linearities that do not plague DC DACs. However, the discussion on current source mismatch, static INL, static DNL and segmented architectures are relevant to DC CS DACs as well as high speed CS DACs. 7.1 Basic Sampling Theory The DDFS is a sampled-data system, and thus a brief explanation of the sampling process is beneficial in understanding the behavior of the device. It will also benefit later analysisinSection7.2inwhichdifferentDACswitchingschemesarediscussed. Animportant operator must be defined to aid in the discussion of sampling theory, namely, the Dirac 184 Delta ?function.? Here the required mathematical theory to formalize the Dirac delta as a distribution is ignored and a more axiomatic treatment is provided [59]. Definition 7.1 (Dirac Delta). The Dirac delta in a one-dimensional real space can be defined as a heuristic function, symbollicaly denoted as ?(x), such that ?(x) , ?? ?? ??? +?, x = 0 0, xnegationslash= 0 (7.1) and satisfies the identity given in Equation 7.2 integraldisplay ? ?? ?(x)dx = 1 (7.2) Of course, the following ?proofs? about the properties of the Dirac delta all leave some- thing to be desired, as the definition of the Dirac delta used in this work cannot be considered mathematically formal. However, if one assumes that the Dirac delta operates similar to a function within an integral, then these ?proofs? hold. Now consider the behavior of the Dirac delta when used in an integral, as its usefulness in mathematically describing the sample operation becomes important. integraldisplay ? ?? x(t)?(t)dt = integraldisplay ? ?? x(0)?(t)dt = x(0) integraldisplay ? ?? ?(t)dt = x(0) (7.3) Equation7.3holdsassuming, ofcourse, thatx(t) isdefinedatzero. Theintegralofafunction multiplied by the Dirac delta takes on the value of the function at which the Dirac delta is non-zero. Ideal sampling ?captures? the value of a continuous function at an instant in time, which certainly sounds similar to the mathematical operation of the Dirac delta. If one wishes to acquire the value ofx(t) once everyT seconds, or sample the signalx(t) with an interval of T, then a series of Dirac deltas equally spaced in time is needed. Let us 185 then define a ?pulse train? of Dirac deltas as follows, ?T (t) , ?summationdisplay k=?? ?(t?kT) (7.4) where T is the period of sampling and k is an integer. It is interesting to note that ?T is periodic with T. This can be shown as by proving that ?T(t) = ?T(t+nT). ?T (t+nT) = ?summationdisplay k=?? ?(t+nT?kT) (7.5) = ?summationdisplay k=?? ?(t+ (n?k)T) = ?T(t) (7.6) The last step is possible because the summation limits tend to infinity. So shifting the sequence a finite number of T in any direction results in the same funtion. Since ?T(t) is periodic with T, it can represented by its Fourier series. The real valued Fourier series was described by Definition 2.1. Here the complex Fourier series is used to simplify notation. ?T(t) = ?summationdisplay n=?? ? ?1 T integraldisplay T/2 ?T/2 ?summationdisplay k=?? ?(t?kT)e?j2pint/Tdt ? ?ej2pint/T (7.7) = ?summationdisplay n=?? parenleftbigg1 Te 0 parenrightbigg ej2pint/T = 1T ?summationdisplay n=?? ej2pint/T (7.8) The summation over k in Equation 7.7 was dropped because the Dirac delta is only zero for k = 0 as t varies from?T/2 to T/2. Now the Fourier transform of ?T(t) can be computed in a straight-forward manner. This leads to one of the most interesting results in sampling 186 theory. F{?T(t)}= integraldisplay ? ?? parenleftBigg1 T ?summationdisplay n=?? ej2pint/T parenrightBigg e?j2piftdt = 1T integraldisplay ? ?? ?summationdisplay n=?? e?j2pit(f?n/T) = 1T ?summationdisplay n=?? ?(f?n/T) (7.9) So the Fourier transform of the Dirac comb function is another Dirac comb but in the frequency domain. The spacing of the impulses is the sampling frequency (1/T) of the original sampling operation. One of the theorems related to sampling, fundamentally important to DAC design and universally taught to electrical engineering students is the Nyquist-Shannon Sampling Theo- rem. The first publication of this powerful theorem as it relates to the field of communication is provided by Shannon in 1948 [60]. The Nyquist-Shannon sampling theorem as given by Bernard Widrow [61] is described in Theorem 7.1. Theorem 7.1 (Nyquist-Shannon Sampling Theorem). If the sampling radian frequency ?s is high enough so that |X (j?)|= 0 for |?|? ?s2 (7.10) where X(j?) is the CTFT of x(t) then the sampling condition is met, and x(t) is perfectly recoverable from its samples. In more common parlance, Theorem 7.10 states that a bandlimited signal x(t) can be perfectly reconstructed from its samples if the sample rate is at least twice the bandwidth. In the following section, the foundations of the Nyquist-Shannon sampling theorem will be built. 187 7.2 DAC Fundamentals As part of this work, we will briefly discuss the fundamentals of DAC behavior. A DAC transforms a digital code word into a physical, electrical quantity. Typically this electrical quantity is a voltage (i.e. low impedance output) or a current (i.e. high impedance output). After the DAC generates the physical quantity, it is filtered by a low pass analog reconstruction filter or something that approximates such a filter. This need not be the case however, as output signals with frequencies higher than the first Nyquist zone can be synthesized by applying a bandpass filter to DACs with certain types of responses. The following analysis makes heavy use of convolution. So in keeping with the spirit of this thesis, it is presented here along with one of its more important properties (Theorem 7.2) [59]. Definition7.2 (Convolution). The convolution off(t) andg(t), denoted (fstarg)(t), is defined mathematically as (f starg)(t) = integraldisplay ? ?? f (?)g(t??)d? (7.11) Theorem 7.2 (Fourier Convolution Theorem). Let x(t) and y(t) be continuous function of t, then F{(xstary) (t)}=F{x(t)}?F{y(t)} (7.12) F{x(t)?y(t)}=F{x(t)}starF{y(t)} (7.13) This is generally referred to as the convolution theorem. 188 Proof. Let x(t) and y(t) be continuous functions whose Fourier transform exists. Then the Fourier transform of the convolution of x(t) and y(t) is F{(xstary) (t)}= integraldisplay ? ?? (xstary) (t)e?j?tdt = integraldisplay ? ?? parenleftbiggintegraldisplay ? ?? x(?)y(t??)d? parenrightbigg e?j?tdt (7.14) The order of integral operations can be rearranged provided that Fubini?s theorem [62] is satisfied by the double integral of Equation 7.14 to yield F{(xstary) (t)}= integraldisplay ? ?? x(?) parenleftbiggintegraldisplay ? ?? y(t??)e?j?tdt parenrightbigg d? (7.15) Substituting r = t+? for t into Equation 7.15 and noting that dt = dr gives the final result F{(xstary) (t)}= integraldisplay ? ?? x(?) parenleftbiggintegraldisplay ? ?? y(r)e?j?r+?j??dr parenrightbigg d? = parenleftbiggintegraldisplay ? ?? x(?)e?j??d? parenrightbiggparenleftbiggintegraldisplay ? ?? y(r)e?j?rdr parenrightbigg =F{x(t)}?F{y(t)} (7.16) The ideal DAC response is a series of weighted impulses [63] yIDEAL(t) = ?summationdisplay n=?? ?x[n]??(t?nT) +? (7.17) where ?(t) is the Dirac delta function defined in Equation 7.1, ? is the gain of the DAC (see Section 7.3.1) and? is the offset of the DAC (also see Section 7.3.1). Herex[n] is understood to be the value of the signal x at time nT and thus x(nT) = x[n]. As the DAC metrics considered in this work are measures of the effects of non-linearity on the input signal, linear terms ? and ? can be ignored by setting ? = 1 and ? = 0. Note that Equation 7.17 can 189 be derived from the multiplication of the Dirac comb of Equation 7.4 in Section 7.1 and ignoring the DC offset ? and linear gain. x(t)??T(t) = x(t) ?summationdisplay n=?? ?(t?nT) = ?summationdisplay n=?? x(t)?(t?nT) (7.18) In communications, the spectrum of the generated signals is critical and as DACs are the central actuators for such systems, it is also critical in the analysis of this chapter. Thus the mathematical tool for computing spectrum of continuous functions is presented. The continuous time Fourier transform (CTFT) is the given in Equation 7.19, X (?) = integraldisplay ? ?? x(t)e?j?tdt (7.19) X (f) = integraldisplay ? ?? x(t)e?j2piftdt (7.20) where x(t) is a complex-valued function of time, ? ? R is the continuous-time angular frequencyandX (?) isthetransformedsignalandisgenerallycomplex. Thesecondequation describes the Fourier transform as a function of the ordinary frequency, f. The CTFT, as withothervariantsoftheFouriertransform, isinvertible. Theinverse(backwards)transform is given by Equation 7.21. x(t) = 12pi integraldisplay ? ?? X (?)ej?td? (7.21) As an example, consider the following rectangle function (Equation 7.22, also known as the normalized box car function. Figure 7.1a shows the time domain response of the the rectangle function. rect(x) = ?? ???? ?? ???? ??? 0 |x|> 0.5 0.5 |x|= 0.5 1 |x|< 0.5 (7.22) 190 0 0.2 0.4 0.6 0.8 1 -1 -0.5 0 0.5 1 rect (x/T s) x (a) Rectangle Function (Time) -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -1 -0.5 0 0.5 1 R(2 pif ) Ordinary Frequency (Hz) Ts = 1T s = 2T s = 4 (b) Rectangle Function (Spectrum) Figure 7.1: Rectangle Function Plots Since we will soon be discussing sampling, we will scale the rectangle function to a single sample period Ts. We can then apply the continuous time Fourier Transform (7.19) on the modified rectangle function. R(?) = integraldisplay ? ?? rect parenleftbigg t Ts parenrightbigg e?j?tdt = integraldisplay 0.5Ts ?0.5Ts 1e?j?tdt = 1?j? parenleftBig e?j?t parenrightBigvextendsinglevextendsingle vextendsingle0.5Ts?0.5T s = 1?j? bracketleftBig e?0.5j?Ts?e0.5j?Ts bracketrightBig (7.23) 191 Applying Euler?s formula (Equation 1.31) to Equation 7.23, the expression simplifies to a scaled sinc function. R(?) = 1?j? bracketleftBig e?0.5j?Ts?e0.5j?Ts bracketrightBig = 1?j? bracketleftBigg cos parenleftBigg ?Ts?2 parenrightBigg +jsin parenleftBigg ?Ts?2 parenrightBigg ?cos parenleftBigg Ts?2 parenrightBigg ?jsin parenleftBigg Ts?2 parenrightBiggbracketrightBigg = 1?j? bracketleftBigg ?2jsin parenleftBigg Ts?2 parenrightBiggbracketrightBigg = sin parenleftBig Ts?2 parenrightBig ? 2 (7.24) This analysis will become important in following paragraphs when the output spectrum of various DACs are considered. The unnormalized sinc function is defined as sinc(x) , sin (x)x (7.25) and is both non-causal and infinite. Setting Ts = 1, it follows that R(?) = sinc(?/2). Figure 7.1b shows the frequency response of the rectangular function for various Ts. Now we compute the Fourier transform of the more complexyIDEAL(t) of Equation 7.17. FCTFT{yIDEAL(t)}=F braceleftBigg ?summationdisplay n=?? x[n]?(t?nT) bracerightBigg = integraldisplay ? ?? ?summationdisplay n=?? x[n]?(t?nT)e?j2piftdt = integraldisplay ? ?? [x[0]?(t) +x[1]?(t?T) +x[2]?(t?2T)???]e?j2piftdt = x[0]e?j2pif +x[1]e?j2pifT +??? = ?summationdisplay n=?? x[n]e?j2pifnT (7.26) 192 Notice that Equation 7.26, normalizing T = 1, is exactly the same as the DTFT shown in Equation 2.31. Thus what was stated as sampling in the time domain in words now has a mathematical representation. The conventional CS DAC updates the data output with a new code word once every sampling interval and holds the value until the DAC is updated again. This is sometimes called a zero-order hold or a sample and hold. A DAC that implements such an hold is called a non-return-to-zero (NRTZ) DAC. Figure 7.2a provides an example of the time domain output of an NRTZ DAC. If the DAC returns to zero (RTZ) after waiting T0 seconds, then the DAC is referred to an RTZ DAC. Figure 7.2b shows the output of an RTZ DAC with a 50% duty cycle, which is equivalent to setting T0 = Ts/2 in Equation 7.33. -10 -5 0 5 10 0 2 4 6 8 10 12 14 16 DA C Output (V) Time (ns) (a) Non-Return-to-Zero DAC Output (4 Bits) -10 -5 0 5 10 0 2 4 6 8 10 12 14 16 DA C Output (V) Time (ns) (b) Return-To-Zero DAC Output (4 Bits) Figure7.2: INLCurvesforThermometer-CodedDACModelswithFiniteOutputImpedance Current Sources There are two methods for describing how the hold effect shapes the response of the DAC. As noted by Doris et al. [64], the NRTZ DAC response can be formulated as the convolution of the unit step response and a variation of the DAC response given in Equation 7.17, yNRTZ(t) = u(t)star ?summationdisplay n=?? (x[n]?x[n?1])?(t?nT) (7.27) 193 whereu(t) is the unit step response, also known as the Heaviside step function, and is defined in Equation 7.28. u(t) = ?? ?? ??? 1, t?0 0, t< 0 (7.28) An alternative representation noted by the author of this work is the convolution of a rectan- gular function (Equation 7.22) of the width the sample period with the ideal DAC response yNRTZ(t) = rect parenleftbigg t Ts parenrightbigg star ?summationdisplay n=?? x(t)?(t?nTs) (7.29) Referring back to Theorem 7.2, convolution in the time domain is equivalent to mul- tiplication in the frequency domain (Fourier transform domain). The Fourier transform of rect(t) was calculated in Equation 7.24. The right-hand term is simply the ideal output spectrum of an ideal DAC (or the DTFT of the sequence synthesized by the DAC). FCTFT{yNRTZ(t)}=F braceleftbigg rect parenleftbigg t Ts parenrightbiggbracerightbigg ?F{yIDEAL(t)} (7.30) = ? ?sin parenleftBigT s? 2 parenrightBig ? 2 ? ??F{yIDEAL(t)} (7.31) The output then has a weight sinc filtered output response. This attentuation was the reason for the inverse sinc filter of radar DDFS, described in Section 6.5.1. Likewise, an RTZ DAC output response can be formulated as the convolution of the unit step and two time shifted Dirac delta functions [64] yRTZ(t) = u(t)star ?summationdisplay n=?? x[n] (?(t?nT)??(t?T0?nT)) (7.32) 194 or, again, as the convolution of a rectangular function of width T0 < Ts (typically Ts/2 in literature). FCTFT{yRTZ(t)}=F{rect(t/T0)}?F{yIDEAL(t)} (7.33) = ? ?sin parenleftBigT 0? 2 parenrightBig ? 2 ? ??F{yIDEAL(t)} (7.34) From Figure 7.1b, the shorter the pulse width, the less attenuation of the output spectrum due to value holding. This means that the inverse sinc filter requirement can be removed or the order of the filter reduced by changing the DAC output characteristic. To quantify the effect, let T0 = Ts/2, F{yRTZ(t)} F{yNRTZ(t)} = sin parenleftBig T0 ?2 parenrightBig yIDEAL(t) ? 2 ? ? 2 sin parenleftBig Ts?2 parenrightBig yIDEAL(t) = sin parenleftBig T0 ?2 parenrightBig sin parenleftBig Ts?2 parenrightBig (7.35) Performing a Taylor series expansion (Definition 4.1) on the numerator and denominator sin parenleftBig T0 ?2 parenrightBig sin parenleftBig Ts?2 parenrightBig = T0? 2 ? T30 ?3 8?3! + T50 ?5 32?5! ???? Ts? 2 ? T3s ?3 8?3! + T5s ?5 32?5! ???? (7.36) = T0? T30 ?2 4?3! + T50 ?4 16?5! ???? Ts?T3s ?24?3! + T5s ?416?5! ???? (7.37) Now we substitue Ts = 2T0 into the previous equation and compute the final result. F{yRTZ(t)} F{yNRTZ(t)} = T0?T30 ?24?3! + T50 ?416?5! ???? 2T0? 8T30 ?24?3! + 32T50 ?416?5! ???? (7.38) = 1? T20 ?2 4?3! + T40 ?4 16?5! ???? 2? 2T20 ?23! + 2T40 ?45! ???? (7.39) But T0 and Ts are generally much less than one, a 1 GHz DAC would has a 1 ns clock period. Therefore the attenutation is approximately 1/2 when returning to zero. Note that if T0 becomes sufficiently small higher Nyquist zones of the DAC can be used. 195 7.3 DAC Performance Metrics DACs operate in a wide range of environments with an equally wide range of require- ments. A control DAC for a microelectromechanical system (MEMS) may need only operate at a few kilohertz sample frequency but may require a large output voltage and monotinicity over process variation. A DAC for a high-speed communications link may only require a few hundred millivolts of output swing but may have to operate up to several gigahertz. The qualities of both these DACs can be described by their static and dynamic performance. The remaining analysis of this chapter aims to aid designers in creating high performance DACs. When writing about ?high performance? devices, we want to be precise in the describing the measure of that performance. For the purposes of this dissertation, the static measures of concern are integral non-linearity, differential non-linearity, and static power consumption. The dynamic measures of concern are spurious free dynamic range, signal to noise ratio, sample frequency and total harmonic distortion. These performance metrics are influenced by a wide variety of effects. 7.3.1 Static DAC Performance Static errors are both the simplest DAC design errors to understand and the simplest to correct. They therefore serve as an adequate starting place for DAC performance analysis. It is entirely possible to degrade the overall performance of the DAC by failing to weight individual DAC elements such that the contribution of static errors to the output are zero. This can be particularly bad in high speed CS DAC designs, where device sizing is small and device mismatch becomes significant. The two static performance number discussed in this chapter are Integral non-linearity (INL) and Differential non-linearity (DNL). Both these errors are direct causes for harmonic distortion in the DACs. While not nearly as important in high-speed CS DACs as non-linear errors, two linear errors are mentioned for completeness. Offset error is defined as the linear deviation of the DAC output from the intended output applied to every DAC code. Figure 7.3a presents 196 Output (LSB) Code Offset Error01 2 3 4 5 6 7 000001010011100101110111 (a) Offset Error (3 Bits) ?1 ?0 Output (LSB) Code Gain Error= ?1??001 2 3 4 5 6 7 000001010011100101110111 (b) Gain Error (3 Bits) Figure 7.3: Graphical Explanation of Gain and Offset Errors a graphical explanation of offset error in a DAC. Gain error is defined as the deviation in the gain of the DAC versus the intended design target. Figure 7.3b provides a graphical explanation for gain error. Note that neither gain or offset errors contribute to the spurious performance of the DAC. 7.3.2 INL Integral non-linearity (INL) is a measure of the deviation of the static transfer function of the DAC from some ideal linear transfer function. The measure is generally normalized to the LSB value of the DAC for reporting in publications. The two most widely used methods for determining the ideal linear transfer characteristic are end-point to end-point and least- squares linear (?best?) fit. Both of these techniques compensate for linear errors, typically gain and offset error, as required by the IEEE definition for INL [65]. The following is the official description of INL for an N-bit DAC. INL[k] = Iout[k]?k?IlsbI lsb ; Ilsb = Iout[2 N?1] (2N?1) (7.40) 197 where Iout[2N?1] is the maximum output of the DAC (i.e. assumes that the DAC output increases in magnitude with increasing k) and Ilsb is the LSB step of the DAC using the end-point to end-point line approximation. For the equation of a line, we use: y = mx+b (7.41) where m is the slope of the line and b is the y-intercept of the line. For the DAC, x is the code word of the DAC, m is the LSB value of DAC used in our INL and DNL equations and b is the offset error. Figure 7.4 provides a graphical explanation of INL using a 3-bit DAC. The solid line is an end-point to end-point fitted line, the solid dots are the actual DAC output values, the x-axis is the DAC code input. So the solid line from Equation 7.41 Figure 7.4: Graphical Explanation of INL and DNL Output (LSB) Code000001010011100101110111 0 1 2 3 4 5 6 7 INL010 d111 = y111?y110 DNL111 = d111?LSB can be rewritten in a discrete form as: Iout[n] = IlsbA[n] +b (7.42) whereA[n] is the DAC input code word at thenth sample. This is in keeping with the output code from the DCDO in previous chapters. The end-point to end-point method takes the difference of the output of the DAC at the maximum and minimum output code words and 198 then adjusts out the offset error. The linear least squares fit, described in detail in Section 6.1.3, finds the line that minimizes the mean square error between itself and the actual DAC output data points. Several common sources of static INL, and DNL for that matter, in CS DACs have been described in literature. ? Finite output impedance of the DAC current sources [66], [67]. ? Mismatch in current sources due to local process variation, transistor and resistor mismatch, etc. [68] ? Voltage and temperature dependent resistive load variation (particularly if a polycrys- talline silicon resistor [69],[66] is used). Static INL degrades the spectral purity of the generated signal. This can be demon- strated by applying a sinusoid through the DAC transfer characteristic. This will be per- formed in subsequent sections. In order to start quantifying the static error for current steering DACs, static DAC models must be created. 7.3.3 DAC Models Performing a first order analysis of a current steering DAC provides insight into a few of the major error sources that arise from the basic architecture. Figure 7.5 shows a single- ended, binary weighted current-mode DAC architecture with quite a few assumptions and simplifications. There are no output frequency dependent impedances as no load or source capacitance is considered, for the moment ignore CL is the diagram. The switches are ideal and switch instantaneously. The on resistance of the current source, ro, is assumed to scale linearly with the increasing current. We will see that even some of the simplest models of the CS DAC have a non-linear transfer function and thus produce a spurious response when driven with a periodically repeating input. In Figure 7.5, RL is the load resistance, Iu is the LSB value of the current source, bi is the ith bit of the code word A that is fed to the DAC, 199 roIu b0 b1 r0 22Iu RL VCC CL VCC Vout bn ro 2n2nIu Figure 7.5: Simple Single-Ended Binary-Weighted Model b0 is the least significant bit (LSB) and bn, where m = BA?1, is the most significant bit (MSB). We note that this model can be transformed to an equivalent thermometer coded DAC (Figure 7.6) and the transfer function analysis will remain the same. This is possible since the current sources add in parallel and the resistances scale in the binary model. For roIu t0 t1 r0Iu RL VCC CL VCC Vout tn roIu Figure 7.6: Simple Single-Ended Thermometer Model Figure 7.6, the number of switches closed at sample n is equal to the value of A[n] from Equation 7.42. The total number of switches is NA = 2BA?1, where BA is the number of bits required to represent in the DAC input code. Using this model, we can calculate the transfer function of the DAC model. When A[n] = 0, all of the bits, ti are zero. For this exercise, a bit value of zero indicates an open switch. In that case, no current is drawn and the output Vout = VCC. 200 Now consider the scenario when A[n] = 1. In this case, t0 = 1 and t2 =???= tn = 0. So the circuit in Figure 7.6 reduces to Figure 7.7. This is effectively one output impedance in Iu RL VCC ro Vout Figure 7.7: Single-Ended Single Bit Active parallel with the resistive load. Now we note that as the switches close, the resistors combine in parallel. From circuit theory, the parallel combination of k resistors of the same value is R/k. Iout[k] = kIu + Vout[k]r o/k VCC?Vout[k] RL = kIu + kVout[k] ro Vout[k] = roVCCr o +kRL ? kRLroIur o +kRL (7.43) Figure 7.8a shows the INL of Figure 7.6 using Equation 7.43 for values of BA = 10, RL = 50 ?, r0 = 100 k?, Iu = 20?A and VCC = 2 V. Mathematically, the INL can be computed from Equation 7.43 as INLSE [k] = IuR 2 Lk(k?NA) r0 (7.44) This is equivalent to the single-ended INL derivation used by several authors and whose derivation can be found in Razavi?s popular converter design book [66] Principles of Data 201 -120 -100 -80 -60 -40 -20 0 0 200 400 600 800 1000 INL (LSB) Code Word (a) Single-Ended Thermometer-Coded INL -10 -5 0 5 10 0 200 400 600 800 1000 INL (LSB) Code Word (b) Differential Thermometer-Coded INL Figure7.8: INLCurvesforThermometer-CodedDACModelswithFiniteOutputImpedance Current Sources Conversion System Design. The worst case INL using Equation 7.44 is: INLSE,max = IuR 2 LN 2 4ro (7.45) Fortunately, the situation can be improved by using a differential DAC architecture. Current steering designs inherently differential so a close look at the INL of the architecture is important. Figure 7.9 shows a simple thermometer-coded DAC with a differential output. In the architecture a switch closes the Voutp wire when the ti bit value is one and closes to Voutm value when the ti bit value is zero. Using Equation 7.43 for the single ended analysis independently, Voutp[k] = roVCCr o +kRL ? kRLroIur o +kRL (7.46) Voutm[k] = roVCCr o + (NA?k)RL ? (NA?k)RLroIur o + (NA?k)RL (7.47) 202 roIu RL VCC RL VCC r0Iu Voutm Voutp roIu Figure 7.9: Simple Differential Thermometer Model The output is taken as the difference between Voutp and Voutm. Therefore, after algebraic manipulation, the output voltage is found to be: Vout[k] = Voutp[k]?Voutm[k] (7.48) = (NA?2k) (roRLVCC +Iur 2 oRL) (kNA?k2)R2L +roNARL +r2o (7.49) The INL can then be computed using Equation 7.48. Figure 7.8b shows the differential INL. Notice the improvement is significant. As stated in the previous section, the effect of INL on the output spectrum can be shown by driving a sinusoid through the transfer function. Figure 7.10a shows the effect on the single-ended DAC INL and Figure 7.10b shows the effect using the differential DAC INL. Note that the differential DAC has no even order harmonic distortion, whereas the single-ended DAC suffers from a large second order spur. From this analysis it is clear that DAC current source architectures should be chosen such that the output impedance is large. Some publications refer to the INL/DNL degradation due to changing output impedance from the DAC input code state as code dependent load variation (CDLV) [70]. 203 -100 -80 -60 -40 -20 0 0 0.1 0.2 0.3 0.4 0.5Normalized Output Sp ectrum (dB) Frequency (GHz) (a) Single-Ended Thermometer-Coded Spectrum -100 -80 -60 -40 -20 0 0 0.1 0.2 0.3 0.4 0.5Normalized Output Sp ectrum (dB) Frequency (GHz) (b) Differential Thermometer-Coded Spectrum 7.4 Dynamic DAC Performance One of the earliest papers dealing specifically identifying the causes of dynamic perfor- mance degradation from Van den Bosch et al. [71] captures many of the dynamic problems. 1. The imperfect synchronization of the control signals of the current switches. 2. The digital signal feed-through via the Cgd of the switch transistors. 3. The voltage variation at the drain of the current source transistors. 4. The variation in the output impedance of the current sources. In addition to these, one of the other major issues is inter-symbol interference. Each of these will be briefly described before offering suggested solutions. Non-linearities from the finite output impedance of CS DACs have been carefully ana- lyzed by several authors. One of the better works discussing the problems arising from the dynamic effects of a frequency dependent finite output impedance are provided by Lin et al. [72]. Lin designed a 2.9 GS/s DAC in a 65 nm CMOS process with excellent linearity. Small feature size CMOS generally does not provide a large output impedance at high frequencies, 204 when compared to the latest SiGe or InP bipolar devices and thus the design is remark- ably interesting. If the ro from Section 7.3.3 is replaced by an complex impedance and the CL of the load is not ignored, then the same non-linearities experienced with static output impedance mismatches apply to the dynamic case. If the frequency of the synthesized tone is large, thenZo andZL become small and there is a significant degradation in the performance of the DAC [71]. Mismatches from process variation, or out-right nominal static timing errors, in delay of a signal path can cause harmonic distortion and spurs. The mismatch creates a glitch at the output of the DAC from an off-timing transition. If the DAC is generating a periodic signal, then this mismatch occurs in a periodic manner, generating a spur. Intersymbol inteference (ISI) describes the phenomenon of a previous DAC code word (symbol) affecting the output characteristics of the current DAC code word. Ideally, the output of the DAC would only be dependent on the current code word. Three significant ISI causes are as follows: 1. The data value of a current switch is dependent on previous states due to the switches themselves not recovering to a memoryless state in a given amount of time. 2. Dependency of the connected bias circuitry on the DAC code word. 3. Dependency on the output voltage of the DAC on the switching. An example of when (2) becomes an issue is when the current source transistors are influenced by the switching be action of the current steering transistors. If the tail current does not return to its nominal operating state before transitioning to the next state, then an code dependent effect will be observed. This effect might be observed when operating near the frequency limits of the technology (i.e. the current source is simply not ?fast? enough) or through an improperly designed current source (e.g. the switching pushes transistors into saturation, which can take a significant amount of time to recover). 205 Figure 7.10 shows an output glitch dependent on the device size of the switches of the current source. This is related to the charge feedthrough described in [64], but it is also simply a function of not adjusting the driving cells of the current switches for scaling. Figure 7.10: Glitch Versus Device Size (1 ?m to 10 ?m) -0.4 -0.2 0 0.2 0.4 0.46 0.48 0.5 0.52 0.54 Output Voltage (V) Time (ns) Now that the main static and dynamic sources of error have been presented, DAC and current steering architectures will be presented that help mitigate these errors. 7.5 DAC Architectures Current steering DACs can be divided into categories based on the architecture chosen in the design. This section provides a brief overview of several important DAC architectures thatshouldbeconsideredwhendesigningaCSDAC.ThetwomaintypesofDACsconsidered forCSarebinary-weightedDACsandthermometer-codedDACs. AnBA-bitbinaryweighted CS DAC consists of BA scaled current sources. The simplest such current steering design scales the transistors through in DAC by a binary weight. 7.5.1 R-2R DACs The classic R-2R DAC is generally presented in an operational amplifier configuration, but an analog exists for CS architectures [73][74]. In [73], an R-2R ladder network is used in 206 the design of a 10-bit DAC in a bipolar process. The main motivation for the architecture is to avoid the challenging issue of scaling resistors to achieve binary weighting. Figure 7.11 shows a differential version (as [73] drove the signal single-ended into an operation amplifier) of the architecture. Note that the resistor network is formed at the emitters of the current sources. Also observe that the emitter area of the devices must be scaled with Figure 7.11: R-2R with Binary Scaling (Emitter Network) Q0 RL VCC RL VCC R R 2R Q1 R 2R Q2 R 2R Q3 2R QnVb the binary weighting. So this architecture, while it relaxes the resistor sizing requirements, still suffers from a difficult NPN device scaling problem as the number of bits in the design grows large. This particularly a problem at high speeds, as the minimum size device must have a sufficient amount of current to operate at the target frequency. The metric used as a measure of the operating speed of the device at a specific current density is the unity-gain bandwidth product. For an HBT or bipolar transistor, this is (to the first order) [75]: fT = 12pi gmC pi +C? (7.50) 207 where Cpi and C?, the Miller capacitance, can be found in the hybrid-pi model of a bipolar transistor. gm is the transconductance of the transistor and is linearly related to the current through the device (Equation 7.82). The authors [74] introduce an alternative R-2R. The language used to distinguish be- tween the two types of R-2R circuits are binary attenuation (the architecture proposed by the author) and binary scaling, the previously described R-2R architecture. The naming convention is borrowed for this work. In the binary attenuation architecture, the devices Figure 7.12: R-2R with Binary Attenuation (Collector Network) Q0 RR RERE QBA?2 2R2R RE QBA?1 2R2R Vb RR RL RR RL VCC need not be scaled to achieve current scaling at the output. The currents driven by the sources are attenuated through a resistor network to achieve binary weighting. The R-2R ladder is located at the output of the DAC, which are the collectors of the transistor current cell switches, and divides the output current down as shown in Figure 7.12. The advantages of the binary scaling architecture are: ? The architecture requires half the number of resistors when compared against the binary attenuation architecture in a differential DAC setting. This is because the R-2R division must occur for both outputs. 208 ? Matching between resistors in the network result in common mode INL distortion in a differential architecture. ? The current through the R-2R network is mostly constant and therefore does not suffer fromtemperaturechangesbasedonDACstate. ModernhighspeedDACdesignsrarely mention this, as the temperature time constant is much, much less than the switching speed of the DAC. The mismatch from device mismatch and timing will likely produce error long before that generated by temperature gradients. We also note that small feature sizes allow components to be placed in close proximity to each other, thus allowing a more uniform heat distribution across the R-2R ladder network. The benefits of a fewer number of resistors is lessened because the devices must be scaled with increasing weight. The number and size of the devices can dominate the area of the resulting DAC. The advantages of the binary attenuation architecture are: ? The devices do not need to be scaled with binary weighting. This is significant, as larger or more devices result in higher parasitics which is of critical importance in high speed designs. ? A simpler emitter generation that is more easily matched between current sources. In small geometry designs, the metal routing can cause significant (where significant depends on the resolution of the DAC) voltage drops. ? Scaling the current through an active current source decreases the output impedance. Finite output impedance of current sources in major contributor to [72],[67],[76]. In Section 7.3.3, the effects of finite output impedance was looked at more closely. A pure R-2R DAC, regardless of the architecture chosen, is a binary-coded DAC. 7.5.2 Thermometer Coded and Segmented DACs In binary-coded DACs, each control source is weighted by a factor of two, as discussed in Section 7.5. The implementation of such a DAC is efficient in that only as many active 209 components as necessary are switched when a code word changes. However, binary DACs are highly susceptible to mismatch errors. Binary DACs also suffer from non-monotinicity in the presence of device mismatch. The jump typically happens on the bit boundary to the next power of two bit, for instance from code 7 (3?b0111) to code 8 (3?b1000). To address this issue, designers have introduced thermometer-coded DAC architectures [77] [78] [79]. In fact, it would be more suprising to find a modern CS DAC architecture that did not have thermometer coded DAC as part of the design. In these designs, the value of the input code word A[n] determines the number of switches to close. Ideally, a designer would like all the benefits of thermometer-coded and binary-coded DACs simultaneously without suffering any of the drawbacks. One way to balance the area and speed benefits of binary-coded DACs with thermometer-coded DACs is to segment the designintoportions. Figure7.13showsagenericsegmentedarchitecturesusingthermometer- coded current switches for the MSBs and a binary attenutation R-2R ladder for the LSBs. QB0 RR RERE QBM?2 2R2R RE QBM?1 2R2R RE QT0 RE QTN?1 Vb0 RR RL RR RL VCC Vb1 Therm. Binary Figure 7.13: Segmented R-2R Binary with Thermometer MSBs 210 7.5.3 Return-to-Zero (RTZ) A common technique used to mitigate the impact of ISI is and sometimes to reduce in the impact of sinc roll-off at higher frequencies. The technique can also be used to take signals from higher Nyquist zones. Figure 7.2a shows a NRTZ DAC output and Figure 7.2b shows an RTZ DAC output with 50% duty cycle. Though the RTZ technique has been used extensively in DAC design for a significant period of time, it surprisingly does not show up in many academic DAC publications. Table 7.1 is a small collection of RTZ DACs in literature. Table 7.1: Published RTZ DACs Publication Year Frequncy (GHz) SFDR (dBc) [80] 2005 1.6 70 [81] 2011 1.6 66 [4] 2012 7.2 80 [4] 2012 12 67 Compare these results to the NRTZ DAC listed in the Table 7.2 below. Outside of several low frequency DACs (where one would not have issues with ISI), there is a clear performance improvement over a majority of the NRTZ cases. The requirements for an inverse sinc filter are also relaxed as has already been discussed. RTZ addresses ISI by forcing the output of the DAC to a memoryless state (i.e. zero) before applying the next codeword. ThiscanclearlybeseeninFigure7.2b. ThispreventstheDACfromtransitioning from a code dependent state, which is obviously have CDLV effects. RTZ also address the charge feedthrough problem from data switching. This is because the data is allowed to change while the output is at a zero state. This is illuminated further in Section 7.6. 7.5.4 Translinear Output Buffers and Non-Linear DACs Though the issues of implementing a high speed phase accumulator has been addressed in Chapter 5, oftentimes the one wishes to avoid the ROM compression circuitry or the 211 Source SFDR (Low) SFDR (Nyq.) Area (mm2) Power (mW) fs (MHz) [82] 58 (9.6 MHz) N/A 5.000 730 1000 [83] 56 (3.9 MHz) N/A 1.800 150 125 [84] 49 (8.0 MHz) N/A 1.220 140 75 [78] 87 (2.0 MHz) 71 16.00 650 100 [85] 73 (8.0 MHz) 55 0.600 125 500 [86] 71 (1.0 MHz) 55 3.200 320 300 [57] 61 (5.0 MHz) <50 13.10 300 150 [87] 70 (100.0 MHz) 61.2 0.350 110 1000 [64] 78 62 1.130 216 500 [88] 75 63 30.60 6000 1200 [72] 74 52 0.310 188 2900 [89] 76 61 1.000 97 200 [90] 67 N/A 2.500 400 1400 [91] 95 (1 MHz) <59 0.440 82 320 [92] 71.68 (1 MHz) 43 0.800 25 250 [93] 64 (1 MHz) <40 1.000 20 100 [94] 82 72 11.83 180 100 [95] 98 (10 MHz) 74 1.950 400 400 [96] 60 (1 MHz) N/A 0.230 N/A 800 [79] 80.7 (1 MHz) 80.7 0.280 N/A 10 [97] 47.3 (30 MHz) 36.2 0.200 29 3000 [33] 50 (91.7 MHz) 45 4.200 4800 8600 Table 7.2: SFDR of NRTZ DACs complex multiplexer tree. Directly generating the high speed phase with a pipeline phase accumulator avoids the multiplexer tree entirely. The size and area requirements for ROMs that operate at ultra-high frequencies proved prohibitive. This has lead to the creation of non-linear DACs [98, 33]. In these DACs, the current sources are generally ?sine-weighted?, such that a linear ramp through the DACs bits generates a sinusoidal output. An alternative technique is run the phase output through a linear DAC that then drives a translinear device for sinusoidal generation. The idea of using a non-linear device to transform a linear output to a sinusoid is not new in DDFS literature; however, a recent DDFS by Yang et al. demonstrates remarkably good high speed performance at low powers [99]. Before delving into the more recent implementation, a review of earlier literature assists 212 Figure 7.14: Differential Pair VCC RL Q1 R i Q2 RL VCC IT IT vip vim+ VBE1- + VBE2- vop vom in the development. In 1976, Meyer et al. [100] used a differential pair as a triangle to sine wave converter. Figure 7.14 shows the architecture used by Meyer for his triangle-to-sine conversion analysis. The goal is to approximate a sinusoidal output at the terminals of the differential pair given an triangular input using the physical properies a bipolar transistor, i.e. vod = vop?vom = a1 sin (a2vid) (7.51) where vid represent the differential input voltage vip?vim where a1 and a2 are two linear coefficients that do not affect the spectral purity of the generated signal. Observing Figure 7.14, becomes clear that the output can be written as a function of the current i through the resistor R. Firstly, Ohm?s law is used to find the relationship of the collector current through Q1 and Q2 to the output of the differential pair. vop = VCC?RLIC1 (7.52) vom = VCC?RLIC2 (7.53) vod = RL (IC2?IC1) (7.54) 213 where RL the resistive load and VCC is the supply voltage. The emitter current is related to the collector current of the transistor through the relationship IE = IC + IC? F = IC parenleftBigg ? F 1 +?F parenrightBigg = ?FIC (7.55) where?F is the forward gain of a bipolar transistor and?F = ?F/(1+?F) is commonly used in microelectronics texts [101]. If ?F is sufficiently high, as is the case in SiGe HBTs, then IC ?IE as ?F ? 1. In this particular analysis, ?F is kept throughout the analysis, which differentiates it from Meyer?s analysis. This analysis also applies a differential input voltage, as opposed to driving the differential pair single-ended. vod = 1? F (IE2?IE1) (7.56) Applying Kirchoff?s Current Law (KCL), it is clear thatIE1?i?IT = 0 andIE2 +i?IT = 0. Adding IT to both sides of the equality yields Equation 7.57. IE1 = IT +i (7.57) IE2 = IT?i (7.58) Substituting Equation 7.57 into Equation 7.56, vod = 1? F [(IT?i)?(IT +i)] =? 2? F i (7.59) Thus the output of the differential pair is a linear function i. Consider Kirchoff?s Voltage Law (KVL) about the base-emitter pairs for solving for i. ?vip +VBE1 +iR?VBE2 +vim = 0 (7.60) vid = VBE1 +iR?VBE2 (7.61) 214 The base-emitter voltage can be written as a function of the collector current as shown in Equation 7.62 VBE = VT ln parenleftbiggI C IS parenrightbigg (7.62) where IS is the transport saturation current of the Gummel-Poon model [75] and VT is the thermal voltage defined in Equation 7.83. The equation assumes that the forward Early volt- age, VA, of the device is infinite (i.e. the bipolar transistors have infinite output impedance, which is certainly not a valid assumption for high output frequencies). Using this relation- ship, the large signal transfer function of the differential pair can be derived. Substituing Equation 7.62 into Equation 7.60 yields the following equation vid = iR +VT bracketleftbigg ln parenleftbiggI C1 IS parenrightbigg ?ln parenleftbiggI C2 IS parenrightbiggbracketrightbigg (7.63) = iR +VT ln parenleftbiggI C1 IC2 parenrightbigg (7.64) where the subtraction (addition) property of logarithms is used, ln(a)?ln(b) = ln (a/b). Substituting Equation 7.57 and Equation 7.55 into Equation 7.63 vid VT = iR VT + ln bracketleftbiggparenleftbiggI T +i ?F parenrightbiggparenleftbigg ? F IT?i parenrightbiggbracketrightbigg (7.65) = iRV T + ln parenleftbiggI T +i IT?i parenrightbigg (7.66) Applying the Taylor series in the neighborhood of i = 0 yields the series ln parenleftbiggI T +i IT?i parenrightbigg = 2 bracketleftBigg i IT + i3 3I3T + i5 5I5T +??? bracketrightBigg (7.67) = 2 ?summationdisplay n=0 i2n+1 (2n+ 1)I2n+1T (7.68) Now the desired transfer function of i as a function of vid is: i = b1 sin (b2vid) (7.69) 215 where b1 and b2 are some constants. Thus applying the inverse sine operation to both sides yields b2vid = sin?1 parenleftbigg i b1 parenrightbigg (7.70) Applying the Taylor series expansion on the inverse sine function for i in the neighborhood of i = 0 yields: b2vid = ib 1 + 16 parenleftbigg i b1 parenrightbigg3 + 340 parenleftbigg i b1 parenrightbigg5 +??? (7.71) Finding b1 and b2 in Equation 7.71 such that the error between it and Equation 7.65 is minimized yields the final result b1 = IT (7.72) b2 = 1V T parenleftBigg 1 ITR/VT + 2 parenrightBigg (7.73) The resulting output is a triangle wave (or sine wave with large odd harmonic terms). But one can outperform a single differential pair with a few more transistors. Using the Pad? approximant, one can generate a rational function of low degree poly- nomials that approximates transcendental functions such as sine or cosine quite well [102]. Equation 7.74 defines the Pad? polynomial approximation of a real function f. f (x)? ?fp (x) = summationtextm j=0ajx j 1 +summationtextnk=1bkxk (7.74) 216 where the first m+n derivatives of the function f are equal to the approximation ?fp, f(0) = ?fp(0) (7.75) fprime(0) = ?fprimep(0) (7.76) f(m+n)(0) = ?f(m+n)p (0) (7.77) Note that this approximation is closely related to the Maclaurin series of the function f and in fact the Pad? approximant often uses the Taylor series during its derivation. Now consider the following approximations for sinusoidal function. 1. Using the Pad? technique to approximate the sin function, we get ?sin (pix)?x(1?x 2) 1 +x2 (7.78) 2. Using the Pad? technique to approximate the cosine function, we get cos (pix)? (1?4x 2) (2?x2) 2 +x2 (7.79) Equations7.78and7.79areversionsofthePad?approximationwithcoefficientsrounded to the nearest integer. Figure 7.15 is to help visualize the performance of the Pad? approxi- mation against a more commonly used function the Taylor series approximation. The Pad? approximant is interesting for sinusoidal approximation for the following two reasons: ? Division is more easily implemented in a translinear circuit than a high order polyno- mial [102]. ? A third order Pad? approximant is roughly as complex to implement as a third order Taylor Series approximant with translinear circuit [102]. 217 Figure 7.15: Pad? Sine Approximation 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 Normalized Phase Absolute Error (P ercen t) Pad? (3rd)Taylor (3rd) Taylor (5th) Using the synthesis techniques described in [102], two translinear implementations of the Pad? approximations were realized. Figure 7.16a and Figure 7.17 show ideal translin- ear cicuits for the sine and cosine approximations respectively. Figure 7.16b shows a full transistor implementation of the translinear sine operation. A novel quadrature DDFS architecture has been proposed to take advantage of the translinear output buffers described thus far in this section. Figure 7.18 provides a block diagram of the proposed DDFS. The design requires a current after the output of of the DAC on the cosine path, since the transfer function of the Pad? approximations have been normalized. In practice, the output magnitude of the sine circuit of Figure 7.16a and the cosine circuit of Figure 7.17 are different. 7.6 Current Steering Cell Architectures One of the critical decisions for designing a current steering DAC is selecting an archi- tecture for the current steering cells that comprise the DAC. The output impedance of the DAC, ISI and sampling rate strongly depend on the performance of this single cell. In this section, several current steering cell architectures are analyzed and the problems addressed, 218 1+x Q1 Q2 Q3 Q4 Q5 Q6 Q7Q8 1?x 2 2 + -?sin(pix) (a) Ideal Current Sources 1+x Q3 Q5 Q7 Q9 Q1 Q2 Q4 1?x Q6 Q8 Q10 IO?=2?zIO+ =2+z (b) Transistor Implementation Figure 7.16: Translinear Sine Implementations or raised, by each architecture are presented. The design decisions of this component center on trade-offs between performance, complexity, area and power. The most primitive cell for a differential current steering cell is a three transistor dif- ferential pair. The current source transistor has no degeneration and there is no cascoding at any level. Figure 7.19a shows a schematic for the simple current steering cell. QSW1 and QSW2 are the current steering transistors, RL is the load resistor of the DAC and QCS is the current source transistor. The current sourced by QCS is steered through QSW1 and QSW2 depending on which transistor is switched on. The value Vsp and Vsm are the differential data driving signals with quick transistion times. The differential signals transistion in such a way as to keep the time that both transistors are active relatively small in comparison to the time the data is held. The simple architecture has several advantages: ? Low power since the power supply voltage can be low. ? Small area since the current cell only requires three transistors to implement. The drawbacks for this architecture are quite significant unfortunately. 1. The current source output impedance is low and susceptible data changes. 219 1?x Q5 Q7 Q9 Q8Q10 Q1Q3 Q2 Q4 Q6 2 2+x + ?cos(pix) ? 6(1?x) 3 6(1?x) Figure 7.17: Differential Translinear Cosine Implementation (Ideal Current Sources) FCW Translinear Sine Translinear Cosine Scale Current Time Delay Truncate DAC(10Bit) PhaseAccumulator PhaseRegister summationtext 2424 24 Figure 7.18: Quadrature Translinear DDFS 2. The voltage across the steering transistors is dependent on the output voltage of the DAC. 3. Glitches that capacitively feed through the switching pair are data dependent. The high speed data switching signal causes the tail current through QCS to change, since QCS has a finite output impedance. At first glance, this may appear to be a common-mode effect and therefore eliminated when looking at the output differentially. However, with this configuration, this is not the case. When the glitch in the tail current occurs, it is reflected disproportionately through the active transistor of the current steering pair. The deactivated side is still in the process of activating and thus a majority of the tail current 220 QCSVb QSW1 Vm RL VCC QSW2 VpRL VCC Vsp Vsm (a) Simple Current Steering Cell QCS RE Vb QSW1 Vm RL VLOAD QSW2 VpRL VLOAD Vsp Vsm (b) Simple Current Steering Cell with Degenera- tion Figure 7.19: Simple Current Steering Cells fluctuation appears on a single side of the differential pair. This particular drawback is only important when the glitch power becomes significant with respect to the output of the DAC. For instance, if the DAC is clock slowly for a given process, the glitch will only appear for a tinyfractionoftheoutputcode. Significanceisalsodictatedbytherequiredeffectivenumber of bits (ENOB) for the DAC. For a DAC that operates near the limits of a technology, we will argue that this is important. While we are not concerned with the bias structure of the DAC at this point, large fluctuations in the tail current of QCS will also propagate on the bias line Vb. If multiple current cells are tied to the same bias node, then the current cells have a negative, code dependent interaction. As rate of code changes are related to the signal being converted, this produces a non-linear, output frequency dependent distortion. Improving the output impedance of the current source mitigates this concern. As a reference, the small-signal output impedance of the current source of QCS is approximately ro?VAI C (7.80) 221 where VA is the Early voltage of the transistor and IC the collector current. The size of the glitch at the emitters of the switching resistors is dependent on this value. A simple step that may be used to improve the finite output impedance of a the DAC is adding a resistor to the emitter of QCS as shown in Figure 7.19b. This resistor is called a degeneration resistor and improves the output impedance of the DAC. Equation 7.81 approximately ro,dg = ro (1 +gmRE) (7.81) where RE is the value of the degeneration resistor and gm is the transconductance of QCS. The transconductance gm is: gm = ICV T (7.82) where IC is the collector bias current through the transistor and VT is the thermal voltage and is defined as VT = kTq (7.83) where k?1.3896593?10?23 J/K is Boltzmann?s constant and q?1.602176565?10?19 C is the elementary charge constant. At room temperature, Tr = 27? C, thermal voltage is roughly VT ? 0.026 V. As IC = 1 mA is a reasonable value for biasing the current source and RE = 200 ? is a reasonable value for the degeneration resistor, the output impedance of the current source is improved by?7.5 times. The drawback, though small, is that the supply voltage of the DAC must be increased to account for the voltage drop across the resistor. Resistors also require a non-negligible amount of area. The output impedance can be improved further by adding a cascode tran- sistor to the current source. Figure 7.20a introduces the cascode transistor QCA1. We first consider adding the cascode without the degeneration transistor. In that case, ro,ca = ro parenleftBigg 1 + ?0gmro? 0 +gmro parenrightBigg ??0ro, gmrogreatermuch?0 (7.84) 222 where?0 is the current gain ofQCA1. In SiGe HBT processes, the current gain can be on the around 200 [103]. Using IC = 1 mA as the standard, this yields ro,ca ? 200ro. Adding the degneration resistor only improves this situation further by replacing ro with Equation 7.81. Using the same 200 ? resistor and 1 mA collector current, this results in ro,ca,dg?1400ro. QCS QCA1 RE Vb1 Vb2 QSW1 Vm RL VLOAD QSW2 VpRL VLOAD Vsp Vsm (a) Current Steering Cell with Cascode Current Source QCS QCA1 RE Vb1 Vb2 QSW1 QCA3 VCA Vm RL VLOAD QSW2 QCA2VCA VpRL VLOAD Vsp Vsm (b) Current Steering Cell with Cascode Output Figure 7.20: Current Steering Cells with Cascoding Another dramatic improvement to the current source architecture can be achieved by adding a cascode transistor to the output of the switching transistors. Figure 7.20b provides an example of such a configuration. This particular configuration allows the switching tran- sistors to drive a low impedance output, the emitter of the cascode transistors QCA2 and QCA3. As shown in some of the DAC architectures of Section 7.5, in particular thermometer coded segments, can have the load of tens of transistors tied to the same node. This load impedance slows the performance of the DAC switches dramatically, but by adding the cas- code transistors, the impedance is isolated from the switching transistors. Furthermore, the transistors QCA2 and QCA3 fix the voltage variation across the switches across all DAC code 223 choices to a few hundred millivolts. Otherwise the DAC output voltage is directly applied across the terminals of the switching transistors. A new problem arises though from adding the cascode at the top of the transistors shown in Figure 7.20b. When the current is steered away from the cascode transistor, the bias current through the non-active switch cut the current off the cascode. This causes a delay from the switch to the output of the DAC, as the cascode must change from operating in an inactive region to being fully biased. To address this shortcoming, adding keep-alive transistors to the output cascode as shown in Figure 7.21 will keep those transistors from shutting off completely. The drawback of course is higher output power. This trade-off is very often worth the increased performance. Consider the performance of [72] or [4], which show some of the best published DAC results to date. Q1 Q2 RE Vb1 Vb2 QSW1 Q KL1 RKL QCA1 VCA Vm RL VLOAD QSW2Q KL2 RKL QCA2VCA VpRL VLOAD Vsp VsmV b3 Vb4 Figure 7.21: Current Steering Cell with Cascode Output and Keep Alive All the techniques described thus far have done little to improve the non-linear glitching from data switching or intersymbol interference. Both of these become considerable concerns as the frequency of the process extends higher. As has already been discussed in Section 224 Q1 Q2 RE Vb1 Vb2 QSW1 QRZ1QRZ2 Q KL1 QKL3 RKL QCA1 VCA Vm RL VLOAD QSW2 QRZ3QRZ4Q KL2 QKL4 RKL QCA2VCA VpRL VLOAD Vsp Vsm Vclkm V clkpVclkp VDUMP Figure 7.22: Current Steering Cell with Cascode, Keep Alive and RTZ 7.5.3, using an RTZ architecture addresses both problems. Figure 7.22 Combines all of the techniques discussed thus far, including an RTZ switching quad. This RTZ pair happens to incidentally further improve the isolation of the output from the data switches by acting as an extra cascode stage to the data switches. The authorbelieves that constructinga DACusing segmented R-2R binary attenutation architecture with the current switch shown in Figure 7.22 would dramatically improve the dynamic performance of the DACs being designed at Auburn University. Many of the DACs at Auburn, including the CMOS design discussed in Section 6.6, have dramatic decreases in performance when synthesizing high frequency signal, where high is relative to the sampling frequency of the DAC. The techniques described in this section provide a path to mitigate dynamic degradation effects before even considering calibration. 225 Chapter 8 Conclusions In this work, an exact derivation for the spurs generated by phase truncation error in a phase accumulator were calculated using elementary number theory. The spectral theory was developed from binary unsigned arithmetic to the final computations of the discrete Fourier transform of the truncated phase sequence. The theory replaces the commonly cited work by Nicholas [23] and the less commonly cited work from Torosyan [31]. The particular derivation is well suited for teaching DDFS engineers both qualitatively and quantitatively the origin of phase truncation spurs and would fit well in a textbook on the topic. A novel parallel phase accumulator with linear frequency modulation was introduced and its analysis on the size of the DCDO was presented. It was compared to other parallel accumulators in patent literature that perform a similar operation. In processes CMOS and BiCMOS processes with feature sizes less than or equal to 130 nm, the author argues that every DDFS design should be parallelizing the phase accumulator. This approach removes the need for non-linear DAC implementations entirely and allows designers to focus on the components that are actually limiting the performance of DDFS systems (i.e. DACs). Lastly the DDFS systems designed at Auburn University by the author are presented, culminating in the quadrature DDFS used in the X-band radar-on-a-chip design that was fabricated in a 130 nm BiCMOS process. A revision of the system correctly the errors found during testing was developed to final GDSII form but the team at Auburn University has since taken jobs making testing impractical. The design has not been submitted for fabrication at the time of this writing. Thereisasignificantopportunityforfutureworkderivedfromthisthesis. Firstly, imple- menting the modified accumulators described in Section 4.7.1 and Section 4.7.2 with low-cost 226 FPGA from Xilinx feeding a low-cost DAC demonstration board from Analog Devices would allows for physical verification of the theory, since the theory was only numerically verified in this work. Secondly, use of the theory to modify fully explain the spectral analysis behavior of the output response analyzer and subsequently implementing a new variable state phase accumulator would prove interesting. The new accumulator described could also be used to develop a very fine frequency resolution DDFS. The math fully developing the list of all acquirable frequencies also makes for exciting analysis. Either one of these tasks, if properly built from the work described in this dissertation would be feasible for a master?s student to perform. The first could even be accomplished by a senior project for an undergraduate student if proper components were supplied. An exact analysis of the spectrum of a partial dynamic rotation CORDIC would require significant undertaking but could also lead insights into the devices behavior (and potentially techniques to improve it). The author believes that the CORDIC output stages can actually be used as an ?error correction? stage at the output of a highly compressed ROM. To the author?s knowledge, no one has ever taken a BTM or MTM LUT as the seed for a partial dynamic rotation CORDIC. Lastly, thetheorydevelopedinChapter4shouldbeusedintheanalysisofothersystems where truncation occurs, such as a fractional-N synthesizer. Some of the theory used in calculating the original phase truncation sequences property may also be used in analyzing the sequences generated by LFSR or ?? modulators. Also, a more abstract, compact analysis producing the same results as this work would also be instrumental in the field. 227 Bibliography [1] L. K. Tan, E. Roth, G. Yee, and H. Samueli, ?An 800-MHz quadrature digital syn- thesizer with ECL-compatible output drivers in 0.8 ?m CMOS,? IEEE Journal of Solid-State Circuits, vol. 30, no. 12, pp. 1463 ?1473, Dec. 1995. [2] A. Yamagishi, M. Ishikawa, T. Tsukahara, and S. Date, ?A 2-V, 2-GHz low-power directdigitalfrequencysynthesizerchip-setforwirelesscommunication,?IEEE Journal of Solid-State Circuits, vol. 33, pp. 210?217, 1998. [3] B.-D. Yang, J.-H. Choi, S.-H. Han, L.-S. Kim, and H.-K. Yu, ?An 800-MHz low-power direct digital frequency synthesizer with an on-chip D/A converter,? IEEE Journal of Solid-State Circuits, vol. 39, no. 5, pp. 761?774, 2004. [4] F. Van de Sande, N. Lugil, F. Demarsin, Z. Hendrix, A. Andries, P. Brandt, W. An- klam, J. S. Patterson, B. Miller, M. Rytting, M. Whaley, B. Jewett, J. Liu, J. Wegman, and K. Poulton, ?A 7.2 GSa/s, 14 Bit or 12 GSa/s, 12 Bit Signal Generator on a Chip in a 165 GHz fT BiCMOS process,? IEEE Journal of Solid-State Circuits, vol. 47, no. 4, pp. 1003?1012, 2012. [5] T. Nagasaku, K. Kogo, H. Shinoda, H. Kondoh, Y. Muto, A. Yamamoto, and T. Yoshikawa, ?77GHz low-cost single-chip radar sensor for automotive ground speed detection,? in Proc. IEEE Compound Semiconductor Integrated Circuits Symp. CSIC ?08, 2008, pp. 1?4. [6] Y.-A. Li, M.-H. Hung, S.-J. Huang, and J. Lee, ?A fully integrated 77GHz FMCW radar system in 65nm CMOS,? in Proc. IEEE Int. Solid-State Circuits Conf. Digest of Technical Papers (ISSCC), 2010, pp. 216?217. [7] J. Rogers, C. Plett, and F. Dai, Integrated Circuit Design for High-Speed Frequency Synthesis. Artech House, 2006. [8] M. Skolnik, Radar Handbook, 3rd ed. McGraw Hill, 2008. [9] J. Tierney, C. Rader, and B. Gold, ?A digital frequency synthesizer,? IEEE Transac- tions on Audio and Electroacoustics, vol. 19, no. 1, pp. 48?57, 1971. [10] A. Torosyan and A. N. Willson, ?Exact analysis of DDS spurs and SNR due to phase truncation and arbitrary phase-to-amplitude errors,? in Proc. IEEE Int. Frequency Control Symp. and Exposition, 2005. 228 [11] D. D. Sarma and D. W. Matula, ?Faithful bipartite rom reciprocal tables,? in Proceed- ings of the 12th Symposium on Computer Arithmetic, 1995, p. 17. [12] F. de Dinechin and A. Tisserand, ?Multipartite table methods,? IEEE Transactions on Computers, vol. 54, no. 3, pp. 319?330, 2005. [13] J. Qin, ?Selective spectrum analysis and numerically controlled oscillator in mixed- signal built-in self-test,? Ph.D. dissertation, Auburn University, December 2010. [14] G. E. Shilov, Elementary Real and Complex Analysis. Dover Publications, Inc., 1973. [15] R. F. Lax, Modern Algebra and Discrete Structures. Addison-Wesley Educational Publishers Inc., 1991. [16] A. Torosyan, ?Direct digital frequency synthesizers: Complete analysis and design guidelines,? Ph.D. dissertation, University of California, Los Angeles, 2003. [17] J. F. Wakerly, Digital Design Principles and Practices, 3rd ed. Prentice Hall, 2001. [18] J. R. Barry, E. A. Lee, and D. G. Messerschmitt, Digital Communication. Springer, 2003. [19] A. Devices, ?1 GSPS, 14-Bit, 3.3V CMOS direct digital synthesizer,? 2012. [20] ??, ?3.5 GSPS direct digital synthesizer with 12-bit DAC,? 2012. [21] J. Qin, J. D. Cali, B. F. Dutton, G. J. Starr, F. F. Dai, and C. E. Stroud, ?Selec- tive Spectrum Analysis for Analog Measurements,? IEEE Transactions on Industrial Electronics, vol. 58, no. 10, pp. 4960?4971, October 2011. [22] J. Yu, F. Zhao, J. Cali, D. Ma, X. Geng, F. F. Dai, J. D. Irwin, and A. Aklian, ?A Single-Chip X-band Chirp Radar MMIC with Stretch Processing,? in CICC, 2012, pp. 1?4. [23] H. T. Nicholas and H. Samueli, ?An analysis of the output spectrum of direct digital frequency synthesizers in the presence of phase-accumulator truncation,? in Proc. 41st Annual Symp. Frequency Control. 1987, 1987, pp. 495?502. [24] Y. C. Jenq, ?Digital spectra of nonuniformly sampled signals. ii. Digital look-up tun- able sinusoidal oscillators,? IEEE Transactions on Instrumentation and Measurement, vol. 37, no. 3, pp. 358?362, 1988. [25] S.Mehrgardt, ?Noisespectraofdigitalsine-generatorsusingthetable-lookupmethod,? IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 31, no. 4, pp. 1037?1039, 1983. [26] Y.-C. Jenq, ?Digital spectra of nonuniformly sampled signals: fundamentals and high- speed waveform digitizers,? IEEE Transactions on Instrumentation and Measurement, vol. 37, no. 2, pp. 245?251, 1988. 229 [27] Y. C. Jenq, ?Digital spectra of nonuniformly sampled signals: theories and applications-measuring clock/aperture jitter of an A/D system,? IEEE Transactions on Instrumentation and Measurement, vol. 39, no. 6, pp. 969?971, 1990. [28] Y.-C. Jenq, ?Digital spectra of nonuniformly sampled signals: a robust sampling time offsetestimationalgorithmforultrahigh-speedwaveformdigitizersusinginterleaving,? IEEE Transactions on Instrumentation and Measurement, vol. 39, no. 1, pp. 71?75, 1990. [29] A. Torosyan and J. Willson, A. N., ?Analysis of the output spectrum for direct digital frequency synthesizers in the presence of phase truncation and finite arithmetic preci- sion,? in Proc. 2nd Int. Symp. Image and Signal Processing and Analysis ISPA 2001, 2001, pp. 458?463. [30] U. Dudley, Elementary Number Theory. Dover Publications, Inc., 1978. [31] A. Torosyan, D. Fu, and J. Willson, A. N., ?A 300-MHz quadrature direct digital synthesizer/mixer in 0.25-?m CMOS,? IEEE Journal of Solid-State Circuits, vol. 38, no. 6, pp. 875?887, 2003. [32] K. Doris, A. van Roermund, and D. Leenaerts, Wide-Bandwidth High Dynamic Range D/A Converters. Springer, 2010. [33] X. Geng, F. Dai, J. Irwin, and R. Jaeger, ?An 11-bit 8.6 GHz direct digital synthesizer MMIC with 10-bit segmented sine-weighted DAC,? Solid-State Circuits, IEEE Journal of, vol. 45, no. 2, pp. 300?313, feb 2010. [34] S. Turner and D. Kotecki, ?Direct Digital Synthesizer with Sine-Weighted DAC at 32- GHzClockFrequencyinInPDHBTtechnology,? IEEE Journal of Solid-State Circuits, vol. 41, no. 10, pp. 2284?2290, oct 2006. [35] A. Gutierrez-Aitken, J. Matsui, E. Kaneshiro, B. Oyama, D. Sawdai, A. Oki, and D. Streit, ?Ultrahigh-speed direct digital synthesizer using inp dhbt technology,? IEEE Journal of Solid-State Circuits, vol. 37, no. 9, pp. 1115 ? 1119, sep 2002. [36] X. Yu, F. F. Dai, J. David Irwin, and R. Jaeger, ?A 9-bit Quadrature Direct Digital Synthesizer Implemented in 0.18-?m SiGe BiCMOS Technology,? Microwave Theory and Techniques, IEEE Transactions on, vol. 56, no. 5, pp. 1257?1266, may 2008. [37] S. Pellerano, S. Levantino, C. Samori, and A. Lacaita, ?A 13.5-mw 5-ghz frequency synthesizer with dynamic-logic frequency divider,? Solid-State Circuits, IEEE Journal of, vol. 39, no. 2, pp. 378?383, 2004. [38] R. H. A. W. Kovalick, ?Waveform synthesis using multiplexed parallel synthesizers,? USA Patent 4,454,486, June, 1984. [39] B.-G.Goldberg, ?Digitalfrequencysynthesizerhavingmultipleprocessingpaths,? USA Patent 4,958,310, nov, 1990. 230 [40] P. A. D. B. L. Tise, ?Multiplexed chirp waveform synthesizer,? USA Patent 6,614,813, September, 2003. [41] S. Turner and D. Kotecki, ?Direct digital synthesizer with ROM-Less architecture at 13-GHz clock frequency in InP DHBT technology,? IEEE Microwave and Wireless Components Letters, vol. 16, no. 5, pp. 296?298, may 2006. [42] X. Geng, F. Dai, J. Irwin, and R. Jaeger, ?24-bit 5.0 GHz direct digital synthesizer RFIC with direct digital modulations in 0.13 ? m sige bicmos technology,? IEEE Journal of Solid-State Circuits, vol. 45, no. 5, pp. 944?954, may 2010. [43] e2v, ?Low Power 12-bit 3 GSps DAC with 4/2:1 MUX,? October 2011. [44] G. J. Starr, J. Qin, B. F. Dutton, C. E. Stroud, F. F. Dai, and V. P. Nelson, ?Au- tomated generation of built-in self-test and measurement circuitry for mixed-signal circuits and systems,? in Proc. 24th IEEE Int. Symp. Defect and Fault Tolerance in VLSI Systems DFT ?09, 2009, pp. 11?19. [45] ?Scientific computing tools for python - numpy,? Apr. 2012. [46] ?welcome to mako,? Apr. 2012. [Online]. Available: http://www.makotemplates.org/ [47] D.D.Caro, N.Petra, andA.G.M.Strollo, ?Reducinglookup-tablesizeindirectdigital frequency synthesizers using optimized multipartite table method,? IEEE Transaction on Circuits and Systems, vol. 55, no. 7, pp. 2116?2127, Aug. 2008. [48] M. J. Schulte and J. E. Stine, ?Approximating Elementary Functions with Symmetric Bipartite Tables,? IEEE Transactions on Computers, p. 842, 1999. [49] J. W. Eaton, D. Bateman, and S. Hauberg, GNU Octave Manual Version 3. Network Theory Limited, 2008. [50] S. Axler, Linear Algebra Done Right, 2nd ed. Springer, 1997. [51] R. M. Gray and J. Stockham, T. G., ?Dithered quantizers,? IEEE Transactions on Information Theory, vol. 39, no. 3, pp. 805?812, 1993. [52] T. E. C. III, G. M. Flewelling, D. S. Jansen, J. D. Cali, D. A. Chan, J. Freedman, M.Anthony, T.Dresser, F.Dai, andE.Gebarra, ?Self-HealinginSiGeBiCMOSICsfor Low-SWAP Electronic Warfare Receivers,? in 38th Annual GOMACTech Conference, March 11-14 2013. [53] J. E. Volder, ?The cordic trigonometric computing technique,? IRE Transactions on Electronic Computers, no. 3, pp. 330?334, 1959. [54] J. S. Walther, ?A Unified Algorithm for Elementary Functions,? in Proc. of Spring Joint Computer Conf., 1971, pp. 379?385. [55] J.-M. Muller, Elementary Functions: Algorithms and Implementation, 2nd ed. Birkhauser, 2006. 231 [56] H. Samueli, ?The design of multiplierless fir filters for compensating d/a converter frequency response distortion,? IEEE Transactions on Circuits and Systems, vol. 35, no. 8, pp. 1064?1066, 1988. [57] G. A. M. Van Der Plas, J. Vandenbussche, W. Sansen, M. S. J. Steyaert, and G. G. E. Gielen, ?A 14-bit intrinsic accuracy q2 random walk CMOS DAC,? IEEE Journal of Solid-State Circuits, vol. 34, no. 12, pp. 1708?1718, 1999. [58] A. Devices, ?Ad9737a: RF Digital-to-Analog Converters,? 2012. [59] S. K. Mitra, Digital Signal Processing: A Computer-Based Approach, 3rd ed. McGraw Hill, 2006. [60] C.E.Shannon, ?Amathematicaltheoryofcommunication,?The Bell System Technical Journal, vol. 27, pp. 379?423, 1948. [61] B. Widrow and I. Koll?r, Quantization Noise. Cambridge University Press, 2008. [62] V. I. Bogachev, Measure Theory: Volume 1. Springer, 2007. [63] D. Duttweiler and D. Messerschmitt, ?Analysis of Digitally Generated Sinusoids with Application to A/D and D/A Converter Testing,? IEEE Transactions on Communi- cations, vol. 26, no. 5, pp. 669?675, 1978. [64] K. Doris, J. Briaire, D. Leenaerts, M. Vertreg, and A. van Roermund, ?A 12b 500MS/s DAC with > 70dB SFDR up to 120MHz in 0.18?m CMOS,? in Proc. Digest of Tech- nical Papers Solid-State Circuits Conf. ISSCC. 2005 IEEE Int, 2005, pp. 116?588. [65] IEEE Standard 746-1984: Performance Measurements of A/D and D/A Conversion Techniques and Their Applications, IEEE Std., 1984. [66] B. Razavi, Principles of Data Conversion System Design, J. B. Anderson, Ed. New York: Wiley-IEEE Press, 1995. [67] S. Luschas and H.-S. Lee, ?Output impedance requirements for DACs,? in Proc. Int. Symp. Circuits and Systems ISCAS ?03, vol. 1, 2003. [68] G. I. Radulov, M. Heydenreich, R. W. van der Hofstad, J. A. Hegt, and A. H. M. van Roermund, ?Brownian-Bridge-Based Statistical Analysis of the DAC INL Caused by Current Mismatch,? IEEE Transactions on Circuits and Systems?Part II: Express Briefs, vol. 54, no. 2, pp. 146?150, 2007. [69] N. C.-C. Lu, L. Gerzberg, C.-Y. Lu, and J. D. Meindl, ?Modeling and optimization of monolithic polycrystalline silicon resistors,? IEEE Transactions on Electron Devices, vol. 28, no. 7, pp. 818?830, 1981. [70] W.-H. Tseng, C.-W. Fan, and J.-T. Wu, ?A 12-Bit 1.25-GS/s DAC in 90 nm CMOS With > 70 db SFDR up to 500 MHz,? IEEE Journal of Solid-State Circuits, vol. 46, pp. 2845?2856, 2011. 232 [71] A. Van den Bosch, M. Steyaert, and W. Sansen, ?SFDR-bandwidth limitations for high speed high resolution current steering CMOS D/A converters,? in Electronics, Circuits and Systems, 1999. Proceedings of ICECS ?99. The 6th IEEE International Conference on, vol. 3, 1999, pp. 1193?1196 vol.3. [72] C.-H.Lin, F.M.I.vanderGoes, J.R.Westra, J.Mulder, Y.Lin, E.Arslan, E.Ayranci, X. Liu, and K. Bult, ?A 12 bit 2.9 GS/s DAC with IM3 78 dBc, IM3