Digital Phase Accumulation for Direct Digital Frequency Synthesis
by
Joseph Dominic Cali
A dissertation submitted to the Graduate Faculty of
Auburn University
in partial fulfillment of the
requirements for the Degree of
Doctor of Philosophy
Auburn, Alabama
May 5, 2013
Keywords: DDS, DDFS, DCDO, DAC, Phase Truncation Errors, CORDIC
Copyright 2013 by Joseph Dominic Cali
Approved by
Fa Dai, Chair, Professor of Electrical and Computer Engineering
Richard Jaeger, Ginn Distinguished Professor of Electrical and Computer Engineering
Robert Dean, Associate Professor of Electrical and Computer Engineering
Stanley Reeves, Professor of Electrical and Computer Engineering
Abstract
This work explores direct digital frequency synthesis (DDFS) theory and design and its
application in radar systems. Though there is nothing particularly novel about DDFS in
general, recent designs have been revolutionized with the advancements in CMOS processes
and SiGe BiCMOS integration from 2000 to the current day. Many of the performance
limitations highlighted in early literature, such as the area and power of the sinusoidal read-
only memory (ROM), no longer apply to designs in modern integrated circuit (IC) processes.
The digitally-controlled digital oscillator (DCDO) of the DDFS can now produce signals with
spectral purity far beyond the capabilities of the digital to analog converter (DAC). CMOS
miniaturization allows for high dynamic range sinusoids to be generated with CORDICs
instead of lossy compressed sine and cosine ROMs. Parallelization in the accumulator and
modulation paths eliminate the need for power hungry, current mode logic (CML) pipeline
accumulators. Noise shaping is better understood than at any point prior to this moment,
whichallowsustomitigatequantizationnoisethatarisesfromphaseoramplitudetruncation.
However, alarmingly few DDFS designs published in the past five years have taken note
of the radical shift in the design landscape. Of equal importance are the new challenges
that have arisen in small feature size geometries. In a way, this document is an attempt
to consolidate the state of the art in DDFS design and propose improvements from the
study. To this end, the dissertation is organized into two distinct sections, the DCDO and
the DAC. Digital phase accumulation and sinusoid generation are approached from number
theory and real analysis respectively. An exact computation of the spurs generated through
phasetruncationisdevelopedthatresultsinclosedformexpressionsfortheDCDOspectrum.
Current switches and architectures for improved DAC performance is presented qualitatively.
ii
Acknowledgments
Journeying down the path of higher education can rarely be attributed to the will power
or foresight of the individual in pursuit. In recent years, I have appreciated the support of
the faculty and staff of Auburn University who have guided me through a challenging five
years of graduate school. In addition, I benefitted from the assistance of my fellow graduate
students with whom all my designs have interfaced in some manner. I acknowledge my major
advisor, Dr. Fa Dai for taking me on as a graduate student and funding eight integrated
circuit designs through my stint as a graduate student. I also must mention the members of
my committee Dr. Dean, Dr. Jaeger and Dr. Reeves for their specialized assistance through
many challenging design problems. I cannot fail to mention Dr. Niu, as his passionate and
skilled teaching of semiconductor physics from his deep knowledge of the subject has proven
helpful dozens of times on the job in my short time in the workforce.
There are countless teachers who from kindergarten through my undergraduate degree
at Louisiana State University (LSU) have devoted their energy and time to teaching me
and putting up with my relentless questions with regard to the ?how?s? and the ?why?s? of
this world. Without the prodding of my professors at LSU, I may have never considered
an advanced degree. Above these teachers stand the two greatest teachers in my life, my
mother and father, who have patiently raised me and provided emotional and financial
support throughout my academic journey. They sacrificed many conveniences for me to
attend a private school in preparation for college.
Lastly, I must thank my wife, Alison, for supporting me through the endless nights of
class work, the many weekends of research, my late night existential crises (now why am I
in graduate school again?), and tough medical challenges. She has certainly done more to
shape the outcome of this work than any other person in my life.
iii
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
List of Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
1 Introduction to Phase Accumulators . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Explanation of Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Number Theory Axioms and Notation . . . . . . . . . . . . . . . . . 6
1.1.2 Binary Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Overview of Direct Digital Frequency Synthesis . . . . . . . . . . . . . . . . 13
1.3 Advantages of DDFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.1 Digital Phase Modulation . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3.2 Digital Frequency Modulation . . . . . . . . . . . . . . . . . . . . . . 22
1.3.3 Digital Amplitude Modulation . . . . . . . . . . . . . . . . . . . . . . 23
1.3.4 Fine Frequency Resolution and Fast Switching . . . . . . . . . . . . . 24
1.4 Summary of Contributions and Chapter Breakdown . . . . . . . . . . . . . . 24
2 Background of Phase Truncation Analysis . . . . . . . . . . . . . . . . . . . . . 26
2.1 Mehrgardt?s Analysis (1983) . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 Nicholas?s Analysis (1985) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 Jenq?s Analysis (1988) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3.1 Jenq?s Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.2 Jenq?s Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
iv
2.4 Torosyan?s Analysis (2001) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3 Phase Accumulator Sequences from Number Theory . . . . . . . . . . . . . . . . 43
3.1 Phase Accumulator Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Phase Accumulator Period . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 Truncated Phase Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.4 Relationships Between Sequences . . . . . . . . . . . . . . . . . . . . . . . . 62
3.5 Comments on Mathematical Structure . . . . . . . . . . . . . . . . . . . . . 66
4 Spectrum of Truncated Phase Sequences . . . . . . . . . . . . . . . . . . . . . . 68
4.1 Intuitive Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2 Characteristics of Truncated Phase Sequences . . . . . . . . . . . . . . . . . 72
4.3 Spectrum in the Presence of Phase Truncation . . . . . . . . . . . . . . . . . 79
4.4 Interpreting Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.4.1 Ideal SCMF Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.5 Numerical Verification of Theory . . . . . . . . . . . . . . . . . . . . . . . . 96
4.6 SFDR and SNR in the Presence of Phase Truncation . . . . . . . . . . . . . 96
4.6.1 SFDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.6.2 Worst Case SFDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.6.3 Spur Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.6.4 SNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.7 Architecture Changes for Improved Spurious Response . . . . . . . . . . . . 106
4.7.1 Force Coprime FCWs . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.7.2 Phase Accumulator with Prime Number of States . . . . . . . . . . . 109
5 Parallelization of Phase Accumulator . . . . . . . . . . . . . . . . . . . . . . . . 111
5.1 Pipelined Accumulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.2 Parallel Accumulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2.1 Prior Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.2.2 Derivation of LFM Enabled Architecture . . . . . . . . . . . . . . . . 117
v
5.2.3 Area and Power Growth Analysis . . . . . . . . . . . . . . . . . . . . 119
5.2.4 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . 121
5.3 Multiplexer Upconversion Analysis . . . . . . . . . . . . . . . . . . . . . . . 123
5.4 Behavioral HDL Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.4.1 Problems with Existing Techniques . . . . . . . . . . . . . . . . . . . 126
5.4.2 A Simple Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.4.3 EDA Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.4.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6 Radar Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.1 Previous DDFS Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.1.1 Sine Wave Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.1.2 MTM DDFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.1.3 BTM DDFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.1.4 Output Response Analyzer . . . . . . . . . . . . . . . . . . . . . . . . 145
6.2 Overview of Basic Radar Theory . . . . . . . . . . . . . . . . . . . . . . . . 149
6.3 Overview of Stretch Processing . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.3.1 Single Chip Radar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.4 CORDIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6.4.1 Basic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.4.2 Conventional CORDIC . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.4.3 Optimizing the CORDIC Algorithm for DDFS . . . . . . . . . . . . . 170
6.4.4 Partial Dynamic Rotation CORDIC . . . . . . . . . . . . . . . . . . . 173
6.5 Stretch Processing DDFS Architecture . . . . . . . . . . . . . . . . . . . . . 175
6.5.1 Inverse Sinc Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
6.5.2 Radar Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6.6 Design of 12-bit CMOS DAC . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6.7 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
vi
7 Digital-To-Analog Converters (DAC) . . . . . . . . . . . . . . . . . . . . . . . . 184
7.1 Basic Sampling Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
7.2 DAC Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7.3 DAC Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
7.3.1 Static DAC Performance . . . . . . . . . . . . . . . . . . . . . . . . . 196
7.3.2 INL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
7.3.3 DAC Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7.4 Dynamic DAC Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
7.5 DAC Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
7.5.1 R-2R DACs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
7.5.2 Thermometer Coded and Segmented DACs . . . . . . . . . . . . . . . 209
7.5.3 Return-to-Zero (RTZ) . . . . . . . . . . . . . . . . . . . . . . . . . . 211
7.5.4 Translinear Output Buffers and Non-Linear DACs . . . . . . . . . . . 211
7.6 Current Steering Cell Architectures . . . . . . . . . . . . . . . . . . . . . . . 218
8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
vii
List of Figures
1.1 Basic DDFS Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Gate Logic for One?s Complement . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Phase Accumulator State Plots (Circle) . . . . . . . . . . . . . . . . . . . . . . 14
1.4 BPSK Waveforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5 Simple Chirp Accumulator Diagram . . . . . . . . . . . . . . . . . . . . . . . . 22
1.6 10ns Chirp Waveform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1 Sawtooth Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2 Error Sequence Waveform Components . . . . . . . . . . . . . . . . . . . . . . . 33
3.1 Phase Accumulator State Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1 Spectrums from Two Adjacent FCWs . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2 Simple Estimates for Worst Case SFDR due to Phase Truncation . . . . . . . . 71
4.3 Window Function from Example . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.4 Window Function from Example . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.5 Numerical Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.6 Numerical Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
viii
4.7 SFDR Function (Magnitude) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.8 Forcing Coprime FCWs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.9 Modification SFDR Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.10 Forcing Coprime FCWs (Modification) . . . . . . . . . . . . . . . . . . . . . . . 109
4.11 Mersenne Prime (17) Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.1 Phase Accumulator with LFM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2 Block Diagram of Pipeline Accumulator . . . . . . . . . . . . . . . . . . . . . . 112
5.3 Block Diagram of Pipeline Accumulator with LFM . . . . . . . . . . . . . . . . 113
5.4 [1] Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.5 FSM Chirp-Enabled DDFS with Parallel Processing Path . . . . . . . . . . . . 116
5.6 Finite State Machine for Parallel Processing Path . . . . . . . . . . . . . . . . . 117
5.7 Proposed DDFS Using Novel Parallel Accumulator . . . . . . . . . . . . . . . . 118
5.8 Frequency and Phase Predictive Step . . . . . . . . . . . . . . . . . . . . . . . . 121
5.9 Parallel Phase Accumulator using Predictive Step . . . . . . . . . . . . . . . . . 122
5.10 4-to-1 Upconverting Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.11 CML Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.1 Quadrature, Quarter Sine Compression . . . . . . . . . . . . . . . . . . . . . . . 133
6.2 MTM DDFS Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
ix
6.3 MTM Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.4 MTM DDFS GDSII (130 ?m BiCMOS) . . . . . . . . . . . . . . . . . . . . . . 138
6.5 BTM DDFS Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.6 Phase Accumulator State Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.7 BTM ROM Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.8 BTM, CORIDC, ORA and DACs (130 ?m BiCMOS) . . . . . . . . . . . . . . . 144
6.9 Galois 18-Bit LFSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.10 Phase Accumulator State Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
6.11 BTM Simulation Versus Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.12 Two Tone Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.13 ORA Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.14 Example of Stretch Processing Signals . . . . . . . . . . . . . . . . . . . . . . . 151
6.15 Radar-On-Chip Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.16 Die Photograph of RoC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6.17 CORDIC Vector Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.18 CORDIC Coverage Requirement . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6.19 Conventional CORDIC Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
6.20 arctan Small Angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
x
6.21 CORDIC Bit Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.22 PDR CORDIC Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.23 PDR CORDIC Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.24 Block Diagram for Radar DDFS . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.25 Die Photograph of RoC (DDFS Zoomed) . . . . . . . . . . . . . . . . . . . . . . 176
6.26 Inverse Sinc FIR Filter (Block Diagram) . . . . . . . . . . . . . . . . . . . . . . 177
6.27 Block Diagram of 12-Bit CMOS DAC . . . . . . . . . . . . . . . . . . . . . . . 179
6.28 DAC Current Source Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6.29 Synchronization Circuit for 12-Bit CMOS DAC . . . . . . . . . . . . . . . . . . 180
6.30 Clock Tree for 12-Bit CMOS DAC . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.31 Inverse Sinc Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
6.32 DDFS with Single Tone Output . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
7.1 Rectangle Function Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.2 INLCurvesforThermometer-CodedDACModelswithFiniteOutputImpedance
Current Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
7.3 Graphical Explanation of Gain and Offset Errors . . . . . . . . . . . . . . . . . 197
7.4 Graphical Explanation of INL and DNL . . . . . . . . . . . . . . . . . . . . . . 198
7.5 Simple Single-Ended Binary-Weighted Model . . . . . . . . . . . . . . . . . . . 200
7.6 Simple Single-Ended Thermometer Model . . . . . . . . . . . . . . . . . . . . . 200
xi
7.7 Single-Ended Single Bit Active . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
7.8 INLCurvesforThermometer-CodedDACModelswithFiniteOutputImpedance
Current Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
7.9 Simple Differential Thermometer Model . . . . . . . . . . . . . . . . . . . . . . 203
7.10 Glitch Versus Device Size (1 ?m to 10 ?m) . . . . . . . . . . . . . . . . . . . . . 206
7.11 R-2R with Binary Scaling (Emitter Network) . . . . . . . . . . . . . . . . . . . 207
7.12 R-2R with Binary Attenuation (Collector Network) . . . . . . . . . . . . . . . . 208
7.13 Segmented R-2R Binary with Thermometer MSBs . . . . . . . . . . . . . . . . 210
7.14 Differential Pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
7.15 Pad? Sine Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
7.16 Translinear Sine Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . 219
7.17 Differential Translinear Cosine Implementation (Ideal Current Sources) . . . . . 220
7.18 Quadrature Translinear DDFS . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.19 Simple Current Steering Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
7.20 Current Steering Cells with Cascoding . . . . . . . . . . . . . . . . . . . . . . . 223
7.21 Current Steering Cell with Cascode Output and Keep Alive . . . . . . . . . . . 224
7.22 Current Steering Cell with Cascode, Keep Alive and RTZ . . . . . . . . . . . . 225
xii
List of Tables
1.1 Built-in Barker Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1 Table of Truncated Phase States (4-bit) . . . . . . . . . . . . . . . . . . . . . . 38
4.1 List of Mersenne Primes for Phase Accumulation . . . . . . . . . . . . . . . . . 109
5.1 Comparison of Accumulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.1 Table of Initial Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.2 Example BTM Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.3 Summary of DDFS Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.4 DDFS Performance Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
7.1 Published RTZ DACs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
7.2 SFDR of NRTZ DACs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
xiii
List of Symbols
P Current State of Phase Accumulator . . . . . . . . . . . . . . . . . . . . . 4
A Current Amplitude Output of DCDO . . . . . . . . . . . . . . . . . . . . 4
BP Number of Bits in Phase Accumulator . . . . . . . . . . . . . . . . . . . . 5
BA Number of Bits of Amplitude Resolution in DCDO . . . . . . . . . . . . . 5
NP Number of States in Phase Accumulator . . . . . . . . . . . . . . . . . . 6
?P Least Period of Phase Accumulator Sequence . . . . . . . . . . . . . . . . 6
F Frequency Control Word . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
?P Reduced Frequency Control Word . . . . . . . . . . . . . . . . . . . . . . 6
? Discrete Time Continuous Angular Frequency . . . . . . . . . . . . . . . 36
NE Number of Truncation Error States . . . . . . . . . . . . . . . . . . . . . 55
?E Least Period of Truncated Sequence . . . . . . . . . . . . . . . . . . . . . 58
NQ Number of unique states in the truncated phase word. . . . . . . . . . . . 59
fT Unity gain Bandwidth Product . . . . . . . . . . . . . . . . . . . . . . . . 112
?(x) Dirac delta function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
?T (t) Dirac comb function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
? Continuous Time Angular Frequency . . . . . . . . . . . . . . . . . . . . 190
t Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
f Ordinary Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
VA Early Voltage of Transistor . . . . . . . . . . . . . . . . . . . . . . . . . . 221
gm Transconductance of Bipolar Transistor . . . . . . . . . . . . . . . . . . . 222
VT Thermal Voltage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
xiv
List of Theorems
1.1 Principle (Mathematical Induction) . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Principle (Well-Ordering Principle) . . . . . . . . . . . . . . . . . . . . . . . 6
1.1 Definition (Divides) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Definition (Least Common Multiple) . . . . . . . . . . . . . . . . . . . . . . 7
1.1 Theorem (Binary Number Representation) . . . . . . . . . . . . . . . . . . . 8
1.1 Lemma (Dropping Modulo Operation in Sinusoids) . . . . . . . . . . . . . . 14
1.2 Lemma (Geometric Series) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 Lemma (When the Complex Exponential Equals 1) . . . . . . . . . . . . . . 18
2.1 Definition (Fourier Series of Real-Valued Function) . . . . . . . . . . . . . . 29
2.1 Theorem (Nicholas Number of Spurs) . . . . . . . . . . . . . . . . . . . . . . 34
2.2 Theorem (Nicholas Spur Index) . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3 Theorem (Nicholas Spur Magnitude) . . . . . . . . . . . . . . . . . . . . . . 35
2.4 Theorem (Nicholas Spur Phase) . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.5 Theorem (Jenq?s Non-Uniform Sampling Theorem) . . . . . . . . . . . . . . 36
2.2 Definition (Parseval?s Relation) . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1 Theorem (The Division Algorithm) . . . . . . . . . . . . . . . . . . . . . . . 44
3.1 Definition (Congruence) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Theorem (Phase Accumulator Sequence) . . . . . . . . . . . . . . . . . . . . 46
3.2 Definition (Greatest Common Divisor) . . . . . . . . . . . . . . . . . . . . . 49
3.3 Definition (Relatively Prime) . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1 Lemma (GCD Divisibility) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 Lemma (Linear Modulo Normalization) . . . . . . . . . . . . . . . . . . . . . 50
xv
3.3 Theorem (Phase Accumulator Periodicity) . . . . . . . . . . . . . . . . . . . 51
3.3 Lemma (Alternative Phase Accumulator Expression) . . . . . . . . . . . . . 53
3.4 Lemma (Sum of Two Integers Modulo N) . . . . . . . . . . . . . . . . . . . . 53
3.4 Definition (Truncation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.4 Theorem (Truncated Phase Sequence) . . . . . . . . . . . . . . . . . . . . . . 56
3.5 Lemma (Least Period of the Modulo of a Modulo Sequence) . . . . . . . . . 57
3.5 Theorem (Periodicity of Phase Truncation Error Sequence) . . . . . . . . . . 58
3.6 Theorem (Periodicity of the Difference of Two Modulo Sequences) . . . . . . 59
3.7 Theorem (Truncated Phase Sequence Period) . . . . . . . . . . . . . . . . . 61
3.6 Lemma (GCD and Linear Diophantine Equations) . . . . . . . . . . . . . . . 62
3.8 Theorem (Multiplicative Inverse in Modulo Arithmetic) . . . . . . . . . . . . 63
3.9 Theorem (FCW Time Sequence Permutation Relationship) . . . . . . . . . . 64
3.5 Definition (Groups) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.1 Definition (Taylor Series) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.1 Theorem (Delta Phase Steps) . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2 Definition (Kronecker Delta Function) . . . . . . . . . . . . . . . . . . . . . 74
4.2 Theorem (Sub-Sequences of a Finite Sequence) . . . . . . . . . . . . . . . . . 74
4.3 Theorem (Interchanging Summations for Finite Sequences) . . . . . . . . . . 75
4.4 Theorem (Adjacent Truncated Phase Elements) . . . . . . . . . . . . . . . . 76
4.5 Theorem (When Truncated Values Repeat) . . . . . . . . . . . . . . . . . . . 78
4.1 Lemma (Special Sub-Sequence Arrangement for Periodic Sequences) . . . . . 78
4.3 Definition (Discrete Fourier Transform) . . . . . . . . . . . . . . . . . . . . . 79
4.4 Definition (Inverse Discrete Fourier Transform) . . . . . . . . . . . . . . . . 80
4.6 Theorem (Spectrum of Truncated Phase Sequence) . . . . . . . . . . . . . . 80
4.7 Theorem (DCDO Spectrum with Phase Truncation and Arbitrary ROM) . . 85
4.8 Theorem (FCW Frequency Sequence Permutation Relationship) . . . . . . . 86
4.9 Theorem (Number of Phase Accumulator Least Periods) . . . . . . . . . . . 88
xvi
4.2 Lemma (DFT Periodicity) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.3 Lemma (Window Function Periodicity) . . . . . . . . . . . . . . . . . . . . . 91
4.4 Lemma (Period of Amplitude Spectrum with Phase Truncation) . . . . . . . 92
6.1 Definition (Convergent Series (Real)) . . . . . . . . . . . . . . . . . . . . . . 159
6.1 Theorem (Cauchy Convergence Criterion (Real)) . . . . . . . . . . . . . . . . 160
6.1 Lemma (Sequences for Convergent Series) . . . . . . . . . . . . . . . . . . . 161
6.2 Theorem (CORDIC Convergence Theorem) . . . . . . . . . . . . . . . . . . 163
6.2 Definition (Conventional CORDIC Iteration) . . . . . . . . . . . . . . . . . . 167
7.1 Definition (Dirac Delta) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
7.1 Theorem (Nyquist-Shannon Sampling Theorem) . . . . . . . . . . . . . . . . 187
7.2 Definition (Convolution) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7.2 Theorem (Fourier Convolution Theorem) . . . . . . . . . . . . . . . . . . . . 188
xvii
List of Abbreviations
BIST Built-In Self-Test
BPSK Binary Phase Shift Keying
BTM Bipartite Table Method
CDMA Code Division Multiple Access
CML Current Mode Logic
CORDIC COordinate Rotation DIgital Computer
CS current steering
CW Continuous Wave
DAC Digital-to-Analog Converter
DDFS Direct Digital Frequency Synthesis
DEM Dynamic Element Matching
DFF D-Flip-Flop
DFT Discrete Fourier Transform
DNL Differential Non-Linearity
DSP Digital Signal Processing
Emitter Coupled Logic
ENOB Effective Number of Bits
xviii
FCW Frequency Control Word
FIR Finite Impulse Response
GCD Greatest Common Divisor
IDE Integrated Development Environment
INL Integral Non-Linearity
KCL Kirchoff Current Law
KVL Kirchoff Voltage Law
LFM Linear Frequency Modulation
LSB Least Significant Bit
MSB Most Significant Bit
MTM Multipartite Table Method
NRTZ Non-Return-to-Zero
ORA Output Response Analyzer
PLL Phase-Locked Loop
PM Phase Modulation
QAM Quadrature Amplitude Modulation
QPSK Quadrature Phase Shift Keying
RAM Random Access Memory
ROM Read-Only Memory
RTZ Return-to-Zero
xix
SCMF Sine or Cosine Mapping Function
SFDR Spurious Free Dynamic Range
SINAD Signal to Noise Ratio and Distortion
SNDR Signal to Noise and Distortion Ratio
SNR Signal to Noise Ratio
Source Coupled Logic
SPI Serial Peripheral Interface
SSM Static Mismatch Shaping
SSPA Switching Sequence Post Adjustment
THD Total Harmonic Distortion
TSPC True Single Phase Clock
WLAN Wireless Local Area Network
xx
Chapter 1
Introduction to Phase Accumulators
In this chapter, Direct Digital Frequency Synthesis (DDFS) is introduced as an impor-
tant component in modern 21st century communication systems, and its fundamental oper-
ating principles are presented. Wireless cellular communication techniques such as code divi-
sion multiple access (CDMA) and spread spectrum wireless local area networks (WLAN) [2]
require fast frequency switching, an attribute in which DDFS excels over conventional ana-
log frequency synthesis approaches. As integrated circuit processes advance, DDFS is also
emerging as a critical component in commercial radar systems, agile clock synthesizers [3]
and high speed testing equipment [4] opening up new opportunities in industries outside of
telecommunications, the automotive industry being one of the more exciting [5],[6].
In DDFS systems, the amplitude, frequency, and phase of synthesized waveforms can
be modulated digitally and nearly instantaneously, which depending upon the operating
frequency of the technology and the level of pipelining in the digital core could mean less
than a few nanoseconds. The lock time of a standard analog phase-locked loop (PLL) can
be on the order of several hundred microseconds as a result of the slow settling time of
the loop filter [7]. The ability to directly modulate the signal also allows for arbitrary, high-
bandwidth waveform synthesis varying from simple phase-shift keying used in low cost signal
data transmission systems to complex non-linear frequency sweeps used in radar systems [8].
One of the better published results of an arbitrary waveform generator is presented by Van
de Sande et al. [4] the year of this writing, indicating that research in the field remains active.
The DDFS operates not by digitally controlling an analog oscillator component but
by numerically computing a complex digital signal and directly converting it to a physical
electrical quantity through a digital-to-analog converter (DAC). The phase, frequency, and
1
Phase Accumulator
PhaseRegister
summationtext BPF BF BPT1
BP
Sine/Cosine
Mapping BA
CLK
DAC y(t)P A
Figure 1.1: Basic DDFS Block Diagram
amplitude of a DDFS system are themselves digital codes that can be modulated in the
digital domain through simple multiplication and addition operations. These operations,
excluding quantization, are completely linear and thus superior to the analog equivalent
operations that apply unwanted harmonic distortion and spurious mixing to the signal.
The earliest implementation of a circuit in an academic publication that resembles the
modernDDFSappearsin 1971byJosephTierneyet al.[9]. Figure1.1shows thearchitecture
of the DDFS proposed in [9], exluding the quadrature sine and cosine outputs and the analog
reconstruction filter after the DAC, which is the basic architecture of modern DDFS devices.
The focus of this dissertation is explaining how to generate spectrally pure sinusoids with
suchadeviceinanefficientmannerbygatheringpublishedresultsforthevariouscomponents
comprising the device and inserting mathematical explanations when necessary or helpful.
Care is taken in an attempt to place the analysis of DDFS systems on a clear mathematical
foundation and perhaps to illuminate some of the more difficult concepts such as the rise of
spurs through phase truncation.
For readers not familiar with the terminology, a spur is unwanted coherent spectral
energy, harmonically related to the intended synthesized signal or otherwise. This implies
that attaching a spectrum analyzer to the capture the waveform generated, one would see
a distinct tone that did not decrease in power when averaged over time, hence the use of
coherency in the definition. In the digital domain, one would find that increasing the length
2
of the Discrete Fourier Transform also had no influence over the magnitude or phase of the
unwanted tones.
The fundamental components of the DDFS are a clock receiver and distribution tree, an
overflowingaccumulator, asineand/orcosinemappingfunction(SCMF),oneormoreDACs,
andareconstructionfilterattheDAC(s)output. Theoverflowingaccumulatorisoftencalled
the phase accumulator, as its cyclic overflowing is analogous to the phase of a sinusoid. The
accumulator is incremented by a value known as the frequency control word (FCW). The
reconstruction filter is not studied in detail, but several DAC clocking methodologies to
reduce the stringent requirements of the filter are presented during the DAC architecture
survey (Section 7.5).
The term SCMF is used instead of the more common read-only-memory (ROM) or
?lookup table? (LUT). The terminology is borrowed from Torosyan in his dissertation and
publications [10] and is general enough to encompass the wide range of techniques available
for sinusoidal phase to amplitude conversion. The name choice also separates the functional
behavior of the component from its realization on silicon. The cost of digital memory has
become so remarkably inexpensive both in area and power that some designs use random
access memory (RAM) as a opposed to a ROM to implement the SCMF function. Bipartite
and multipartite table methods , BTM [11] and MTM [12] respectively, and the COrdinate
Rotation DIgital Computer (CORDIC) [13] are implemented and studied in this work as
effective techniques for implementing the SCMF.
1.1 Explanation of Notation
The conventions used in this document are described in this section for reference. This
is particularly important in mixed-signal systems such as a DDFS, as many of the analyses
of the behavior of the device transition between discrete-time and continuous-time repre-
sentations of the signal. The same conceptual entity crosses several processing domains. In
order to clearly denote when a digital variable, or some non-digitized discrete time sequence,
3
is intended, an upper case English letter glyph is used. For instance, P represents the phase
state of the phase accumulator and A represents the amplitude output of the SCMF. An
immense effort was put into the writing attempting
? To provide consistency in notation. The author wants avoid incessant flipping between
this section and subsequent sections.
? Toavoidcollisionswithimportantvariablesinliterature. Forinstance,repeatedlyusing
a variable Q in a text about passive filters in a manner unrelated to the quality-factor
of an inductor or energy storage tank can be confusing.
The nth element in the sequence P is denoted with square brackets, P[n] being the state of
the phase accumulator at clock cyclen. Parentheses are used to denote continuous functions,
where y = x(t) is the value of the function x corresponding to the argument t. The phase
accumulator at time nTs is given as P(nTs), where TS is the period of the clock driving the
DDFS. Some mathematics texts [14] use a more general and formal notation to represent
the same function concept x : tmapsto?y, where t?ST and y?SY and ST and SY are sets. The
notation x?S means that the element x is contained in the set S.
There are some commonly used sets in mathematics that appear frequently in DDFS
analysis. Instead of repeatedly listing the elements that form the set, a list of all sets used
4
in this document are presented below (many used by [15] in his Modern Algebra text):
?= The empty set (i.e. that set containing no elements) (1.1)
B = The set containing only 0 and 1 (1.2)
P = The set of all positive integers, also known as the natural numbers (1.3)
P0 = The set of all positive integers including zero (1.4)
Z = The set of all integers (1.5)
Zn = The set: {x?Z : 0?x<n} (1.6)
R = The set of all real numbers including?and ?? (1.7)
C = The set of all complex numbers (1.8)
Severalsmallproofswillbederivedinthetexttolayafoundationforsomeofthefundamental
behavior of DDFS systems. Some readers may find this excessive, but the author believes
that the best explanations to the spurious behavior of phase truncation come from number
theory used in conjunction with real and complex analysis. While Torosyan felt it sufficient
to state in a single line that ?this result follows from a fundamental result from number
theory? [16] citing a large text on number theory, this work attempts to pull the information
from various works on abstract algebra and number theory to supplement the material. If
completed to the level of detail intended by the author, purchasing a mathematics textbook
to understand a key statement in this work should not be necessary.
The digital signals of the DDFS have a finite word length representation. The number
of bits of resolution for a variable is denoted by a capital B with the variable name as the
subscript. Thus the number of bits in the phase accumulator is denoted BP and the number
of bits of amplitude resolution is denoted as BA. The number of unique states that can
be represented by the finite bit length word is denoted by a capital N with the variable
name as the subscript. Going back the phase accumulator as the example, a BP-bit phase
accumulator has NP = 2BP states.
5
The least period of sequences, of which discussions begin in Section 1.2, are denoted
using ?. The length of the least period of the phase accumulator with NP states would be
?P. In later sections, it becomes clear that ?P is a function of the frequency control word
driving the phase accumulator. The reduced frequency control word (Section 1.2) is denoted
with ?. If the sequence is modulo NP with frequency control word F, then ?P denotes the
reduced frequency control word (Equation 3.26).
The multiplicative inverse of a number or variable is denoted as the name of variable
with ?1 in the superscript. In this work, the multiplicative inverses under analysis are
integer numbers a?1 that such?aa?1?N = 1. This will be used quite frequently in Chapter 4
where the inverse of the reduced frequency control word is frequently used in computing
spectrum permutations.
1.1.1 Number Theory Axioms and Notation
In this section, several axioms and definitions that are used in proofs in Section 1.2 and
Chapter 3 are supplied. The principle of mathematical deduction is defined as follows [15]:
Principle 1.1 (Mathematical Induction). Suppose S is a subset of P such that the following
two properties hold:
1. 1?S.
2. For all k?P, if k?S, then k + 1?S.
then S = P.
Mathematical induction is the last of Peano?s five axioms for natural numbers [15], and
thus it is sufficient to treat it as an axiom in this work. The first step, showing that 1?S
is called the basis step of mathematical induction. The second step is called the induction
step. This will be used to show that any integer can be represented by a binary number.
Principle 1.2 (Well-Ordering Principle). Every nonempty set of non-negative integers has
a smallest element.
6
The mathematical constructs necessary to develop the well-ordering principle from the
axiom of choice are beyond the scope of this work. In some mathematical systems the well-
ordering principle is treated as an axiom itself, underlying the subtleties of a mathematical
principle that appears trivial at first glance. The presentation of the division algorithm
(Theorem 3.1) in Section 3.1 makes use of this principle. As a majority of the derivations
in the work make use of the division algorithm, the Well-Ordering Principle provides an
underpinning for the entire text.
Most of the arithmetic in hardware implementations is modulo arithmetic attributable
to the finiteness of the physical components. The notation m|b is read m divides b. This
notation appears in the explanation of the modulo behavior of the phase accumulator in
Chapter 3.
Definition 1.1 (Divides). An integer m divides an integer b, or symbolically m|b, if and
only if there exists an integer d such that md = b.
If no such integer d exists, then m does not divide b, or symbolically m-b. Many of the
proofs in Chapter 3 use this definition. Also in Chapter 4, it is shown that the analysis of the
spectrum of a digitally controlled digital oscillator (DCDO) can be dramatically simplified
whenever the number of error states divides the number of phase accumulator states. The
last definition necessary before starting the analysis is the least common multiple (LCM) of
two integers.
Definition 1.2 (Least Common Multiple). The least common multiple of two integers a and
b, symbolically denoted as LCM(a,b), is the smallest positive integer c such that a|c and
b|c.
The least common multiple of two integersaandbis unique, as two integers cannot both
be least from the well-ordering theorem (Principle 1.2). All other definitions and theorems
useful for understanding the behavior of a DDFS are presented or derived as necessary.
7
1.1.2 Binary Arithmetic
A majority of the digital signals are implemented in hardware as binary numbers. This
flows from the efficiency and robustness of which two state logic can be implemented in
electronic circuits [17]. The arithmetic operations on these digital signals are then binary
arithmetic operations. This text, unless explicitly noted in the section, assumes the value of
a digital signal represented by bits is unsigned. The binary number B = bN?1bN?2???b0 in
its unsigned representation has the base-10 value given in Equation 1.9.
vB =
N?1summationdisplay
i=0
2ibi (1.9)
where bi takes either the value of 0 or 1 (i.e. bi ? B) and bi is called the ith bit. From
Equation 1.9, it is clear the maximum unsigned value of an N-bit word is obtained when
all bi = 1.
max{vB}=
N?1summationdisplay
i=0
2i = 1 + 2 +???+ 2N?2 + 2N?1 = 2N?1 (1.10)
The summation is evaluated using an analysis typical for geometric series (Lemma 1.2),
multiplying both sides by (r?1), where r = 2 in this case and rearranging. The minimum
possible unsigned value for B is obtained by setting all bi = 0. Thus min{vB}= 0.
Theorem 1.1 (Binary Number Representation). Any non-negative integer can be repre-
sented by an unsigned binary number.
Proof. Now let us show that any x?P0 can be represented by an unsigned binary number.
First we have already noted that x = 0 can be represented by Equation 1.9 by setting all
bi = 0. Now we need only show that x ? P can be represented by an unsigned binary
number. We will do so by using strong induction. Assume that S is a non-empty subset of
P. For the basis step, we note that for x = 1, then Equation 1.9 can be made to equal 1 by
8
setting bi = 0 for i> 0 and b0 = 1.
x =
?summationdisplay
n=0
2ibi = 20b0 = 1, bi = 0,i> 0 (1.11)
Thus 1?S. Now we take the induction step. Assume that x?S and every positive integer
from 1 to x can be represented by a binary number. We must now show that x+ 1?S. We
can do this by finding a binary representation of x+ 1, or equivalently, finding ci such that
x+ 1 =
?summationdisplay
n=0
2ici (1.12)
Let us briefly consider some of the properties of x. By our induction step, we know that x
can be written as an unsigned binary number:
x =
?summationdisplay
n=0
2ibi = 20b0 + 21b1 +???
= 20b0 + 2
parenleftBig
b1 + 21b2 +???
parenrightBig
(1.13)
Clearly, the second term is a multiple of 2 and is therefore an even number by definition. If
b0 = 1, then the first term evaluates to 1, and adding 1 to an even number yields an odd
number. So for x to be even b0 = 0 otherwise x is odd. Now let us tackle the case of x + 1
assuming x is even.
x+ 1 =
?summationdisplay
n=0
2ici =
?summationdisplay
n=0
2ibi + 1
20c0 + 21c1 +???=
parenleftBig
20b0 + 21b1 +???
parenrightBig
+ 1 (1.14)
Since x is assumed even, b0 = 0. Applying this knowledge and rearranging we get
20c0?1 =
parenleftBig
21b1 + 22b2 +???
parenrightBig
?
parenleftBig
21c1 + 22c2 +???
parenrightBig
(1.15)
9
Setting c0 = 1 and setting bi = ci for i = {1,2,???}sets both the left and right hand sides
of the equation to zero and the equality holds. Thus x+ 1?S whenever x is even. Now let
us consider the case when x is odd. If x is odd then x+ 1 is even.
Since x + 1 is even, 2|(x + 1) by definition and there exists a number d such that
2d = (x + 1), in this case d is a positive integer since x + 1 is an even positive integer. If
we can show that d can be written as an unsigned binary number then x+ 1 can be written
as an unsigned binary number. We can show this by proving that d?x and thus by our
induction hypothesis, i.e. every positive integer from 1 to x can be represented by a binary
number, d?S.
d?x? (x+ 1)2 ?x?(x+ 1)?2x (1.16)
Since the least integer in our set S is 1, it is clear that the previous inequality holds (if this
is not satisfactory, then apply induction to the inequality). Since d?S, we can now find
the binary representation of x+ 1.
(x+ 1) = 2d
?summationdisplay
n=0
2ici = 2
parenleftBigg?summationdisplay
n=0
2ibi
parenrightBigg
parenleftBig
20c0 + 21c1 + 22c2 +???
parenrightBig
=
parenleftBig
21b0 + 22b1 +???
parenrightBig
parenleftBig
21c1 + 22c2 +???
parenrightBig
=
parenleftBig
21b0 + 22b1 +???
parenrightBig
(1.17)
Since x+ 1 is even, c0 = 0. It is clear from Equation 1.17 that setting ci = bi?1 for all i> 0
causes the equality to hold. Therefore x + 1 ?S whenever x is odd. Since the induction
step holds for all x + 1, the set S = P and we have shown that all positive integers can be
represented by an unsigned binary number.
10
In two?s complement representation, the value of B is
vB =?2N?1bN?1 +
N?2summationdisplay
i=0
2ibi. (1.18)
Theminimumandmaximumvaluesofthetwo?scomplementrepresentationcanbecomputed
similarly to the unsigned binary case. Using Equation 1.18, the maximum value is obtained
by setting bN?1 = 0 and bN?2 down to b0 to 1. The minimum value is obtained by setting
bN?1 = 1 and bN?2 down to b0 to 0.
max{vB}=
N?2summationdisplay
i=0
2i = 2N?1?1 (1.19)
min{vB}=?2N?1 (1.20)
The conversion of an unsigned full-scale binary number to the two?s complement number
system such that the zero from the unsigned representation maps to the lowest two?s com-
plement value and the maximum valued in unsigned representation maps to the maximum
two?s complement value involves only inverting the most significant bit (MSB) of the un-
signed number.
A technique used commonly in DDFS designs to approximate the negation of the value
of an integer is to take a one?s complement of the two?s complement binary representation
of the number. In one?s complement, all the bits of B are inverted. This operation is
popular because of its efficient hardware implementation, as only N XOR gates are required
for the inversion of an N-bit word. The architectures presented in Chapter 6 utilize this
technique in the sinusoidal compression algorithm. Figure 1.2 is a gate level block diagram
of a conditional one?s complement operation. The bit a inverts all the bi bits when asserted
high but does not affect the value of bi when asserted low.
One must carefully evaluate the approximation of negation using one?s complement in
the system. So consider the effect of one?s complement on a word B in a two?s complement
11
a
bn?1
bn?2
bn?3
b0
bn?1
bn?2
bn?3
b0
Figure 1.2: Gate Logic for One?s Complement
binary system. The resulting one?s complement value vB1 is given in Equation 1.21.
vB1 =?2N?1bN?1 +
N?2summationdisplay
i=0
2ibi (1.21)
where bi is the complement of bi, meaning that if bi = 0 then bi = 1 and if bi = 1 then
bi = 0. From this one can see that the one?s complement does not negate vB, which is to say
?vBnegationslash= vB1.
vB1 +vB =?2N?1
parenleftBig
bN?1 +bN?1
parenrightBig
+
N?2summationdisplay
i=0
2i
parenleftBig
bi +bi
parenrightBig
=?2N?1 +
N?2summationdisplay
i=0
2i
=?1
since bi +bi = 1 using the definition previous supplied by the definition of a complement in
a binary number system. From the previous equation, we see that vB1 =?vB?1.
12
1.2 Overview of Direct Digital Frequency Synthesis
TheDDFSofFigure1.1operatesbyincrementinganaccumulatorattheclockfrequency
fclk by the value F, where F ?ZNP and ZNP ={0,1,2,...,NP?1}is the set of all integers
between 0 and NP?1 inclusive. F is generally constrained between zero and the maximum
value of the phase accumulator, though there are exceptions to this rule when reducing area
overhead of the adder and control logic by a minuscule margin is critical [13]. F represents
the FCW and will be used as its symbol in mathematical notation. The accumulator has
a finite bit resolution BP and therefore the accumulator will overflow periodically as F is
continually added to the previous accumulator state. The rate of overflow of the phase
accumulator is thus dependent on F. F thereby controls the frequency of the synthesized
DDFS waveform, suggesting that both the phase accumulator and FCW are aptly named.
The periodic overflowing of the accumulator is a remarkably efficient technique for
implementingtheperiodicphasebehaviorofasinusoid. Thephaseaccumulatormaps [0,NP)
to [0,2pi) when driving a lookup table that maps P ? [0,NP) to sin (2piP/NP). Referring
to Equation 1.10, NP = 2BP ?1 is the maximum integer value that can be stored in the
phase accumulator when treating the phase accumulator value as an unsigned integer. With
additional hardware, the number of phase states can be set to any positive integer less than
2BP ?1. One of the new achievements of this work is finding closed form equations for the
spectrum of a DCDO for any NP. The relationship between the current integer phase state
of the accumulator, P, and the analogous normalized phase value, ?, is then
?[n] = 2piN
P
P[n] (1.22)
where ? is in radians. This maps the phase states uniformly across [0,2pi) in NP steps.
Figure 1.3a demonstrates the mapping between the phase states of a 3-bit accumulator and
the corresponding phase in radians. But this particular mapping need not be the case. The
phase accumulator could, in fact, map to any closed interval in R. This feature will prove
13
useful in implementing sinusoidal quarter-wave compression in Section 6.1.1. Figure 1.3b
shows a 3-bit phase accumulator with a 1/2 least significant bit (LSB) offset in which case
P ?[0,NP) maps to [pi/8,2pi +pi/8). Note that in both the figures, the phase step remains
000
001
010
011
100
101
110
111
pi
4
(a) Phase Mapping Circle
000
001010
011
100
101 110
111
pi
8
pi
4
(b) Phase Mapping Circle (1/2 LSB Offset)
Figure 1.3: Phase Accumulator State Plots (Circle)
uniform. The adjustment in the mapping happens through the SCMF and will be discussed
in more detail in Chapter 6.
The SCMF of the DDFS takes the value of P (Equation 3.4) and maps it to the appro-
priate sine or cosine value A. Equation 1.23 shows the SCMF mapping for an untruncated
phase accumulator word to an ideal sine function.
A[n] = sin
parenleftbigg 2pi
NPP[n]
parenrightbigg
(1.23)
= sin
parenleftbigg 2pi
NP ?nF +P0?NP
parenrightbigg
= sin
parenleftbigg2piF
NP n+
2pi
NPP0
parenrightbigg
The modulo NP arithmetic in the argument can be dropped since the period of the sinusoid
is 2pi and implicitly executes the modulo operation. This can be demonstrated through the
following simple lemma.
14
Lemma1.1(DroppingModuloOperationinSinusoids). Themodulooperatorcanbedropped
within the sine and cosine functions with argument 2piP[n]/NP, where P[n] is given by Equa-
tion 3.1. Equivalently,
sin
parenleftbigg 2pi
NP ?nF +P0?NP
parenrightbigg
= sin
parenleftbigg 2pi
NP (nF +P0)
parenrightbigg
(1.24)
Proof. Using the definition of the modulo operation described in Section 3.1, for arbitrary
n?Z and P0 ?Z there exists an integer d such that
?nF +P0?NP = nF +P0?dNP = r (1.25)
where 0 ? r < NP. Plugging nF + P0 ?dNP for P[n] and applying the trigonometric
difference identity for sine yields:
sin
parenleftbigg 2pi
NP (nF +P0?dNP)
parenrightbigg
= sin
parenleftbigg 2pi
NP (nF +P0)
parenrightbigg
cos
parenleftbigg 2pi
NP (dNP)
parenrightbigg
(1.26)
?sin
parenleftbigg 2pi
NP (dNP)
parenrightbigg
cos
parenleftbigg 2pi
NP (nF +P0)
parenrightbigg
This can be further reduced by observing that
cos
parenleftbigg 2pi
NP (dNP)
parenrightbigg
= cos (2pid) = 1 (1.27)
sin
parenleftbigg 2pi
NP (dNP)
parenrightbigg
= sin (2pid) = 0 (1.28)
Finally, substituting Equation 1.27 and Equation 1.28 back into Equation 1.26,
sin
parenleftbigg 2pi
NP (nF +P0?dNP)
parenrightbigg
= sin
parenleftbigg 2pi
NP (nF +P0)
parenrightbigg
(1.29)
15
The spectrum of such a sinusoid (Equation 1.23) can calculated by executing a discrete
Fourier transform over one period of the waveform, which is NP in this particular case. The
discrete Fourier transform is defined later in Section 4.3.
F{A}[k] =
NP?1summationdisplay
n=0
sin
parenleftbigg2piF
NP n
parenrightbigg
e?j2pikn/NP, 0?k?NP?1
=
NP?1summationdisplay
n=0
1
2j
bracketleftBig
ej2piFn/NP ?e?j2piFn/NP
bracketrightBig
e?j2pikn/N
=
NP?1summationdisplay
n=0
1
2j
bracketleftBig
ej2pin(F?k)/NP ?e?j2pin(F+k)/NP
bracketrightBig
(1.30)
Euler?s formula, given in Equation 1.31, was applied to the sine function to simplify the
expression.
ejx = cos (x) +jsin (x) (1.31)
Euler?s formula can be used to write sine and cosine function as the sum of two complex ex-
ponentials as follows (to verify, substitute Equation 1.31 for each of the complex exponential
terms in the equations below):
sin (x) = 12j
bracketleftBig
ejx?e?jx
bracketrightBig
(1.32)
cos (x) = 12
bracketleftBig
ejx +e?jx
bracketrightBig
. (1.33)
Equation 1.30 is the difference of two geometric series. A technique for finding the closed
form solution of a geometric series is presented here.
Lemma 1.2 (Geometric Series). A summation series of the form
N?1summationdisplay
n=0
rn
16
is called a finite geometric series and can be written as the ratio of two numbers,
N?1summationdisplay
n=0
rn = 1?r
N
1?r (1.34)
if rnegationslash= 1.
Proof. Let rnegationslash= 1. The problem is solved by expanding the summation
N?1summationdisplay
n=0
rn = 1 +r +r2 +???+rN
and multiplying both sides by 1?r and using the distributive property of multiplication
over addition.
(1?r)
N?1summationdisplay
n=0
rn = (1?r)
parenleftBig
1 +r +r2 +???+rN?1
parenrightBig
=
parenleftBig
1 +r +r2 +???+rN?1
parenrightBig
?
parenleftBig
r +r2 +r3 +???+rN
parenrightBig
= 1?rN
Then dividing both sides of previous equation by 1?r, which can be done since rnegationslash= 1, gives
N?1summationdisplay
n=0
rn = 1?r
N
1?r (1.35)
and the proof is complete.
The summation of ej2pian/NP for a?Z over one period is zero if NP - a. This can be
shown by directly computing the summation using Equation 1.34,
NP?1summationdisplay
n=0
ej2pian/NP =
NP?1summationdisplay
n=0
parenleftBig
ej2pia/NP
parenrightBign
= 1?e
j2piaNP/NP
1?ej2pia/NP
= 1?e
j2pia
1?ej2pia/NP =
1?1
1?ej2pia/NP = 0
17
The summation of a geometric series works because ej2pia/NP negationslash= 1 when NP -a.
Lemma 1.3 (When the Complex Exponential Equals 1).
ej2pia/NP = 1 (1.36)
if and only if NP |a.
Proof. The forward proof is simple, as NP |a means that there exists an integer d such that
dNP = a. Substituting this value for a in Equation 1.36 yields:
ej2pidNP/NP = ej2pid. (1.37)
Applying Euler?s formula (Equation 1.31) to Equation 1.37
ej2pid = cos (2pid) +jsin (2pid) = 1 +j0 = 1 (1.38)
Now one must show that if Equation 1.36 holds then NP |a. Applying Euler?s formula to
Equation 1.36 yields
ej2pia/NP = cos
parenleftbigg2pia
NP
parenrightbigg
+jsin
parenleftbigg2pia
NP
parenrightbigg
= 1 (1.39)
The right hand side of Equation 1.39 only equals 1 when the cosine term equals 1 and the
sine term equals 0. Cosine only equals 1 when the argument is 2pin for n?Z. Then
2pin = 2piaN
P
(1.40)
a = NPn (1.41)
Then NP |a. Plugging the solved value of a into the sine argument gives 0 and the equation
holds. Therefore, Equation 1.36 is true if and only if NP |a.
18
Finally, the value of the summation for when F?k = 0 must be computed
NP?1summationdisplay
n=0
1
2je
j2pin(F?k)/NP = 1
2j
NP?1summationdisplay
n=0
ej2pin0/NP (1.42)
= 12j
NP?1summationdisplay
n=0
1 = NP2j (1.43)
Gathering together results of the analysis gives the final DFT result.
F{A}[k] =
?
????
???
???
????
NP
2j k = F
?NP2j k =?F
0 otherwise
(1.44)
The results of the DFT indicate that all the spectral energy generated by a DDFS assum-
ing no phase truncation and an ideal SCMF is located at a single frequency bin F, which
is precisely the FCW. This leads to what should perhaps be referred to as the fundamen-
tal equation for DDFS systems. Noting the relationship between the DFT and CTFT for
uniform sampling interval Tclk, an equation for the output frequency of the DDFS can be
formulated
f0 = FN
P
fclk (1.45)
where f0 is the frequency of the fundamental tone generated by the DDFS system.
1.3 Advantages of DDFS
As is the custom for Auburn University dissertations and publications concerning DDFS
devices, a brief explanation of the advantages of DDFS over traditional PLLs is supplied.
The benefits ultimately reduce to the benefits of operating on signals in the digital domain
over the analog domain:
? Straight-forward efficient implementation of complex modulation schemes.
19
? Direct manipulation of parameters that are often difficult to precisely control in the
analog equivalent circuit.
? Digital signals are not corrupted by noise as easily as their analog domain counterparts.
? Arithmetic operations are highly linear and tolerant to device mismatch.
In addition to these, several other benefits of DDFS already mentioned earlier in the
introduction are summarized.
? Very fast frequency switching, several orders of magnitude faster than that of a tradi-
tional PLL [7].
? Fine frequency resolution (see Equation 1.45).
? Broadband frequency synthesis, as the same device with a sufficiently fast input clock
cansynthesizesignalsontheorderofseveralKilohertztotheorderofseveralGigahertz.
1.3.1 Digital Phase Modulation
Phase modulation (PM) is the process of adding, subtracting, multiplying or otherwise
affecting the output of the phase accumulator. The phase signal after phase modulation is
Pm[n] = P[n] +M[n] or Pm[n] = M[n]P[n] (1.46)
where P[n] is the phase accumulator, M[n] is the modulation sequence and Pm[n] is the
resulting modulated phase word. Typically this is used to efficiently generate various phase
shift keying techniques such as Binary Phase Shift Keying (BPSK) and Quadrature Phase
Shift Keying (QPSK). By controlling the phase of the synthesized waveform, information
can be encoded in the synthesized signal for transmission over noisy mediums. Figures 1.4a
and 1.4b show the transient waveforms of continuous wave (CW) signals modulated with
BPSK sequences. This is not to be confused with frequency modulation, which is discussed
next in Section 1.3.2.
20
-1
-0.5
0
0.5
1
0 10 20 30 40 50
Amplitude
Time
Coded WaveCode Sequence
(a) BPSK for 1110010 Sequence
-1
-0.5
0
0.5
1
0 10 20 30 40 50
Amplitude
Time
Coded WaveCode Sequence
(b) BPSK for 1010101 Sequence
Figure 1.4: BPSK Waveforms
In the radar architecture described in Chapter 6, Barker codes are hardwired into the
PM circuitry for testing. Table 1.1 shows the list of Barker codes [8] built into the DDFS.
These and other more complex codes were implemented in the radar system for ?short-
range?, low-power pulse compression operating modes. The phase is flipped every k clock
Table 1.1: Built-in Barker Codes
Number Of Bits Code
3 110
5 11101
7 1110010
11 11100010010
13 1111100110101
cycles by 180 degrees if a binary ?1? is encountered and is not flipped if a ?0? is encountered.
The hardware implementation is as simple as toggling the MSB of the phase accumulator,
which only requires an XOR gate.
21
1.3.2 Digital Frequency Modulation
Linear frequency modulation (LFM), also known as chirp modulation, can be imple-
mented by adding a frequency accumulator before the phase accumulator. Figure 1.5 shows
a block diagram implementation of the modification that should be made to the accumulator
to allow for LFM. A full derivation of LFM is provided in Section 5.2.2, but in short, the
32
32
32
Frequency Accumulator
Phase Accumulator
Ph
a
se
R
egi
s
ter
F
r
equ
en
c
y
R
egi
s
ter
32
32
Figure 1.5: Simple Chirp Accumulator Diagram
frequency of the DDFS increments by F0 every clock cycle. While not shown explicitly in
the figure, the frequency register is typically initialized to some start frequency and other
control circuitry watches the frequency register value to issue a stop command. The chirp
rate is controlled with the same precision and agility as the frequency of a traditional phase
accumulator. Figure 1.6 shows a 10 ns duration chirp waveform with frequency accelerating
from around 500 MHz to 5 GHz.
The chirp waveform is well suited for radar application requiring pulse compression. In
Chapter6, LFMisimplementedinastretchprocessingpulsecompressionradar. Figure6.14a
shows several conceptual chirp waveforms in the explanation of stretch processing. The fine
control of the chirp rate allows the radar system to dynamically tailor its output waveform
based on the distance of the target under investigation.
22
-1
-0.5
0
0.5
1
0 2 4 6 8 10
Amplitude
Time (ns)
Figure 1.6: 10ns Chirp Waveform
1.3.3 Digital Amplitude Modulation
By placing a multiplier at the output of the SCMF, the amplitude of the output can
be digitally modulated. This operation is important for implementing quadrature amplitude
modulation (QAM) schemes such as QAM16 and QAM64, which are commonly used in
digital communications systems such as fiber and cable internet [18]. The operation is
performed in the digital domain, so implementing higher order (such as QAM 256) or more
complex modulation schemes is feasible without much added overhead to the DDFS system,
though the DAC and filter requirements become increasingly difficult.
In more complex systems, a finite impulse response (FIR) filter can be added to the
output of the SCMF. This filter can be used to pre-distort the signal before driving the DAC
to compensate for the non-idealities of the following analog circuitry, or of the non-linearity
of the DAC itself. In Section 6.5.1, the design of an inverse sinc filter is shown that is applied
to the radar system of Chapter 6. The inverse sinc FIR compensates for the zero-order hold
operation of a traditional current steering DAC.
23
1.3.4 Fine Frequency Resolution and Fast Switching
One of the most cited benefits to DDFS designs is fine frequency tuning [7]. Equa-
tion 1.45 provides the frequency of a DDFS given an FCW F. Informally, the resolution of
a DDFS device is the difference in synthesized frequency between two adjacent FCWs. This
can be calculated by setting F = 1
f0 = FN
P
fclk = fclkN
P
(1.47)
A 32-bit phase accumulator is a value commonly found in commercial DDFS parts [19],[20].
For a 1 GHz clock frequency, this results in a frequency resolution of 109/232 ?0.23 Hz. This
also implies that the LFM discussed in Section 1.3.2 is capable of generating remarkably
smooth chirp waveforms when given enough bits of resolution.
The DDFS can also rapidly switch between different output frequencies. This is of
critical importantance in spread spectrum applications, where the speed of the frequency
switching directly impacts the performance of the system. Using the DDFS as a local
oscillator allows one to quickly switch to a different band, as quickly as the analog filter
can respond to the changes. Combining the fine frequency resolution and fast frequency
switching makes for a versatile solution for demanding problems from a host of fields [19],
ranging from biomedical to military.
1.4 Summary of Contributions and Chapter Breakdown
This section summarizes the contributions of the author to the state of the art in the
analysis of phase truncation spurs and to DDFS design in general. In Chapter 3, a complete
approach for calculating the least period of the output sequence of a phase accumulator
with truncation is derived. This approach is completely general and does not depend on the
number of states in the phase accumulator to be a power of two. To the author?s knowledge,
there are no publications that perform this analysis. In Chapter 2, the most frequently cited
24
published analyses on phase truncation spurs are presented in chronological order. The
notation between the analysis is unified. Again, the author is not aware of any such analysis
available in published literature. By comparing the methodologies, it becomes clear what
can and what cannot be computed using each methodology and how strongly analyses are
dependent upon their predecessors? research.
In Chapter 4, an exact and fully general analysis of phase truncation spurs is developed.
The technique generalizes and builds upon Torosyan?s work on phase truncation spurs and
is not subject to the same limitations. Furthermore, the approach on computing the closed
form expression for phase truncation spurs is completely quantitative and does not depend
on any approximations or intuition to arrive at the results. This is to say that the analysis is
a direct computation of the discrete Fourier transform on the output of a DCDO. Here it is
further shown how the spectrum of the quantized amplitude waveform is mostly independent
of the phase truncation spurs. Application of the theory is applied to the author?s previous
published design [21] with suggestions for improvements. In Chapter 5, a novel approach to
parallelizing a phase accumulator with frequency modulation is presented. It covers several
patents on the topic, as there is not much in the way of academic publications.
In Chapter 6, the DDFS designs at Auburn University and their application in a simple
single-chip radar system is explored [22]. The design attempted the challenging task of
co-locating a radar transmitter and receiver onto the same silicon substrate. A quadrature
DDFSgeneratedtheradarwaveformandpulsecompressionsequence. Lastly, inChapter7, a
surveyofliteraturehighlightsmanyoftheproblemsintheDACdesignsatAuburnUniversity
and offers a suggestions for improving DAC designs based on observations made during the
survey. Some of the analysis and observations in the chapter have not been published to the
author?s knowledge.
25
Chapter 2
Background of Phase Truncation Analysis
In this chapter, an overview of the literature surrounding the analysis of the perfor-
mance of DDFS devices is presented. Despite widely cited publications by Nicholas [23],
Jenq [24] and Torosyan [10] on the analysis of phase truncation spurs in DDFS, papers are
still published (or submitted for publishing) with either old approximations of the spurious
behavior for which concise closed-form equations exist, or, worse yet, incorrect reasoning
related to the location, magnitude and optimization of such spurs.
Oftentimes early, widely cited papers continue to propagate while newer analyses go
unnoticed. Two authors, Jenq and Torosyan, through two separate mathematical techniques,
provide a closed-form solution to the spurs generated from phase truncation. Jenq?s analysis
can be elegantly used to compute the signal-to-noise ratio in the presence of phase truncation
but does not compute the location of the spurs. Torosyan goes a step further, presenting an
elegantalgorithmforefficientlycalculatingall ofthespursthatresultsfromphasetruncation
in the order of magnitude.
An attempt is also made in this chapter to unify several of the previous published
techniques by using a consistent notation between techniques. The publications on phase
truncation errors span nearly three decades and use various mathematical approaches for
deriving the spurious content. The techniques are presented chronologically from publication
date.
2.1 Mehrgardt?s Analysis (1983)
Mehrgardt [25] attempts to explain the non-intuitive spectral output of a signal gener-
ated from a finite length sinusoidal lookup table. The critical observation in the analysis is
26
that the phase word can be written as the sum of two sawtooth waveforms, the truncated
phase word and kept phase word.
??[n] = ?[n]???[n] (2.1)
where ??[n] is the phase word after truncation, ?[n] is given by Equation 1.22 and ??[n] is
the value of the bits truncated after mapping. The truncated phase word drives anNQ entry
sinusoidal lookup table with values
A[n] = sin
parenleftBigg 2pi
NQn
parenrightBigg
,n?{0,...,NQ?1} (2.2)
where NQ = NP/NE as in Chapter 3. Note that no amplitude truncation is applied to
the table values, meaning the analysis uses an ideal SCMF. The spurious response will be
generated completely from phase truncation.
Mehrgardt decides to tackle the problem by considering an analogous system in the
continuous time domain. Consider the function below
S(t) = sin
parenleftbigg 2pi
NP [NPft?NExsw(t)]
parenrightbigg
= sin
parenleftBigg
2pift? 2piN
Q
xsw(t)
parenrightBigg
(2.3)
where xsw(t) is a sawtooth waveform of amplitude 1 and frequency NQf where f = FN
P
fclk
is the desired frequency of the synthesized tone. NE is the number of error states and NP
is the number of phase states, as discussed in Chapter 3. The sawtooth waveform can be
represented mathematically as
xsw (t) = NQft?floorleftNQftfloorright (2.4)
27
wherefloorleft?floorrightis the real domain truncation operation that maps an real number r to the nearest
integer that has a value less thanr. That this is a reasonable approximation for the behavior
of the phase accumulator can be shown without too much analysis. As an example, letTclk =
10?9 s, F = 3, NP = 27 and NE = 25. Figure 2.1 is a plot of the phase accumulator values
along with the continuous time approximation of the phase from Equation 2.3. The phase
accumulator overflows approximately every TclkNP/F seconds or at a rate of F/(TclkNP).
Notice the actual digital error values look like samples of the continuous time function used
0
20
40
60
80
100
120
140
0 20 40 60 80 100 120 140
Phase
Time (ns)
Full PhaseTruncated Error
Sawtooth Approx.
Figure 2.1: Sawtooth Approximation
in the analysis.
Applying the difference trigonometric identity for sine, Equation 2.3 can be rewritten
as
S(t)?sin (2pift) cos
parenleftBigg 2pi
NQxsw(t)
parenrightBigg
?cos (2pift) sin
parenleftBigg 2pi
NQxsw(t)
parenrightBigg
(2.5)
Now in general, NE lessmuchNP making NQ large and therefore the small angle approximation
for sine and cosine can be applied on Equation 2.5.
S(t)?sin (2pift)?cos (2pift)
bracketleftBigg 2pi
NQxsw(t)
bracketrightBigg
(2.6)
28
Since xsw(t) is periodic, the Fourier series can be computed but it must be noted that
xsw(t) is not a Dirichlet sawtooth because it does not take the value of the mid-point at
discontinuities. Consequently,theFourierseriesdoesnotactuallyexistforxsw(t) asdescribed
by Equation 2.4. This oversight is corrected in Nicholas?s analysis by adding a periodic pulse
trainthatforcesthesawtoothtotakethevalueofthemidpointatdiscontinuities. Regardless,
xsw(t) is almost a Dirichlet sawtooth and thus the Fourier series of the sawtooth is used to
represent it. First, the definition of the Fourier series is presented.
Definition 2.1 (Fourier Series of Real-Valued Function). The Fourier series of a real valued
function is defined as
Fs{f(x)}=
?summationdisplay
n=0
[an cos (nx) +bn sin (nx)] (2.7)
where the coefficients of Equation 2.7 are computed using the inner product described below
an = 1pi
integraldisplay pi
?pi
f (x) cos (nx)dx, n?0 (2.8)
bn = 1pi
integraldisplay pi
?pi
f (x) sin (nx)dx, n?0 (2.9)
Without performing the full calculation, the Fourier series of a sawtooth waveform is
xsw(t)?
?summationdisplay
k=1
(?1)ksin (2pikNQft)pik (2.10)
Equation 2.10 is an approximation because the constant component has been ignored, as
it does not contribute to the spurious response of the waveform. Substituting the Fourier
series ofxsw(t) back into Equation 2.6, the derivation of the spectrum of the continuous time
analogy is complete. Note that the product to sum identity was used in the derivation.
S(t)?sin (2pift)?
?summationdisplay
k=1
(?1)k
NQk [sin (2pi(kNQ + 1)ft) + sin (2pi(kNQ?1)ft)] (2.11)
29
At this point, S(t) must be sampled to get back to the discrete time DCDO case.
Mehrgardt does this is stages, starting with sawtooth waveform. The sampling process is
executed by replacing f with its discrete value and substituting n/fclk for t. Applying the
sampling process to xsw(t) first, the following equation is derived.
xsw[n] =
?summationdisplay
k=1
(?1)ksin
parenleftBig
2pikNQ
parenleftBig F
NPfclk
parenrightBigparenleftBig n
fclk
parenrightBigparenrightBig
pik (2.12)
=
?summationdisplay
k=1
(?1)ksin (2pikFn/NE)pik (2.13)
Now F and NE can be reduced by removing common factors, using the modular arithmetic
of the previous chapter such that F/NE = ?E/?E (Lemma 3.2). If following along using
Mehrgardt?s publication, one will notice the analysis in this work has diverged. In particular,
he makes the statement that the finite precision frequency can be written as 2pia/b where a
and b are not defined but exist and places the final results in these terms. In this analysis,
the introduction of the extra terms are not necessary because the meaning of the symbols
has been carefully tracked. Note that the sinusoid in the sawtooth equation is periodic in k
with ?E. After several summations and trigonometric identities, which are left as an exercise
to the reader in the original publication from Mehrgardt and so will also be done so here, a
final result is achieved,
S[n] = sin
parenleftbigg
2pi FN
P
n
parenrightbigg
? piN
Q
( ?NP?1)/2summationdisplay
m=1
?km?
braceleftBigg
sin
parenleftBigg
2pi
parenleftBigg m
?NP +
F
NP
parenrightBigg
n
parenrightBigg
+ sin
parenleftBigg
2pi
parenleftBigg m
?NP ?
F
NP
parenrightBigg
n
parenrightBiggbracerightBigg
(2.14)
where km is calculated through the following relation
km =?mkprime? ?NP (2.15)
30
and where kprime is the solution for k in the linear congruence relation
angbracketleftBig
k ?NQF
angbracketrightBig
?NP = 1 (2.16)
Lastly, the coefficients are of the form
?k =
??
??
???
(?1)k
?NP sin(pik/ ?NP) for ?NP odd
(?1)k
?NP tan(pik/ ?NP) for ?NP even
(2.17)
The result is never related back to the FCW to characterize the behavior of the DCDO with
phase truncation or calculate the SFDR or SNR. The result here shows the relationship of
the FCW and spurs because of the changes made in symbolic notation in the derivation.
Only qualitative observations such as the number of spurs, the magnitude of spurs and that
such spurs should be expected are presented.
As a summary, this technique provides:
? the number of phase truncation spurs and
? the magnitude of the phase truncation spurs.
ItmissessomespursbecauseofitsfailuretouseaDirichletsawtoothwaveforminitsanalysis.
The author makes the observation of the discrepancy between his analysis and simulation in
the later paragraphs of his paper.
2.2 Nicholas?s Analysis (1985)
H. T. Nicholas provides [23] one of the most well-known analyses of phase truncation
spurs. The analysis provides equations for the location, phase and amplitude of the spurs
generated through phase truncation. The general idea is similar Mehrgardt?s analysis de-
scribed in Section 2.1, in that the phase error is thought of as a sawtooth waveform. Nicholas
takes a more formal mathematical approach and finds several clever trigonometric reduction
31
techniques to bring about a final result that is concise, accurate and efficient in implemen-
tation. The results are summarized by the following theorems which are presented without
proofs.
To summarize the steps taken by Nicholas:
1. Find the analogous continuous time representation of the phase and the truncation
error sequence. This takes the form of a pulse train
pe(t) = NE?
E
?summationdisplay
k=1
?E
pik sin
parenleftBiggpik
?E
parenrightBigg
cos
parenleftbigg
2pik FN
E
t
parenrightbigg
+ 12 (2.18)
and a sawtooth that meets Dirichlet conditions
xsw(t) =
?summationdisplay
k=1
NE
pik sin
parenleftbigg
2pik FN
E
t
parenrightbigg
+ NE2 (2.19)
Note that these waveforms are slightly different than those shown in the publication.
If plotting the waveforms exactly from the publication, one will not get the correct
error sequence. This is because Nicholas removed the DC term from both the pulse
train, which is mentioned in the publication, and the sawtooth waveform. However,
in the plots in the publication, the DC term is added back.
2. Sample the continuous time representation by the DDFS sample rate, which is per-
formed in the same manner described in Section 2.1:
pe[n] = NE?
E
?summationdisplay
k=1
?E
pik sin
parenleftBiggpik
?E
parenrightBigg
cos
parenleftbigg
2pik FN
E
n
parenrightbigg
+ 12 (2.20)
and the Dirichlet sawtooth waveform
xsw[n] =
?summationdisplay
k=1
NE
pik sin
parenleftbigg
2pik FN
E
n
parenrightbigg
+ NE2 (2.21)
32
The truncation error sequence can then be reconstructed by subtracting the pulse train
from the sawtooth waveform
ep[n] = xsw[n]?pe[n] (2.22)
Figure 2.2a shows the sawtooth waveform and pulse train waveform for NE = 64
and F = 3. Figure 2.2b shows the subtraction of the pulse train from the sawtooth
waveform, yielding the the very familiar phase truncation error sequence plotted in
Figure 2.1.
(a) Nicholas? Sawtooth and Pulse Train
-20
0
20
40
60
80
0 20 40 60 80 100 120
n
SawtoothPulse Train
(b) Nicholas? Phase Error Sequence
0
10
20
30
40
50
60
70
0 20 40 60 80 100 120
n
Figure 2.2: Error Sequence Waveform Components
3. Perform an spectacularly complex set of trigonometric manipulations found in mathe-
matics texts dedicated to the task, Nicholas arrives at his final error sequence equation
ep[n] = ?NE?
E
?E/2summationdisplay
k=1
bracketleftBigg
cot
parenleftBiggpik
?E
parenrightBigg
sin
parenleftbigg
2pik FN
E
n
parenrightbigg
?cos
parenleftbigg
2pik FN
E
n
parenrightbiggbracketrightBigg
(2.23)
4. Use number theory to relate the spurs in k to the frequency control word.
33
5. Plugtheerrorsequenceintotheoriginalsineexpressionandusethesmallangleapprox-
imation. This is where Nicholas? approach becomes an approximation to the spurious
behavior (albeit a very good one).
6. Perform an enormous amount of trigonometric manipulations and number to theory
to arrive at the final answer.
The results from Nicholas? work are summarized in the following set of theorems. The
number of spurs due to phase truncation is provided in Theorem 2.1
Theorem 2.1 (Nicholas Number of Spurs). An accumulator withBP bits whose leastBT bits
are truncated has (?E/2?1) spurs. Or written in notation used previously in the document,
an accumulator with NP phase states and NE error states such that NE|NP has (?E/2?1)
spurs.
The mapping from the ? (spur index) value to the frequency control word is provided
by Theorem 2.2.
Theorem 2.2 (Nicholas Spur Index). The spur index, ?, for the spur located at the DFT
frequency bin k is, if 2|(k??E/2)
? =
angbracketleftBiggk??
E
2BP?BT ?
(?E/2?1)
E
angbracketrightBigg
?E
(2.24)
where ?E is the reduced frequency control word for the error sequence (Lemma 3.3)
?E = FGCD(F,N
E)
(2.25)
and NE = 2BT is the modulo number for the error sequence. If 2|(?k??E/2),
? =
angbracketleftBigg?k??
E
2BP?BT ?
(?E/2?1)
E
angbracketrightBigg
?E
(2.26)
The magnitude of the spurs given a spur index is provided by Theorem 2.3.
34
Theorem 2.3 (Nicholas Spur Magnitude). The magnitude of the spur at spur index ? is
m? = pi2
BT?BP
?E cosec
parenleftbigg?pi
?E
parenrightbigg
(2.27)
Theorem 2.4 (Nicholas Spur Phase). The phase of the spur at spur index ? is
?? =?cot
parenleftbigg?pi
?E
parenrightbigg
(2.28)
As mentioned in above, the analysis starts in a similar vein as Mehrgardt, but with one
critical difference in the specification of the error sawtooth waveform. Nicholas notes that
the Dirichlet conditions for the Fourier series of a function with discontinuities require that
the function take on the average value of the function at the point of discontinuity. He thus
adds a separate pulse train function to create a sawtooth error function that can be analyzed
by using the Fourier series. In Mehrgardt?s publication, he notes that there are extra terms
in the result that resulted from insufficiencies of the sawtooth approximation to the actual
error function behavior. Nicholas avoids those terms by creating a pair of continuous time
functions that have convergent Fourier Series.
To demonstrate the importance of this dissertation, outside of avoiding trigonometric
small-angle approximations, the final discrete time equation in Nicholas? analysis is
A[n] = sin
parenleftbigg
2pi FN
P
n
parenrightbigg
?
?E/2summationdisplay
k=1
parenleftBigg pi
?ENQ cosec
parenleftBiggkpi
?E
parenrightBiggparenrightBigg
?
parenleftBig
ej2piGCD(F,NE)(?E?k?ENQ)n/NP +e?j2piGCD(F,NE)(?E?k?ENQ)n/NP
parenrightBig
?
e?jcot(kpi/?E) (2.29)
Compare this equation with the closed form solution from this document in Chapter 4 under
Theorem 4.6. The small angle approximation prevents simplification to an elegant final form.
Lastly, the worst case spur magnitude is predicted by the following equation. This flows
from the observation that the spur magnitude is a strictly decreasing function of the spur
35
index. So selecting a spur index of 1 yields
mwc = 1N
Q
pi
?E
sin
parenleftBig pi
?E
parenrightBig (2.30)
2.3 Jenq?s Analysis (1988)
Y. C. Jenq published a series of papers on analyzing the spectrum of non-uniformly
sampled signals [26, 24, 27, 28]. Re-deriving the analysis as it pertains to modern DCDO?s
is worthwhile, as the notation used in [24] is decidedly different than that commonly used in
DDFS literature. There is one critical difference between Jenq?s analysis and the following
derivation: Jenq uses a continuous time domain representation for the output of the DCDO.
Using the notation for a uniformly sampled discrete time signal, the analysis is more easily
accessible to one familiar with digital signal processing (DSP) without having to concern
one?s self with integrals and the Dirac delta function.
Before presenting the theorem, the discrete time Fourier transform (DTFT) is given in
Equation 2.31.
X (?) =
?summationdisplay
n=??
x[n]e?j?n (2.31)
The DTFT can be derived from the continuous time Fourier transform, or just Fourier trans-
form, described in Section 7.2. The DTFT is used to compute the spectrum of waveforms
discretized in the time domain, which is precisely the types of waveforms considered in this
work.
Theorem 2.5 (Jenq?s Non-Uniform Sampling Theorem). The DTFT of a signal formed by
non-uniformly sampling a waveform g(t) that is band-limited between
parenleftBig?1
2T,
1
2T
parenrightBig
in such a way
that the overall sampling process repeats with period M?T is given by
G(?) = 1MT
M?1summationdisplay
m=0
?
?
?summationdisplay
k=??
G0
parenleftbigg
??k
parenleftbigg 2pi
MT
parenrightbiggparenrightbigg
ej2pi?ktm/(MT)
?
?ejm?T (2.32)
36
where tm is the uniform sampling rate of a sub-sequence of the sampled g(t) by taking every
Mth sample (it is uniform because the sampling process is periodic with MT), ? is the
angular frequency of the waveform in radians and Ga is the CTFT of g(t).
Thistheoremisnotdifficulttoshow. ThereasonthatJenqgetscitedinDDFSliterature
is an interesting observation made about the behavior of the truncated phase accumulator
output.
2.3.1 Jenq?s Observation
The key observation made by Jenq in his analysis (Section 2.3) is that in the presence
of phase truncation the phase code sent to the SCMF is not uniformly spaced. Effectively
this ?looks like? a non-uniform sampling operation on the generated sinusoid. But the phase
accumulator is of finite length, so regardless of phase truncation, it is periodic and therefore
the truncated phase word is also periodic. In Section 1.2, the periodicity of the phase
accumulator given a frequency control word F is given in Equation 3.12.
Now consider a four bit phase accumulator incremented by F = 3 that is truncated to
three bits and fed into an ideal SCMF (i.e. assume there is no quantization in the amplitude
value stored in the SCMF). Thus in the example, the number of bits truncated from the
phase word is BT = 1, the number of bits in the phase accumulator is BP = 4 and the
number of kept bits after truncation is BPT = 3. Table 2.1 shows the state of the phase
accumulator, the truncated phase word, and the phase step between adjacent phase words.
Note that the leftmost column of the table matches the predicted untruncated phase
sequence of Theorem 3.2 and also the second column shows the predicted truncated phase
sequence of Theorem 3.4. While the proofs should be sufficient, working through an example
provides a helpful check on the result and some intuition into the behavior of the devices
modeled by the mathematical analysis. There are a few observations that can be made from
the sequence.
37
Table 2.1: Table of Truncated Phase States (4-bit)
Phase Accumulator Truncated Phase Truncated Phase Step
0000 (00) 000 (0) -
0011 (03) 001 (1) 1
0110 (06) 011 (3) 2
1001 (09) 100 (4) 1
1100 (12) 110 (6) 2
1111 (15) 111 (7) 1
0010 (02) 001 (1) 2
0101 (05) 010 (2) 1
1000 (08) 100 (4) 2
1011 (11) 101 (5) 1
1110 (14) 111 (7) 2
0001 (01) 000 (0) 1
0100 (04) 010 (2) 2
0111 (07) 011 (3) 1
1010 (10) 101 (5) 2
1101 (13) 110 (6) 1
0000 (00) 000 (0) 2
? As is the case in Figure 3.1b, the phase accumulator is periodic with NP = 24 = 16
clock cycles.
? The truncated phase sequence is also periodic with the phase accumulator. While
already stated, working through a simple example helps visualize the periodicity.
? The truncated phase step is not uniform. It varies between the values of 1 and 2.
An interesting feature follows from this analysis. Since the truncated phase sequence is
periodic, the delta phase cycle is also periodic and the periodicity of the delta phase cycle is
the same as the periodicity of the phase error sequence from truncation. From Lemma 3.5,
if 2BT|NP, which is does in our example, then
?E = 2
BT
GCD(F,2BT ) =
2
GCD(1,2) = 2 (2.33)
38
2.3.2 Jenq?s Results
Jenq does not deal with spur locations, magnitudes or phases in any of his publications.
Instead the non-uniform sampling theorem is applied and then Parseval?s relation is used to
find a noise power boundary. This allows for a completely general calculation of the signal to
noise ratio due to phase truncation spurs. He describes the problem in a manner differently
than any of the authors, by thinking of the phase accumulator value as an integer value plus
a coprime rational number. Variables W, L and M are introduced to describe the output of
the phase accumulator, where L and M are coprime.
d = W +L/M (2.34)
Looking at Equation 3.38 the expression can be rewritten in the notation used in this work.
M is periodicity of the error sequence (?E), as multiplying d by M yields an integer value.
L is the reduced FCW over M, that is to say L = ?E. Applying the non-uniform sampling
theorem gives the an answer in the time domain.
G(?) = 2piT?
E
?summationdisplay
k=??
?
?
?E?1summationdisplay
m=0
e?j2pirmf0/fclke?j2pikm/?E
?
??
bracketleftbigg
???0?k
parenleftbigg 2pi
?ET
parenrightbiggbracketrightbigg
(2.35)
The amplitude of the spurs only considered at this point. Sampled the waveform yields
A(k) = 1?
E
?E?1summationdisplay
m=0
e?j2pi?m?E??E/(?ENQ)e?j2pimk/?E (2.36)
Definition 2.2 (Parseval?s Relation). Parseval?s relation for the sequence g[n] of length N
N?1summationdisplay
n=0
|g[n]|2 = 1N
N?1summationdisplay
k=0
|G[k]|2 (2.37)
39
Applying Parseval?s relationship yields the final result for the signal to noise ratio
SNR = 10 log10
bracketleftBigg |A(k)|2
1?|A(k)|2
bracketrightBigg
(2.38)
Several observations can be made about how the function increases and decreases with
various values of ?E and ?E. Using this information, the best and worst case signal to noise
ratio is computed. The worst case is calculated as
SNRwc = 10 log10
?
??
bracketleftBig
sin
parenleftBig pi
NQ
parenrightBigparenleftBigN
Q
pi
parenrightBigbracketrightBig2
1?
bracketleftBig
sin
parenleftBig pi
NQ
parenrightBigparenleftBigN
Q
pi
parenrightBigbracketrightBig2
?
?? (2.39)
and the best case is derived as
SNRbc = 20 log10
bracketleftBigg
cot
parenleftBigg pi
2NQ
parenrightBiggbracketrightBigg
(2.40)
For the details of the derivation, consider reading Jenq?s series of non-uniform sampling
papers.
2.4 Torosyan?s Analysis (2001)
Although an excellent analysis of the location and magnitude of phase truncation spurs
existed at the time of his publications, Arthur Torosyan and Alan Willson of UCLA provided
an exact, clear and practical means for an analytical understanding of phase truncation
spurs [29, 10]. Instead of working from analogous, time domain functions such as Nicholas
and Mehrgardt, Torosyan approaches the problem using elementary number theory. The
critical observation is that any two FCWs that generate periodic phase sequences of the same
length, the resulting sequences can be related through a simple rearrangement and that the
frequency responses of sequences related in this manner are also simple rearrangements of
each other. The exact same observation is actually made by Nicholas in [23] but is not used
in his analysis.
40
TheimplicationisenormousforDCDOanalysis,astheDFTneedonlyberunbynumber
of prime factors of NP times and all other frequency domain responses can be generated by
permuting the frequency response. Consider the case of a BP = 32 bit accumulator. If every
FCW should be tested, then the DFT would need to run NP = 232 = 4294967296 times on
sequences of at least half are periodic with NP. The current state of computer technology
simplycannothandlethecomputationalrequirementsofsuchataskinamanageableamount
of time (i.e. if each DFT of such a sequence took one second, 4294967296 s is equal to
approximately 136 years). It is great news then that mathematicians have shown that such
computations are not required to fully characterize DCDOs.
Much of the work in Chapter 4 will be used in this analysis, so reading that chapter
before continuing may be helpful. The analysis begins by showing that FCWs of the same
period generate sequences that are simple permutations of each other (Theorem 3.9). Then
it will be shown that the frequency domain representation of such sequences are also simple
permutations of each other (Theorem 4.8). Then the DFT of the ? = 1 case is computed
(Theorem 4.6 with ? = 1). Since the all sequences can be generated as permutations of
? = 1, the analysis is complete. To summarize the results:
1. Onlyonefrequencycontrolwordforeachpossiblephaseaccumulatorleastperiodneeds
to be considered. Let this FCW be chosen such that after reducing the word modulo
the number of phase accumulator states, the resulting number is 1 (i.e. ? = 1).
2. All other frequency control words for given least period are permutations of the previ-
ously described FCW.
3. The DFT happens to commute with a set of vectors for a given least period and
therefore the frequency spectrums are also permutations. This leads to the ?window
function? after simplifying the result.
41
4. An interesting simplification can be made when observing the reduced word 1, allowing
the grouping of repeating terms and resulting in a simple final expression for the phase
truncation spurs.
42
Chapter 3
Phase Accumulator Sequences from Number Theory
In this chapter, a complete analysis of the sequences generated by a phase accumulator
is developed from elementary number theory. The analysis begins by proving the well-known
untruncated phase accumulator state equation using basic modulo arithmetic (Section 3.1).
The number theoretic techniques may seem excessive at this point, but the motivation for
approaching the problem from a mathematical standpoint will eventually become apparent
in Chapter 4. Several concepts from number theory are presented to support the analysis
from the previous section and will be used in later proofs as the work progresses. After
determining the expected phase accumulator sequence, the periods of such sequences are
explored (Section 3.2). Several Greek letters are given special meaning, such as the reduced
FCW ? and the least period length ?.
The effects of truncation on the phase accumulator sequence are explored in Section 3.3.
The tools developed for deriving the period of an untruncated phase sequence are applied
to the truncation problem resulting in a general theory for the periodicity of both the trun-
cated phase sequence and the error sequence. Developing a pure mathematical framework
for analyzing the phase accumulator prevents common mistakes in calculating the periodic-
ity of DDFS waveforms. The relationship between both untruncated and truncated phase
sequences of different FCWs is established in Section 3.4. Lastly, some comments on abstract
algebraic structures that can be used to fully describe the operation of the phase accumulator
are presented in Section 3.5.
43
3.1 Phase Accumulator Sequence
The state of the phase accumulator at clock cycle n can be written as a function of the
FCW and the initial phase,
P[n] =?nF +P0?NP (3.1)
whereP0 is the initial state of the phase accumulator, F is the FCW and?a?m represents the
smallest non-negative integer remainder when dividing a by m (sometimes called the least
residue). This is commonly referred to as the modulo m operator on the integer a. The
modulo operation is clearly intimately related to integer division by definition. The division
algorithm for positive integers is stated below [30]:
Theorem 3.1 (The Division Algorithm). Given non-negative integers a and b, bnegationslash= 0, there
exist unique integers q and r, with 0?r<b such that
a = bq +r (3.2)
Proof. Let a ? P0 and b ? P. Let S = {a?bn : n?Z and a?nb?0}. That S is non-
empty can be shown by setting n = 0 since a?0 by selection. S has a least element by the
well-ordering principle (Principle 1.2). Let r = a?bq be least element of S where q?P is a
selection of n that yields the least element of S. r?0 from the definition of S. That r<b
comes from the selection of the least element. If r?b, then a?b(q + 1)?0 and would be
less that a?bq, meaning that the least element of S was not selected. So 0 ?r < b and
a = bq +r.
44
To show that q and r are unique, assume a = bq0 +r0, 0?r0 <b. Setting the equations
equal
bq +r = bq0 +r0
b(q?q0) = r0?r (3.3)
Then b|(r0?r) which implies that c0b = r0?r, c0 ?Z. Since 0?r0 <b and 0?r<b, it
follows that?b<r0?r<b. From the original selection bnegationslash= 0, and the only integer choice
ofc0 that produces a value between?bandbis 0. Therefore 0 = r0?r andr0 = r. Plugging
back into Equation 3.3, b(q?q0) = 0 and since bnegationslash= 0, q = q0.
In summary,?a?m returns the unique r of the previous stated division algorithm. More
formally, the?a?m operator yields the smallest positive integer r such that a?r(modm),
or a congruent to r modulo b [30]. The latter expression forms a binary relationship on the
integers described by the definition below:
Definition 3.1 (Congruence). a is congruent to b modulo m (i.e. a?b(modm)) if and
only if m|(a?b)
In much of this work, the initial phase is assumed to be zero as the results from analysis
willfollowwithoutlossofgenerality. AnyanalysisdependentonthevalueofP0 willexplicitly
allude to that requirement. Plotting Equation 3.1 for several clock cycles provides a clear
visual explanation of the behavior of the phase accumulator and FCW. Figure 3.1a shows
the progression of states versus clock cycle in a 4-bit phase accumulator with F = 1 and
Figure 3.1b shows the progression of states for F = 3. Looking back at the phase circle of
Figure 1.3a, one can visualize a higher FCW value ?jumping over more circles? at each step
(speaking in terms of the graphic).
The author is unaware of any explanation given as to why Equation 3.1 fully describes
the phase state at clock cycle n. Clearly the phase accumulator has a finite number of states
(NP), so adding any F to P[n], regardless of its value, must map back to a value in the
45
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
Phase
State
Clock Cycle
(a) 4-Bit Phase Accumulator State (F = 1)
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
Phase
State
Clock Cycle
(b) 4-Bit Phase Accumulator State (F = 3)
Figure 3.1: Phase Accumulator State Plots
set ZNP. Certainly the equation holds for Figure 3.1a and Figure 3.1b as well. But the
reason the equation should hold for all F and an accumulator of any BP is not trivially
apparent (at least to the author). All of the phase truncation analyses in literature make
use of Equation 3.1, which is not surprising since it describes the fundamental behavior of
the phase accumulator.
Theorem 3.2 (Phase Accumulator Sequence). The non-negative integer value of a phase
accumulator with NP states at clock cycle n with frequency control word F and initial phase
P0 is
P[n] =?nF +P0?NP (3.4)
Proof. Recall the proof that any non-negative integer can be represented by an unsigned
binary number (Section 1.1.2). From the description of the phase accumulator and FCW,
the repeated addition operation must also yield a non-negative number. So nF + P0 is a
non-negative number that can be represented by an unsigned binary number.
nF +P0 =
?summationdisplay
n=0
2ibi (3.5)
46
But the phase accumulator has a finite number of bits BP, by the description of the accu-
mulation operation itself, the least BP bits are kept, so
nF +P0 =
?summationdisplay
n=BP
2ibi +
BP?1summationdisplay
n=0
2ibi
=
parenleftBig
2BPbBP + 2BP+1bBP+1 +???
parenrightBig
+
BP?1summationdisplay
n=0
2ibi
= 2BP
parenleftBig
bBP + 21bBP+1 +???
parenrightBig
+
BP?1summationdisplay
n=0
2ibi (3.6)
Subtracting the second summation term from both sides of the previous equation,
(nF +P0)?
BP?1summationdisplay
n=0
2ibi = 2BP
parenleftBig
bBP + 21bBP+1 +???
parenrightBig
(3.7)
Itisclearthat 2BP dividestheexpressionabove, andwecansayfromthepreviousdescription
of congruences that
(nF +P0)?
?
?
BP?1summationdisplay
n=0
2ibi
?
?parenleftBigmod 2BPparenrightBig (3.8)
Now to arrive at Equation 3.1, we must show that summationtextBP?1n=0 2ibi is the smallest non-negative
integer such that the congruence relationship above holds (additionally we should like to
show that it is unique). From the definition of the modulo operator, this is equivalent to
showing that the residue, r, when dividing nF + P0 by 2BP is equal to summationtextBP?1n=0 2ibi and that
r< 2BP.
nF +P0 = q
parenleftBig
2BP
parenrightBig
+r (3.9)
47
Note that using Equation 3.6 immediately provides a potential r and q for the division
algorithm:
r =
BP?1summationdisplay
n=0
2ibi (3.10)
q = bBP + 21bBP+1 +??? (3.11)
From Equation 1.10, we know that the maximum value of the r term is 2BP ?1, so r< 2BP.
When storing nF +P0 in the finite bit phase accumulator, all the bits greater than or equal
toBP are discarded, or equivalentlybi = 0 fori?BP and thereforeq = 0. From the division
algorithm, we know that q and r must be unique. So r is the smallest non-negative integer
of nF +P0 divided by NP and thus r =?nF +P0?NP by definition.
Phase accumulators can also, with additional hardware, directly compute a number
modulo any integer value of NP. In fact, Jenq uses NP = 1000 in his DDFS analysis [24], a
number which neither Nicholas [23] or Torosyan [29] can accurately predict spurious phase
truncation behavior from.
3.2 Phase Accumulator Period
Looking carefully at Figure 3.1b, an interesting observation about the periodicity of a
phase accumulator is readily apparent. Though the phase accumulator overflows three times
when F = 3, it does not return to its original starting state until the 16th clock cycle. That
is to say, the phase accumulator sequence for bothF = 1 andF = 3 is periodic with 16 clock
cycles. The question then arises as to when the exact same phase accumulator state will
repeat for an arbitrarily chosen FCW. Every explanation of the origin and nature of phase
truncation spurs uses this information in its formulation. Equation 3.12 is the periodicity of
the phase accumulator state.
?P , NPGCD(F,N
P)
(3.12)
48
where GCD(a,b) is notation for the greatest common divisor (GCD) of a and b. ?P is the
length of the sequence generated by Equation 3.1 before the phase state first repeats, or the
smallest positive integer ?P such that P[n+?P] = P[n] for all n?Z. The greatest common
divisor is formally defined below [15] and informally as the largest integer that divides both
a and b where both a and b cannot simultaneously be zero.
Definition 3.2 (Greatest Common Divisor). Given a,b?Z, and both a and b not simulta-
neously zero, GCD(a,b) = d if and only if
1. d|a and d|b, and
2. If c?Z and if c|a and c|b, then c|d.
One case, GCD(a,b) = 1, comes up frequently in developing the theory of linear con-
gruences, and so a name is given to two such integers.
Definition 3.3 (Relatively Prime). Two integers a and b are said to be relatively prime, or
coprime, if GCD(a,b) = 1.
Let us now consider the origin of Equation 3.12. The author, again, is not aware of any
derivation for this expression in the case of phase accumulators. We begin by introducing a
few theorems that flow from our earlier definitions.
Lemma 3.1 (GCD Divisibility). If GCD(a,b) = d, then
GCD
parenleftBigga
d,
b
d
parenrightBigg
= 1 (3.13)
Proof. Let GCD(a,b) = d. Assume that
GCD
parenleftBigga
d,
b
d
parenrightBigg
= k, k> 1. (3.14)
49
Then by Definition 3.2, k| ad and k| bd. By Definition 1.1, there exists c0,c1 ?Z such that
c0k = ad?c0kd = a (3.15)
c1k = bd?c1kd = b (3.16)
Clearly then kd|a and kd|b, but since k > 1, kd > d making it a common divisor larger
than d, which leads to a contradiction since d by choice is the greatest common divisor of
a and b. Assuming k < 1 and not equal to zero also leads to a contradiction in a similar
manner. Therefore k = 1.
Now we show another helpful theorem [30] that will prove useful in the derivation of the
phase accumulator least period.
Lemma 3.2 (Linear Modulo Normalization). If ac?bc(modm) and GCD(c,m) = d, then
a?b
parenleftBig
mod md
parenrightBig
.
Proof. Let ac?bc(modm) and GCD(c,m) = d. From the definition of congruence given
above, m|(ac?bc)?m|c(a?b). Since d divides c and m (from the definition of GCD),
m
d |
c
d (a?b). Now from Lemma 3.1, we know that GCD
parenleftBigc
d,
m
d
parenrightBig
= 1. Since md - cd, md |(a?b).
From Definition 3.1,
a?b
parenleftbigg
mod md
parenrightbigg
(3.17)
Here we note that nothing in the theorem or proof prevents d = 1, so it immediately
follows that
ac?bc(modm)?a?b(modm) (3.18)
if GCD(c,m) = 1.
AstheDDFSisdesignedtogeneratespectrallypuresignals, itisofcriticalimportanceto
determine the periods of the sequences generated by the DCDO. Unwanted periodic behavior
50
generates deterministic spurs and will become a major topic in the discussions of Chapter 4.
Now consider, yet again, the fundamental phase accumulator expression Equation 3.1. As
an example of the case when F = 1, the the sequence generated is
P = [0,1,2,...,NP?1,0,1,2,...] (3.19)
Clearly then the length of the period for F = 1 is NP. Now, in general, Equation 3.12 can
be shown to be true for arbitrary F and P0.
Theorem 3.3 (Phase Accumulator Periodicity). The least period of the sequence generated
by applying the FCW F to a phase accumulator with NP states without phase truncation is
?P , NPGCD(F,N
P)
(3.20)
Proof. Recall that the state of the phase accumulator at clock cycle n is
P[n] =?Fn+P0?NP (3.21)
If GCD(F,NP) = 1, then the period of the generated sequence is NP and this can be shown
by noting that forn = 0 thatP[0] = P0 and findingk> 0 such thatP[k] = P0. Equivalently,
P[k] = P[0] =?P0?NP, which means that
kF +P0 ?P0 (modNP)?NP |(kF +P0?P0)?NP |kF (3.22)
But NP - F since GCD(F,NP) = 1 and therefore NP |k and an integer d exists such that
dNP = k. The simplification was made using Lemma 3.4. Then we wish to find the smallest
51
positive integer d such dNP = k is true. Let d = 1, then k = NP.
P[NP] =?NPF +P0?NP
=
angbracketleftBig
?NPF?NP +?P0?NP
angbracketrightBig
NP
=?P0?NP (3.23)
so d = 1 satisfies the equality and the period of the sequence for GCD(F,NP) = 1 is NP.
Now consider F such that GCD(F,NP) = d. We are again looking for k such that
[kF +P0]?P0 (modNP)?nF ?0 (modNP) (3.24)
as P[0] = P0 still for this F. From Lemma 3.2 we know that
nF ?0 (modNP)?nFd ?0
parenleftbigg
mod NPd
parenrightbigg
(3.25)
From Lemma 3.1, GCD
parenleftBigF
d,
NP
d
parenrightBig
= 1 and from our previous analysis for the periodicity of
the sequence for relatively prime F and NP, it follows that the period is NPd . So we have
shown the origin of Equation 3.12 and proved it to be true for all F.
It is interesting to note that typically NP is a power of 2, so GCD(F,NP) will be a
power of 2 if F is not relatively prime to NP. Thus looking at the position of first ?1? in
the bit string counting from the LSB of F gives the period of the sequence generated by
the phase accumulator. Consider an 4-bit accumulator as a simple example. If F = 6, then
the binary representation of F = 0110. Noting that the first 1 appears in the second bit
position, meaning that the largest power of 2 divisor of F is 21 and the period of the phase
accumulator is 24/21 = 23 = 8.
52
3.3 Truncated Phase Sequences
Applying Lemma 3.2 to Theorem 3.2, it is possible to write the phase accumulator
sequence Equation 3.4 in a ?reduced? form. The reduced form is useful when deriving
techniques for minimizing the number of required computations for a DFT of a sequence.
Lemma 3.3 (Alternative Phase Accumulator Expression). The phase accumulator Equa-
tion 3.4 with NP states and FCW F can be written as:
P[n] = d??Pn??P (3.26)
whered = GCD(F,NP), ?P = F/dand ?P is the least period of?Fn?NP and can be calculated
from Theorem 3.3.
Proof. Letd = GCD(F,NP) and?Fn?NP = r0. Then by Definition 3.2,d|F andd|NP and
there exist integers ?P = F/d and ?P = NP/d. By the definition of the modulo operator,
there exists c0 ?Z such that
?Fn?NP = r0 ?Fn?c0NP = r0 (3.27)
?d?Pn?c0d?P = r0 (3.28)
??Pn?c0?P = r0d (3.29)
where 0?r0 <NP. Note that 0? r0d < NPd must also be true and therefore??Pn??P = r0d .
Multiplying both sides by d yields d??Pn??P = r0 =?Fn?NP.
The next theorem provides a method for manipulating the sum of multiple integers
modulo N. The technique is powerful at reducing the complexity of the summation in
modulo arithmetic. As will be demonstrated soon, the truncation operation can be written
concisely as the difference of two modulo sequences.
53
Lemma 3.4 (Sum of Two Integers Modulo N).
?a+b?N =??a?N +?b?N?N (3.30)
Proof. From the definition of the modulo operator, the left-hand side of Equation 3.30 can
be written as
r0 =?a+b?N = (a+b)?c0N (3.31)
where 0?r0 <N and c0 ?Z. The right-hand side of Equation 3.30 can be written as
ra =?a?N = a?c1N
rb =?b?N = b?c2N
r1 = (a?c1N) + (b?c2N)?c3N
= (a+b)?(c1 +c2 +c3)N (3.32)
where 0?r1 <N and c1,c2,c3 ?Z. From the division algorithm (Theorem 3.1), we know
that for integer (a+b) and N there exists a unique q and r such that
(a+b) = Nq +r, 0?r<N (3.33)
As already stated, 0?r0 <N and 0?r1 <N. Now from the previous equations, it is clear
that
(a+b) = c0N +r0 (3.34)
(a+b) = (c1 +c2 +c3)N +r1 (3.35)
54
By the definition of uniqueness through application of the division algorithm,c0 = c1+c2+c3
and r0 = r1. Therefore
?a+b?N =??a?N +?b?N?N (3.36)
and the proof is complete.
The size of the SCMF is exponentially related, in the case of a ROM, to the number of
bits of the phase accumulator used in addressing. The frequency resolution is dependent on
the number of bits in the phase accumulator (Section 1.3.4). This creates a design trade-
off decision point, where the area and speed of the ROM is juxtaposed with the frequency
resolution of the DDFS. For this reason, the phase word from the phase accumulator is
truncated so that a smaller address is required. The study of DDFS devices then requires
the study of truncated phase sequences. Theorem 3.4 provides a mathematical expression
for the truncated phase sequence. But first truncation should be clearly defined.
Definition 3.4 (Truncation). Truncation is defined as ?ignoring? the least significant BT
digits of a number. The sequence generated by truncating the least BT-digits of an NP state
accumulator driven by FCW F is given by
Ptrunc[n] =?Fn?NP ?
angbracketleftBig
?Fn?NP
angbracketrightBig
NE (3.37)
where NE = rBT and r is the radix of the number system.
The truncated phase sequence is introduced as a definition in the general form above
instead of a derived theorem to allow for truncation in non-radix 2 number systems. The
purpose of truncation is to allow fewer digits to represent a number at the expense of some
amount of error, which is denoted truncation error. Theorem 3.4 shows the reduced sequence
generated through truncation in the phase accumulator. Intuitively for hardware engineers,
one can view phase truncation of the accumulator as shifting the least BT bits ?right? (in
the direction of the LSB) while simultaneously shifting zeros in at the MSB position.
55
Theorem3.4 (TruncatedPhaseSequence). NE divides every element of the sequence gener-
ated by Equation 3.37, such that the truncated phase sequence can be written in a normalized
form,
PT[n] = 1N
E
bracketleftbigg
?Fn?NP ?
angbracketleftBig
?Fn?NP
angbracketrightBig
NE
bracketrightbigg
(3.38)
Proof. Consider the truncated sequence defined in Definition 3.4,
Ptrunc =?Fn?NP ?
angbracketleftBig
?Fn?NP
angbracketrightBig
NE (3.39)
From the definition of the modulo operator,
?Fn?NP = Fn?c0NP (3.40)
angbracketleftBig
?Fn?NP
angbracketrightBig
NE =?Fn?NP ?c1NE (3.41)
where c0,c1 ?Z for some fixed value of n (though such integer pairs exist for every value of
n). Substituting Equations 3.40 and 3.41 into Equation 3.39,
(Fn?c0NP)?[(Fn?c0NP)?c1NE] = c1NE (3.42)
Clearly NE|c1NE and Equation 3.38 holds.
Equation 3.43 shows the behavior in binary number systems, which better reflects hard-
ware implementations. Setting r = 2, the number of error states for BT digits (bits) of
truncation is NE = 2BT. Applying Theorem 3.4 with these values,
PT[n] = 12B
T
bracketleftBig
?Fn?NP ?
angbracketleftBig
?Fn?NP
angbracketrightBig
2BT
bracketrightBig
(3.43)
A further simplification can be made when the number of accumulator states is a power of
two, whichisthecasewhenthecarry-outoftheaccumulatorissimplyignoredonanoverflow.
This type of phase accumulator is the focus of the analyses of Nicholas and Torosyan and
56
their results are in fact dependent on the special case. Assuming that NP = 2BP where BP
is the number of bits in the accumulator,
PT[n] = 12B
T
[?Fn?2BP ??Fn?2BT ] (3.44)
The reduction is made through application of Lemma 3.5 on the 2BT |2BP case. The theorem
immediately follows and will be used to derive the least period of the truncation error.
Lemma 3.5 (Least Period of the Modulo of a Modulo Sequence). The sequence
f[n] =??Fn?N?M (3.45)
has least period ?N if M -N or period ?M if M|N.
Proof. First consider M | N, then there exists an integer d such that dM = N (from
Definition 1.1). From the definition of the modulo operator,
?Fn?N ?Fn?c0N = r0 (3.46)
where c0 ?Z and 0?r0 < 0. Applying the second modulo operation,
?Fn?c0N?M ?Fn?c0N?c1M = r1 (3.47)
where c1 ?Z and 0?r1 <M. Rearranging the previous equation and substituting dM for
N,
Fn = (c0d+c1)M +r1 (3.48)
But 0?r1 <M, so from the definition of the modulo operation
Fn = (c0d+c1)M +r1 ??Fn?M (3.49)
57
The period of?Fn?M can be calculated from Theorem 3.2,
?M = MGCD(F,M) (3.50)
Thus we have shown the case for M | N. Now consider the case where M - N. From
Theorem 3.2, we know the period of the sequence?Fn?N is ?N, so the sequence??Fn?N?M
is also periodic with ?N.
The previous result is important in the analysis of the least period of truncated phase
sequences. Lemma 3.5 can directly be used to calculate the least period of the truncation
error sequence, where truncation is defined in Definition 3.4.
Truncation generally happens on a bit boundary as the operation is free in hardware
(i.e. the bits are simply ignored). Therefore the general result of Theorem 3.5 can be made
more specific to the typical DCDO accumulator use case.
Theorem 3.5 (Periodicity of Phase Truncation Error Sequence). The truncated error se-
quence of an NP state phase accumulator driven by FCW F with NE states in the truncated
sequence has least periodicity
?E =
??
??
???
?P if NE -NP
NE
GCD(F,NE) if NE|NP
(3.51)
Proof. The error sequence of the phase accumulator is (Theorem 3.4)
PE[n] =
angbracketleftBig
?Fn?NP
angbracketrightBig
NE (3.52)
Let NE -NP. Then applying Lemma 3.5, we know that PE[n] has least period ?P. Now let
NE|NP, then again applying Lemma 3.5 PE[n] has least period
NE
GCD(F,NE) (3.53)
58
and the proof is complete.
As will be discussed in Chapter 4, in most real hardware implementations NE | NP.
Thus the variable NQ = NPN
E
is introduced and represents the number of unique values in the
truncated phase word. This value becomes the number of entries (address domain) of the
SCMF.
Now the least period of the difference between two modulo sequences will be derived.
SincethetruncatederrorsequencewasshowntobeofthisforminTheorem3.4, thefollowing
derivation can be directly used to compute the periodicity of the sequence.
Theorem 3.6 (Periodicity of the Difference of Two Modulo Sequences). The least period of
the sequence generated by
f[n] =?Fn?N??Fn?M (3.54)
is the least common multiple of the least periods of the two sequences
f0[n] =?Fn?N (3.55)
f1[n] =?Fn?M (3.56)
Proof. Let L = LCM(?N,?M), where
?N = NGCD(F,N) (3.57)
?M = MGCD(F,M) (3.58)
59
are the least periods of f0[n] and f1[n] from Theorem 3.3. From Lemma 3.3, the sequences
can be written as
f0[n] = GCD(F,N)??Nn??N (3.59)
f1[n] = GCD(F,M)??Mn??M (3.60)
where ?N = F/GCD(F,N) and ?M = F/GCD(F,M). First we check that L is a period for
f[n].
f[n+L] = GCD(F,N)??Nn+L??N ?GCD(F,M)??M +L??M (3.61)
Note that the multiplication of the GCD terms does not influence periodicity of the sequence.
From Definition 1.2, ?N |L and ?M |L. Therefore ?L??N = 0 and ?L??M = 0. Applying
Lemma 3.4 to the previous equation
f[n+L] = GCD(F,N)??Nn??N ?GCD(F,M)??Mn??M = f[n] (3.62)
Now we must show that L is the least period of f[n]. Assume that there is a positive integer
K <L such that
f[n+K] = f[n] (3.63)
Then
f[n+K] = f[n] (3.64)
?Fn+K?N??Fn+K?M =?Fn?N??Fn?M (3.65)
??Fn?N +?K?N?N???Fn?M +?K?M?M =?Fn?N??Fn?M (3.66)
60
which implies that
?K?N = 0 (3.67)
?K?M = 0 (3.68)
From the definition of the modulo operation, K = c0N = c1M for some c0,c1 ?P. So K
must be a multiple of both N and M that is less than L. But this is a contradiction, since
L is the least common multiple of ?N and ?M.
Theorem 3.7 (Truncated Phase Sequence Period). The length of the least period of the
truncated phase accumulator sequence (Equation 3.38) with NP phase states and NE error
states is
?P = NPGCD(F,N
P)
(3.69)
Proof. The truncated phase sequence is given by Equation 3.38,
PT[n] = 1N
E
bracketleftbigg
?Fn?NP ?
angbracketleftBig
?Fn?NP
angbracketrightBig
NE
bracketrightbigg
(3.70)
From Theorem 3.6, the period of PT[n] is equal to the least common multiple of least period
of?Fn?NP and
angbracketleftBig
?Fn?NP
angbracketrightBig
NE. From Theorem 3.3, the least period of?Fn?NP is
?1 = ?P = NPGCD(F,N
P)
(3.71)
From Theorem 3.5, the least period of
?2 =
?
???
???
?P if ?E - ?P
?E, if ?E|?P
(3.72)
Now consider the least common multiple of ?1 and ?2 when ?E - ?P. Then ?2 = ?P and
LCM(?1,?2) = LCM(?P,?P) = ?P.
61
Now consider the other case where ?E|?P which from Definition 1.1 there existsc0 ?Z
suchthatc0?E = ?P. ThenLCM(?1,?2) = LCM(?P,?E) = LCM(c0?E,?E) = c0?E = ?P.
Therefore ?P is the least period of PT[n].
The periodicities for arbitrary phase accumulator sequence, truncated phase sequence,
and truncated error sequence have been derived along with the exact mathematical formula-
tionsofthesequencesthemselves. Theanalysiswillproveusefulinre-derivingNicholas?s[23],
Torosyan?s [29] and Jenq?s [24] phase truncation papers of Chapter 2 as well as the fully gen-
eral theorem presented in this work in Chapter 4.
3.4 Relationships Between Sequences
From Section 3.2, it is clear that different FCWs can have the same least period. Since
there are only a finite number of states that can be taken in modulo arithmetic, it would
intuitively seem that these sequences of the same period would be related in some manner.
This section aims to develop how these sequences are related. Before beginning, however, it
is helpful to show when a multiplicative inverse exists in a modulo number system.
Lemma 3.6 (GCD and Linear Diophantine Equations). If GCD(a,b) = d and a?P0 and
b?P, then there exist x,y?Z such that ax + by = d. Equations of the form ax + by = c,
where only integer solutions are allowed are called linear Diophantine equations.
Proof. The popular approach for proving this lemma is applying Euclid?s GCD algorithm in
reverse until an x and y are found such that ax + by = GCD(a,b). Here a more compact
approach will be used. Let a?P0 and b?P. Let S ={ax+by : x,y?Z and ax+by?0}.
That S is non-empty can be shown by choosing x and y equal to zero. S is well-ordered by
Principle 1.2. Select the smallest element of S, say d, and write it as ax0 +by0 = d. Now if
d = GCD(a,b), then the proof is complete.
Assume that d - a, then from the Theorem 3.1, a = dq + r, 0 <r <d (if r = 0 then d
would divide a) and therefore dq = a?r. Multiplying ax0 + by0 = d by q and substituting
62
for dq for a?r yields the following
q(ax0 +by0) = dq
q(ax0 +by0) = a?r
a?qax0?by0 = r
a(1?qx0)?by0 = r
Letting x1 = (1?qx0) and y1 = ?y0, then ax1 + by1 = r and r?S. But this leads to a
contradiction because r < d but d is the least element of S. So d|a. The same technique
can then be used to show that d|b.
Now to show that d is the greatest common divisor of a and b. By Definition 3.2, any
divisor of a and b must also divide d for this to be true. Let c be a common divisor of a and
b.
d = ax+by
= c
parenleftbigga
c
parenrightbigg
x+c
parenleftBiggb
c
parenrightBigg
y
= c
parenleftBigga
cx+
b
cy
parenrightBigg
Clearly then c|d and GCD(a,b) = d.
This provides a powerful technique for working with equations involving the greatest
common divisor of two numbers.
Theorem 3.8 (Multiplicative Inverse in Modulo Arithmetic). A unique multiplicative in-
verse F?1 ?ZNP exists such that
angbracketleftBig
F?1F
angbracketrightBig
NP =?1?NP (3.73)
if and only if GCD(F,NP) = 1,
63
Proof. Let GCD(F,NP) = 1. From Lemma 3.6 there existx,y?Zsuch thatFx+NPy = 1.
Fx+NPy = 1 (3.74)
Fx?1 =?NPy (3.75)
Then NP |(Fx?1) and from Definition 3.1, Fx?1 (mod NP) and x is the multiplicative
inverse of F modulo NP. Now to show that it is unique, assume that x0,x1 ? ZNP are
distinct solutions to the linear congruence relation Fx?1 (mod NP). Then
Fx0 ?1 (mod NP)
Fx1 ?1 (mod NP) (3.76)
Through the transitive property, Fx0 ? Fx1 (mod NP). Then NP | (Fx0 ?Fx1) and
through the distributive property NP |F(x0 ?x1). Since GCD(F,NP) = 1, NP - F and
therefore NP | (x0?x1). Again from Definition 3.1, x0 ?x1 (mod NP). But both x0 and
x1 are less than NP by definition and x0 = x1. Therefore the multiplicative inverse is unique
by contradiction.
The relationship between the sequences of FCWs can now be derived as all the tools
necessary for the derivation have been developed.
Theorem 3.9 (FCW Time Sequence Permutation Relationship). The sequences generated
by two different FCWs F0 and F1 driving an accumulator with NP states are permutations
of each other if GCD(F0,NP) = GCD(F1,NP).
Proof. Let GCD(F0,NP) = GCD(F1,NP) = d. We wish to show that the following two
sequences
f0[n] =?F0n?NP (3.77)
f1[n] =?F1n?NP (3.78)
64
are permutations of each other, and more particularly that,
f0[k1n] = f1[n] and f1[k0n] = f0[n] (3.79)
for some k0,k1 ?ZNP. First consider the case where d = 1 (i.e. F0 and F1 are relatively
prime toNP). Then from Theorem 3.8, there exist multiplicative inverses forF0 andF1 such
that F0F?10 = 1 and F1F?11 = 1. Then
f0[F?10 F1n] =
angbracketleftBig
F0F?10 F1n
angbracketrightBig
NP =?F1n?NP = f1[n] (3.80)
Sok1 = F?10 F1 andk0 = F?11 F0 . Now consider the case wherednegationslash= 1. Then from Lemma 3.2,
we get
f0[n]
d =
bracketleftbiggparenleftbiggF
0
d
parenrightbigg
n
bracketrightbiggparenleftbigg
mod
parenleftbiggN
P
d
parenrightbiggparenrightbigg
(3.81)
f1[n]
d =
bracketleftbiggparenleftbiggF
1
d
parenrightbigg
n
bracketrightbiggparenleftbigg
mod
parenleftbiggN
P
d
parenrightbiggparenrightbigg
(3.82)
But now GCD
parenleftBigF
0
d ,
NP
d
parenrightBig
= GCD
parenleftBigF
1
d ,
NP
d
parenrightBig
= 1 from Lemma 3.1 and multiplicative inverses
exist for both F0d and F1d from Theorem 3.8. Then the logic for whend = 1 readily applies and
k1 =
parenleftBigF
0
d
parenrightBig?1parenleftBigF
1
d
parenrightBig
and k0 =
parenleftBigF
1
d
parenrightBig?1parenleftBigF
1
d
parenrightBig
are the multipliers that generate the permutations.
Theorem 3.9 forms a cornerstone in the analysis of Chapter 4 and is the most important
observation of this chapter. A similar observation is made by Torosyan in [29] through citing
a number theory text. Here the relevant analysis from several texts have been collated to
arrive at the same observation.
65
3.5 Comments on Mathematical Structure
Lastly,wenotethattheequivalentmathematicalstructureforthephaseaccumulatorisa
commutative ring with identity. The following paragraphs define and describe the properties
of rings. First, the definition of a group is presented [15]
Definition 3.5 (Groups). The set P with a binary operation + is called a group if
1. a+ (b+c) = (a+b) +c for a,b,c?P, i.e. associativity holds.
2. There exists an element e?P such that a+e = e+a = a for all a?P. The element
e is called the identity for +.
3. For every a ? P, there exists an element b ? P such that a + b = b + a = e. The
element b is called the inverse of a with respect to +.
This can be further refined to a Abelian group in the phase accumulators case, as the
commutative property holds (a + b = b + a, whenever a,b?P). A ring is a set R with two
binary operations + and?on R such that
? The + operation forms a commutative group on R, and
? The?operation forms a semigroup.
? For a,b,c?R, then a?(b + c) = a?b + a?c and (a + b)?c = a???c + b?c, i.e. the?
operation is distributive over +.
Note that the definition of the phase accumulator at state n makes use of multiplication (or
??? in the definition of the the commutative ring). Notice that a multiplicative inverse need
not exist for an element ??P.
The author believes that using more advanced properties of Abelian rings could be used
to arrive at a general solution more quickly than the detailed analysis used in this document.
However, the solution derived in this work is exact and with merit. It also makes use of more
66
elementary properties of number theory such that an undergraduate student in Computer
Engineering (or an engineering discipline that provides some discrete mathematics) could
fully follow.
67
Chapter 4
Spectrum of Truncated Phase Sequences
Thespectrumoftruncatedphasesequencesisnowinvestigated. Tovisualizethepeculiar
behavior of the spurs generated by phase truncation, consider Figure 4.1a and Figure 4.1b.
Figure 4.1a shows the output spectrum from a single period of a BP = 20 bit phase accu-
mulator, truncated to BQ = 12 bits and driven by FCW F = 28 ?500. Figure 4.1b shows
the output spectrum when F = (28?500) + 1. Though it has no effect on the analysis,
assume 1 GHz clock for the above example. This means that two tones approximately
1 GHz/220 ? 1 kHz apart produce radically different output spectrums. Figure 4.1a shows
only one spectral line at the intended synthesized frequency, while Figure 4.1b shows sev-
eral hundred spurious tones in addition to the main tone. The truncation error sequence is
(Theorem 3.4)
PE[n] =
angbracketleftBig
?Fn?NP
angbracketrightBig
NE (4.1)
whereNP = 2BP andBE = 2BP?BQ = 28. From Lemma 3.5, the period of the error sequence
is 28. In this chapter, it will be shown how this information can be used to predict the exact
location, magnitude and phase of each individual spur that result from phase truncation.
4.1 Intuitive Understanding
Early analysis of the magnitude of the spurs generated by phase truncation uses trigono-
metric approximations. The simplest argument is that the error from phase truncation (??)
68
-120
-100
-80
-60
-40
-20
0
0 0.1 0.2 0.3 0.4 0.5Normalized
Output
Po
wer
(dB)
Normalized Frequency (1/fclk)
(a) BP = 20, F = 28?500, BQ = 12
-120
-100
-80
-60
-40
-20
0
0 0.1 0.2 0.3 0.4 0.5Normalized
Output
Po
wer
(dB)
Normalized Frequency (1/fclk)
(b) BP = 20, F = 28?500 + 1, BQ = 12
Figure 4.1: Spectrums from Two Adjacent FCWs
is relatively small with respect to the phase variable (?). Equation 4.2 shows the approxi-
mation.
sin (? + ??) = sin (?) cos (??) + cos (?) sin (??) (4.2)
?sin (?) + cos (?) ??, whenever ??lessmuch1 (4.3)
The technique applies a product-to-sum trigonometric identity followed by a small angle
approximation. The quality of the small angle approximation can be evaluated through
applying a Taylor series expansion on sine.
Definition 4.1 (Taylor Series). Let f(x) be an infinitely differentiable function. The Taylor
series of f(x) taken about a value a is defined by Equation 4.4.
f (x)?
n=?summationdisplay
n=0
bracketleftBigg dn
dxn (f (x))
bracketrightBigg (x?a)
n! (4.4)
69
The Maclaurin series expansion for sine, which is a special case of the Taylor series taken
about a = 0, yields:
sin (x) = x?x
3
3! +
x5
5! ???? (4.5)
Note that for xlessmuch1, the third order and fifth order terms become negligible. Equation 4.6
shows the small angle approximation for sine.
sin (x)?x, xlessmuch1 (4.6)
Ignoring the shape of the error ??, i.e. assuming it constant, the approximation can be
used to roughly approximate an upper bound on the largest spur from the phase truncation
process. This is because keeping ?? constant at its maximum value concentrates all the
spurious energy into a single tone. In reality, ?? varies with time and thus the cosine term
is amplitude modulated by a ?sawtooth wave-like? error function. This spreads the error
energy into multiple frequency bins, reducing the maximum power at the spurious frequency.
Consider a phase accumulator with BP-bits in which the bottom BT bits are truncated,
leaving a kept phase word of Q of BQ-bits. Remember that the phase accumulator values
are mapped uniformly from [0,2pi) in NP = 2BP steps.
max{??}= 2piNBTN
P
= 2pi2
BT
2BP =
2pi
2BP?BT =
2pi
2BQ (4.7)
It is clear from the previous equation that the phase error decreases exponentially with BQ
(halved for each kept bit added). Note also that the magnitude of the error depends on
the relationship between number of bits truncated to the number of bits kept and not the
number truncated bits alone. The loose upper bound for the SFDR in decibels from phase
70
-160
-140
-120
-100
-80
-60
-40
-20
4 6 8 10 12 14Absolute
Appro
xmation
Error
(dB)
Kept Bits
(a) Small Angle Estimate
-100
-80
-60
-40
-20
0
4 6 8 10 12 14
W
orst
Case
SFDR
(dB)
Kept Bits
Approx. SFDRNicholas SFDR
This Work
(b) Worst Case SFDR due to Phase Truncation
Figure 4.2: Simple Estimates for Worst Case SFDR due to Phase Truncation
truncation using this approximation is
20 log10
parenleftbigg 2pi
2BQ
parenrightbigg
= 20 log10 (2pi)?20BQ log10 (2)?16?6.02BQ (4.8)
However, an approximation has been made in the derivation itself, so the ?upper bound? is
no upper bound in any formal sense of the phrase (or at least it has not been proven to be so
yet in this work). The error in the trigonometric approximation of Equation 4.2 is shown in
Figure 4.2a. Clearly for 10 bits or more, the approximation of Equation 4.2 to sine is greater
than 90 dB, which will be much larger than the SNR and SFDR of the systems discussed in
this work.
Figure 4.2b shows the actual worst case spur against the approximation using Nicholas?s
technique. Qualitatively it is clear that the estimate is a gross overestimate, approximately
10 dB, to the actual value of the worst case SFDR. But this is expected, as the phase error
is a sawtooth like function against time and thus the spurious energy is spread across many
bins.
There are several problems with relying on this analysis when designing a DDFS system.
71
1. The analysis provides no insight into the number of spurs generated by phase trunca-
tion.
2. The analysis provides no insight into the location of the phase truncation spurs.
3. The analysis provides no way to distinguish phase truncation spurs from quantized
amplitude spurs.
4.2 Characteristics of Truncated Phase Sequences
Since the delta phase cycle is periodic with ?E, the difference betweenPT[n] andPT[n+
?E] should be the equal for all n ? Z and have the value F modulo N. An example
demonstrating this can be found in Section 2.3.1.
Theorem 4.1 (Delta Phase Steps). The truncated phase difference separated by ?E steps is
equal to the reduced FCW for F
?PT[n]?PT[n+ ?E]?NP =
angbracketleftBigg F
GCD(F,NE)
angbracketrightBigg
NP
(4.9)
if NE |NP or is zero otherwise. NP is the number of states in the phase accumulator and
NE is the number of states in the truncated error sequence.
Proof. Recall from Equation 3.38,
PT[n] = 1N
E
bracketleftbigg
?Fn?NP ?
angbracketleftBig
?Fn?NP
angbracketrightBig
NE
bracketrightbigg
(4.10)
PT[n+ ?E] = 1N
E
bracketleftbigg
?Fn+F?E?NP ?
angbracketleftBig
?Fn+F?E?NP
angbracketrightBig
NE
bracketrightbigg
(4.11)
72
FirstconsiderthecommoncasewhereNE|NP. Thenthepreviousequationsreduce, through
application of Lemma 3.5, to
PT[n] = 1N
E
bracketleftBig
?Fn?NP ??Fn?NE
bracketrightBig
(4.12)
PT[n+ ?E] = 1N
E
bracketleftBig
?Fn+F?E?NP ??Fn+F?E?NE
bracketrightBig
(4.13)
Next consider ?F?E?NE. If NE | F?E then ?F?E?NE = 0 through the definition of the
modulo operator. Let d = GCD(F,NE). Then d|F and d|NE by Definition 3.2 and there
exist c0,c1 ? Z such that c0d = F and c1d = NE. ?E = NEd from Equation 3.12. Then
F?E = c0d
parenleftBigN
E
d
parenrightBig
= c0NE. Clearly then NE |F?E by Definition 1.1. Now the result follows
through application of Lemma 3.4 that?Fn+F?E?NE =?Fn?NE.
PT[n]?PT[n+ ?E] = 1N
E
bracketleftBigparenleftBig
?Fn?NP ??Fn?NE
parenrightBig
?
parenleftBig
?Fn+F?E?NP ??Fn?NE
parenrightBigbracketrightBig
(4.14)
= 1N
E
bracketleftBig
?Fn?NP ??Fn+F?E?NP
bracketrightBig
(4.15)
Applying the modulo operator to both sides of the last equation yields the final result
?PT[n]?PT[n+ ?E]?NP =
angbracketleftbigg 1
NE
bracketleftBig
?Fn?NP ??Fn+F?E?NP
bracketrightBigangbracketrightbigg
NP
=
angbracketleftbigg 1
NE?Fn?Fn+F?E?NP
angbracketrightbigg
NP
=
angbracketleftbigg 1
NE?F?E?NP
angbracketrightbigg
NP
(4.16)
Using the knowledge that NE|F?E and substituting for ?E,
?PT[n]?PT[n+ ?E]?NP =
angbracketleftBiggF?
E
NE
angbracketrightBigg
NP
=
angbracketleftBigg F
GCD(F,NE)
angbracketrightBigg
NP
(4.17)
73
Now consider the other case where NE - NP, then ?E = NP from Theorem 3.5. But
PT[n] is periodic with NP as well from Theorem 3.6. So PT[n] = PT[n + ?E] and PT[n]?
PT[n+ ?E] = 0.
The implication is that if NE | NP, then the sequence PT can be divided into ?E
sub-sequences that sum together to provide the original truncated phase sequence. First,
it is important to show that any finite length sequence can be split into a sum of different
sub-sequences by utilizing the Kronecker delta function (Definition 4.2).
Definition 4.2 (Kronecker Delta Function). The Kronecker delta is defined as a function
on Z such that
?[n] =
??
??
???
0, nnegationslash= 0
1, n = 0
(4.18)
Theorem 4.2 (Sub-Sequences of a Finite Sequence). Any finite sequence f[n] with length
? such that ? is a composite positive integer (i.e. ? = aN) can be decomposed into the sum
of a sub-sequences.
f[n] =
a?1summationdisplay
q=0
N?1summationdisplay
r=0
?[n?(Nq +r)]f[Nq +r], 0?n< ? (4.19)
where ?[n] is the Kronecker Delta Function (Definition 4.2).
Proof. Letf[n] is a finite sequence of length ? where ? = aN is a composite positive integer.
It is clear that
f[n] =
??1summationdisplay
m=0
?[n?m]f[m] (4.20)
where ?[n] is the Kronecker delta function by the definition of the function. This operation
is sometimes called substitution and proves helpful in analysis. From the division algorithm,
for a given m and divisor N,
m = Nq +r (4.21)
74
where 0 ?r < N and q and r are unique. This information can be used to compose an
equivalent double summation by substituting Nq +r for m,
f[n] =
a?1summationdisplay
q=0
N?1summationdisplay
r=0
?[n?(Nq +r)]f[Nq +r] (4.22)
since Nq +r?{0,1,???,??1}.
While not dealing with finite sequences explicitly in the analysis thus far, the behavior
of a periodic sequence can be understood by viewing a single period of the sequence. Under
certain cases where the period of the error sequence divides the full truncated phase sequence
period, Theorem 4.2 can be applied to develop an efficient algorithm for characterizing a
DCDO. Let ?E | ?P for a DCDO implementation, then the truncated phase sequence PT
can be reconstructed from the subsequences using Theorem 4.2.
PT[n] =
(?P/?E?1)summationdisplay
j=0
?E?1summationdisplay
m=0
?[n?(?Ej +m)]PT[n] (4.23)
=
(?P/?E?1)summationdisplay
j=0
?[n??Ej]PT[n] +???+
(?P/?E?1)summationdisplay
j=0
?[n?(?Ej + ?E?1)]PT[n] (4.24)
Before analyzing the spectrum of such sequences, the interchangeability of the summa-
tions of finite sequences is considered.
Theorem 4.3 (Interchanging Summations for Finite Sequences). Two finite summations
can be interchanged for arbitrary sequence f[n].
Nsummationdisplay
n=0
Msummationdisplay
m=0
f[n,m] =
Msummationdisplay
m=0
Nsummationdisplay
n=0
f[n,m] (4.25)
75
Proof. Since addition is commutative, f[i,j] can be arranged into any order and return the
same sum
Nsummationdisplay
n=0
Msummationdisplay
m=0
f[n,m] = f[0,0] +f[0,1] +???+f[0,M] +f[1,0] +???+f[N,M] (4.26)
= f[0,0] +f[1,0] +???+f[N,0] +f[0,1] +???+f[N,M] (4.27)
=
Msummationdisplay
m=0
Nsummationdisplay
n=0
f[n,m] (4.28)
The results seems self-evident, but it is important to note that the order of infinite
summations can not be arbitrarily changed in every case. Thus pointing out that it can be
done so with finite sequences prevents any gaping holes in the analysis that follows. The
next step is to find the relationship between truncated phase output codes of the same value.
To begin, the case of a FCW of one is analyzed.
Theorem 4.4 (Adjacent Truncated Phase Elements). For F = 1,
PT[?En] = PT[?En+m] (4.29)
for all n?Z and 0?m< ?E if NE|NP.
Proof. Plugging in the left hand side of Equation 4.29 into Equation 3.38 (with NE |NP
simplification)
PT[?En] = 1N
E
bracketleftBig
?F?En?NP ??F?En?NE
bracketrightBig
(4.30)
From the proof of Theorem 4.1, NE |F?E, and therefore ?F?En?NE = 0. Applying this
knowledge results in
PT[?En] = 1N
E
bracketleftBig
?F?En?NP
bracketrightBig
(4.31)
76
Next plugging the right hand side of Equation 4.29 into Equation 3.38 yields
PT [?En+m] = 1N
E
bracketleftBig
?F?En+m?NP ??F?En+m?NE
bracketrightBig
(4.32)
Applying Lemma 3.4 to?F?En+m?NE yields
?F?En+m?NE =
angbracketleftBig
?F?En?NE +?m?NE
angbracketrightBig
NE (4.33)
=
angbracketleftBig
?m?NE
angbracketrightBig
NE = m (4.34)
since m< ?E?NE. Now it only needs to be shown that
?F?En+m?NP ??F?En?NP = m (4.35)
By the division algorithm (Theorem 3.1),F?En = NPq0+r0, whereq0 ?Zand 0?r0 <NP.
Also from the division algorithm, F?En + m = NPq1 + r1 and 0 ?r1 < NP. Subtracting
the previous two equations from each other
F?En+m?F?En = NPq1 +r1?NPq0?r0 (4.36)
m = NP(q1?q0) + (r1?r0) (4.37)
Since m < ?E ? NP and 0 ? r0,r1 < NP, it is clear that q1 ?q0 = 0 and q0 = q1.
Consequently, m = r1?r0. Plugging back into Equation 4.35,
?NPq1 +r1?NP ??NPq0 +r0?NP = r1?r0 = m (4.38)
77
From Theorem 3.9, it was noted that two frequency control words of the same period
are simple permutations of each other. This leads to an interesting relationship that takes
advantage of Theorem 4.4.
Theorem 4.5 (When Truncated Values Repeat). For any frequency control word F,
PT [?En] = PT
bracketleftBig
?En+ ??1P m
bracketrightBig
(4.39)
for all n?Z and 0?m< ?E where ??1P is the multiplicative inverse of ?P modulo ?P and
NE|NP.
Proof. This is easily shown by expanding PT
bracketleftBig
?En+ ??1P m
bracketrightBig
.
PT
bracketleftBig
?En+ ??1P m
bracketrightBig
= 1N
E
bracketleftBiggangbracketleftBig
F?En+F??1P m
angbracketrightBig
NP ?
angbracketleftbiggangbracketleftBig
F?En+F??1P m
angbracketrightBig
NP
angbracketrightbigg
NE
bracketrightBigg
(4.40)
First consider the sequence
angbracketleftBig
F??1P m
angbracketrightBig
NP. Applying Lemma 3.3, it is easily shown thatangbracketleftBig
F??1P m
angbracketrightBig
NP = d
angbracketleftBig
?P??1P m
angbracketrightBig
?P where d = GCD(F,NP). But ?
?1
P is the multiplicative
inverse of ?P modulo ?P by definition, that the multiplicative inverse exists comes from
Theorem 3.8. Then
angbracketleftBig
F??1P m
angbracketrightBig
NP = d?m??P.
From the proof of Theorem 4.1, it was shown thatNE|F?E and therefore?F?En?NE =
0. Then Equation 4.40 becomes
PT
bracketleftBig
?En+ ??1P m
bracketrightBig
= dN
E
bracketleftBig
??P?En+m??P ??m??E
bracketrightBig
(4.41)
Since 0?m< ?E, the same logic in the proof of Theorem 4.4 immediately applies and
PT [?En] = PT
bracketleftBig
?En+ ??1P m
bracketrightBig
(4.42)
78
Lemma 4.1 (Special Sub-Sequence Arrangement for Periodic Sequences). Any finite peri-
odic sequence f[n] with length ? such that ? is a composite positive integer (i.e. ? = aN)
can be decomposed into the sum of sub-sequences
f[n] =
a?1summationdisplay
q=0
N?1summationdisplay
r=0
?[n??Nq +br??]f [?Nq +br??], n?Z (4.43)
where b is coprime to ?.
Proof. Let b?Z be coprime to ?, then GCD(b,?) = 1. Then?br?? =?r?? by Lemma 3.2.
The summation of Equation 4.43 then becomes
f[n] =
a?1summationdisplay
q=0
N?1summationdisplay
r=0
?[n??Nq +r??]f [?Nq +r??], n?Z (4.44)
which is the same as that shown in Theorem 4.2, and thus by the proof of that same
theorem, f[n] can be written equivalently as the sum of sub-sequences of the form given in
Equation 4.43.
4.3 Spectrum in the Presence of Phase Truncation
Now we are ready to calculate the spectrum of a truncated phase sequence. All the
work from Chapter 3 and the previous section of this chapter provide the tools necessary to
compute the discrete Fourier transform (DFT) of an arbitrary DCDO with phase truncation.
For completeness, the DFT is defined below
Definition 4.3 (Discrete Fourier Transform). If x[n] is a discrete function that is periodic
with N, then the DFT is
X [k] =
N?1summationdisplay
n=0
x[n]e?j2pikn/N, 0?k?N?1 (4.45)
79
The DFT is typically used to describe the frequency content of the discrete waveform.
The DFT is a reversible transformation and thus the inverse DFT is also defined
Definition 4.4 (Inverse Discrete Fourier Transform).
X [k] = 1N
N?1summationdisplay
k=0
X [k]ej2pikn/N (4.46)
Theorem 4.6 (Spectrum of Truncated Phase Sequence). The DFT of a truncated phase
sequence driving an ideal SCMF (i.e. no amplitude quantization) for when NE|NP is
ST[k] = RPV[k]2j
parenleftBig
?
bracketleftBig
??P?k?RP
bracketrightBig
??
bracketleftBig
??P +k?RP
bracketrightBigparenrightBig
(4.47)
where RP , ?P?
E
, ?[n] is the Kronecker delta function and
V [k] , 1?e
?j2pik?E??1P /?P
1?e?j2pik??1P /?P (4.48)
when ?P -k or V[k] = ?E.
Proof. Let the period of the error sequence ?E divide the period of the untruncated phase
sequence ?P, then the truncated phase sequence has period ?P (Theorem 3.6). Now apply
the reduced SCMF function to PT
AT[n] = sin
parenleftBigg 2pi
NQPT[n]
parenrightBigg
(4.49)
AT[n] has the same periodicity of PT[n] (which is ?P and calculated in Theorem 3.7), with
the same error sequence periodicity (calculated in Theorem 3.5), so from Lemma 4.1 it can
be written as the summation of sub-sequences as follows
AT[n] =
?P
?E?1summationdisplay
l=0
?E?1summationdisplay
m=0
?
bracketleftbigg
n?
angbracketleftBig
?El + ??1P m
angbracketrightBig
?P
bracketrightbigg
sin
parenleftBigg 2pi
NQPT[n]
parenrightBigg
(4.50)
80
That ??1P is coprime to ?P is certainly not difficult to show using the knowledge that ?P is
coprime to ?P by definition. Now apply the ?P-point DFT toAT[n] to compute the spectral
response
ST[k] =
?P?1summationdisplay
n=0
?P
?E?1summationdisplay
l=0
?E?1summationdisplay
m=0
?
bracketleftbigg
n?
angbracketleftBig
?El + ??1P m
angbracketrightBig
?P
bracketrightbigg
sin
parenleftBigg 2pi
NQPT[n]
parenrightBigg
e?j2pikn/?P (4.51)
From Theorem 4.3, the summations can be interchanged and we will do so for better read-
ability. Interchanging the summation and applying the sifting property of the Kronecker
delta,
ST[k] =
?P
?E?1summationdisplay
l=0
?E?1summationdisplay
m=0
?P?1summationdisplay
n=0
?
bracketleftbigg
n?
angbracketleftBig
?El + ??1P m
angbracketrightBig
?P
bracketrightbigg
sin
parenleftBigg 2pi
NQPT[n]
parenrightBigg
e?j2pikn/?P (4.52)
=
?E?1summationdisplay
m=0
?P
?E?1summationdisplay
l=0
sin
parenleftBigg 2pi
NQPT
bracketleftbiggangbracketleftBig
?El + ??1P m
angbracketrightBig
?P
bracketrightbiggparenrightBigg
e?j2pik??El+?
?1
P m??P/?P (4.53)
Since 0 ? m < ?E, Theorem 4.5 applies and PT[?El + ??1P m] = PT[?El]. Furthermore,
e?j2pik??El+?
?1
P m??P/?P is periodic with ?P and the modulo operator can be dropped. Apply-
ing both these observations yields
ST[k] =
?E?1summationdisplay
m=0
?P
?E?1summationdisplay
l=0
sin
parenleftBigg 2pi
NQPT[?El]
parenrightBigg
e?j2pik?El/?Pe?j2pik??1P m/?P (4.54)
=
?
?
?E?1summationdisplay
m=0
e?j2pik??1P m/?P
?
?
?P
?E?1summationdisplay
l=0
sin
parenleftBigg 2pi
NQPT[?El]
parenrightBigg
e?j2pik?El/?P (4.55)
Notice the critical observation that the exponential m term can be factored out of the inner
summation. Applying the geometric series computation, Lemma 1.2, on the m summation
yields
?E?1summationdisplay
m=0
e?j2pik??1P m/?P = 1?e
?j2pik??1P ?E/?P
1?e?j2pik??1P /?P (4.56)
81
Of course this is only for the case where ?P - k, otherwise the summation evaluates to ?E
as the exponential term evaluates to 1. Borrowing notation from Torosyan for consistency,
we define
V [k] , 1?e
?j2pik??1P ?E/?P
1?e?j2pik??1P /?P (4.57)
Notice however that this expression is different than derived in Torosyan?s dissertation [31].
In particular, only the case for F = 1 is evaluated and the knowledge that the spectrum of
sequences of equal least periods are simple rearrangements of each other. Here a completely
general expression for any FCW has been calculated. Now from the proof of Theorem 4.4,
it was shown that when NE|NP, as it does in the currently considered case
PT [?En] = 1N
E
?F?En?NP (4.58)
With this knowledge, consider the sine component of Equation 4.55,
sin
parenleftBigg 2pi
NQPT [?El]
parenrightBigg
= sin
parenleftbigg 2pi
NP ?F?El?NP
parenrightbigg
(4.59)
= sin
parenleftbigg2pi
?P ??P?El??P
parenrightbigg
(4.60)
= sin
parenleftbigg2pi
?P (?P?El)
parenrightbigg
(4.61)
Reducing the modulo sequence was possible through Lemma 3.3 and by noting that ?P =
NP/GCD(F,NP). The modulo operation was dropped because the sine function naturally
performs the operation (Lemma 1.1). Then using Euler?s formula 1.31,
sin
parenleftbigg2pi
?P (?P?El)
parenrightbigg
= 12j
bracketleftBig
ej2pi?P?El/?P ?e?j2pi?P?El/?P
bracketrightBig
(4.62)
Plugging back into Equation 4.55
ST[k] = V[k]2j
?P
?E?1summationdisplay
l=0
bracketleftBig
ej2pi?P?El/?P ?e?j2pi?P?El/?P
bracketrightBig
e?j2pik?El/?P (4.63)
82
Consider now just the summation
?P
?E?1summationdisplay
l=0
bracketleftBig
ej2pi?P?El/?P ?e?j2pi?P?El/?P
bracketrightBig
e?j2pik?El/?P =
?P
?E?1summationdisplay
l=0
ej2pi?El(?P?k)/?P ?e?j2pi?El(?P+k)/?P
(4.64)
Now the exponential ej2pi?El(?P?k)/?P and e?j2pi?El(?P+k)/?P are periodic over k with ?P/?E,
which can easily be shown by replacing k with k + ?P/?E as shown below
ej2pi?El(?P?k?(?P/?E))/?P = ej2pi?El(?P?k)/?Pe?j2pi?El(?P/?E)/?P (4.65)
= ej2pi?El(?P?k)/?P (4.66)
So when k = ?P + a?P?
E
, for a? Z then the summation is equal to ?P/?E otherwise the
summation is equal to zero. This was shown in Section 1.2 by applying Lemma 1.2 but is
shown here again as an example
?P
?E?1summationdisplay
l=0
ej2pi?El(?P?k)/?P = 1?e
j2pi(?P?k)
1?ej2pi?E(?P?k)/?P (4.67)
By setting a = 1, it becomes clear that the summation is non-zero when k??P | ?P?
E
which
from the definition of the modulo operation k = ?P (mod ?P?
E
). Therefore
?P
?E?1summationdisplay
l=0
ej2pi?El(?P?k)/?P = ?
bracketleftBig
??P?k?(?P/?E)
bracketrightBig
(4.68)
Likewise, the other exponential term, e?j2pi?El(?P+k)/?P, is periodic over k with ?P/?E,
and therefore when k = ??P + a?P?
E
then the summation is equal to ?P/?E otherwise the
summation is equal to zero. Using the analysis directly above, the final spectral response of
83
the truncated sequence is
ST[k] = ?PV[k]?
E2j
parenleftBig
?
bracketleftBig
??P?k?(?P/?E)
bracketrightBig
??
bracketleftBig
??P +k?(?P/?E)
bracketrightBigparenrightBig
(4.69)
Letting RP = ?P?
E
and defining
SE[k] = RP2j
parenleftBig
?
bracketleftBig
??P?k?RP
bracketrightBig
??
bracketleftBig
??P +k?RP
bracketrightBigparenrightBig
(4.70)
Then the final equation can be written as
ST[k] = V[k]SE[k] (4.71)
Applying the logic of the previous proof to a cosine mapping function, as opposed to
sine, yields
CT[k] = V[k]CE[k] (4.72)
where V[k] is the same window function of Theorem 4.6 and CE[k] is
CE[k] = RP2
parenleftBig
?
bracketleftBig
??P?k?RP
bracketrightBig
+?
bracketleftBig
??P +k?RP
bracketrightBigparenrightBig
(4.73)
For a quadrature system, where the output is represented as CST[k] = CT[k] + jST[k], the
truncated output spectrum simplifies even further without any loss of generality.
CST[k] = RPV[k]2 ?
bracketleftBig
??P?k?RP
bracketrightBig
(4.74)
An even more powerful generalization can be made at this point about the relationship
between phase truncation spurs and amplitude spurs from approximations and truncation.
84
This allows the analysis extend beyond single tone generation and be applied to arbitrary
waveform generators or single tone generation with lossy compression.
Theorem4.7 (DCDOSpectrumwithPhaseTruncationandArbitraryROM). The spurious
response of an arbitrary ROM with an overflowing phase accumulator in the presence of
truncation is
SG[k] = V[k]M[k], k?{0,1,2,...,?P} (4.75)
where
M[k] =FRP {A[?Pn]}[k], (4.76)
V[k] is the windowing function from Theorem 4.6,FRP {?}is the RP-point discrete Fourier
transform and A[?Pn] is the ROM sequence generated by the reduced frequency control word
without phase truncation.
Proof. If one closely observes Equation 4.63, it is clear that the RP-point DFT of the ROM
values multiplied by the window functionV[k] yields the complete spectrum. In the analysis
a sinusoid was used, but this certainly need not be the case. The ROM values repeat
because of the characteristics of the truncated phase sequence, not the SCMF and thus V[k]
isindependentfromthechoiceofROMvalues. ThusmakingAT arbitrary, thecontribution
of the output spectrum from the ROM is
M[k] ,FRP {AT[?P?En]}[k] =
RP?1summationdisplay
n=0
AT[?P?En]e?j2pikn/RP (4.77)
=
RP?1summationdisplay
n=0
A[?Pn]e?j2pikn/RP (4.78)
=FRP {A[?Pn]}[k], (4.79)
Note that this is simply the discrete Fourier transform of the ROM without phase trunca-
tion. Finally, in its most general form, the output spectrum of a DCDO with an arbitrary
85
ROM is
SG[k] = V[k]M[k] (4.80)
where the subscript G is intended to represent ?general.?
An interesting observation made by Torosyan in his dissertation is that the DFT com-
mutes with sequences ?of the form? of those generated by the overflowing phase accumulator.
The proof is rather qualitative and ?of the form? is never formally identified. Since the DFT
commutes, the spectrums of two FCWs with the same least period for a DCDO are linear
permutations of each other. This is not too surprising, since the time sequences in Theo-
rem 3.9 have already been shown to be linear permutations of each other. Using the general
closed form equation presented in Theorem 4.7, it is straightforward to prove the spectrums
of the FCWs are simple linear permutations of each other.
Theorem 4.8 (FCW Frequency Sequence Permutation Relationship). The frequency re-
sponse (DFT) of two different FCWs F0 and F1 driving an accumulator with NP states are
permutations of each other if GCD(F0,NP) = GCD(F1,NP).
Proof. Let GCD(F0,NP) = GCD(F1,NP) = d. Then from Theorem 4.7, the corresponding
frequency domain representations of a full period for F0 and F1 are
SG0[k] = V0[k]M0[k] (4.81)
SG1[k] = V1[k]M1[k] (4.82)
If a coefficient ? exists such that V0[?k] = V1[k] and M0[?k] = M1[k], then SG0[?k] = SG1[k]
and the proof would be complete. Consider the respective window functions,
V0[k] = 1?e
?j2pik?E??1P0/?P
1?e?j2pik??1P0/?P (4.83)
V1[k] = 1?e
?j2pik?E??1P1/?P
1?e?j2pik??1P1/?P (4.84)
86
where ?P0 and ?P1 are multiplicative inverses modulo ?P of F0 and F1 reduced as per
Lemma 3.3 respectively. Let ? = ??1P1?P0. Recall that the complex exponential operation
performs an equivalent modulo operation, then
V0[??1P1?P0k] = 1?e
?j2pi??1P1?P0k?E??1P0/?P
1?e?j2pik??1P0/?P (4.85)
= 1?e
?j2pik?E??1P1/?P
1?e?j2pik??1P0/?P (4.86)
= V1[k] (4.87)
Clearly then an ? has been found that satisfies the window function rearrangement. Now
consider M0[k] and M1[k].
M0[k] =
RP?1summationdisplay
n=0
AT0[?En]e?j2pikn/RP (4.88)
M1[k] =
RP?1summationdisplay
n=0
AT1[?En]e?j2pikn/RP (4.89)
Recall from Chapter 3 that AT0 and AT1 are
AT0[n] = A[PT0[n]] = A
bracketleftBigg d
NE??P0?En??P
bracketrightBigg
(4.90)
AT1[n] = A[PT1[n]] = A
bracketleftBigg d
NE??P1?En??P
bracketrightBigg
(4.91)
The simplification of the truncated phase sequence is shown in the proof of Theorem 4.6.
Now check to see if M0[?k] = M1[k] by plugging Equation 4.90 into Equation 4.88.
M0
bracketleftBig
??1P1?P0k
bracketrightBig
=
RP?1summationdisplay
n=0
A
bracketleftBigg d
NE??P0?En??P
bracketrightBigg
e?j2pi??1P1?P0kn/RP (4.92)
=
RP?1summationdisplay
j=0
A
bracketleftBigg d
NE
angbracketleftBig
?P0?E??1P0?P1j
angbracketrightBig
?P
bracketrightBigg
e?j2pi??1P1?P0k?P1??1P0j/RP (4.93)
=
RP?1summationdisplay
j=0
A
bracketleftBigg d
NE??P1?Ej??P
bracketrightBigg
e?j2pikj/RP = M1 [k] (4.94)
87
The change of variable in the summation from n to j by setting n = ??1P0?P1j while keeping
the summation limits the same is not trivially apparent. It is permissible in this instance
because ?P1 and ??1P0 are both coprime to ?P and A and the exponential are simply rear-
rangements of the original sequence as both are effectively modulo ?P or RP = ?P/?E (see
Theorem 3.9 for phase accumulator sequence permutation explanation). By multiplying by
a coprime number the least period of the sequence remains the same and all the values of A
and the exponential are summed, albeit in a different order.
The same operations can also be used to show that M1[??1P0?P1k] = M0[k].
Since the spectrums of FCWs with the same least period are permutations of each other,
it is important to determine the maximum possible number of least periods. It turns out that
in the case of the phase accumulator, it is equivalent to deriving how many possible greatest
common divisors are possible between NP and F. This observation follows observing that
the least period ?P is computed from NP and F with ?P = NP/gcd(F,NP). Deriving the
number of greatest common divisors between an arbitrary number F and a fixed number
NP is equivalent to deriving the total number distinct factors that can be formed from the
prime factors of NP. That the integer NP has a unique prime factorization comes from the
Unique Factorization Theorem [30].
Theorem 4.9 (Number of Phase Accumulator Least Periods). Let the prime-power de-
composition of NP be pinn pin?1n?1 ...pi11 where pj is a prime factor of NP and ij is number of
occurrences of pj in NP. The number of possible least periods with F = 1,2,...,NP ?1 is
then (in + 1)(in?1 + 1)...(i1 + 1)?1.
Proof. Let the prime-power decomposition of NP be pinn pin?1n?1 ...pi11 . F will have some com-
bination of prime factors in common with NP, if not then F and NP are coprime. So first
we decide how many copies of pn are common between F and NP if F is allowed to equal
NP. The choices are 0, 1 up to in and thus there are in + 1 such choices. Next we decide
88
how many copies of pn?1, and so forth to p1. Thus there are
(in + 1)(in?1 + 1)...(i1 + 1) (4.95)
possible combinations. But F < NP so the combination GCD(F,NP) = NP is discarded,
hence the subtraction of one. Note that F = 0 and F = NP are the same case since
P[n+NP] = P[n]. The final equation for the number of meaningful least periods is then
(in + 1)(in?1 + 1)...(i1 + 1)?1 (4.96)
As a quick sanity check, Torosyan and Nicholas claim that there are BP such combina-
tions for a BP-bit accumulator. In their analyses both NP and NE are fixed to powers of
two. Applying Theorem 4.9 with NP = 2BP, it is clear that p1 = 2 and i1 = BP and thus
there are BP + 1?1 = BP. Thus the specific case chosen works correctly.
To summarize the critical observations made in this chapter:
1. A closed form equation for the spectrum of a DCDO with an arbitrary mapping func-
tion in the presence of phase truncation is developed in Theorem 4.6 and Theorem 4.7.
(a) The special case of sinusoids is explicitly solved.
(b) The spectrum of the ROM and the window function can be computed indepen-
dently and later used to quickly generate the exact expected spectrum.
2. The tones generated by amplitude mapping non-idealities are disjoint from the spurs
from phase truncation.
3. The spectrums of FCWs with the same least period of simple permutations of each
other which is shown in Theorem 4.8.
89
(a) This implies that a DCDO can be completely characterized by analyzing only
FCWs that correspond to distinct least periods, which is the number of prime
factors in NP.
4.4 Interpreting Results
To gain an intuition into the results from Theorem 4.6, the periodicity of the terms
of SG[k] is derived. This is accomplished by calculating the period of each multiplication
term of ST[k]. It has already been noted that the least period of AT[n] is ?P in the proof of
Theorem 4.6, and thus an ?P-point DFT is taken to prevent any spectral leakage or aliasing.
An N-point DFT is periodic with N, which is shown in the following lemma
Lemma 4.2 (DFT Periodicity). The N-point Discrete Fourier Transform is periodic with
N.
Proof. LetX[k] betheDFTofasequencex[n] oflengthN. WewishtoshowthatX[k+N] =
X[k].
X[k +N] =
N?1summationdisplay
n=0
x[n]e?j2pi(k+N)/N (4.97)
=
N?1summationdisplay
n=0
x[n]e?j2pik/Ne?j2piN/N (4.98)
=
N?1summationdisplay
n=0
x[n]e?j2pik/N = X[k] (4.99)
The exponential is simplified by applying Euler?s formula.
e?j2pi = cos (?2pi) +jsin (?2pi) = 1 + 0j = 1 (4.100)
90
The first term of SG[k] under investigation is the window function V[k]. Recall again
that under analysis is the case for when NE |NP. The least period of V[k] is derived by
calculating a such that V[k +a] = V[k].
Lemma 4.3 (Window Function Periodicity). The window function V[k] has least period ?P.
Proof. As stated already, we are search for the smallest a?P such that both
V[k +a] = 1?e
?j2pi(k+a)??1P ?E/?P
1?e?j2pi(k+a)??1P /?P (4.101)
= 1?e
?j2pik??1P ?E/?Pe?j2pia??1P ?E/?P
1?e?j2pik??1P /?Pe?j2pia??1P /?P (4.102)
Clearly if the newly added exponential terms equal one, then the equation holds. So we are
searching for a such that
e?j2pia??1P ?E/?P = 1 (4.103)
e?j2pia??1P /?P = 1 (4.104)
The smallest suchathat works for Equation 4.104 isa = ?P because ??1P is coprime with ?P
(from the proof of Theorem 4.6). The smallest such a for Equation 4.103 is RP = ?P/?E,
which is less than ?P. If ?P is a period of Equation 4.103, then the least period for the
window function has been found.
e?j2pi?P??1P ?E/?P = e?j2pi??1P ?E = 1 (4.105)
Therefore a = ?P is the least period of V[k].
The next term of SG[k] is the ROM spectrum M[k]. The following lemma derives the
typical least period of M[k]. It does not hold for cases where the ROM itself has a least
period less than the length of the ROM. The author knows of no cases where this would
91
occur in practice, as ROMs consume valuable chip real-estate and periodicity in the ROM
implies that the size can be trivially compressed.
Lemma 4.4 (Period of Amplitude Spectrum with Phase Truncation). The function M[k]
from Theorem 4.7 is periodic with RP = ?P/?E.
Proof. From Theorem 4.7, M[k] is shown to be
M[k] =FRP {AT[?En]}[k] (4.106)
From Lemma 4.2, an RP-point DFT is periodic with RP. There may be a smaller period for
M[k] for different AT (e.g., consider a constant zero), but typically this is not the case.
How many periods of M[k] are in SG[k]? Since SG[k] has least period ?P (from
Lemma 4.2) and M[k] has period RP = ?P/?E, the answer is clearly ?E. Practically,
this means that the spectrum of the ROM is repeated ?E times across the spectrum while
being modulated by the window function.
Itisclearthenthatthespursfromphasetruncationare?aliases?ofthedesiredspectrum
whose magnitude and phase are solely determined by the window function. Now we derive
the shape of the function V[k]. Firstly, the expression can be rewritten in such a way as to
clearly separate the magnitude of the window function from the phase.
V[k] = 1?e
?j2pik?E??1P0/?P
1?e?j2pik??1P0/?P
=
?
?e
?jpik?E??1P0/?P
e?jpik??1P0/?P
?
?e
jpik?E??1P0/?P ?e?jpik?E??1P0/?P
ejpik??1P0/?P ?e?jpik??1P0/?P
= e?jpik(?E?1)??1P0/?P
?
?sin
parenleftBig
pik?E??1P0/?P
parenrightBig
sin
parenleftBig
pik??1P0/?P
parenrightBig
?
? (4.107)
The conversion from the sum of two complex exponentials to the sine function is an applica-
tion of Euler?s formula (Equation 1.31). Clearly then for a given k, the phase and amplitude
92
of the truncation spur can be immediately, directly computed.
|V[k]|= sin
parenleftBig
pik?E??1P0/?P
parenrightBig
sin
parenleftBig
pik??1P0/?P
parenrightBig (4.108)
Now another interesting observation can be made. The numerator of V[k] is periodic
with RP as shown in the proof of Lemma 4.3. This can also be trivially shown to be true
for the numerator of |V[k]|. Recall that M[k] is also periodic with RP. This implies the
magnitude of the ROM spectrum replicas within M[k] are identical after multiplying by the
numerator of the window function, which furthermore implies that the magnitude of the
spurs are completely determined by the denominator of the window function. To reiterate,
the magnitudes of the spurs from phase truncation are completely determined by analyzing
the properties of a single cosecant term.
1
sin
parenleftBig
pik??1P /?P
parenrightBig = csc
parenleftBiggpik??1
P
?P
parenrightBigg
(4.109)
4.4.1 Ideal SCMF Example
A visual explanation of the preceding will assist in understanding the conclusions. An
SCMF must be selected and the size of the phase accumulator must be selected. Let us
analyze a sine-only SCMF with no amplitude truncation. The spectrum, given by ST[k], of
a such setup was derived in Theorem 4.6. Next example values for the phase accumulator
must be selected in order to populate the data required for the plots. Let us choose ?P = 1.
This is not an arbitrary choice, as it has some interesting properties that allows for the
direct computation of the N largest phase truncation spurs. Next we choose ?E = 4 and
?P = 128, and these selections are arbitrary other than that ?E |?P (a requirement in the
current analysis). ?E is also chosen small to make the resulting plots more readable (recall
from the previous section that ?E replicas of the ROM spectrum are present in SG[k]).
93
First the multiplicative inverse of ?P modulo ?P is computed. The selection of ?P = 1
makes such a computation obvious, as 1?1 = 1 and thus ??1P = 1. The magnitude of this
particular case is then found by plugging the selected values into Equation 4.108
V[k] = sin (pik/32)sin (pik/128), k ={0,1,...,127} (4.110)
As Torosyan in his dissertation, the numerator and denominator of the above window func-
tion are plotted independently. Figure 4.3a shows the magnitude of window function numer-
ator and Figure 4.3b shows the magnitude of the window function denominator. Note that
the numerator has ?E = 4, which matches the theory from Section 4.4.
(a) Window Function Numerator
0
0.5
1
1.5
2
-0.4 -0.2 0 0.2 0.4
Magnitude
Normalized Frequency
(b) Window Function Denominator
0
0.5
1
1.5
2
-0.4 -0.2 0 0.2 0.4
Magnitude
Normalized Frequency
Figure 4.3: Window Function from Example
Now SE[k] (Equation 4.70) can also be computed given the selected values.
SE[k] = 16j (?[?1?k?32]??[?1 +k?32]) (4.111)
Thus between k = {?64,?63,...,63}, tones appears at k = {?64 + 1,?32 + 1,1,32 + 1}
and k = {?32?1,?1,32?1,64?1}. Figure 4.4a is a plot of SE[k]. If only considering
the the spectrum from 0 to fny/2, there are 4 unique tones, only 1 of which is the desired
94
ton and the others are spurs. This can be generalized to the ideal SCMF case, there are
?E?1 spurs, which is to say that only one replica is desired, the others are spurs. From
the previous discussion, it was noted that the magnitude of the numerator of the window
function is periodic with theSE[k] and Figure 4.4b shows the multiplication of the two terms.
(a) SCMF Spectrum
0
0.2
0.4
0.6
0.8
1
-0.4 -0.2 0 0.2 0.4
Magnitude
Normalized Frequency
(b) Numerator Times SCMF Spectrum
0
0.05
0.1
0.15
0.2
-0.4 -0.2 0 0.2 0.4
Magnitude
Normalized Frequency
Figure 4.4: Window Function from Example
Which of the replicas are spurs and which of these are the intended generated tone?
Consider the untruncated case to act as a guide to the discussion. In Section 1.2, no number
theory reductions are made in the analysis and a NP-point DFT is performed on the output
of the ideal SCMF. It was noted in the proof of Theorem 4.6 that the SCMF output is
actually periodic with ?P, so a ?P-point DFT should be sufficient to capture the data.
S[k] =
?P?1summationdisplay
n=0
sin
parenleftbigg 2pi
NP ?Fn?NP
parenrightbigg
e?j2pikn/?P (4.112)
=
?P?1summationdisplay
n=0
sin
parenleftbigg2pi
?P ??Pn??P
parenrightbigg
e?j2pikn/?P (4.113)
Applying Euler?s formula and computing the summation,
S[k] = 12j
bracketleftBig
?
bracketleftBig
??P?k??P
bracketrightBig
??
bracketleftBig
??P +k??P
bracketrightBigbracketrightBig
(4.114)
95
This lead provides enough information to hypothesize that the fundamental tones for ST[k]
are located at k = ?P and k = ??P are the fundamental tones. Back to the particular
problem under consideration, the desired tone is at k =?1,1.
The two portions of information left to consider are the SFDR and SNR due to phase
truncation. This is addressed in Section 4.6. But in the next section, a more complex
example is provided with the full ST[k] spectrum computed.
4.5 Numerical Verification of Theory
This section provides several examples to demonstrate the developed phase truncation.
The theory can be fully verified without any measured results, since the analysis applies
to the digital portion of a DDFS. Here complete visibility into the behavior of the device
can be achieved through simulation. One important note is that long FFT computations
will start to show noticeable computational error. This results from the finite precision of
the floating point precision calculations. The direct computation from Theorem 4.6 requires
fewer operations, for a 32-bit accumulator over a billion fewer mathematical operations, and
results in a more precise answer.
Figure 4.5a shows the absolute error between the theory and full simulation of a 12-bit
accumulator with the least 4 bits truncated with frequency control word 7. The error is
presented in this manner because plotting the spectrums of simulation versus theory on the
same plot causes the results to be overlaid in such as way as to be indistinguishable (as
shown in Figure 4.5b).
4.6 SFDR and SNR in the Presence of Phase Truncation
In this section, the theory from Section 4.3 is used to understand the impact of trun-
cation on the system performance metrics such as spurious-free dynamic range (SFDR) and
signal to noise ratio (SNR) . When writing about ?performance metrics?, care is taken to be
precise as to the meaning of the terminology. The SFDR is defined as the ratio of the power
96
(a) Error Between Theory and Simulation
0
2e-09
4e-09
6e-09
8e-09
1e-08
0 1000 2000 3000 4000
Absolute
Error
k
(b) Theory and Simulation Overlay
0
10
20
30
40
50
60
70
80
0 1000 2000 3000 4000
Sp
ectrum
(dB)
k
Sim.Theory
Figure 4.5: Numerical Validation
of a single fundamental tone and the power of the largest spurious tone in the spectrum [32].
Equation 4.115 expresses the relationship in a mathematical form,
SFDR = PfundP
spur-max
(4.115)
where Pfund is the power of the single fundamental tone at the output port and Pspur-max is
the power of the largest spur measured at the output port. SFDR is regularly reported in
decibels,
SFDRdb = 10 log10
parenleftBigg P
fund
Pspur-max
parenrightBigg
(4.116)
The SNR is defined as the ratio of the power of the fundamental tone to all non-harmonically
related spur and noise power. As with SFDR, SNR is most often reported in decibels.
SNR = 10 log10
parenleftBigg P
fundsummationtext
Pspur?summationtextPharm
parenrightBigg
(4.117)
Of interest in the analysis is whether the phase truncation generates spurs harmonically
related to the fundamental tone. The total harmonic distortion (THD) is defined as the ratio
97
of the power of the single fundamental tone and the summed power all tones harmonically
related to the fundamental tone.
THD = 10 log10
parenleftBigg P
fundsummationtext
?
k=2Pk
parenrightBigg
(4.118)
where Pk is power of the kth harmonic relative to the fundamental tome Pfund. Again, the
result is generally reported in decibels.
4.6.1 SFDR
In Section 4.4, it was noted that the denominator of the window function completely
determines the relationship between ROM spectrum replicas. In this section, the SFDR
of the ideal SCMF in the presence of phase truncation is derived. To begin the analysis,
consider Figure 4.3b from the example. Keep in mind that this term is in the denominator,
thus lower values corresponds to a larger tones. For the ?P = 1 case, the magnitude of the
denominator of the window function monotonically increases with increasing k from zero to
?P/2?1 and monotonically increases with decreasing k from zero to??P/2.
This is always the case for ?P = 1, regardless of the value of ?P. This can be seen by
plugging ??1P = 1 into the denominator of the window function (Equation 4.109).
1
sin
parenleftBigpik
?P
parenrightBig = csc
parenleftBiggpik
?P
parenrightBigg
(4.119)
Plugging in at the extreme values, sin(0) = 0 and|sin(pi(??P/2)/?P)|= sin(pi/2) = 1 and
lastly |sin(pi(?P/2)/?P)| = sin(pi/2) = 1. Therefore the largest replicas for ?P = 1 case
occur where the the denominator is least or the nearest tone to the right of k = 1 and to the
left of k =?1. Going back to the example, the worst case spur is located at
kspur = RP?1 (4.120)
98
The next worst spur would occur atRP +1 then 2RP?1 and so forth such that the locations
of the spurs are
kspur?{nRP?1,nRP + 1}, n?{0,1,...,?E/2} (4.121)
and the magnitudes of the spurs are arranged from largest to smallest with increasing value
of k and consequently increasing value of n. Recall that only the relationship of the denom-
inator matters, therefore the SFDR for the ?P = 1 case can be written as the ratio of the
denominators of two window functions (writing them as the ratio of two window functions
would work as well, but the numerators cancel out anyway).
csc (pikfund/?P)
csc (pikspur/?P) =
csc (pi/?P)
csc (piRP/?P?pi/?P) (4.122)
= sin (piRP/?P?pi/?P)sin (pi/?
P)
(4.123)
Recalling that RP = ?P/?E,
sin (pi(?P/?E)/?P?pi/?P)
sin (pi/?P) =
sin (pi/?E?pi/?P)
sin (pi/?P) (4.124)
Using the argument difference trigonometric identity for sine, the equation reduces further
to
sin (pi/?E) cos (pi/?P)?cos (pi/?E) sin (pi/?P)
sin (pi/?P) = sin (pi/?E) cot (pi/?P)?cos (pi/?E)
(4.125)
It is now important to reiterate how the least periods are related to the number of
states in the phase accumulator and the FCW. Both ?P and ?E are functions of F, so
99
Equation 4.125 can be rewritten in terms of the original design parameters.
?P = NPGCD(F,N
P)
(4.126)
?E = NEGCD(F,N
E)
(4.127)
Substituting back into Equation 4.125
SFDR = sin
parenleftBiggpiGCD(F,N
E)
NE
parenrightBigg
cot
parenleftBiggpiGCD(F,N
P)
NP
parenrightBigg
?cos
parenleftBiggpiGCD(F,N
E)
NE
parenrightBigg
(4.128)
An alternative expression could be created substituting back into Equation 4.124 as well
SFDR = sin
parenleftBigpiGCD(F,N
E)
NE ?
piGCD(F,NP)
NP
parenrightBig
sin
parenleftBigpiGCD(F,N
P)
NP
parenrightBig (4.129)
Recall that NE | NP in the current analysis which by definition implies there is an
integercsuch thatcNE = NP. ButNQ = NP/NE has already been defined as the number of
address states in the SCMF.NE|NP for there to be any realizable optimization in hardware
due to truncation. So there is further simplification to the expression yet.
GCD(F,NP) = GCD(F,NQNE) = GCD(F,NE)GCD(?E,NQ) (4.130)
where ?E = F/GCD(F,NE). Substituting this into Equation 4.129
SFDR = sin
parenleftBig
piGCD(F,NE)
bracketleftBig 1
NE ?
GCD(?E,NQ)
NP
bracketrightBigparenrightBig
sin
parenleftBigpiGCD(F,N
E)GCD(?E,NQ)
NP
parenrightBig (4.131)
If GCD(F,NE) = NE then there is no truncation and infinite SFDR. Therefore, for any
interesting SFDR value, the argument GCD(F,NE)/NE has a minimum value of 1/NE and
maximum possible value of 1/pl where pl is the least prime factor of NE. Without loss of
generality, the smallest prime integer is 2 and therefore the maximum possible value of the
100
term is 1/2. Now consider the argument GCD(?E,NQ)/NP, it has a minimum value of 1/NP
and a maximum possible value of (NQ/2)/NP = 1/2NE.
4.6.2 Worst Case SFDR
To gain an intuition for how the SFDR changes withNP, NE andF, the numerator and
denominator of Equation 4.131 are analyzed. First we fix GCD(?E,NQ) and see how the
numerator and denominator change with increasing GCD(F,NE). NE = 32 and NP = 4096
in the plotted figures. From the plot of the numerator in Figure 4.6a and the denominator
in Figure 4.6b, it appears both increase monotonically with increasing GCD(F,NE). Closely
inspecting the arguments in the numerator and denominator reveal this to be the case as
well, as the arguments increasing monotonically with increasing (recall that the maximum
value of GCD(?E,NQ)/NP = 1/(2NE) so 1/NE?1/(2NE) is always a positive quantity).
Since sine monotonically increases from 0 topi and the limits of the arguments are computed
in the previous paragraph and shown to be within that interval, it is always the case that
both increase monotonically.
0
0.2
0.4
0.6
0.8
1
2 4 6 8 10 12 14 16
Numerator
(Equation
4.137)
GCD(F,NE)
GCD(GP,NQ) = 1GCD(GP,NQ) = 2
(a) Numerator of SFDR Function
0
0.005
0.01
0.015
0.02
0.025
2 4 6 8 10 12 14 16
Denominator
(Equation
4.137)
GCD(F,NE)
GCD(GP,NQ) = 1GCD(GP,NQ) = 2
(b) Denominator of SFDR Function
Figure 4.6: Numerical Validation
101
Now it must be determined whether the Equation 4.131 decreases or increases with
increasing GCD(F,NE). Figure 4.7 certainly seems to indicate that the SFDR decreases
with increasing values of GCD(F,NE). There is another way to show prove this to be the
40
60
80
100
120
140
2 4 6 8 10 12 14 16
Equation
4.137
GCD(F,NE)
GCD(GP,NQ) = 1GCD(GP,NQ) = 2
Figure 4.7: SFDR Function (Magnitude)
case for arbitraryNE andNP. Instead of computing the derivative and dealing with difficult
trigonometry, several values of GCD(F,NE) can be computed. First let GCD(F,NE) = 1,
SFDR1 = sin
parenleftBig
pi
parenleftBig 1
NE ?
GCD(?E,NQ)
NP
parenrightBigparenrightBig
sin
parenleftBigpiGCD(?
E,NQ)
NP
parenrightBig (4.132)
The next prime number after 1 is 2, so now GCD(F,NE) = 2 is evaluated
SFDR2 = sin
parenleftBig
2pi
parenleftBig 1
NE ?
GCD(?E,NQ)
NP
parenrightBigparenrightBig
sin
parenleftBig2piGCD(?
E,NQ)
NP
parenrightBig (4.133)
Applying the double angle formula to the both the sine term in the numerator and denomi-
nator yields
SFDR2 = 2 sin
parenleftBig
pi
parenleftBig 1
NE ?
GCD(?E,NQ)
NP
parenrightBigparenrightBig
cos
parenleftBig
pi
parenleftBig 1
NE ?
GCD(?E,NQ)
NP
parenrightBigparenrightBig
2 sin
parenleftBigpiGCD(?
E,NQ)
NP
parenrightBig
cos
parenleftBigpiGCD(?
E,NQ)
NP
parenrightBig (4.134)
102
NotethattheSFDR1 termappearsasamultiplicationterminSFDR2. Andthustheprevious
equation reduces to
SFDR2 = SFDR1
?
?cos
parenleftBig
pi
parenleftBig 1
NE ?
GCD(?E,NQ)
NP
parenrightBigparenrightBig
cos
parenleftBigpiGCD(?
E,NQ)
NP
parenrightBig
?
? (4.135)
Recallthatcosinemonotonicallydecreasesfromzerotopi/2. Fromthepreviousdiscussions,it
was noted that 2NE?NP and typicallyNElessmuchNP. For the smallest value of GCD(?E,NQ),
i.e. 1, thedenominatorclearlydominatessincepi/NP <pi/(1/NE?1/NP) andthusSFDR2 <
SFDR1. For the largest possible value of GCD(?E,NQ), i.e. NQ/2, pi/(2NE) = pi/2NE, in
which case SFDR2 = SFDR1. Thus for every possible value of GCD(?E,NQ) except for one
case the SFDR decreases with increasing GCD(F,NE) and in that case, the SFDR is flat.
This means that for a given number of phase and error states, the worst case SFDR can
be directly calculated. It occurs for the largest possible GCD(F,NE) less than NE and the
largest possible GCD(?E,NQ) less than NQ. First set the largest possible GCD(F,NE) =
NE/pe, where pe is the least prime factor of NE. Substituting this expression into Equa-
tion 4.131
sin
parenleftBig
pi
parenleftBigN
E
pe
parenrightBigparenleftBig 1
NE ?
GCD(?E,NQ)
NP
parenrightBigparenrightBig
sin
parenleftBig
pi
parenleftBigN
E
pe
parenrightBigparenleftBigGCD(?
E,NQ)
NP
parenrightBigparenrightBig =
sin
parenleftBigpi
pe ?
piGCD(?E,NQ)
peNQ
parenrightBig
sin
parenleftBigpiGCD(?
E,NQ)
peNQ
parenrightBig (4.136)
= sin
parenleftBiggpi
pe
parenrightBigg
cot
parenleftBiggpiGCD(?
E,NQ)
peNQ
parenrightBigg
?cos
parenleftBiggpi
pe
parenrightBigg
(4.137)
Now the we simply insert the worst case value for GCD(?E,NQ) = NQ/cq to arrive at
the final value. Here cq ranges from NQ if ?E and NQ are coprime to pq, the smallest prime
factor that is common to ?E and NQ, if they are not coprime. Finally, this yields
SFDRwc = sin
parenleftBiggpi
pe
parenrightBigg
cot
parenleftBigg pi
pecq
parenrightBigg
?cos
parenleftBiggpi
pe
parenrightBigg
(4.138)
103
Note that Equation 4.138 is an exact expression for the worst possible SFDR due to phase
truncation spurs for all possible FCW combinations under the assumptions presented in this
section.
An expression was developed for the special case whereNP andNE are powers of two in
Torosyan?s dissertation. If the analysis above is correct, then Equation 4.138 should reduce
to the following:
SFDRwc = cot
parenleftBigg pi
2NQ
parenrightBigg
. (4.139)
Which is a surprising result considering all the complicated analysis required to arrive here.
This is one of the few times where the final solution to the exact analysis provides a simpler
result than the solution arrived at by approximations. Let NP and NE be powers of two.
Then the smallest prime factor ofNE ispe = 2. The largest value of GCD(?E,NQ) is 1. This
is becauseNQ is a power of two as well. But for truncation spurs to exist, GCD(F,NE) must
be less than NE (or else there is nothing to truncate, because the truncation error sequence
is always 0). sin(pi/2) = 1 and cos(pi/2) = 0, so the final result is indeed
sin
parenleftbiggpi
2
parenrightbigg
cot
parenleftBigg pi
2NQ
parenrightBigg
?cos
parenleftbiggpi
2
parenrightbigg
= cot
parenleftBigg pi
2NQ
parenrightBigg
(4.140)
An interesting observation can be made here that unites Jenq?s analysis of SNR with
the SFDR analysis in this section. Equation 2.40 is exactly the same as Equation 4.139,
which means the worst case SFDR is equal to the best case SNR. This is because all of the
harmonic energy is stored in a single spur at the worst case SFDR. The SNR for a spectrum
with a single spur is always equal to the SFDR for the same spectrum. Considering that the
two techniques arrived to the same conclusion through very different paths.
104
4.6.3 Spur Locations
Using Theorem 4.8, the spectrums of frequency words with the same least period are
simple linear permutations of each other and thus the SFDR does not change for a given
?P. Let ?P1 be an arbitrary reduced FCW with same ?P as the ?P = 1 example. Applying
the permutations to spur locations, a closed form solution for the locations of spurs can be
derived,
kspur??1P1?P = kspur??1P1 (4.141)
=
braceleftBig
n??1P1RP?1,n??1P1RP + 1
bracerightBig
, n?{0,1,...,?E/2} (4.142)
wherethemagnitudesofthespursarestillsortedbyincreasingn, thoughnolongerincreasing
k. To find the order of spur magnitude, simply apply the frequency permutation to find its
location for the ?P = 1 case.
Note that the number of spurs and spur locations is only for a pure sinusoidal case.
From Theorem 4.7, it is clear that the entire spectrum of the ROM is copied ?E times over
the full spectrum. Thus if there are hundreds of amplitude quantization spurs or compression
spurs, then all of the quantization spurs are copied in the spectrum. Thus phase truncation
can result in thousands of spurs being introduced into the spectrum. This observation does
not fall out of previous DDFS analysis to the author?s knowledge.
4.6.4 SNR
The signal to noise ratio can also be computed from the closed form analysis of phase
accumulators. The first closed form analysis of this was performed by Jenq in his work and
is given in Chapter 2. To calculate the SNR using this technique, all the spurs must be
calculated directly. Since the spurs can be computed directly without computing the full
DFT, this calculation is not as resource intensive as it may seem. Also, the calculation need
only be calculated for the total number of possible least periods of the phase accumulator
105
(Theorem 4.9). Jenq?s technique only provides an upper bound and may not be used in the
case of a lossy sinusoidal compression technique. The full spectrum calculation in this work
actually uses the the values from the SCMF (Theorem 4.7).
4.7 Architecture Changes for Improved Spurious Response
From the observations in Section 4.3 and Section 4.6, simple modifications to phase
accumulator architecture can yield interesting results. The analysis has shown that small
changes to the FCW result in dramatically different output spectrums. The system can be
made agnostic to the FCW choice producing the same SFDR and SNR.
4.7.1 Force Coprime FCWs
One technique for improving the worst case SFDR of the DCDO in the presence of
phase truncation is to force an FCW that is coprime to the number of states (NP) in the
phase accumulator. This technique, though not described in such terms, is originally pro-
posed by Nicholas in [23]. The computational complexity of the technique is rather small,
as only a single bit needs to be added to the phase accumulator. This is accomplished by
inserting a toggle flip-flop to the carry-in of the first full-adder of the phase accumulator
adder. Conceptually, half of the available phase accumulator states are being discarded,
meaning that an unmodified phase accumulator that only accepted odd-valued FCWs would
produce the same results. The difference in Nicholas? suggestion is that the hardware toggles
a 1 on the carry-in of LSB full-adder to make every FCW coprime (i.e. an efficient hard-
ware realization). Figure 4.8 shows a block diagram of the modification necessary for the
implementation.
The drawback of this implementation is that the SNR of the is degraded and spurs are
introduced in FCWs that do not experience phase truncation. Using Equation 4.137 and
106
D Q
Q
clk
F
C
in
Reg.
B
P
B
P
P
Figure 4.8: Forcing Coprime FCWs
Equation 4.139, the improvement to the worst case SFDR without the modification is
SFDRwc1 = cot
parenleftBigg pi
2NQ
parenrightBigg
= cot
parenleftbiggpiN
E
2NP
parenrightbigg
(4.143)
The worst case SFDR with the modification is the SFDR of a coprime FCW driving a
conventional phase accumulator. Let GCD(F,NE) = 1 and GCD(F,NP) = 1, which is what
it means to force a coprime FCW. Using Equation 4.125 and noting that ?P = NP and
?E = NE for this case,
SFDRwc2 = sin
parenleftbigg pi
NE
parenrightbigg
cot
parenleftbigg pi
NP
parenrightbigg
?cos
parenleftbigg pi
NE
parenrightbigg
(4.144)
Now it simply needs to be shown that this is an improvement over Equation 4.143. This can
be done by dividing SFDRwc2 by SFDRwc1, which yields
SFDRwc2
SFDRwc1 =
sin
parenleftBig pi
NE
parenrightBig
cot
parenleftBig pi
NP
parenrightBig
?cos
parenleftBig pi
NE
parenrightBig
cot
parenleftBigpiN
E
2NP
parenrightBig (4.145)
= csc
parenleftbigg pi
NP
parenrightbigg
tan
parenleftbiggpiN
E
2NP
parenrightbigg
sin
parenleftbigg pi
NE ?
pi
NP
parenrightbigg
(4.146)
107
Figure4.9numericallyshowsthatSFDRwc2 isalwaysbetterthanSFDRwc1 by1.88dB,which
is the extreme case whereNP?2 bits are truncated. In general the improvement of the worst
caseSFDRisbetterthan3.9dBbutnotbymuch. Theexpressionasymptoticallyapproaches
a value around 3.922 dB which prevents any further improvement with this methodology.
2
2.5
3
3.5
4
2 4 6 8 10 12 14
10
log
10
(SFDR
wc
2/
SFDR
wc
1)
BT
NP = 8N
P = 9N
P = 10N
P = 11N
P = 12N
P = 13N
P = 14N
P = 15N
P = 16
Figure 4.9: Modification SFDR Improvement
The technique by Nicholas impacts the frequency output of the DDFS. It effectively
adds an extra bit to the accumulator but forces the FCW to 1.
f0 = FN
P
fclk + 12N
P
fclk (4.147)
Another simple modification can be made to prevent spurs from being generated in
cases where phase truncation does not occur. This technique is novel to this work to the
author?s knowledge. Applying a logic ?or? to each of the truncated bits detects the case of
GCD(F,NE) = NE, in which case no phase truncation spurs occur. Sending the indicator to
a logic ?and? operation on the output of the toggle flip-flop addresses the issue. Figure 4.10
shows the necessary modification.
108
D Q
Q
clk
F
C
in
Reg.
B
P
B
P
P
B
T
Figure 4.10: Forcing Coprime FCWs (Modification)
4.7.2 Phase Accumulator with Prime Number of States
Another technique for improving the worst case SFDR of the DCDO in the presence of
phase truncation is to make the number of states in the phase accumulator a prime number.
One special set of prime numbers that takes advantage of every state achievable by a BP bit
register minus one is a Mersenne Prime, which is a prime number of that form MP = 2p?1.
Table 4.1 provides four Mersenne prime number that can be easily implemented without
losing many states. The implementation is also inexpensive (though not as efficient as the
forced coprime FCW technique in Section 4.7.1).
Table 4.1: List of Mersenne Primes for Phase Accumulation
p MP
13 8191
17 131071
19 524287
31 2147483647
This technique works because every FCW is coprime to a phase accumulator with a
prime number of states. The worst case SFDR for BP = 17 and BT = 5 is 68.325 dBc,
which is computed using Equation 4.124 and converting to decibels. Now consider the
109
improvement by forcing NP = 231 ?1. The major theorems from Chapter 4 cannot be
used because NE - NP, but fortunately, the number of least periods analysis does work.
This means that the SFDR needs to be calculated only once for all FCW (there is only
one possible least period since all the FCW are prime to a prime-valued accumulator). The
spectrum for NP = 217?1 and F = 1441 is plotted in Figure 4.11 and the worst case SFDR
is found to be 72.12 dBc. This is an improvement of approximately 3.8 dB, so the same limit
that existed for modification in Section 4.7.1 exists here.
-120
-100
-80
-60
-40
-20
0
0 0.1 0.2 0.3 0.4 0.5Normalized
Output
Po
wer
(dB)
Normalized Frequency
Figure 4.11: Mersenne Prime (17) Spectrum
This solution is not as elegant as forcing coprime FCWs because the spectrum is always
polluted with spurs despite the 3.8 dB improvement in SFDR. With the modification shown
in Figure 4.10, a direct 3.9 dB improvement is achieved when phase truncation spurs are
present otherwise the spectrum is clean when there is no phase truncation.
110
Chapter 5
Parallelization of Phase Accumulator
A core component of the DCDO is the overflowing accumulator, whose behavior has
been the target of the theory presented in the preceding chapters. The accumulator gener-
ates the periodic sequence analogous to the periodic phase of a sinusoid. The mathematics
surrounding the sequences generated by a phase accumulator has already been discussed in
Section 1.2 and Chapter 3. In this section, the hardware implementations of phase accu-
mulators are presented. Figure 5.1 provides a block diagram of an accumulator with LFM
capabilities.
FrequencyAccumulator
BF
MUX
2:1
BF Freq.
Reg.
BF
FSTEP BF
FSTART BF
SELF
PhaseAccumulator
BP
MUX
2:1
BP PhaseReg.
BP
P
PSTART BP
SELP
F
clk
Figure 5.1: Phase Accumulator with LFM
BF is the number of frequency control word bits, BP is the number of phase accumu-
lator bits, FSTART is the starting FCW when LFM is enabled, FSTEP is the linear frequency
increment, PSTART is the starting phase of the accumulator, fclk is the frequency at which
the accumulators are clocked, F is the value of the frequency register and P is the value of
the phase register. This matches the naming conventions used thus far in this document.
111
FCW0
FCW1
FCW2
FCW3
FCW4
D
D
D
D
D
summationtext
D
D
D
D
D
D
summationtext
D
D
D
D
D
D
summationtext
D
D
D
D
D
D
summationtext
D
D
D
D
D
D
summationtext
D
D
D
D
D
P0
P1
P2
P3
P4
Figure 5.2: Block Diagram of Pipeline Accumulator
5.1 Pipelined Accumulator
The most common accumulator architecture chosen to address the problem of achieving
high frequency operation is the pipeline accumulator [33],[34],[35],[36]. Figure 5.2 presents
a conventional pipeline phase accumulator for five bits of phase accumulation. ?High fre-
quency? is a subjective term but implies operational frequencies relatively high with respect
to the unity-gain band-width product (fT) of target technology. At these frequencies static
rail-to-rail CMOS fails to operate correctly.
The term pipelining refers to inserting a synchronous delay to break large combinatorial
logic into smaller logical blocks with less propagation time. Each ?D? in the figure represents
a D-Flip-Flop (DFF) and each capital Greek ? character is a full adder. In many cases this
allows the overall clock rate of the system to increase dramatically over carry-save and carry
look-ahead architectures. In the most extreme incarnation, every bit of the accumulator is
pipelined, thus the longest delay between two system clocks is a full adder. Using current
mode logic (CML), also called emitter coupled logic (ECL) when bipolar transistors are
used or source coupled logic (SCL) when NFETs are used, extremely high frequencies can be
obtained. The change to CML is not without drawbacks, since the static power consumption
is nearly equivalent to the dynamic power consumption, an issue not present with rail-to-rail
CMOS logic. Figure 5.2 demonstrates the traditional implementation of a pipeline phase (or
frequency) accumulator using 5-bits.
112
A different architecture is necessary for obtaining LFM at multi-GHz clock speeds when
a pipelining architecture is chosen. Figure 5.3 is a novel pipeline architecture that merges the
backend DFFs of the frequency accumulator with the frontend DFFs of the phase accumu-
lator, thus saving a significant amount of area and power. Additional savings can be made
FCW0
FCW1
FCW2
FCW3
FCW4
D
D
D
D
D
summationtext
D
D
D
D
D
D
summationtext
D
D
D
summationtext
D
D
summationtext
D
D
D
D
summationtext
D
D
summationtext
D
D
D
D
summationtext
D
D
summationtext
D
D
D
D
summationtext
D
D
D
D
D
D
summationtext
D
D
D
D
D
P0
P1
P2
P3
P4
Figure 5.3: Block Diagram of Pipeline Accumulator with LFM
by removing the front registers entirely. Clearly this is possible since the FCW is generally
held for many clock cycles and thus there is no need to pipeline. Using this technique causes
undesirable phase jumps when changing FCWs. However, even with the power savings, the
accumulator still requires a large number of DFFs. This leads to a discussion on the benefits
of parallel accumulation.
5.2 Parallel Accumulator
A parallel accumulator is proposed to address the limitations encountered with the CML
full pipeline accumulator. The phase accumulator is an excellent candidate for paralleliza-
tion as the next phase state can be precisely predicted using a closed form equation (see
Section 5.2.2). With this knowledge, the next phase state can be computed at the same time
the current phase state is computed. The high speed circuitry is limited to the upconverting
multiplexer logic needed to interleave the parallel computed phase values. Crafting high
speed True Single Phase Clock (TSPC) logic in the upconverting multiplexer logic allows
113
static CMOS to carrying the data to 2-4 times a typical standard cell library in the target
technology [37].
5.2.1 Prior Art
Parallelizing digital accumulation to increase the throughput of a DDFS has been pro-
posed in a prior IEEE journal article [1] as well as prior patents [38][39][40]. In the first such
patent of its type from Hassun in 1984 [38], the technique placesNP identical DCDO circuits
parallel to each other and then multiplexes the output into a single DAC. In the patent by
Goldberg [39], the entire calculation path from accumulator to ROM is duplicated NP times
and then multiplexed at NP times the core digital clock frequency. Thus there are NP accu-
mulators, NP ROMs, and NP phase and frequency modulation paths. The accumulators are
offset in such as way as to produce an output that when combined sequentially produces the
same output as a DDFS that operates at NP the clock frequency of the individual DDFS
units.
Another interesting approach by Tan [1] uses a single phase accumulator that accu-
mulates at a FCW four times (or more generally NP times) that of the desired FCW. An
additional set of multipliers and adders are used to predict the remaining phase values in
parallel. Figure 5.4 is a block diagram of the architecture proposed by Tan. The authors
of [1] claim frequency modulation support by externally varying the FCW but provide no
measured results. The author of this work believes that several limitations of the architec-
ture should be noted for the LFM case. Firstly, using an off-chip input for the frequency step
word (FSW) limits the speed at which the LFM operates to the speed at which the off-chip
circuitry can update the signal the digital core. This information often comes from a serial
peripheral interface (SPI). Furthermore, in Subsection 5.2.2, we will demonstrate that the
Tan architecture correctly supports LPM but only supports LFM at data rates lower than
the clock frequency even if the FSW can change at the clock rate.
114
Pmod
F
Quadrature
ROM1 (12Bit)
Quadrature
ROM2 (12Bit)
Quadrature
ROM3 (12Bit)
Quadrature
ROM4 (12Bit)
DAC(12Bit)
DAC(12Bit)
?3
?2
?1
?4
Multiplexer
4:1
summationtext
summationtext
summationtext
summationtext
PhaseAccumulator
PhaseRegister
summationtext 32
32
12
12
12
12
12
12
12
12
12
12
14
14
14
14
32
32
32
12
Figure 5.4: [1] Architecture
The architectures become more exciting, as well as complex, in more recent patents [40]
where a finite state machine is used to initialize NP LFM accumulators such that the output
produced can be interleaved to the correct result. Figure 5.5 shows eight parallel traditional
LFM components similar to the one shown in Figure 5.1. The wires controlling the initial-
ization of the frequency and phase states are not drawn to prevent an overly busy block
diagram.
The central observation of the patent is that if proper initial conditions are set for each
LFM core, then the multiplexed signal from several paths is equivalent to the a single high
speed LFM accumulator output. The mathematics involved in the derivation is similar to
that in Section 5.2.2, and thus the derivation will be postponed for a section. The required
initial values for each phase accumulator path is:
Pi[0] = PSTART +iFSTART + 12FSTEP
parenleftBig
i2?i
parenrightBig
(5.1)
where Pi is the phase of the ith accumulator path. Figure 5.6 shows the state machine
from the patent that performs necessary computations to initialize each component. By
controlling its clocking, the LFM accumulators can be initialized before running. Once the
115
32
32
32
Frequency Accumulator
Phase Accumulator
Ph
a
se
R
e
gi
s
te
r
F
r
e
qu
e
n
c
y
R
e
gi
s
te
r
32
32
PEC LUT
Sine
ROM
DAC
32
32
32
Frequency Accumulator
Phase Accumulator
Ph
a
se
R
e
gi
s
te
r
F
r
e
qu
e
n
c
y
R
e
gi
s
te
r
32
32
Sine
ROM
M
UX
 2
:
1
M
UX
 8:
1
6 more pairs of channels
32
32
32
Phase Accumulator
Ph
a
se
R
e
gi
s
te
r
F
r
e
qu
e
n
c
y
R
e
gi
s
te
r
32
32
Sine
ROM
32
32
32
Frequency Accumulator
Phase Accumulator
Ph
a
se
R
e
gi
s
te
r
F
r
e
qu
e
n
c
y
R
e
gi
s
te
r
32
32
Sine
ROM
M
UX
 2
:
1
32
32
Frequency Accumulator
F
r
e
qu
e
n
c
y
R
e
gi
s
te
r
32
Figure 5.5: FSM Chirp-Enabled DDFS with Parallel Processing Path
relationship between the accumulators is correctly setup, the accumulators operate at the
core clock frequency.
The author has developed a different, more direct, approach to parallel phase accumu-
lation. Figure 5.7 shows a DDFS using the alternative architecture. Note that this is not
remarkably different at this level from any of the other parallel accumulator implementa-
tions. The portion of the design that improves upon its predecessors and achieves novelty is
the implementation of the parallel LFM phase accumulator itself.
116
M
MU
X 
2:
1
Sta
r
t
Ph
a
se
CE
Sta
r
t
F
r
eq.
CEMU
X 
2:
1
MU
X 
2:
1 Step
F
r
eq.
CE
?2
M
MUX
F
r
eq.
Step
CE
PE
C
F
r
eq.
CE
Finite
State
Machine
Figure 5.6: Finite State Machine for Parallel Processing Path
5.2.2 Derivation of LFM Enabled Architecture
As claimed in Section 5.2, phase accumulation is a deterministic process for which
simple closed form equations can be derived. Equation 5.2 expresses the state of the phase
accumulator at the nth clock cycle:
Pn = Pn?1 +F (5.2)
where F is the frequency control word (FCW), or linear phase step size, given to the accu-
mulator and Pn is the phase state at clock cycle n. Recursively evaluating Equation 5.2, we
can predict the future values of the phase state at m clock cycle advanced from n.
Pn+1 = Pn +F
Pn+2 = Pn+1 +F = Pn + 2F
Pn+m = Pn+m?1 +F = Pn +mF (5.3)
117
a0a1a2a1a3a3a4a3a5a6a6a7a8a7a3a1a9a10a2
a11a12a9a4a13
a2a12a9
a13a14a1a12a4a15
a13a14a1a12a4a16
a13a14a1a12a4a17a18a19a15a20
a13a14a1a12a4a21
a13a14a1a12a4a22
a13a14a1a12a4a18
a6a3a23
a24a25a24a9a26a13a14a1a12a4
a24a25a24a9a26a11a2a4a27a7a4a25a6a28
a0a14a1a12a4a18a10a29a7a3a1a9a24a10a25a5a2a2a1a28
a30a31
a30a32
a30a32
a30a32
a30a32
a33
a33
a33
a34
a35
a36
a37
a38
a39
a34
a35
a36
a37
a38
a39
a34
a35
a36
a37
a38
a39
a40
a41
a42
a43
a44
a45
a46
a47
a48
a44
a49
a50
a43
a51
a44
a52
a40
a41
a42
a43
a44
a47
a48
a44
a49
a50
a43
a51
a44
a52
a40
a41
a42
a43
a44
a53
a48
a44
a49
a50
a43
a51
a44
a52
a33
a33
a33
a30a32
a30a32
a54a55a56a57
a58a59a60
a61a62a63
a54a55a56a57
a58a59a60
a54a55a56a57
a58a59a60
a33
a33
a33
a33
a33
a33
a34
a35
a36
a37
a38
a39
a34
a35
a36
a37
a38
a39
a34
a35
a36
a64
a34
a65
a66
a67
a38
a39
a33
a33
a33
a0a68a69a70a71a0a18
a11a2a4a27
a72
a73
a74
a75
a53
a48
a44
a49
a50
a43
a51
a44
a52
a72
a73
a74
a75
a45
a46
a76
a48
a44
a49
a50
a43
a51
a44
a52
a77a69a71a78a70a17a18a70a21a20
a77a69a71a78a70a18
a77a69a71a78a70a17a18a70a22a20
a30a31
a33
a33
a33
a77a69a71a78
a30a31
a30a32
a30a32
a30a32
a30a31
a30a31
a30a31
a30a32
a30a32
a30a31
a30a31
a5a25a1a3a10a79
a69a14a24a2a13
Figure 5.7: Proposed DDFS Using Novel Parallel Accumulator
The previous closed form expression gives the Tan architecture in Figure 5.4 when setting
m = 4. Note however that F is assumed constant. This assumption must be relaxed to
accurately predict the phase under LFM conditions.
Equations 5.4 and 5.5 express the state of the phase variable at the nth clock cycle.
Fn = Fn?1 +F0 (5.4)
Pn = Pn?1 +Fn (5.5)
where F0 is the frequency step word (FSW), Fn is the FCW at clock cycle n and Pn is again
the phase state at clock cycle n. Evaluating the Equation 5.5 recursively, we can predict the
118
future values of the phase variable.
Pn+1 = Pn +Fn+1 = Pn +Fn +F0 (5.6)
Pn+2 = Pn+1 +Fn+2 = Pn + 2Fn + 3F0 (5.7)
Pn+3 = Pn+2 +Fn+3 = Pn + 3Fn + 6F0 (5.8)
Pn+m = Pn +mFn +
parenleftBiggm2 +m
2
parenrightBigg
F0 (5.9)
Likewise the future value of the frequency variable can be computed as
Fn+1 = Fn +F0 (5.10)
Fn+2 = Fn+1 +F0 = Fn + 2F0 (5.11)
Fn+m = Fn+m?1 +F0 = Fn +mF0 (5.12)
Now we can now analyze the Tan architecture of Figure 5.4 assuming that F is not a
constant but instead changes over time (as required for LFM support). Note that there is
not frequency prediction for F but instead it can only change at the rate of the low speed
accumulator clock frequency. To achieve the correct phase value in the parallel computation,
Equation 5.13 must be observed.
Pn+m = Pn +
msummationdisplay
k=1
Fn+k (5.13)
Thus frequency modulation appears to have a zero order hold on signal and introduces
undesirable modulation on the synthesized output.
5.2.3 Area and Power Growth Analysis
Without much effort, the number of DFFs and full adders required to implement a
pipeline accumulator can be derived. Equation 5.14 expresses the number of full adders in
119
an accumulator with the architecture shown in Figure 5.2 as a function of the number of
bits in the accumulator, BP.
Nfa (BP) = BP (5.14)
The number of DFF cells for the same architecture can also be expressed as a function of
BP.
Nff (BP) = 2
bracketleftBiggB2
P +BP
2
bracketrightBigg
+BP (5.15)
For a pipeline accumulator with on-chip LFM (Figure 5.3) the number of full adders is given
by Equation 5.16.
Nfa (BF,BP) = BP +BF (5.16)
Likewise, the number of DFF cell also becomes a function of BP and BF as shown in
Equation 5.17
Nff (BF,BP) = 2
bracketleftBiggB2
P +BP
2
bracketrightBigg
+ 2BF +BP (5.17)
Note that both solutions grow quadratically with the number of bits in the phase accumu-
lator. Assuming a modest 3 mW per DFF, which is a reasonable assumption for an SiGe
HBT CML architecture operating at the desired clock frequency, the accumulator alone will
consume more than 3 W of power.
Table 5.1: Comparison of Accumulators
Process Architecture VDD fclk (GHz) Power Bits
[1] CMOS Parallel 5 V 0.8 N/A 32
[35] InP Full Pipeline N/A 9.2 N/A 8
[41] InP Ripple Carry N/A 13 2.13 W 8
[34] InP Full Pipeline 3.6 V 32 4.9 W 8
[36] BiCMOS Full Pipeline 3.3 V 6.2 0.825 W 9
[33] BiCMOS Full Pipeline 3.3 V 8.6 N/A 11
[42] BiCMOS Ripple Carry 3.3 V 5 N/A 24
This Work BiCMOS Parallel 1.2V 6.4 0.3 W 32
120
5.2.4 Hardware Implementation
The equations from Section 5.2.2 have been realized in HDL. The chip never made it out
for fabrication, but the simulations appear to work correctly. Figure 5.8 shows the frequency
and phase predictive steps. It may not be immediately apparent why this methodology is
an improvement over [40].
1. No finite state machine logic required. This reduces logical complexity in the imple-
mentation.
2. No initialization sequence required when changing the chirp rate. This reduces latency
between generating signals.
3. Extra frequency and phase modulation only needs to be inserted at the frequency
and phase accumulators. Instead of the 8 adders required in Figure 5.5, only a single
additional adder is required. The diagram in Figure 5.7 also indicates that using the
8 adders external to the parallel phase accumulator also works.
Figure 5.8: Frequency and Phase Predictive Step
121
The proposed architecture has an apparent drawback in that multiplications are re-
quired. These are constant multiplications, and thus can be implemented efficiently as shift
and add circuits. Any multiplication by a power of two is simply a bit shift to the left,
which means the operation is completely free. If the chirp rate does not need to change in
one fast clock cycle, and most likely it does not, then the multipliers can be computed at
a much slower rate (though this influences FSW switching latency). Figure 5.9 shows how
the signals from the predictive circuitry are fed into the phase accumulator portion of the
design.
32
Phase Accumulator
Ph
a
se
R
egi
s
ter
32
32
Ph
a
se 
M
R
egi
s
ter
Ph
a
se 
5
R
egi
s
ter
Ph
a
se 
4
R
egi
s
ter
Ph
a
se 
3
R
egi
s
ter
Ph
a
se 
2
R
egi
s
ter
. 
. 
.
Phase Step Prediction
ppxM
ppx4
ppx3
ppx2
ppx1
. 
. 
.
ppx(M-1)
. 
. 
.
Ph
a
se 
1
R
egi
s
ter
phase1
phaseM
phase5
phase4
phase3
phase2
clk
rst
fstep
32
32
32
32
32
32
32
32
32
32
32
init_frequency
in
it_p
ha
se
Figure 5.9: Parallel Phase Accumulator using Predictive Step
122
5.3 Multiplexer Upconversion Analysis
Thefour-to-onemultiplexershownattheoutputofFigure5.4isatwostagesynchronous
parallel to sequential data converter. Similar data multiplexing structures can be found in
commercial DACs [43]. This is because the state-of-the-art FPGAs from Xilinx (the Virtex
7 at the time of this writing) and similarly Altera cannot output a data stream at the data
rate of modern high-speed DACs. Figure 5.10 shows a symbolic diagram of the upconverting
multiplexer.
Multiplexer
2:1D3
D2 D
Multiplexer
2:1D1
D0 D
Multiplexer
2:1 DCLK1X
CLK2X
CLK4X
S
S
S
Zseq
Figure 5.10: 4-to-1 Upconverting Multiplexer
Consider an N parallel data paths of BA bits each. Let the clock at the lowest level
be fc0. Let the clock of the DAC be fcM. Generally, the clock rate doubles and bus width
halves at each upconverting multiplexer stage. Let fcml be frequency at which a technology
must switch from CMOS rail to rail logic cells to CML cells.
There are five unit cells in the design, ignoring the clock distribution tree, that are
required to construct the multiplexer tree.
? CMOS DFF for lower speed synchronization.
? CMOS Multiplexer for lower speed data multiplexing.
? CMOS to CML Converter for taking the CMOS to CML for higher speed operation.
? CML DFF for higher speed synchronization.
123
? CML Multiplexer for higher speed data multiplexing.
The CMOS standard cell libraries provided vendors, such as IBM or ARM, do not op-
erate at the maximum achievable operating frequency for CMOS rail-to-rail logic obtainable
in a given process. Custom CMOS design may be used to obtain an extra stage of upcon-
version before switching to CML. Since the bus width is halved at each stage, the number
of required stages can be calculated by
NS = log2 (N) (5.18)
where NS is the number of upconversion stages and N is the number of parallel data paths.
Using Figure 5.10, it clearly follows that the number of multiplexer cell is
Nmux =
NS?1summationdisplay
n=0
2nBA (5.19)
where Nmux is the number of multiplexer cells. Each multiplexer of Figure 5.10 is composed
of BA multiplexer cells. Similarly the number of DFFs is
Ndff =
NS?1summationdisplay
n=0
2nBA (5.20)
where Ndff is the number of DFFs required in the multiplexer tree. It may be possible to
eliminate some of the DFFs is algorithms to control the multiplexing and gating logic over
temperature and process variation are implement. But for the purposes of this analysis,
DFFs are assumed to capture the data at each new upconverted data rate. The frequency
at the nth stage is
fcn = 2nfc0 (5.21)
At the stage where fcn < fcml, the CMOS to CML converter must convert the signal
from a rail-to-rail operating voltage to CML voltage levels. Let this conversion happen at
stage SC. The conversion operation in some instances can be integrated into a hybrid CML
124
R1
Q1 Q2 Q3 Q4
R2
Q5 Q6
Q7
M1
R3
D0p D0m D1p D1m
SELp SELm
Vref
VCC
Figure 5.11: CML Multiplexer
multiplexer cell. Figure 5.11 shows a traditional CML multiplexer built from HBTs. The
CMOS used in the current source helps reduce the required voltage supply, giving the design
a 10 to 20 percent power savings.
Several publications report a threshold at which CM4L outpaces CMOS performance in
terms of power and operating frequency [7]. This should be independently investigated for
the target technology for proper optimization of the upconverting multiplexer tree. Some
analysis in literature is naive in that the static power draw of CML is not incorporated into
the power analysis. Low activity bits may benefit from CMOS rail-to-rail implementations
over CML at a given frequency if the dynamic power consumption is somewhat higher.
Furthermore, CML generally requires a higher supply rail and additional passive resistors
that can be large in comparison to the active devices.
125
5.4 Behavioral HDL Synthesis
One of the contributions of this thesis is an improved technique for dynamically gen-
erating synthesizable HDL code and the corresponding test structures required to verify
the synthesized result. These techniques flowed from previous work as a Computer Science
undergraduate student at Louisiana State University (LSU).
5.4.1 Problems with Existing Techniques
A common approach to realizing reconfigurable HDL designs is nesting hundreds of
printf or echo statements in programming language control structures [44]. This technique,
whose commonality flows from the obviousness of its implementation, is error prone and
inefficient in several ways. Firstly, the target language code is not properly highlighted
during design since it is nested between quotation marks and parentheses and preceded by
source language functions. The target language design tools are therefore not utilized during
the design of the module. In contrast, templating allows syntax highlighting to function
properly while designing the module because a majority of the template is written in the
target language. With printf the code is placed in strings and is likely highlighted as if
the code is for the source language. The difference here should not be understated. Proper
spacing, syntax highlighting and other integrated development environment (IDE) features
foster a more friendly, productive design environment.
Secondly, sinceeverylineofcodeiswrittenina printf statement, designersoftenforego
proper code comments. Should a new designer take over the work or an independent design
reviewer engaged, he will find a massive body of strings written in some source language
with little idea of what is actually happening. The original author might choose to write the
comments in the source language to address this issue, but the produced code may still not
have sufficient documentation. Since the template language is written in the target language,
comments flow as naturally as when coding entirely in that target language. In fact, the
126
comments themselves can take advantage of the templating system and produce dynamic
content based on template settings.
The last problem highlighted here is failure to properly separate design concerns. With
the printf approach, the target language code is written in the source language formatting
and oftentimes in the same file. There are many difficulties that arise from this scenario.
Version control becomes problematic, as any change to the structure of the target language
design create a modification to the source language program that generates the code. With
templating, the template and the code that generates the values used for the template are
tracked effectively and correctly as separate entities. Reusability also takes a significant
hit when using embedding the target language within the source language. While carefully
creating libraries of printf type statements can arguably accomplish reusability similar
to using a template, the reuability in the template solution is inherently accomplished.
Templates are themselves practically defined as reusable code objects.
5.4.2 A Simple Example
The following code creates a combinatorial block that implements the erfc function
assuming the input and output integers are normalized values between zero and one. The
setup uses Python with the numpy [45] package to provide array math functionality similar
to MATLAB and the Python mako [46] package provides the template engine. The first step
is to write a template for handling combinatorial cases in Verilog. Listing 5.1 shows the
template required for the example.
Listing 5.1: Combinatorial Logic Template (comb.v)
module ${name}(address , value );
/? Parameters ?/
parameter integer BX = ${BX};
parameter integer BY = ${BY};
/? Declare input ports ?/
input wire [BX?1:0] address ;
127
/? Declare output ports ?/
output reg [BY?1:0] value ;
always@(address) begin
case(address ):
% for i in range( len(values) ):
${address [ i ]}: value <= ${values [ i ]};
% endfor
endcase
end
endmodule
The template requires a variable that specifies the name of the module (name), one
that holds the number of bits in the x word (BX), one that holds the number of bits in
the y = erfc(x) word (BY), an array that holds the possible combinatorial logic addresses
(address) and an array that holds the erfc values (values). Anything with a dollar sign
($) preceding a word enclosed in curly brackets is evaluated to a string and replaced during
template evaluation. The ?%? starts a templating construct. In this instance a for loop is
used to populate the Verilog case statement by traversing through all the address and erfc
values. Note that any Verilog combinatorial lookup table can be generated by using the
comb.v template.
Next the logic for generating addresses and values is written in the source language.
The first part of Listing 5.2 shows the necessary code for calculating the erfc function in
Python. The second part of Listing 5.2 demonstrates how to evaluate the template and
generate the Verilog code.
Listing 5.2: Template Evaluation Using Python
# Import modules
from mako. template import Template
from numpy import linspace , floor , array
from scipy . special import erfc
# Perform the erfc calculations
BX = 8
BY = 12
x = linspace (0 , 1, 2??BX, endpoint=False)
y = erfc (x)
# Convert to integer values
128
values = array( floor (y ? 2??BY ? 1) ,
dtype=int)
address = array(x ? 2??BX, dtype=int)
# Evaluate the template
name = ?CombErfc ?
print Template(filename=?comb.v ? ). render(
name=?CombErfc ? ,
BX=BX,
BY=BY,
address=address ,
values=values
)
A fragment of the resulting output is provided in Listing 5.3. Obviously this particular
examples is easily realized using printf, though perhaps not quite as elegantly. However,
Section 6.5.1 shows the results of templating used to produce code for an inverse sinc FIR
filter.
Listing 5.3: Template Output
module CombErfc(address , value );
/? Parameters ?/
parameter integer BX = 8;
parameter integer BY = 12;
/? Declare input ports ?/
input wire [BX?1:0] address ;
/? Declare output ports ?/
output reg [BY?1:0] value ;
always@(address) begin
case(address ):
0: value <= 4095;
1: value <= 4076;
2: value <= 4058;
3: value <= 4040;
4: value <= 4022;
5: value <= 4004;
6: value <= 3986;
129
5.4.3 EDA Scripts
Successful implementations of complex System-on-Chip (SoC) designs require rigorous
design methodologies and tool flows. Oftentimes circumstances, whether cost driven or
foundry driven or feature drive, dictate that tools from different vendors be used at various
stages of the design process. As an example, the digital simulation and verification may be
done using Mentor Graphic?s Questa, the digital RTL synthesis may be done by Synopsys?
Design Compiler, analog design may be performed in Cadence?s Virtuoso tool suite and
Mentor?s Calibre might be used for physical verification. Even when using the same vendor
for the entire tool flow, integration between tools may be lacking.
Generally, the glue between stages of the design process are command scripts. These
might be written in Skill, TCL, Scheme or any number of other languages and they act
as bridges to prepare the export artefacts of one tool for insertion into the next. As an
example, the GDSII of digital marco generated by digital synthesis may need to be imported
into an custom analog integrated circuit suite for integration. Automating the bridging code
is important in reducing the risk of making mistakes by having a person manually edit these
files for each design entity that must pass through the flow. This also provides tractability
and reproducibility of results.
Listing 5.4 is an excerpt from a file required by Cadence?s ihdl command for importing
a Verilog netlist as a schematic.
Listing 5.4: ihdl Import Example
?? The target library
dest_sch_lib := ${libname}
?? Reference libraries (where to search to
?? resolve external references )
ref_lib_list := ${ref_lib_list}
?? Do not overwrite a module with the same
?? name
import_if_exists := ${import_override}
?? Makes a symbol for the module
130
import_cells := ${import_cells}
The template generation code can be in the same file that generated the HDL originally.
Instead of remembering hundreds of commands and language syntaxes, the instructions on
how to perform a task can be looked up one time and placed into a template. The only
required language experience then is the source language, which in this case is Python.
Another important feature to note is that the templating actually provides a layer of
abstraction. If a vendor changes the syntax of a command file, the only change necessary is a
new or modified template. The source language implementation for generating the bridging
command code does not change.
5.4.4 Optimization
Optimization is another area that text templating proves itself as a useful tool. Lan-
guages such Cadence?s spectre provide some degree of parametrization in designs. Verilog
also continues to improve its generation and parametrization constructs. In most cases how-
ever, the parameters in the language allow for minor changes in the structure of the design,
such as the width and length of a MOSFET. With templating, one could just as easily sweep
through different MOSFET models as the parameters of a single model. The versatility of
text templating allows for the introduction of interesting optimization algorithms, such as
mixed-integer optimization.
Yet another benefit of driving optimization through a general purpose programming
environment is the capability of adding new optimization algorithms. Some vendors hide
optimization features behind expensive licenses, and even then, the designer is limited to
the algorithms chosen by the vendor. With templating, one could use MATLAB?s optimiza-
tion toolbox to drive an optimization, making adjustments of design variables through text
templating.
131
Chapter 6
Radar Application
Although DDFS is used in a wide array of applications, one application that is par-
ticularly well suited for the technique is radar. Firstly, complex waveforms can be directly
generated in the digital domain. DDFS allows one to quickly respond to environmental
changes by adjusting the waveform to different frequencies. The DDFS is trivially phase
continuous and with some simple modifications can be made phase coherent. This chapter
demonstrates how DDFS was incorporated in designs at Auburn University for radar and
BIST applications.
6.1 Previous DDFS Designs
The original target application for a highly efficient DDFS architecture was an analog
built-in self-test (BIST) system, described in detail by Qin in his dissertation [13]. Briefly
summarized, the BIST circuit generated two sinusoidal test tones, summed them together
and sent the signal to a DAC. The analog signal is then mixed to the frequency of the
devices that requires self-healing and then allowed to propagate through an ADC receiver
path. A third, high fidelity DCDO is used to extract in-phase and quadrature phase at a
certain frequency from the samples of the ADC. The extracted information is then processed
through an algorithm to determine the magnitude and phase of the signal at that frequency.
This allows for the linearity measurements, spur searching and various other techniques.
6.1.1 Sine Wave Symmetry
All of the MTM, BTM and CORDIC SCMF analysis compyte only one quarter of the
sinusoidthatistobesynthesized. Sinusoidalfunctionsnaturallyhaveaconvenientsymmetry
132
In
ter
c
h
an
ge
Phase to Amplitude Conversion
Cosine Mapping Function
Sine Mapping Function
14
Conditional
Negate
Conditional
Negate
12
MSB 1
MSB 2
MSB 2
MSB 3
MSB 1
Phase
Word
Figure 6.1: Quadrature, Quarter Sine Compression
that allows one to compress the size of the ROM in a lossless manner. Figure 6.1 shows the
logic necessary to implement quarter sine compression in the quadrature case. Many authors
use a 1?s complement technique to invert the phase, which is also used in Figure 6.1. The
1?s complement technique was dicussed in more detail in Section 1.1.2 and Figure 1.2.
Mathematically it is also easy to demonstrate the quarter wave symmetry using trigono-
metric identities. Equations 6.1 and 6.2 show how the amplitude of the sine and cosine are
related to the sign of the phase. These alone allow for a half-wave sinusoidal compression.
Adding Equations 6.3 and 6.4 to the mix allow the half-wave to be reduced by another factor
of two, yielding quarter-wave sinusoidal compression.
cos (?) = cos (??) (6.1)
sin (?) =?sin (??) (6.2)
cos
parenleftbiggpi
2 ??
parenrightbigg
= sin (?) (6.3)
sin
parenleftbiggpi
2 ??
parenrightbigg
= cos (?) (6.4)
133
6.1.2 MTM DDFS
The first DDFS design by the author at Auburn University was conceived as an exper-
iment to push Register Transfer Level (RTL) code through synthesis and place-and-route
using the Cadence digital tool flow. None of the students in the Auburn University Radio
Frequency research group had used digital synthesis tools at the time, but a flow needed to
be developed for upcoming designs of increasing complexity. A symmetric multipartite table
method (MTM) DDFS borrowing ROM values from [47] with corrections from the published
result was written in Verilog. Figure 6.2 shows the block level implementation of the MTM
DDFS. The design implemented dynamic element matching (DEM) at the output of the
SCMF to randomize mismatch errors in the thermometer coded portion of the DAC.
SPI
(3-Pin)
SCLKSIN
GaloisLFSR
(18Bit)
Multipartite
Table(12Bit)Truncate Truncate DAC(12Bit)
PhaseAccumulator
PhaseRegister
summationtext 2424
FCW
24
summationtext24
Phase
15
3
15 12 12 VOUT
CLK
Figure 6.2: MTM DDFS Block Diagram
Table 6.1 shows the initial value table for the MTM DCDO. These values provide an ac-
curate starting point from which smaller offset tables are used to construct the remainder of
the sine. Both MTM and BTM are piece-wise linear approximations of transcendental func-
tions and thus both require accurate initial value tables. Typically the Table of Initial Values
(TIV) dominates the ROM size, as it stores the most amplitude resolution information.
Table 6.2a is one of the offset tables (TO) used in the MTM design and Table 6.2b
is the second offset table used in the MTM design. Arguably the clearest explanation of
multipartite table methods is supplied by [12]. To summarize, in the bipartite method
134
Table 6.1: Table of Initial Values
Addresses Values
0, 1, 2, 3 12, 37, 61, 86
4, 5, 6, 7 110, 134, 158, 182
8, 9, 10, 11 206, 230, 254, 278
12, 13, 14, 15 301, 324, 347, 370
16, 17, 18, 19 393, 415, 437, 459
20, 21, 22, 23 481, 502, 523, 544
24, 25, 26, 27 564, 584, 604, 623
28, 29, 30, 31 642, 660, 679, 696
32, 33, 34, 35 714, 730, 747, 763
36, 37, 38, 39 778, 793, 808, 822
40, 41, 42, 43 836, 849, 861, 873
44, 45, 46, 47 885, 896, 906, 916
48, 49, 50, 51 926, 934, 943, 950
52, 53, 54, 55 958, 964, 970, 975
56, 57, 58, 59 980, 984, 988, 991
60, 61, 62, 63 993, 995, 996, 997
(Section 6.1.3) the offset tables compute the line segments.
y = mix (6.5)
Here x can be decomposed into smaller n sub-components
x = ?0x0 +?1x1 +?2x2 +???+?m?1xn?1 (6.6)
Plugging Equation 6.6 back into Equation 6.5
y = mi (?0x0 +?1x1 +?2x2 +???+?m?1xn?1) (6.7)
=
n?1summationdisplay
k=0
?kmixk (6.8)
If the ?k values are chosen as powers of two, then the hardware multiplication operation for
?k is free. The size of xk determines the number of entries in the TO and is exponentially
135
related to the value of xk. This is clearly seen in the analysis of bipartite table in the
following section. MTM therefore exchanges smaller TO tables for more additions. If the
number of bits is sufficiently large, this translates into significant area improvement. The
other downside to the the MTM technique is that quantization occurs in each table, resulting
in a slight degradation in overall performance with respect to the BTM technique.
(a) Offset Table 1
Addresses Values
0, 1, 2, 3 3, 9, 3, 9
4, 5, 6, 7 3, 9, 3, 8
8, 9, 10, 11 3, 8, 2, 7
12, 13, 14, 15 2, 7, 2, 6
16, 17, 18, 19 2, 6, 2, 5
20, 21, 22, 23 2, 5, 1, 4
24, 25, 26, 27 1, 3, 2, 0
28, 29, 30, 31 0, 1, 0, 0
(b) Offset Table 2
Addresses Values
0, 1, 2, 3 0, 1, 1, 1
4, 5, 6, 7 2, 2, 2, 3
8, 9, 10, 11 0, 1, 1, 1
12, 13, 14, 15 2, 2, 2, 3
16, 17, 18, 19 0, 0, 1, 1
20, 21, 22, 23 1, 2, 2, 2
24, 25, 26, 27 0, 0, 1, 1
28, 29, 30, 31 1, 2, 2, 2
32, 33, 34, 35 0, 0, 1, 1
36, 37, 38, 39 1, 1, 2, 2
40, 41, 42, 43 0, 0, 0, 1
44, 45, 46, 47 1, 1, 1, 1
48, 49, 50, 51 0, 0, 0, 0
52, 53, 54, 55 0, 1, 1, 1
56, 57, 58, 59 0, 0, 0, 0
60, 61, 62, 63 0, 0, 0, 0
Figure 6.3 shows a block diagram of a two TO table MTM ROM. The diagram could
be extended to any number of offset tables with little imaginative effort. BQ are the bits of
the address words used in the line calculate. BTIV are the bits sent to the TIV table. BQ1
are the most significant q1 bits of BQ and BQ2 are the next most significant q2 bits (or the
remainder of the bits in the example). The only thing not shown explicitly in the diagram
is the bit shift operation performed on the word amplitude word of the first offset table,
ATO1. It should be multiplied by 2q2, or bit shifted that amount. In the diagram is implied
as part of the ?lookup? table operation.
The amount of compression of the MTM technique is difficult to quantify without im-
plementation. In particular the size of the adders can quickly begin to dominate the area
136
TIV
Table
TO1
Table
TO2
Table
Negate
Negate
summationtext
summationtext
A
PT BPT BTIV BA
BABQ1
MSB
braceleftbig
BQ1
bracerightbig
MSB
braceleftbig
BQ2
bracerightbig
BQ
BQ2
BTO1
BTO2
BATO1
BATO2
BNL = MSBs{BTIV}
Figure 6.3: MTM Block Diagram
and power when the offset tables have a very small number of entries. Also, the extra
additions may also require pipelining, which would dramatically impact the effective com-
pression boost. The authors of [47] and [12] both showed better than 2X overall compression
improvement in the implementations over published symmetric BTM ROMs.
Figure 6.4 shows a portion of the GDSII submitted for fabrication. The design was
realized in a 0.130 ?m BiCMOS process. The rectangular circuit on the left is the MTM
DCDO and the rectangular circuit on the right is a 12-bit DAC with DEM. In the design, the
DEM logic had a mistake that caused the design to operate poorly, 45 dBc SFDR Nyquist
with 1 GHz clock.
6.1.3 BTM DDFS
The second DDFS design, and the one used for the first realization of the ORA BIST
system, was a symmetric bipartite table method (BTM) DDFS [48]. The BTM technique is
first described with respect to DDFS designs in [11] with the first observation that taking
advantage of the line symmetry in hardware for additional area, speed and power savings
reported in [48]. The BTM is mathematically a piece-wise linear approximation technique
and hence can be applied in the approximation of any function. In this work, elementary
transcendental functions such as sine and cosine are the targeted functions. Figure 6.5 shows
the DDFS system that used the BTM ROM for sine.
137
MTM
DCDO DEM DAC
260 ?m
214 ?m 286 ?m
468 ?m
Figure 6.4: MTM DDFS GDSII (130 ?m BiCMOS)
SPI
(3-Pin)
SCLKSIN
GaloisLFSR
(33Bit)
Bipartite
Table(12Bit)Truncate Truncate DAC(12Bit)
PhaseAccumulator
PhaseRegister
summationtext 3232
FCW
32
summationtext32
Phase
15
3
15 12 12 VOUT
CLK
Figure 6.5: BTM DDFS Block Diagram
The compression methodology is most easily described through an example. Assume
that a quarter sine ROM withD = 8 address bits needs to be compressed. Then the number
of ROM entries isND = 28 = 256. The blue dashed line in Figure 6.6a shows the ideal values
for the sine function plotted from 0 to pi/2. For the moment, no amplitude quantization is
applied in the analysis. In a conventional piece-wise linear (PWL) approximation of the sine
function, the function is split into N lines. Equation 6.9 provides a PWL approximation of
138
the function y(x).
?y(x) =
??
???
????
???
???
????
???
?
m0x+b0, a0 ?x<a1
m1x+b1, a1 ?x<a2
???
mN?1x+bN?1 aN?1 ?x<aN
(6.9)
The values of mi and bi are generally chosen to either minimize mean of the square of the
error, which is commonly referred to as the mean squared error, between a line and the
function or to minimize the maximum error. mi is the slope of the approximating line and
bi is the y-intercept of the line. ai forms the starting and ending points of the domain of the
line segment. The mean squared error is generally minimized using a first order linear least
squares regression fit on the function over that interval. The mean squared error between
two continuous functions is given by Equation 6.10.
MSEc = 1a
i+1?ai
integraldisplay ai+1
ai
(?yi(x)?yi(x))2dx (6.10)
Since the DCDO is actually approximating a discrete number of points, one can minimize
the discrete mean squared error. This is performed trivially using a computer, as most math
software, including open source software such as Octave [49], provide a method for applying
a linear regression fit to data. The discrete mean squared error is shown in Equation 6.11.
MSEd = 1n
ND/N?1summationdisplay
k=0
(?yi(xk)?yi(xk))2 (6.11)
Letr = ND/N bethenumberofpointsinalinesegment. Forthequartersineapproximation,
xk = kpi2N
D
. For the first line between a0 and a1 the values of k range from 0 to r?1. To find
139
?yi, the following over-determined system of equations must be solved:
m0x0 +b0 = yi(x0)
m0x1 +b0 = yi(x1)
???
m0xr?1 +b0 = yi(xr?1)
This can be written in matrix notation as
?
??
??
??
??
??
?
x0 1
x1 1
??? ???
xr?1 1
?
??
??
??
??
??
?
?
??
?
m0
b0
?
??
? =
?
??
??
??
??
??
?
yi(x0)
yi(x1)
???
yi(xr?1)
?
??
??
??
??
??
?
=?X0
?
??
?
m0
b0
?
??
? = Y0 (6.12)
The linear least squares best solution to the problem is [50]
?
??
?
m0
b0
?
??
? =
parenleftBig
XT0 X0
parenrightBig?1
XT0 Y0 (6.13)
The bold characters are used to represent matrices, with the bold superscript ?T? meaning
the transpose operation. Performing the matrix product yields
b0 =
summationtextr?1
k=0yi(xk)
summationtextr?1
k=0x
2
k?
summationtextr?1
k=0xk
summationtextr?1
k=0xkyi(xk)
rsummationtextr?1k=0x2k?
parenleftBigsummationtextr?1
k=0xk
parenrightBig2 (6.14)
m0 = r
parenleftBigsummationtextr?1
k=0xkyi(xk)
parenrightBig
?summationtextr?1k=0xksummationtextr?1k=0yi(xk)
rsummationtextr?1k=0x2k?
parenleftBigsummationtextr?1
k=0xk
parenrightBig2 (6.15)
The adjacent best fit line is calculated by changing the summations from 0 andr?1 tor and
2r?1. This calculation is made for each of line segments in the approximation. To perform
this operation in MATLAB or Octave, use of the built-in polyfit command is sufficient.
140
The multicolored line in Figure 6.6a shows the piece-wise linear approximation super-
imposed upon the targeted sine function. Each line is represented with a slope and a offset.
The table of offset values is called the TO ROM in the same manner as the MTM table.
Multipliers are expensive to implement in hardware, so the multiplication linear offset values
are also stored in a ROM (the TO ROM).
0
0.2
0.4
0.6
0.8
1
0 0.1 0.2 0.3 0.4 0.5
Amplitude
Normalized X-Value
(a) PWL Sine
-0.01
-0.005
0
0.005
0.01
0 50 100 150 200 250 300
Appro
xmation
Error
ROM Address
(b) Approximation Error of BTM
Figure 6.6: Phase Accumulator State Plots
The conventional PWL implementation offers some compression over the ideal sine func-
tion. To develop concrete numbers for a comparison, assume 10-bits of amplitude resolution
in the ROM. An uncompressed quarter sine ROM would required 28?10 = 2560 bits to repre-
sent the implementation. Assume 4 lines are used to represent the function. The initial value
table has one entry for each line in this approximation. Thus there are 22?10 + 27?9 = 1192
(see Equation 6.16). This is only a modest amount of compression. The important obser-
vation of the BTM technique is that one line segment is oftentimes good enough for several
initial value points.
The number of bits required for the ROM is
2BTIV ?BA + 2BTO?1?BATO (6.16)
141
TIV
Table
TO
Table Negate
summationtext A
PT BPT BTIV BA
BA
BQ1
MSB
braceleftbig
BQ1
bracerightbig
BQ1
BTO BATO
BNL = MSBs{BTIV}
Figure 6.7: BTM ROM Block Diagram
BA is the amplitude resolution of the ROM, BTIV are the number of bits used to address
the TIV table, BTO are the number of bits used to address the TO table and BATO are the
number of amplitude resolution bits required by the TO table. The minus 1 in the exponent
of the number of TO entries comes from using line symmetry. Figure 6.7 shows a block
diagram of a symmetric BTM ROM. BQ1 = BPT?BTIV are the LSB of the phase word that
addresses the ROM. BNL are the number of different line approximations to in the offset
table. The BTM approximation improves with an increasing number of initial value points
and increasing number of line approximations.
The author is not aware of any direct closed form compression equations for the BTM
method. Most authors performs a computer optimization over the various segmentation
configurations to calculate the size. By observing Figure 6.6a it is clear that the widest
range of amplitude values that must be stored in the TO table is from the first line segment.
The range of the subsequent lines in the approximation is smaller (for a formal proof, observe
that cosine is the derivative of sine and follow where the line . The domain of the first line
segment is determined by the number of initial value points. For the quarter sine ROM,
this is equal to pi2N
TIV
. Using small angle approximation, sin(x)?x. Then maximum value
stores in the TO ROM is approximately pi2N
TIV
. Using Figure 6.6a yet again, the maximum
value of the cyan curve stops roughly at 0.4. NTIV = 4 in that example so the expected
maximum value is pi8 approx0.39. It takes BA bits to represent the maximum amplitude of
142
Table 6.2: Example BTM Compression
BPT BTIV BNL Bits Size After Compression
8 2 2 1192 46.5%
8 3 2 592 23.1%
8 4 2 384 15.0%
8 5 2 416 16.3%
8 6 2 680 26.6%
the sine, so it takes
BATO =
ceilingleftbigg
log2
parenleftbigg
2BA pi2N
TIV
parenrightbiggceilingrightbigg
(6.17)
bits of amplitude resolution to represent the offsets. Then an excellent estimate of the
number of bits required is easily developed:
2BTIV ?BA + 2BTO?1?
ceilingleftbigg
log2
parenleftbigg
2BA pi2N
TIV
parenrightbiggceilingrightbigg
(6.18)
Note that all the variables are known at the outset of the computation, making it a direct
computation. Table6.2showsvariouscompressionratioswithdifferentsegmentationchoices.
Figure 6.6b shows the approximation error of the BTM with BTIV = 4, BNL = 2
and BPT = 0. The jumps in error occur at the transition points between different line
approximations. Notice the error is less than 0.1%. Depending on the quality of the DAC
that follows, this error may not even show up in the output spectrum of the DDFS. Now let
us observe the effect of the approximation error on the output of the spectrum. Figure 6.8a
is an image of the GDSII file submitted for fabrication in a 130 ?m BiCMOS process. The
large block of circuitry on the left contains the BTM, ORA, SPI, control logic and CORDIC,
which will discussed in more detail in Section 6.4. The two blocks to the right of the circuit
are DACs. The upper one is a 12-bit CMOS DAC discussed in more detail in Section 6.6
143
and the lower one is an experimental DEM DAC that was refined from the MTM DDFS
design in Section 6.1.2.
BTM,
CORDIC,
ORA
DAC12-b
DACDEM
(a) Image of GDSII (b) Die Photo
Figure 6.8: BTM, CORIDC, ORA and DACs (130 ?m BiCMOS)
Onetechniqueforreducingthespuriousresponsethatresultsfromphasetruncationisto
add a small random signal to the truncated portion of the phase word and sum the overflow
bit of that addition into the main phase word [51]. This technique is called additive dithering
and is effective at reducing spurs generated from both phase and amplitude truncation.
Figure 6.9 shows an example of a Galois linear feedback shift register (LFSR) . These allows
for power and area efficient means of generating random sequences that can be used in the
dithering process.
Atthispoint, applyingthetheoryfromChapter4wouldserveasanexcellentverification
of the theory and example of its application. Let the BTM have 8-bits of quarter sine
address space. Thus the entire ROM, decompressed takes 10-bits of phase resolution. To
keep the spectrum a manageable yet sufficiently detailed for use in verification, assume an
accumulator of 12-bits (NP = 212 = 4096) and thus 2 bits are truncated and NE = 22 = 4.
144
0 1 1 0 1 0 1 0 0 0 1
0101000
Generating Polynomial: x18 +x11 + 1
11 1
18
To Add To
Truncated Word
Figure 6.9: Galois 18-Bit LFSR
Choose BNL = 2 and BTIV = 5 for the BTM decomposition. Lastly let F = 117 (chosen
completely arbitrarily other than to guarantee phase truncation).
Now apply Theorem 4.7. ?P = 117, ?P = 4096 and ?E = 4. The spectrum of the
BTM without phase truncation driven by ?Pnshould be copied ?E times then multiplied by
the window function of Equation 4.48. Figure 6.10a shows the spectrum of BTM replicated
?E times. Figure 6.10b shows the window function plotted alongside the replicated BTM
spectrum. It is not that insightful since it plotted in magnitude (and thus 50 dB down is
not visually detectable).
Lastly, Figure 6.11 shows a plot of a Verilog simulation of the specified BTM against
the direct computation of the spectrum through theory. This highlights the statements
from Section 4.6.3 by providing an example. All of the BTM spurs are copied 4 times and
hundreds of spurs are spread as a result of phase truncation.
6.1.4 Output Response Analyzer
One of the first DDFS implementations by the author was for an Output Response
Analyzer(ORA)BISTcircuit[21][52]. BoththeBTMandMTMdesignsdescribedtargetted
theBISTsystem. Figure6.13showsthebasicarchitectureofanORAunit. TheBISTsystem
145
-80
-70
-60
-50
-40
-30
-20
-10
0
-0.4 -0.2 0 0.2 0.4
Sp
ectrum
(dB)
Normalized Frequency
Copy 1Copy 2
Copy 3Copy 4
(a) BTM Spectrum Copy
0
0.2
0.4
0.6
0.8
1
-0.4 -0.2 0 0.2 0.4
Amplitude
Normalized Frequency
Window FunctionROM (No Ph. Trunc.)
(b) BTM Spectrum with Window Function
Figure 6.10: Phase Accumulator State Plots
is designed to measure theperformance of an RF receiver. Two testtones are generated using
a DDFS that are then upconverted and sent into the receiver. Figure 6.12 shows the block
diagram of the two tone test pattern generation circuit. The DCDO used as the test pattern
generator is described in Chapter 6. The signal y is sampled by an ADC at the end of the
receiver chain. The cosine and sine terms are generated by a single quadrature DCDO. The
detailed derivations of ORA operation are discussed in [21], but, in summary, the system
effectively computes a single DFT bin that contains the spectral energy at frequency f. The
results of the derivation are shown in the following equations:
A(f) =
radicalBig
DC21 +DC22 (6.19)
?(f) =?tan?1
parenleftbiggDC
2
DC1
parenrightbigg
(6.20)
In [21], the authors note that behavior of the ORA unit appear somewhat unpredictable
and highly dependent on when the ORA accumulation is stopped. The authors originally
view the ?zero? crossing point of the phase accumulator as a period of the generated sinu-
soid. Later it was realized that the output ?looked? best at an ?integer multiple period? of
the phase accumulator sequence. Terms such as fake integer multiple period (FIMP) and
146
-80
-70
-60
-50
-40
-30
-20
-10
0
-0.4 -0.2 0 0.2 0.4
Sp
ectrum
(dB)
Normalized Frequency
Verilog SimTheory
Figure 6.11: BTM Simulation Versus Prediction
NCO1
NCO2
summationtextFCW1
FCW2
32
32
12
12
ftt12
Figure 6.12: Two Tone Generation
good integer multiple period (GIMP) were used in explaining the behavior in subsequent
publications.
Using the analysis from Chapter 3 and Chapter 4 a more formal description of the
behavior can be specified. Firstly, the FIMP is not an integer multiple period at all nor is
it related to the period of the digital waveform in any way. Thus the output at a FIMP
suffers from extreme spectral leakage by stopping the accumulation before a full period of
the fundamental waveform. The GIMP mentioned in the text is the least period of the phase
accumulator given a f requency control word F, or
GIMP = ?P = NPGCD(F,N
P)
(6.21)
As discussed in [21] and [13] it is important to reduce to the ORA measurement time.
Stopping on an least period provides an excellent measurement since the large power mixing
tone from the DCDO does not suffer from spectral leakage. But the implementation of the
147
Output Response Analyzer
sin(2npifTclk)
cos(2npifTclk) ACC1
ACC2
DC1
DC2
y(nTclk)
Figure 6.13: ORA Block Diagram
DCDO has a power-of-two NP number of phase states. This implies that the test length
increases by a factor of 2 for each frequency step. To address this issue a simple modification
can be made to the DCDO that allows it to change NP to an integer less than NP but still
satisfying NE |NP. This is equivalent to adding a programmable modulo operation to the
phase accumulator output.
This allows ?P to be modified to the desired run length and no spectral leakage from
stopping the accumulation at a non-period. This obviously changes the frequency step
slightly. Let NP = 216 and NE = 24. Assume a 1 GHz clock frequency for the ORA and
corresponding DCDO. Say want to measure a 123 MHz tone location with the ORA. The
closest FCW corresponding to this tone is derived using Equation 1.45.
F = f0NPf
clk
= 8060.9?8061 (6.22)
But note that 8061 is coprime to NP and thus ?P = NP and 216 clock cycles are required for
a measurement without spectral leakage. First note that changing NP = 216?24 allows one
to precisely hit 123 MHz with the ORA. But one could choose a smaller NP to reduce the
measurement time and still precisely land on the frequency. Let NP = 215?17?24. Then
F = 3997 and the measurement time has been reduced by a factor of 2 and still suffers no
148
spectral leakage. Thus the analysis from this work was used to develop a simple modification
that provides a significant amount of flexibility to the ORA BIST system.
6.2 Overview of Basic Radar Theory
The basic radar equation is generally presented in the first or second chapter of an un-
dergraduate radar textbook [8]. While the fundamental equation appears in various forms
depending which variables are chosen as function parameters for the power received, Equa-
tion 6.23 serves as a rough starting place for the discussion in this thesis.
Pr = PtGtAe?(4pi)2R4 (6.23)
where Pr is the receiver power at the radar receiver, Pt is the transmitted power from the
radar transmitter, Ae is the receiving antenna aperture, ? is the radar cross section of the
target, Gt is the transmission antenna gain and R is the distance of the target from the
radar system. This assumes that the transmitting and receiving antenna are placed in close
enough proximity such that the distance that the signal travels from the radar to the target
and back to the radar after reflecting from the target are approximately equal (or that the
transmitting and receiving antenna are the same antenna). There are certainly cases when
this assumption will not hold true, but for this work this assumption is used.
By expanding the antenna aperature term, Equation 6.23 can be written in a form [8]
used in many standard textbooks.
Pr = PtGtGr?
2?
(4pi)3R4 (6.24)
Radar operates by using the propagation speed of an electromagnetic wave and time as a
ruler for distance. A wave is radiated from the radar system and energy reflected from a
target is received by the radar. Taking into account this round trip of travel, the distance
149
of a target from the radar antenna is given by
R = cT2 (6.25)
whereT is the time between transmission and reception of the electromagnetic wave. Often-
times, ranges of interest are given and the equation is rearranged to calculate the necessary
time interval for travel.
When multiple target detection is required, the transmitted pulse width of the radar
must be reduced. The radar range resolution for traditional pulse radar is given by
? = c?2 = c2B (6.26)
where? is the pulse period andB is the signal bandwidth. Immediately an inherent trade-off
between transmitted energy and range resolution is apparent. By using pulse compression we
can increase the transmitted signal power while maintaining a high bandwidth. We discuss
this is a later section.
6.3 Overview of Stretch Processing
Equation 6.24 can be rearranged and solved for the distance R.
R = 4
radicaltpradicalvertex
radicalvertexradicalbtPtGtGt?2?
(4pi)3Pr (6.27)
Assuming the antenna and transmitted wavelength are fixed, to double the radar range, the
transmit power must be increased by a factor of 24 = 16. When this is combined with the
decreasing pulse width requirement for range resolution, the power requirements of a radar
will quickly become unmanageable for smaller systems. Pulse compression is used to address
the power, bandwidth and range issues.
150
Stretch processing is a pulse compression technique used in some pulse compression
radar systems [8]. It combines the power benefits of having a long pulse duration with the
range resolution of a high bandwidth signal. This effectively increases the range resolution
of the radar without raising the bandwidth requirements of the receiver. The technique
proves useful when a system needs a high range resolution but can only implement lower
frequency digital signal processing hardware. Figure 6.14a shows a simplified time domain
representation of stretch processing and Figure 6.14b shows the corresponding frequency
domain representation of the same signal.
(a) Stretch Processing (Time) (b) Stretch Processing (Frequency)
Figure 6.14: Example of Stretch Processing Signals
The technique works by transmitting a high bandwidth FM signal from the radar.
Generally the bandwidth of this signal exceeds the capabilities of the ADC in the radar
receiver chain. Upon receive, a higher bandwidth receiver chirp is mixed with the return
transmit signal. The bandwidth of the receiver chirp, often called the dechirp signal, is set by
the range interval (or range coverage) requirements of the radar. The following enumerated
list roughly presents the mathematics of the technique:
1. A chirp signal is generated by our transmitter and has a characteristic equation of the
form:
ytx(t) = Asin
bracketleftBig
2pif0t+pi?t2
bracketrightBig
(6.28)
151
where f0 is the starting frequency of the waveform, ? is the frequency slope and t is
the time.
2. The signal received from a target has the following characteristic equation:
yrx(t) = Bsin
bracketleftBig
2pi(f0 +fd)(t?tD) +pi?(t?tD)2 +?
bracketrightBig
(6.29)
wheretD isthetimedelayfromtheelectromagneticwavepropagation,fd istheDoppler
shift and ? is an unknown phase shift. tD contains the distance information.
3. The received signal has a reference chirp wave mixed with it.
yref(t) = C sin
bracketleftBig
2pifref(t?tref) +pi?ref(t?tref)2
bracketrightBig
(6.30)
where ?ref is the frequency slope of the reference chirp and tref is the time that the
chirp reference is started. The multiplication of two sine waves gives
sin(x0) sin(x1) = 12 [cos(x0?x1)?cos(x0 +x1)] (6.31)
4. The mixed signal, after low pass filtering is given by
yout = BC2 cos
bracketleftBig
2pi(f0 +fd?fref)(t?tD) +pi(???ref)(t?tD)2 (6.32)
+ 2pi?ref(tref?tD)(t?tD) +?] (6.33)
5. The final equation for the distance of the target (assuming ? = ?ref, f0 = fref and
? = BC2 ) becomes
yout = ?cos [2pi(fd +?(tref?tD))(t?tD)] (6.34)
152
Figure 6.15: Radar-On-Chip Block Diagram
Performing spectrum analysis yields a frequency pulse at
fstretch = fd +?(tref?tD) (6.35)
we know ? and tref, so solving for tD and applying Equation 6.25 we get
R = c2
parenleftBiggf
d
? ?tD
parenrightBigg
(6.36)
The range-Doppler coupling is inherent to stretch processing. The Doppler information
must be extracted using another method.
6.3.1 Single Chip Radar
Asinglechipradarsolutionwasdevelopedusingthestretchprocessingtechnique. Figure
6.15 shows a simplified block diagram of the single chip radar solution. The design was
fabricated in a 130 nm BiCMOS process in 2011. The quadrature DDFS chirp generator
showninthefigurewasimplementedusingapartialdynamicrotationCORDIC.Thedetailed
block diagram of the component is shown in Figure 6.24.
153
Figure 6.16 shows the die photograph of the RoC chip. The components from the block
diagram are labelled on the silicon. The DACs marked in the figure are described in Section
6.6. The DCDO (labelled NCO in the figure) is a PDR CORDIC described in Section 6.4.
Transmitter
PLL
Receiver
NCO DACs
ADC
Figure 6.16: Die Photograph of RoC
6.4 CORDIC
For ROMs to achieve compression, approximations must be introduced into the phase
to sine/cosine mapping function (SCMF). Another technique that has been widely used for
accurate elementary function generation, such as trigonometric functions, is the Cordinate
Rotation Digital Computer (CORDIC) algorithm. As the name implies, the CORDIC algo-
rithm works by performing iterative rotations to a vector. Most of the early research revolved
around digital computing and audio synthesis [53], but as CMOS processes have advanced
it has found a place in high speed frequency synthesis [21]. The algorithm is relevant to this
work, as a modified version of the CORDIC algorithm is sufficiently fast enough to be used
in high speed DDFS. First however, a brief history of the CORDIC algorithm, along with
the basic theory required to understand the algorithm, will be presented.
154
6.4.1 Basic Theory
The mathematical theory behind the CORDIC algorithm is quite old; however, the first
observation that the technique could compute trigonometric equations in digital computers
was made by Volder [53] in 1959. In his seminal paper, Volder notes that CORDIC can solve
equations of the following two forms:
Yprime = K (Y cos (?) +X sin (?)) (6.37)
Xprime = K (X cos (?)?Y sin (?)) (6.38)
and
R = K?X2 +Y 2 (6.39)
? = tan?1
parenleftbiggY
X
parenrightbigg
(6.40)
The two sets of equations also describe two conceptual modes of operation. These modes
are rotation mode and vectoring mode and correspond to the solutions of Equation 6.37
and Equation 6.39 respectively. The rotation mode calculates the vector coordinates given
a starting vector and an angle of rotation (i.e. it finds the x and y coordinates of a known
vector after rotation). Vectoring mode calculates the magnitude and angle of a vector given
the coordinates of the vector.
Both rotation mode and vectoring mode CORDICs are used in the implementation of
the BIST system described at the end of Chapter 4. The rotation mode CORDIC is used
to compute sine and cosine in a DDFS and consequently will be the focus of this section.
To understand how CORDIC works and how the partial dynamic rotation CORDIC of the
following sections is derived, it is helpful to derive the CORDIC algorithm itself.
In the conclusion of Volder?s paper, he notes that other algorithms that use the funda-
mental concept of CORDIC can be used to solve different computing problems. It should
155
come as no surprise then that the CORDIC algorithm as described by Volder would eventu-
ally be generalized in a mathematical sense. The generalized (i.e. ?unified?) algorithm was
introduced by Walther [54] in 1971.
There are two methodologies for arriving at the CORDIC algorithm. Muller [55] uses
a non-restoring decomposition algorithm on the angle ? to derive the necessary recursive
sequence to perform CORDIC. The other approach, whose variant is used in the derivation of
the CORDIC recursive sequence in this chapter, is used by Walther [54]. Let vi = (xi,yi) be
a coordinate in the Cartesian coordinate system (R2). The bold characters in mathematical
notation are used to represent vectors and matrices. Let vi+1 be generated from vi using
the following relationship:
xi+1 = xi??iyi (6.41)
yi+1 = yi +?ixi (6.42)
where ?i ? R. Figure 6.17a shows an example of the effect of applying Equations 6.41
and 6.42 with positive?i to arbitrarily chosen coordinates (xi,yi). Notice the transformation
equationscanbeequivalentlythoughtofasaddingatransformationvector vdi = (??iyi,?ixi)
to vi. Figure 6.17b shows the effect of a negative ?i on the subsequent vector. Notice that
the magnitude of the vector in both figures is scaled and no longer sits on the unit circle.
To begin to understand the behavior of the iterative equations, consider how the mag-
nitude of vi+1 is related to vi.
|vi+1|=
radicalBig
x2i+1 +y2i+1
=
radicalBig
(xi??iyi)2 + (yi +?ixi)2
=
radicalBig
(x2i ?2?ixiyi +?2iy2i ) + (y2i + 2?ixiyi +?2iy2i )
=
parenleftbiggradicalBig
1 +?2i
parenrightbiggradicalBig
x2i +y2i =
radicalBig
1 +?2i|vi| (6.43)
156
x
y
1
2 1
1
2
1
?i
?di
?i+1
(xi,yi)
(xi+1,yi+1)
(a) Rotation with Positive ?i
x
y
1
2 1
1
2
1
?i
?di
?i+1
(xi+1,yi+1)
(xi,yi)
(??iyi,?ixi)
(b) Rotation with Negative ?i
Figure 6.17: CORDIC Vector Rotations
Let v0 = (x0,y0) be the initial starting vector. Then the magnitude of the vector vn is
|vn|=|v0|
n?1productdisplay
i=0
parenleftbiggradicalBig
1 +?2i
parenrightbigg
=|v0|Kn (6.44)
where Kn is used in traditional CORDIC literature, though often without the subscript, to
denote the product of magnitudes of iterative rotations [53] [54].
It is clear from Figure 6.17a and Figure 6.17b that a positive ?i results in a positive ?di
(i.e. counter-clockwise rotation) and a negative ?i results in a negative ?di (i.e. clockwise
rotation). Theanalysisthatfollowswillshowthismathematically. Calculatetherelationship
between the angle of vi and vi+1. This is the sum of the angle of vi, denoted ?i, and the
angle between the vectors vi+1 and vi, denoted ?di.
?i+1 = ?i +?di (6.45)
157
The following dot product relationship is used to calculate ?di:
cos (?di) = vi+1?vi|v
i+1||vi|
(6.46)
= xi (xi??iyi) +yi (yi +?ixi)
(x2i +y2i )
radicalBig
1 +?2i
(6.47)
= 1radicalBig
1 +?2i
(6.48)
Now using the cosine to tangent trigonometric identity
cos (?di) =? 1radicalBig
1 + tan2 (?di)
(6.49)
The plus or minus arises from the fact the cosine is a even function and tangent is an odd
function. In DDFS systems, only the first quadrant, where both tangent and cosine are
positive, is of interest. In this case only the positive solution is valid. The solution for ?di is
then found by taking the inverse tangent:
?di = tan?1 (?i) = ?i tan?1 (|?i|) (6.50)
where ?i as the sign of ?i. The inverse tangent function has odd symmetry, so for positive
?i, ?di is positive and for negative ?i, ?di is negative. This is in full agreement with the
qualitative assesment from the previous figures. The ?i is chosen to rotate the vector in the
correct direction. Then the angle of vn is
?n = ?0 +
n?1summationdisplay
i=0
?i tan?1 (|?i|) = ?0 +?sn (6.51)
where?sn is the sum of the delta angles. At this point, it is interesting derive the coordinates
after n such operations, xn and yn. Firstly xn is computed using the cosine sum of angles
158
identity and the geometric observations that cos (?0) = x0 and sin (?0) = y0.
xn =|vn|cos (?0 +?sn)
= Kn|v0|[cos (?0) cos (?sn)?sin (?0) sin (?sn)]
= Kn|v0|[x0 cos (?sn)?y0 sin (?sn)] (6.52)
Likewise for yn, while not derived as explicitly
yn = Kn|v0|[x0 sin (?sn) +y0 cos (?sn)] (6.53)
Depending on the initial value v0 chosen, different functions can be generated. Since this
work concerns itself with the generation of sinusoids, it is paramount for correct operation.
Using Equation 6.52 and Equation 6.53, it is clear that setting x0 = 1 and y0 = 0 generates
both a sine and cosine output. Let x0 = 1 and y0 = 0, then|v0|= 1 and
xn = Kn cos (?sn) (6.54)
yn = Kn sin (?sn). (6.55)
Then the recursive equations so far have yielded a means to compute sine and cosine
of ?sn using only iterative multiplications and additions. ?i has not been assigned a value,
but from Equation 6.51 it is clear that ?sn is a function of ?i. One wants the sum of ?di to
?converge? on a desired angle ?. Let ? be the angle for whose sine and cosine value we wish
to compute by this technique. A formal definition of convergence is required to proceed [14]:
Definition 6.1 (Convergent Series (Real)). Let{x1,x2,???}?R be an infinite sequence of
numbers, then the series of this sequence of numbers is said to converge to X ?R if given
159
any ?> 0, ??R there exists an integer N > 0 such that
vextendsinglevextendsingle
vextendsinglevextendsingle
vextendsingleX?
nsummationdisplay
i=1
xi
vextendsinglevextendsingle
vextendsinglevextendsingle
vextendsingle<? (6.56)
for all n>N.
Convergence is sometimes written as a single limit statement for conciseness:
X = limn??
nsummationdisplay
i=1
xi. (6.57)
For the algorithm to be useful for computing sine and cosine, ?sn must converge to any
arbitrary angle between 0 and pi/2. Or more precisely, for any arbitrary ?? [0,pi/2) and
?> 0 there exists an integer N > 0 such that
vextendsinglevextendsingle
vextendsinglevextendsingle
vextendsingle??
nsummationdisplay
i=1
?i tan?1 (|?i|)
vextendsinglevextendsingle
vextendsinglevextendsingle
vextendsingle<? (6.58)
for some n > N. The properties of series that converge are set forth using the Cauchy
convergence criterion [14], which is
Theorem 6.1 (Cauchy Convergence Criterion (Real)). A series
?summationdisplay
i=1
xi (6.59)
converges if and only if, given any ?> 0, ??R there exists an integer N such that
|xm+1 +???+xn|<? (6.60)
for all n>m?N.
Let us consider the sequences for which the series converges. This can be inferred from
the Cauchy convergence criterion or proved simply as below:
160
Lemma 6.1 (Sequences for Convergent Series). If the sum of {x1,x2,???}?R converges,
then
limn??xn = 0 (6.61)
Proof. Let X = limn??summationtextni=1xi. Note that X = limn??summationtextn+1i=1 xi also. Then
limn??
parenleftBiggn+1summationdisplay
i=1
xi?
nsummationdisplay
i=1
xi
parenrightBigg
= limn??xn+1 = 0 (6.62)
since the limits of both series are the same.
Note that the converse of Lemma 6.1 does not guarantee convergence for the series but
Theorem 6.1 does. The harmonic series is a good example of series whose elements approach
zero toward infinity, but whose series diverges. From the previous lemma,
limn???n tan?1 (|?n|) = limn??tan?1 (|?n|) = 0 (6.63)
The sign ?n has no impact on the convergence of the sequence to zero. So ?i must be chosen
such that the inverse tangent of its sequence has a limit of zero. There are several ways to
easily show this, but perhaps the easiest is to note the Taylor series expansion of inverse
tangent leads to a small angle approximation similar to that of sine (i.e. tan?1(x) ?x for
xlessmuch1.
tan?1 (x) =
?summationdisplay
n=0
(?1)n
2n+ 1
parenleftBig
x2n+1
parenrightBig
= x?x
3
3 +
x5
5 ???? (6.64)
Note the higher order x terms rapidly become negligible for small x.
161
There are multiple tests that can be used to determine convergence in a series, one such
test is the ratio test [14]. If
limn??
vextendsinglevextendsingle
vextendsinglevextendsingle
vextendsingle
tan?1 (?n+1)
tan?1 (?n)
vextendsinglevextendsingle
vextendsinglevextendsingle
vextendsingle< 1 (6.65)
then the series converges absolutely. It has already been shown that it is required that
tan?1 (?i) approach zero for large values of i. Using the small angle approximation as n
tends towards infinity, Equation 6.65 can be rewritten as
limn??
vextendsinglevextendsingle
vextendsinglevextendsingle
vextendsingle
?n+1
?n
vextendsinglevextendsingle
vextendsinglevextendsingle
vextendsingle< 1 (6.66)
Now it need only be shown that the algorithm described in Definition 6.2 can converge
to any angle between some convergence limits ?max and ?min. This would complete the
description of an algorithm capable of simultaneously computing a scaled sine or cosine to
arbitrary precision given enough iterations. In order to cover the entire interval, a second
less obvious property must be met.
|?n|?
?summationdisplay
i=n+1
|?i| (6.67)
For the series of ?i to take any value on the interval, the size of the previous step must be
less than or equal to the sum of the remaining steps. This is more obvious when shown
graphically, as in Figure 6.18. Unless the sum of the remaining steps is equal or greater than
the size of the previous step, then there is a gap in the obtainable values from the algorithm.
A gap in ?i would result in a gap in tan?1 (?i) and there would be unobtainable angles of
rotation.
The CORDIC algorithm works by keeping track of the angle of rotation at each iteration
to determine if the rotations have passed the desired target angle. In CORDIC literature,
the variable zi is used to denote the angle information at iteration i. One could start z0
162
?n Step
?
?summationdisplay
i=n+1
?i
?summationdisplay
i=n+1
?i
Unreachable
Figure 6.18: CORDIC Coverage Requirement
at zero and sum the ?di components and ?i could be computed by subtracting ? from the
current?i to determine the direction of rotation. A more efficient approach would be to start
z0 = ? and subtract the ?di while checking the sign of the residual angle at each iteration.
This is how ?i is selected. Define the iterative relationship:
zi+1 = zi??di = zi??i tan?1 (|?i|) (6.68)
where ?i = 1 if zi?0 or ?i =?1 otherwise. At the nth iteration,
zn = z0?
n?1summationdisplay
i=0
?i tan?1 (|?i|) = z0??sn (6.69)
If zn converges to zero at the limit for n as it approaches infinity, then ?sn = z0 and from
Equations 6.54 and 6.55 one sees that the algorithm computes the sine and cosine of ?sn.
Clearly then, proving that zn converges to zero from a given z0 = ? is sufficient to prove
convergence for the CORDIC algorithm.
Theorem 6.2 (CORDIC Convergence Theorem). Let tan?1 (?1),tan?1 (?2),... be a se-
quence of real numbers whose series is convergent by the Cauchy convergence criterion for
163
series (Theorem 6.1). If
vextendsinglevextendsingle
vextendsingletan?1 (?n)
vextendsinglevextendsingle
vextendsingle?
?summationdisplay
k=n+1
vextendsinglevextendsingle
vextendsingletan?1 (?k)
vextendsinglevextendsingle
vextendsingle (6.70)
for any n?P, then for any ??R constrained by
?
?summationdisplay
i=0
vextendsinglevextendsingle
vextendsingletan?1 (?i)
vextendsinglevextendsingle
vextendsingle???
?summationdisplay
i=0
vextendsinglevextendsingle
vextendsingletan?1 (?i)
vextendsinglevextendsingle
vextendsingle (6.71)
The CORDIC z recurrence relation when seeded with z0 = ? converges, i.e.
limn??zn = 0 (6.72)
where zn is defined by Equation 6.68.
Proof. Let ? be chosen according to Equation 6.71. Let S be a non-empty subset of P0. Let
us show that
?
?summationdisplay
k=n
vextendsinglevextendsingle
vextendsingletan?1 (?k)
vextendsinglevextendsingle
vextendsingle?zn?
?summationdisplay
k=n
vextendsinglevextendsingle
vextendsingletan?1 (?k)
vextendsinglevextendsingle
vextendsingle (6.73)
for all n by induction. Since the series
?summationdisplay
i=1
tan?1 (?i) (6.74)
converges by the Cauchy criterion (Theorem 6.1), for any arbitrary ?> 0, ??R there exists
an integer N > 0 such that
vextendsinglevextendsingle
vextendsingletan?1 (?m+1) +???+ tan?1 (?n)
vextendsinglevextendsingle
vextendsingle<? (6.75)
164
for all n>m?N. This implies
limn??
?summationdisplay
k=n
vextendsinglevextendsingle
vextendsingletan?1 (?k)
vextendsinglevextendsingle
vextendsingle = 0 (6.76)
and thus showing Equation 6.73 to be true would prove convergence by making the upper
and lower limit of zn equal to zero. First let us check that 0?S by setting n = 0 and using
Equation 6.73.
?
?summationdisplay
k=0
vextendsinglevextendsingle
vextendsingletan?1 (?k)
vextendsinglevextendsingle
vextendsingle?z0 ?
?summationdisplay
k=0
vextendsinglevextendsingle
vextendsingletan?1 (?k)
vextendsinglevextendsingle
vextendsingle (6.77)
Since z0 = ? and ? has been chosen according to Equation 6.71, 0 ?S. Now we take the
induction step. Assume x?S. It must be shown that x+ 1?S by showing that
?
?summationdisplay
k=x+1
vextendsinglevextendsingle
vextendsingletan?1 (?k)
vextendsinglevextendsingle
vextendsingle?zx+1 ?
?summationdisplay
k=x+1
vextendsinglevextendsingle
vextendsingletan?1 (?k)
vextendsinglevextendsingle
vextendsingle. (6.78)
zx+1 is computed as follows:
zx+1 = zx??x tan?1 (|?x|) (6.79)
Since we have assumed x?S,
?
?summationdisplay
k=x
vextendsinglevextendsingle
vextendsingletan?1 (?k)
vextendsinglevextendsingle
vextendsingle?zx?
?summationdisplay
k=x
vextendsinglevextendsingle
vextendsingletan?1 (?k)
vextendsinglevextendsingle
vextendsingle (6.80)
There are two cases to examine, zx ? 0 for which ?x = 1, ?x ? 0 and zx < 0 for which
?x =?1, ?x < 0 . Assume zx?0, then ?x = 1 and
zx+1 = zx?tan?1 (?x) = zx?
vextendsinglevextendsingle
vextendsingletan?1 (?x)
vextendsinglevextendsingle
vextendsingle (6.81)
165
The upper bound is found by substituting the upper bound of Equation 6.80 into Equation
6.81 and adjusting the equality to an inequality.
zx+1 ?
?summationdisplay
k=x
vextendsinglevextendsingle
vextendsingletan?1 (?k)
vextendsinglevextendsingle
vextendsingle?
vextendsinglevextendsingle
vextendsingletan?1 (?x)
vextendsinglevextendsingle
vextendsingle (6.82)
?
?summationdisplay
k=x+1
vextendsinglevextendsingle
vextendsingletan?1 (?k)
vextendsinglevextendsingle
vextendsingle (6.83)
The lower bound is found by substituting Equation 6.70 into Equation 6.81 and adjusting
the equality to an inequality.
zx+1 = zx?
vextendsinglevextendsingle
vextendsingletan?1 (?x)
vextendsinglevextendsingle
vextendsingle?zx+1 ?zx?
?summationdisplay
k=x+1
vextendsinglevextendsingle
vextendsingletan?1 (?k)
vextendsinglevextendsingle
vextendsingle (6.84)
But zx is positive or zero, so
zx+1 ??
?summationdisplay
k=x+1
vextendsinglevextendsingle
vextendsingletan?1 (?k)
vextendsinglevextendsingle
vextendsingle (6.85)
and the zx+1 is bounded for the case zx?0. Now consider zx < 0, then ?x =?1.
zx+1 = zx +
vextendsinglevextendsingle
vextendsingletan?1 (?x)
vextendsinglevextendsingle
vextendsingle (6.86)
ThelowerboundiscomputedbysubstitutingthelowerboundofEquation6.80intoEquation
6.86
zx+1 ??
?summationdisplay
k=x
vextendsinglevextendsingle
vextendsingletan?1 (?k)
vextendsinglevextendsingle
vextendsingle+
vextendsinglevextendsingle
vextendsingletan?1 (?x)
vextendsinglevextendsingle
vextendsingle (6.87)
??
?summationdisplay
k=x+1
vextendsinglevextendsingle
vextendsingletan?1 (?k)
vextendsinglevextendsingle
vextendsingle (6.88)
The upper bound is computed by substituting Equation 6.70 into Equation 6.86
zx+1 = zx +
vextendsinglevextendsingle
vextendsingletan?1 (?x)
vextendsinglevextendsingle
vextendsingle?zx+1 ?zx +
?summationdisplay
k=x+1
vextendsinglevextendsingle
vextendsingletan?1 (?k)
vextendsinglevextendsingle
vextendsingle (6.89)
166
But zx is negative, so
zx+1 ?
?summationdisplay
k=x+1
vextendsinglevextendsingle
vextendsingletan?1 (?k)
vextendsinglevextendsingle
vextendsingle (6.90)
Thus x+ 1?S and the proof is complete.
Note that this proof for the CORDIC algorithm is different from Walther?s proof (per-
haps a little more formal) [54]. As far is the author is aware, this is an original proof.
The conventional CORDIC algorithm has been fully derived at this point. Definition 6.2
summarizes the results of the analysis thus far.
Definition 6.2 (Conventional CORDIC Iteration). The conventional CORDIC iteration is
defined as
xi+1 = xi??i?iyi (6.91)
yi+1 = yi +?i?ixi (6.92)
zi+1 = zi??i tan?1 (?i). (6.93)
where ?i = 1 if zi?0 or ?i = ?1 otherwise. Performing successive CORDIC iterations to
compute some desired value is called the CORDIC algorithm.
There are infinitely many such ?i that can satisfy Equation 6.66 and Equation 6.67.
What is desired is an efficient hardware implementation of Equation 6.41 and Equation 6.42.
As has already been mentioned in the BTM and MTM sections, a multiplication or division
by 2 is merely a shift operation, which is remarkably efficient in hardware (no combinatorial
logic is required). Let |?i| = 2?i, a shift by i operation in hardware. Now let us calculate
whether this series converges using the ratio test.
limn??
vextendsinglevextendsingle
vextendsinglevextendsingle
vextendsingle
2?(n+1)
2?n
vextendsinglevextendsingle
vextendsinglevextendsingle
vextendsingle = limn??
1
2 =
1
2 < 1 (6.94)
167
Clearly, the series then converges. Now let us calculate the interval of convergence. The
maximum angle of rotation would be
?max =
?summationdisplay
i=0
tan?1
parenleftBig
2?i
parenrightBig
(6.95)
= tan?1 (1) + tan?1
parenleftbigg1
2
parenrightbigg
+ tan?1
parenleftbigg1
4
parenrightbigg
+??? (6.96)
= 1.7432865...?0.55pi (6.97)
The minimum obtainable rotation angle would be ?min = ??max = ?1.7432865.... Since
quarter sine compression is typically used in a DDFS, the 0.55pi is more than sufficient for
implementing the SCMF. From this point forward ?i will assumed to be a power of 2.
6.4.2 Conventional CORDIC
Figure 6.19 shows the hardware implementation of the conventional CORDIC iteration.
Thusthreefulladdersarerequired. Sincezi isatwo?scomplementnumber, thesigndetection
block is merely a check on the MSB of zi. The hard-wired shifts are ?free? operations,
requiring no additional hardware. The multiplication by?1 can be implemented using the
one?s complement technique described in Figure 1.2. Overall this is rather low hardware
overhead for a technique that can compute a sinusoid to arbitrary precision.
One of the drawbacks of conventional CORDIC implementations is the scalar K from
the iterative rotations. As n tends towards infinity,
K? =
?productdisplay
i=0
parenleftBig?
1 + 2?2i
parenrightBig
= 1.64676... (6.98)
But since Kn is a function of the number of iterations, the conventional CORDIC typically
has a fixed number of iterations. One brute force method to is to initalize x0 = 1/Kn and
thus the final computation results in cos (?) and sin (?). Other methods to correct for this
unwantedscaling, andwhenthescalingcanbeignored, willbediscussedlaterinthischapter.
168
summationtext
summationtext
summationtext
greatermuchi
greatermuchi
?1
?1
?1
1
0
1
0
1
0
Hard-Wired
Shift
Hard-Wired
Shift
Sign
Detection
xi
yi
zi
tan?1 (2?i)
Dx
Dy
Dz
Qx xi+1
Qy yi+1
Qz zi+1
Figure 6.19: Conventional CORDIC Stage
Note that the upper bound of the remaining phase error, assuming vectoring mode
operations with z0 = ?, after n rotations is
?summationdisplay
i=n+1
vextendsinglevextendsingle
vextendsingletan?1
parenleftBig
2?i
parenrightBigvextendsinglevextendsingle
vextendsingle (6.99)
The angle remaing phase error is bounded by the small angle approximation for arctangent.
Figure 6.20a shows the value of arctangent against each CORDIC iteration as well as the
small angle approximation of arctangent at each value. The better question is how the
qualityofthesmallangleapproximationchangeswitheachiteration. Figure6.20bshowsthis
information by taking the ratio of the magnitude of error of the small-angle approximation
for arctangent at a certain iteration and dividing it by the arctangent value of that iteration.
This will prove helpful in calculating upper error bounds on the phase and amplitude in the
CORDIC algorithm in the following section.
169
0
10
20
30
40
50
60
0 5 10 15 20
Phase
Step
(degrees)
Iteration
2?itan?1parenleftbig2?iparenrightbig
(a) arctan Versus Iteration
10?14
10?12
10?10
10?8
10?6
10?4
10?2
100
0 5 10 15 20parenleftbig
2?
i?
tan
?1
(2?
i)
parenrightbig/
tan
?1
(2?
i)
Iteration
(b) Quality of Approximation
Figure 6.20: arctan Small Angle
6.4.3 Optimizing the CORDIC Algorithm for DDFS
The conventional CORDIC algorithm when started at zero phase and initialized such
that the scaling factor is normalized covers a phase range of?0.55pi to 0.55pi. Since DDFS
systems use quarter wave sinusoidal compression, this is over twice the required range for
convergence. We only wish to compute angles in the interval [0,pi/2). This means that the
CORDIC algorithm always takes the same first step, since z0 = ? is always greater than
or equal to zero. Figure 6.21a shows the upper bound of the ?remaining phase error? at
iteration i (computed using Equation 6.99). Starting at iteration 0, ?0 = 0, the phase error
upper boundary is the entire pi/2, or 90?, range. After one iteration, the maximum phase
error is 54.9?.
The question remains of how y0, x0 and z0 should be initialized for the optimization to
work correctly. Using Equation 6.52 and 6.53 is clear that x0 = cos (?0) = cos (pi/4) =?2/2
and y0 = sin (pi/4) =?2/2. These values can be pre-computed and stored in the a ROM for
initialization. Using Equation 6.68, it is clear that z must be set to
z0 = ??pi4 (6.100)
170
10?4
10?3
10?2
10?1
100
101
102
0 5 10 15 20Remaining
Phase
Error
(degrees)
0
5
10
15
20
Phase
Resolution
(bits)
CORDIC Iteration
(a) Phase Error Versus Iteration
10?14
10?12
10?10
10?8
10?6
10?4
10?2
100
0 5 10 15 20
K
Scaling
Error
-10
0
10
20
30
40
Amplitude
Resolution
(bits)
Iteration
(b) K Versus Iteration
Figure 6.21: CORDIC Bit Resolution
A brute force approach would be to directly compute this number and use it as a seed for
the z-path of the CORDIC. Starting at ?0 = pi/4 means that the CORDIC algorithm only
needs to cover the interval [?pi/4,pi/4] or [?0.25pi,0.25pi]. Skipping the first CORDIC stage
(i.e. starting i = 1) converges over the interval [?0.305pi,0.305pi] which is sufficient to cover
the required interval.
Theoptimizationofstartingthevectoratanangleofpi/4 andskippingthefirstCORDIC
iteration effectively eliminates one CORDIC iteration. The number of rotations can be
further reduced by extending this idea to a general purpose look-up table that initializes the
CORDIC algorithm to the mth iteration. The green line of Figure 6.21a shows the numbers
of bits of phase resolution obtained for a given upper bound of phase error. From Figure
6.20a and Figure 6.20b, along with Equation 6.64 for the small angle approximation, it shows
that each CORDIC iteration gain a single bit of phase accuracy. This can be directly related
to an amplitude error and consequently an effective number of bits. Let ?r be the remaining
phase error. The sine difference formula, which has been used in nearly every chapter of this
171
document, yields
sin (???r)?sin (?) +?r cos (?) for ?rlessmuch1 (6.101)
The error is then ?r cos (?), but ? ? [0,pi/2) and therefore the error is bounded by the
maximum value of cosine over that interval, or cos(0) = 1. The maximum error is then ?r,
which is precisely the remainin phase error. So the green line of Figure 6.21a also predicts the
number of bits of amplitude resolution provided by the CORDIC after i iterations without
error. Figure 6.22 shows howx0 andy0 are derived by sending the MSBs of the DDFS phase
word to a look-up table.
Assume that BLUT bits are used for the lookup table. Then the pi/2 CORDIC search
range is reduced to:
?ROT = pi2B
LUT+1
(6.102)
If an offset is introduced into the LUT in the same manner as described in the pi/4 opti-
mization offset discussion, an extra iteration can be reduced. Thus a six bit LUT with a
half-LSB offset elminates 7 CORDIC operations. That is to say, the first iteration would be
i = 8 using Figure 6.21a.
AninterestingphenomenonhappenswiththescalingfactorwheninitializingtheCORDIC
algorithm. Figure 6.21b shows the scaling factor value of the ith iteration. The green line
shows the number of amplitude bits of resolution required at the output before the Kn scal-
ing factor to introduce an error above quantization. From Equation 6.44, it should come as
no surprise that the amplitude error decreases at a rapid rate, particularly with the 2?2i rate
of decrease in error per term.
172
6.4.4 Partial Dynamic Rotation CORDIC
The generalized PDR CORDIC architecture with support for conventional CORDIC
stages is shown in Figure 6.22. This component is used the radar DDFS system described in
this chapter. Consider a conventional CORDIC with ?0 = 0 and let ? = 5?. The first step
would be tan?1(20) = 45?. This step far overshoots the desired angle of rotation. There is
enough information to avoid this overshoot because the starting angle is known, the desired
angle is known and the amount of rotation for a given 2?i is known. Keep in mind that ? is
a binary word. Looking at the MSBs of word give an idea of how much rotation is required.
In this case, starting at iteration i = 3 yields tan?1 (2?3) = 7.125?.
Coarse
sin/cos
LUT
Conv.
CORDIC
Stages
PDR
CORDIC
Stages
QuadratureOp
eration
Conv.
z-path
Optimized
z-path
PT BPT
BLUT = MSBs(BQ)
BROT = LSBs(BQ)
BAL
BAL
AsBA
AcB
A
zr
2 MSBs(BPT)
? ?,?
BQ
Figure 6.22: PDR CORDIC Architecture
One of the major drawbacks of the PDR CORDIC is that Kn for a fixed number of
stages changes based on the requested angle ?. It has already been shown in Figure 6.21b
that if the CORDIC is initalized before rotations begin, the impact of Kn on the magnitude
of the output is negligible. Figure 6.23 shows the block diagram for a PDR CORDIC rotation
stage.
The dynamic rotation selection (DRS) logic shown in Figure 6.23 becomes quite simple
if the CORDIC stages are seeded with enough resolution. Recall from the small angle
approximation in Figure 6.20a that tan?1(2?i)?2?i for large i. Then the check to find the
appropriate rotation angle is merely a check on the MSB position of the remaining phase
173
summationtext
summationtext
summationtext
greatermuchj
greatermuchj
?1
?1
?1
1
0
1
0
1
0
Barrel
Shifter
Barrel
Shifter
Sign
Detection
Dynamic
Rotation
SelectionDRS
tan?1 (2?j)
LUT
xi
yi
zi
Dx
Dy
Dz
Qx xi+1
Qy yi+1
Qz zi+1
j
?j
Figure 6.23: PDR CORDIC Stage
Design BA BP SINAD SFDR (W.C) Frequency Area
BTM 12 32 66 70 dBc 680 MHz 0.013 mm2
MTM 11 32 N/A 58 dBc 1.0 GHz 0.008 mm2
CORDIC (RoC) 12 32 64 66 dBc 1.1 GHz 0.011 mm2
CORDIC (ORA) 12 32 73 78 dBc 680 MHz 0.054 mm2
Table 6.3: Summary of DDFS Designs
(zi). This is because we are comparing a 2?i number, which is a value represented by a single
bit.
Table 6.3 summarizes the size of the various lookup tables implemented and fabricated.
Inallcases, theSFDRandSNRoftheDACwhereseveralordersofmagnitudeworsethanthe
digital code word produced by the DCDO. Therefore the frequency is a measured operating
frequency, but the SINAD and SFDR are HDL simulated DCDO outputs.
174
6.5 Stretch Processing DDFS Architecture
One pulse compression technique for which a DDFS with LFM is well suited is stretch
processing [42]. In fixed chirp rate stretch processing, a high bandwidth linear chirp of chirp
rate (?) with a fixed time length (TTX) is transmitted into the environment. The transmit
period times the chirp rate yields the effective bandwidth of the transmitted chirp (?TX).
This bandwidth sets the range resolution (i.e. the radars ability to uniquely distinguish
closely spaced targets). A first order approximation of the range resolution (RRES) of a
stretch processing system is given in Equation 6.25.
During reception of the signal reflected from a target, a ?destretch? signal of the same
chirp rate ? as the transmitted signal, but with a longer time duration (TRX) and con-
sequently wider bandwidth, is used to demodulate the signal. The difference in the time
duration between the transmit and receive chirps sets the range interval of the radar. The
range interval is the ?window? through which the radar can detect objects. It is the band-
width of the destretch signal (?RX), and not the transmitted pulse, that sets the system
bandwidth requirement for the DDFS.
Figure 6.24 shows the top level DDFS architecture used for the radar system. The
components of radar directly related to this dissertation are the DDFS and corresponding
control circuitry. Two DACs for in-phase and quadrature phase sinusoid generation are
implemented in the DDFS system.
FrequencyAccum.F BF PhaseAccum.PT BPT
QuadraturePDR
CORDIC
AIBA
InverseSinc
Filter DACI
AQBA InverseSinc
Filter
DACQ
GaloisPRNG
(65?b)
BDD
PhaseModulator BPM
RadarController
FSTAR
T
FSTEPPSTAR
T
PPA
USE
C
C
Figure 6.24: Block Diagram for Radar DDFS
175
Figure 6.25: Die Photograph of RoC (DDFS Zoomed)
DCDO/SPI DACs
The additive dithering technique that was used in both the MTM DDFS and BTM
DDFS was also employed for the RoC DDFS. However, the phase truncation spurs rested
so far beneath the noise floor of the DAC, that no spurious improvement was detected.
Figure 6.25 is the die photo of the RoC chip zoomed around the DDFS. The DAC and
DCDO/SPI are labelled to help make sense of the silicon.
6.5.1 Inverse Sinc Filter
The high speed DAC implementation used in the DDFS inherently applies a zero order
hold (ZOH) operation on the output waveform. The ZOH transfer function is actually a sinc
function in the frequency domain and its reason for existence is described in Section 7.2. As
stated in Section 6.5, the DDFS generates a wide bandwidth ?destretch? chirp signal that
performs the pulse compression step of the radar. It is important that the amplitude of the
generated chirp not fluctuate with frequency (and hence distort the radar measurement).
One solution [56] is to apply an inverse sinc operation using a finite impulse response (FIR)
filter to shape the waveform before sending it to the DAC.
In the DDFS, two FIR filters with 9-bit coefficient resolution, one for the I-path and
one for the Q-path, were implemented after the PDR CORDIC. Pipelining the FIR filter
was essential to allow it to reach 1 GHz operation. Figure 6.26 shows a block diagram of the
inverse sinc filter component as implemented, wherec0 =?1,c1 = 4,c2 =?16, andc3 = 192
for nine bit coefficient resolution. Note that after each addition the results are stored in a
176
Z?1 Z?1 Z?1 Z?1 Z?1 Z?1 Z?1x
summationtext
Z?1
Z?1
c0
summationtext
Z?1
Z?1
c1
summationtext
Z?1
Z?1
c2
Z?1
Z?1
Z?1
c3
summationtext summationtext
Z?1 Z?1
summationtext
y
Figure 6.26: Inverse Sinc FIR Filter (Block Diagram)
pipeline register. The coefficients from Samueli?s work were verified to be optimal in a min-
max sense using a linear programming (LP) algorithm. An LP algorithm for finding optimal
coefficients was developed with Python programming. The coefficients where found to be in
full agreement with previously published work [56].
The measured results match the theory quite well (Figure 6.31a). To test the filters, a
90 MHz sweep with the clock frequency at 200 MHz was generated using the DDFS. Four
markers are placed equally across the waveform and the amplitude was measured. With the
inverse sinc filter deactivated, the measured values across the spectrum were -23.60 dBm,
-24.03 dBm, -24.87 dBm and -26.37 dBm. This roll off indicates that the DACs do indeed
exhibit ZOH behavior. Next, the same 90 MHz sweep with the clock frequency at 200 MHz
was generated with the inverse sinc filter active. The four markers now read -27.24 dBm,
-27.17 dBm, -27.09 dBm and -27.30 dBm. Here the gain variation is less than 0.21 dBm.
6.5.2 Radar Controller
The DDFS is tightly integrated with a digital radar controller. Several default modes of
operation are programmable for the DDFS depending on the radar operating environment.
The default mode is stretch processing mode for longer range target acquisition. A BPSK
177
mode is available for detecting objects close in to the radar with built-in Barker code mod-
ulation schemes. This mode is required because the RoC chip must operate in half-duplex
mode from a single antenna and thus a long chirp would prevent detection of close in targets.
There are also QPSK and general LPM modes for experimenting with different detection
techniques in the lab. Several device characterization and test modes for the DACs, filters
and DCDO were also implemented such as a single tone mode.
The basic operation of the radar transceiver in stretch processing mode and the algo-
rithm is described as follows:
1. Initialize common analog components such as the PLL and bandgaps.
2. Deactivate the analog receiver circuitry.
3. Activate the analog transmitter circuitry.
4. Load the transmitter frequency control words into the start frequency, stop frequency
and step frequency DDFS registers.
5. Clock the start frequency state into the frequency accumulator. Clock the start phase
state into the phase accumulator.
6. Run the DDFS until the transmitter stop frequency control word is reached.
7. Load the transmitter timer, store the old receiver wait time and start waiting.
8. While waiting, activate the analog receiver circuitry.
9. Deactivate the analog transmitter circuitry.
10. Load the receiver frequency control words into the start frequency, stop frequency and
step frequency DDFS registers.
11. Run the DDFS until the receiver stop frequency control word is reached.
12. Load the receiver timer, store the previous transmitter wait time, and start waiting.
178
13. Proceed to state (2).
TheLPMmodesoperatewithasimilaralgorithmexceptalltheLFMbehaviorisdeactivated.
6.6 Design of 12-bit CMOS DAC
Two fully differential, current steering 12-bit, 1 GHz CMOS DACs convert the digital
output of the DCDO to a voltage. The DACs use a segmented architecture with 6-bits of
thermometer coding for the MSB and 6-bits of binary coding for the LSB. Figure 6.27 is
a block diagram of the DAC. The output of the DAC has 20 dB of digitally controlled,
programmable gain. The gain is programmed by modifying the values of current reference
and thus reducing the magnitude of the current through the current steering switches. This
reduces the DACs operating frequency, which is described in Section 7.5.1.
The DAC uses a triple-centroid switching scheme made popular in [57], that randomizes
spurs due to current cell mismatch. The clock tree is a balanced H-tree in an attempt to
minimize clock skew along the wire paths. A single base transistor size was chosen such that
Registers
ThermometerRowDecoder
ThermometerCol.Decoder
63-bitThermometer
Decoder Latchesand
CurrentSwitchMatrix CurrentSourceMatrix
DelayStages
D[11:0]
D[11:9]
D[8:6]
D[5:0]
Tr[6:0]
Tc[6:0]
T[62:0]
B[5:0]
im
RL
ip
RL
Figure 6.27: Block Diagram of 12-Bit CMOS DAC
the variation exhibited through Monte Carlo simulations kept the DAC DNL within bounds.
Figure 6.28 demonstrates how the single unit transistor is used to build the current source
network. [ht] The thermometer coded current sources have a cascade transistor added to
increase the output impedance of the DAC. The DAC also implements custom high speed
latches that convert the single ended digital input to a differential signal. These latches aid
179
M=16Vref
I64
Vcas
Ti
RL
Ti
RL
M={8,4,2,1}Vref
B5,4,3,2 B5,4,3,2
I32,16,8,4
M=1Vref
M=1
B1 B1
I2
M=1
M=1
M=1
M=1
B0 B0
I1
VDD
Figure 6.28: DAC Current Source Sizing
D
CLK
VDD
GND
Figure 6.29: Synchronization Circuit for 12-Bit CMOS DAC
in synchronizing the the digital bits sent to the DAC from the synthesized digital component.
The synchronization reduces timing mismatch and consequently the improves the spurious
response of the DAC. The total area of the DAC, including the digital front-end, is 400 ?m
?500 ?m. The SFDR of the DAC is approximately 55 dBc (better than 60 dBc at certain
frequencies) through about two thirds of the Nyquist frequency. The measured narrowband
noise, where narrowband is measured before the third harmonic of the fundamental tone, is
approximately 90 dBc.
The DAC high speed clock distribution tree makes use of an H-tree network as shown in
Figure 6.30. This technique is used to equalize the amount of static delay between current
steering cells on the clock distribution network. Any statis mismatch will directly translate
into deterministic spurs when a periodic varying signal drives the DAC.
180
CLK
Figure 6.30: Clock Tree for 12-Bit CMOS DAC
6.7 Measurements
TheperformanceofDDFSissummarizedinTable6.4. Fromapreviousimplementation,
the research has shown that the digital component can run without errors up to 1.1 GHz.
However, due to process manufacturing issues with this particular run, the system could
only run at 650 MHz. Another version will be resubmitted without modification that should
allow it to reach the proper operating frequency. The static DNL and INL of DAC cannot
Table 6.4: DDFS Performance Summary
Parameter Value
fclk 650 MHz
SFDR (low) 55 dBc (at 1.26 MHz)
SFDR (mid) 60 dBc (at 88 MHz)
SNR (narrowband) 91 dBc
Power (Analog) 150 mW
Power (Digital) 700 mW
DAC Area 400 ?m?500 ?m (2X)
Digital Area (inc. SPI/control logic) 400 ?m?800 ?m
be measured as the inputs of the DAC are not accessible from the pad frame of the chip.
The SFDR however indicates that the DAC performs poorly in INL. This is likely due to the
programmable gain stage of the DAC, as the third order harmonic was strongly dependent
on the gain state.
181
Lastly, Figure 6.31a shows a 100 MHz chirp with the inverse sinc filter activated. The
attenuation at low frequencies is from the test setup, in which the output of the packaged
RoC chip is AC coupled to the spectrum analyzer. Figure 6.31b shows the waveform without
the inverse sinc filter activated. Because of the scale of the waveform it is difficult visually
determine the impact of the filter on the waveform. However, looking at the readings from
the markers, the impact of the inverse sinc filter is more easily understood. The output
spectrum when the inverse sinc filter is activated does not attenuate at higher frequencies.
The deviation between the waveforms becomes even more pronounced as the output of the
DAC approaches the Nyquist frequency.
(a) Chirp with Inverse Sinc On (b) Chrip with Inverse Sinc Off
Figure 6.31: Inverse Sinc Filter
Figure 6.32 shows the DDFS operating in single tone mode. The main tone is located
at 85.8 MHz and the largest spur is the fourth order harmonic of the main tone located
at 343.2 MHz and is 57 dB down. This tone is measured before the low pass filter. This
work briefly describes a fully functional DDFS for stretch processing radar applications.
The performance of the digital logic of the DDFS is competitive with other published DDFS
implementations given the feature size of the technology, almost certainly when frequency
resolution and features are considered.
182
Figure 6.32: DDFS with Single Tone Output
183
Chapter 7
Digital-To-Analog Converters (DAC)
As digital circuitry evolves to smaller geometry nodes that result in higher speeds, lower
power and smaller area, more functionality is relegated to the digital processing domain.
The transition from the discrete-time digital domain to the continuous-time analog domain
is performed by a Digital-to-Analog Converter (DAC). In modern designs, the performance
of the DAC dominates the performance of the DDFS [4], since small feature size CMOS
allows spectrally pure digital sinusoids to be generated with little overhead at sufficiently
high speeds. It is not uncommon to have a digital SFDR and SNR that are 10 to 20 dB
better than what can be achieved by a DAC in the same technology.
The term DAC is general and includes small devices that tune static circuit parameters
to massive RF DACs as those designed by Analog Devices [58] or e2v [43]. The DACs
discussed in this thesis are > 1 GSample/s current steering (CS) designs. The design issues
discussed include clock and data timing errors and frequency dependent non-linearities that
do not plague DC DACs. However, the discussion on current source mismatch, static INL,
static DNL and segmented architectures are relevant to DC CS DACs as well as high speed
CS DACs.
7.1 Basic Sampling Theory
The DDFS is a sampled-data system, and thus a brief explanation of the sampling
process is beneficial in understanding the behavior of the device. It will also benefit later
analysisinSection7.2inwhichdifferentDACswitchingschemesarediscussed. Animportant
operator must be defined to aid in the discussion of sampling theory, namely, the Dirac
184
Delta ?function.? Here the required mathematical theory to formalize the Dirac delta as a
distribution is ignored and a more axiomatic treatment is provided [59].
Definition 7.1 (Dirac Delta). The Dirac delta in a one-dimensional real space can be
defined as a heuristic function, symbollicaly denoted as ?(x), such that
?(x) ,
??
??
???
+?, x = 0
0, xnegationslash= 0
(7.1)
and satisfies the identity given in Equation 7.2
integraldisplay ?
??
?(x)dx = 1 (7.2)
Of course, the following ?proofs? about the properties of the Dirac delta all leave some-
thing to be desired, as the definition of the Dirac delta used in this work cannot be considered
mathematically formal. However, if one assumes that the Dirac delta operates similar to a
function within an integral, then these ?proofs? hold. Now consider the behavior of the
Dirac delta when used in an integral, as its usefulness in mathematically describing the
sample operation becomes important.
integraldisplay ?
??
x(t)?(t)dt =
integraldisplay ?
??
x(0)?(t)dt = x(0)
integraldisplay ?
??
?(t)dt = x(0) (7.3)
Equation7.3holdsassuming, ofcourse, thatx(t) isdefinedatzero. Theintegralofafunction
multiplied by the Dirac delta takes on the value of the function at which the Dirac delta is
non-zero. Ideal sampling ?captures? the value of a continuous function at an instant in time,
which certainly sounds similar to the mathematical operation of the Dirac delta.
If one wishes to acquire the value ofx(t) once everyT seconds, or sample the signalx(t)
with an interval of T, then a series of Dirac deltas equally spaced in time is needed. Let us
185
then define a ?pulse train? of Dirac deltas as follows,
?T (t) ,
?summationdisplay
k=??
?(t?kT) (7.4)
where T is the period of sampling and k is an integer. It is interesting to note that ?T is
periodic with T. This can be shown as by proving that ?T(t) = ?T(t+nT).
?T (t+nT) =
?summationdisplay
k=??
?(t+nT?kT) (7.5)
=
?summationdisplay
k=??
?(t+ (n?k)T) = ?T(t) (7.6)
The last step is possible because the summation limits tend to infinity. So shifting the
sequence a finite number of T in any direction results in the same funtion. Since ?T(t) is
periodic with T, it can represented by its Fourier series. The real valued Fourier series was
described by Definition 2.1. Here the complex Fourier series is used to simplify notation.
?T(t) =
?summationdisplay
n=??
?
?1
T
integraldisplay T/2
?T/2
?summationdisplay
k=??
?(t?kT)e?j2pint/Tdt
?
?ej2pint/T (7.7)
=
?summationdisplay
n=??
parenleftbigg1
Te
0
parenrightbigg
ej2pint/T
= 1T
?summationdisplay
n=??
ej2pint/T (7.8)
The summation over k in Equation 7.7 was dropped because the Dirac delta is only zero for
k = 0 as t varies from?T/2 to T/2. Now the Fourier transform of ?T(t) can be computed
in a straight-forward manner. This leads to one of the most interesting results in sampling
186
theory.
F{?T(t)}=
integraldisplay ?
??
parenleftBigg1
T
?summationdisplay
n=??
ej2pint/T
parenrightBigg
e?j2piftdt
= 1T
integraldisplay ?
??
?summationdisplay
n=??
e?j2pit(f?n/T)
= 1T
?summationdisplay
n=??
?(f?n/T) (7.9)
So the Fourier transform of the Dirac comb function is another Dirac comb but in the
frequency domain. The spacing of the impulses is the sampling frequency (1/T) of the
original sampling operation.
One of the theorems related to sampling, fundamentally important to DAC design and
universally taught to electrical engineering students is the Nyquist-Shannon Sampling Theo-
rem. The first publication of this powerful theorem as it relates to the field of communication
is provided by Shannon in 1948 [60]. The Nyquist-Shannon sampling theorem as given by
Bernard Widrow [61] is described in Theorem 7.1.
Theorem 7.1 (Nyquist-Shannon Sampling Theorem). If the sampling radian frequency ?s
is high enough so that
|X (j?)|= 0 for |?|? ?s2 (7.10)
where X(j?) is the CTFT of x(t) then the sampling condition is met, and x(t) is perfectly
recoverable from its samples.
In more common parlance, Theorem 7.10 states that a bandlimited signal x(t) can be
perfectly reconstructed from its samples if the sample rate is at least twice the bandwidth.
In the following section, the foundations of the Nyquist-Shannon sampling theorem will be
built.
187
7.2 DAC Fundamentals
As part of this work, we will briefly discuss the fundamentals of DAC behavior. A
DAC transforms a digital code word into a physical, electrical quantity. Typically this
electrical quantity is a voltage (i.e. low impedance output) or a current (i.e. high impedance
output). After the DAC generates the physical quantity, it is filtered by a low pass analog
reconstruction filter or something that approximates such a filter. This need not be the
case however, as output signals with frequencies higher than the first Nyquist zone can be
synthesized by applying a bandpass filter to DACs with certain types of responses.
The following analysis makes heavy use of convolution. So in keeping with the spirit of
this thesis, it is presented here along with one of its more important properties (Theorem
7.2) [59].
Definition7.2 (Convolution). The convolution off(t) andg(t), denoted (fstarg)(t), is defined
mathematically as
(f starg)(t) =
integraldisplay ?
??
f (?)g(t??)d? (7.11)
Theorem 7.2 (Fourier Convolution Theorem). Let x(t) and y(t) be continuous function of
t, then
F{(xstary) (t)}=F{x(t)}?F{y(t)} (7.12)
F{x(t)?y(t)}=F{x(t)}starF{y(t)} (7.13)
This is generally referred to as the convolution theorem.
188
Proof. Let x(t) and y(t) be continuous functions whose Fourier transform exists. Then the
Fourier transform of the convolution of x(t) and y(t) is
F{(xstary) (t)}=
integraldisplay ?
??
(xstary) (t)e?j?tdt
=
integraldisplay ?
??
parenleftbiggintegraldisplay ?
??
x(?)y(t??)d?
parenrightbigg
e?j?tdt (7.14)
The order of integral operations can be rearranged provided that Fubini?s theorem [62] is
satisfied by the double integral of Equation 7.14 to yield
F{(xstary) (t)}=
integraldisplay ?
??
x(?)
parenleftbiggintegraldisplay ?
??
y(t??)e?j?tdt
parenrightbigg
d? (7.15)
Substituting r = t+? for t into Equation 7.15 and noting that dt = dr gives the final result
F{(xstary) (t)}=
integraldisplay ?
??
x(?)
parenleftbiggintegraldisplay ?
??
y(r)e?j?r+?j??dr
parenrightbigg
d?
=
parenleftbiggintegraldisplay ?
??
x(?)e?j??d?
parenrightbiggparenleftbiggintegraldisplay ?
??
y(r)e?j?rdr
parenrightbigg
=F{x(t)}?F{y(t)} (7.16)
The ideal DAC response is a series of weighted impulses [63]
yIDEAL(t) =
?summationdisplay
n=??
?x[n]??(t?nT) +? (7.17)
where ?(t) is the Dirac delta function defined in Equation 7.1, ? is the gain of the DAC (see
Section 7.3.1) and? is the offset of the DAC (also see Section 7.3.1). Herex[n] is understood
to be the value of the signal x at time nT and thus x(nT) = x[n]. As the DAC metrics
considered in this work are measures of the effects of non-linearity on the input signal, linear
terms ? and ? can be ignored by setting ? = 1 and ? = 0. Note that Equation 7.17 can
189
be derived from the multiplication of the Dirac comb of Equation 7.4 in Section 7.1 and
ignoring the DC offset ? and linear gain.
x(t)??T(t) = x(t)
?summationdisplay
n=??
?(t?nT)
=
?summationdisplay
n=??
x(t)?(t?nT) (7.18)
In communications, the spectrum of the generated signals is critical and as DACs are
the central actuators for such systems, it is also critical in the analysis of this chapter. Thus
the mathematical tool for computing spectrum of continuous functions is presented. The
continuous time Fourier transform (CTFT) is the given in Equation 7.19,
X (?) =
integraldisplay ?
??
x(t)e?j?tdt (7.19)
X (f) =
integraldisplay ?
??
x(t)e?j2piftdt (7.20)
where x(t) is a complex-valued function of time, ? ? R is the continuous-time angular
frequencyandX (?) isthetransformedsignalandisgenerallycomplex. Thesecondequation
describes the Fourier transform as a function of the ordinary frequency, f. The CTFT, as
withothervariantsoftheFouriertransform, isinvertible. Theinverse(backwards)transform
is given by Equation 7.21.
x(t) = 12pi
integraldisplay ?
??
X (?)ej?td? (7.21)
As an example, consider the following rectangle function (Equation 7.22, also known as
the normalized box car function. Figure 7.1a shows the time domain response of the the
rectangle function.
rect(x) =
??
????
??
????
???
0 |x|> 0.5
0.5 |x|= 0.5
1 |x|< 0.5
(7.22)
190
0
0.2
0.4
0.6
0.8
1
-1 -0.5 0 0.5 1
rect
(x/T
s)
x
(a) Rectangle Function (Time)
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
-1 -0.5 0 0.5 1
R(2
pif
)
Ordinary Frequency (Hz)
Ts = 1T
s = 2T
s = 4
(b) Rectangle Function (Spectrum)
Figure 7.1: Rectangle Function Plots
Since we will soon be discussing sampling, we will scale the rectangle function to a single
sample period Ts. We can then apply the continuous time Fourier Transform (7.19) on the
modified rectangle function.
R(?) =
integraldisplay ?
??
rect
parenleftbigg t
Ts
parenrightbigg
e?j?tdt
=
integraldisplay 0.5Ts
?0.5Ts
1e?j?tdt
= 1?j?
parenleftBig
e?j?t
parenrightBigvextendsinglevextendsingle
vextendsingle0.5Ts?0.5T
s
= 1?j?
bracketleftBig
e?0.5j?Ts?e0.5j?Ts
bracketrightBig
(7.23)
191
Applying Euler?s formula (Equation 1.31) to Equation 7.23, the expression simplifies to a
scaled sinc function.
R(?) = 1?j?
bracketleftBig
e?0.5j?Ts?e0.5j?Ts
bracketrightBig
= 1?j?
bracketleftBigg
cos
parenleftBigg
?Ts?2
parenrightBigg
+jsin
parenleftBigg
?Ts?2
parenrightBigg
?cos
parenleftBigg
Ts?2
parenrightBigg
?jsin
parenleftBigg
Ts?2
parenrightBiggbracketrightBigg
= 1?j?
bracketleftBigg
?2jsin
parenleftBigg
Ts?2
parenrightBiggbracketrightBigg
= sin
parenleftBig
Ts?2
parenrightBig
?
2
(7.24)
This analysis will become important in following paragraphs when the output spectrum of
various DACs are considered. The unnormalized sinc function is defined as
sinc(x) , sin (x)x (7.25)
and is both non-causal and infinite. Setting Ts = 1, it follows that R(?) = sinc(?/2).
Figure 7.1b shows the frequency response of the rectangular function for various Ts.
Now we compute the Fourier transform of the more complexyIDEAL(t) of Equation 7.17.
FCTFT{yIDEAL(t)}=F
braceleftBigg ?summationdisplay
n=??
x[n]?(t?nT)
bracerightBigg
=
integraldisplay ?
??
?summationdisplay
n=??
x[n]?(t?nT)e?j2piftdt
=
integraldisplay ?
??
[x[0]?(t) +x[1]?(t?T) +x[2]?(t?2T)???]e?j2piftdt
= x[0]e?j2pif +x[1]e?j2pifT +???
=
?summationdisplay
n=??
x[n]e?j2pifnT (7.26)
192
Notice that Equation 7.26, normalizing T = 1, is exactly the same as the DTFT shown in
Equation 2.31. Thus what was stated as sampling in the time domain in words now has a
mathematical representation.
The conventional CS DAC updates the data output with a new code word once every
sampling interval and holds the value until the DAC is updated again. This is sometimes
called a zero-order hold or a sample and hold. A DAC that implements such an hold is called
a non-return-to-zero (NRTZ) DAC. Figure 7.2a provides an example of the time domain
output of an NRTZ DAC. If the DAC returns to zero (RTZ) after waiting T0 seconds, then
the DAC is referred to an RTZ DAC. Figure 7.2b shows the output of an RTZ DAC with a
50% duty cycle, which is equivalent to setting T0 = Ts/2 in Equation 7.33.
-10
-5
0
5
10
0 2 4 6 8 10 12 14 16
DA
C
Output
(V)
Time (ns)
(a) Non-Return-to-Zero DAC Output (4 Bits)
-10
-5
0
5
10
0 2 4 6 8 10 12 14 16
DA
C
Output
(V)
Time (ns)
(b) Return-To-Zero DAC Output (4 Bits)
Figure7.2: INLCurvesforThermometer-CodedDACModelswithFiniteOutputImpedance
Current Sources
There are two methods for describing how the hold effect shapes the response of the
DAC. As noted by Doris et al. [64], the NRTZ DAC response can be formulated as the
convolution of the unit step response and a variation of the DAC response given in Equation
7.17,
yNRTZ(t) = u(t)star
?summationdisplay
n=??
(x[n]?x[n?1])?(t?nT) (7.27)
193
whereu(t) is the unit step response, also known as the Heaviside step function, and is defined
in Equation 7.28.
u(t) =
??
??
???
1, t?0
0, t< 0
(7.28)
An alternative representation noted by the author of this work is the convolution of a rectan-
gular function (Equation 7.22) of the width the sample period with the ideal DAC response
yNRTZ(t) = rect
parenleftbigg t
Ts
parenrightbigg
star
?summationdisplay
n=??
x(t)?(t?nTs) (7.29)
Referring back to Theorem 7.2, convolution in the time domain is equivalent to mul-
tiplication in the frequency domain (Fourier transform domain). The Fourier transform of
rect(t) was calculated in Equation 7.24. The right-hand term is simply the ideal output
spectrum of an ideal DAC (or the DTFT of the sequence synthesized by the DAC).
FCTFT{yNRTZ(t)}=F
braceleftbigg
rect
parenleftbigg t
Ts
parenrightbiggbracerightbigg
?F{yIDEAL(t)} (7.30)
=
?
?sin
parenleftBigT
s?
2
parenrightBig
?
2
?
??F{yIDEAL(t)} (7.31)
The output then has a weight sinc filtered output response. This attentuation was the reason
for the inverse sinc filter of radar DDFS, described in Section 6.5.1.
Likewise, an RTZ DAC output response can be formulated as the convolution of the
unit step and two time shifted Dirac delta functions [64]
yRTZ(t) = u(t)star
?summationdisplay
n=??
x[n] (?(t?nT)??(t?T0?nT)) (7.32)
194
or, again, as the convolution of a rectangular function of width T0 < Ts (typically Ts/2 in
literature).
FCTFT{yRTZ(t)}=F{rect(t/T0)}?F{yIDEAL(t)} (7.33)
=
?
?sin
parenleftBigT
0?
2
parenrightBig
?
2
?
??F{yIDEAL(t)} (7.34)
From Figure 7.1b, the shorter the pulse width, the less attenuation of the output spectrum
due to value holding. This means that the inverse sinc filter requirement can be removed or
the order of the filter reduced by changing the DAC output characteristic.
To quantify the effect, let T0 = Ts/2,
F{yRTZ(t)}
F{yNRTZ(t)} =
sin
parenleftBig
T0 ?2
parenrightBig
yIDEAL(t)
?
2
?
?
2
sin
parenleftBig
Ts?2
parenrightBig
yIDEAL(t) =
sin
parenleftBig
T0 ?2
parenrightBig
sin
parenleftBig
Ts?2
parenrightBig (7.35)
Performing a Taylor series expansion (Definition 4.1) on the numerator and denominator
sin
parenleftBig
T0 ?2
parenrightBig
sin
parenleftBig
Ts?2
parenrightBig =
T0?
2 ?
T30 ?3
8?3! +
T50 ?5
32?5! ????
Ts?
2 ?
T3s ?3
8?3! +
T5s ?5
32?5! ????
(7.36)
= T0?
T30 ?2
4?3! +
T50 ?4
16?5! ????
Ts?T3s ?24?3! + T5s ?416?5! ???? (7.37)
Now we substitue Ts = 2T0 into the previous equation and compute the final result.
F{yRTZ(t)}
F{yNRTZ(t)} =
T0?T30 ?24?3! + T50 ?416?5! ????
2T0? 8T30 ?24?3! + 32T50 ?416?5! ????
(7.38)
= 1?
T20 ?2
4?3! +
T40 ?4
16?5! ????
2? 2T20 ?23! + 2T40 ?45! ????
(7.39)
But T0 and Ts are generally much less than one, a 1 GHz DAC would has a 1 ns clock
period. Therefore the attenutation is approximately 1/2 when returning to zero. Note that
if T0 becomes sufficiently small higher Nyquist zones of the DAC can be used.
195
7.3 DAC Performance Metrics
DACs operate in a wide range of environments with an equally wide range of require-
ments. A control DAC for a microelectromechanical system (MEMS) may need only operate
at a few kilohertz sample frequency but may require a large output voltage and monotinicity
over process variation. A DAC for a high-speed communications link may only require a few
hundred millivolts of output swing but may have to operate up to several gigahertz. The
qualities of both these DACs can be described by their static and dynamic performance. The
remaining analysis of this chapter aims to aid designers in creating high performance DACs.
When writing about ?high performance? devices, we want to be precise in the describing
the measure of that performance. For the purposes of this dissertation, the static measures of
concern are integral non-linearity, differential non-linearity, and static power consumption.
The dynamic measures of concern are spurious free dynamic range, signal to noise ratio,
sample frequency and total harmonic distortion. These performance metrics are influenced
by a wide variety of effects.
7.3.1 Static DAC Performance
Static errors are both the simplest DAC design errors to understand and the simplest to
correct. They therefore serve as an adequate starting place for DAC performance analysis.
It is entirely possible to degrade the overall performance of the DAC by failing to weight
individual DAC elements such that the contribution of static errors to the output are zero.
This can be particularly bad in high speed CS DAC designs, where device sizing is small and
device mismatch becomes significant. The two static performance number discussed in this
chapter are Integral non-linearity (INL) and Differential non-linearity (DNL). Both these
errors are direct causes for harmonic distortion in the DACs.
While not nearly as important in high-speed CS DACs as non-linear errors, two linear
errors are mentioned for completeness. Offset error is defined as the linear deviation of the
DAC output from the intended output applied to every DAC code. Figure 7.3a presents
196
Output (LSB)
Code
Offset
Error01
2
3
4
5
6
7
000001010011100101110111
(a) Offset Error (3 Bits)
?1
?0
Output (LSB)
Code
Gain
Error= ?1??001
2
3
4
5
6
7
000001010011100101110111
(b) Gain Error (3 Bits)
Figure 7.3: Graphical Explanation of Gain and Offset Errors
a graphical explanation of offset error in a DAC. Gain error is defined as the deviation in
the gain of the DAC versus the intended design target. Figure 7.3b provides a graphical
explanation for gain error. Note that neither gain or offset errors contribute to the spurious
performance of the DAC.
7.3.2 INL
Integral non-linearity (INL) is a measure of the deviation of the static transfer function
of the DAC from some ideal linear transfer function. The measure is generally normalized to
the LSB value of the DAC for reporting in publications. The two most widely used methods
for determining the ideal linear transfer characteristic are end-point to end-point and least-
squares linear (?best?) fit. Both of these techniques compensate for linear errors, typically
gain and offset error, as required by the IEEE definition for INL [65]. The following is the
official description of INL for an N-bit DAC.
INL[k] = Iout[k]?k?IlsbI
lsb
; Ilsb = Iout[2
N?1]
(2N?1) (7.40)
197
where Iout[2N?1] is the maximum output of the DAC (i.e. assumes that the DAC output
increases in magnitude with increasing k) and Ilsb is the LSB step of the DAC using the
end-point to end-point line approximation. For the equation of a line, we use:
y = mx+b (7.41)
where m is the slope of the line and b is the y-intercept of the line. For the DAC, x is the
code word of the DAC, m is the LSB value of DAC used in our INL and DNL equations
and b is the offset error. Figure 7.4 provides a graphical explanation of INL using a 3-bit
DAC. The solid line is an end-point to end-point fitted line, the solid dots are the actual
DAC output values, the x-axis is the DAC code input. So the solid line from Equation 7.41
Figure 7.4: Graphical Explanation of INL and DNL
Output (LSB)
Code000001010011100101110111
0
1
2
3
4
5
6
7
INL010
d111 = y111?y110
DNL111 = d111?LSB
can be rewritten in a discrete form as:
Iout[n] = IlsbA[n] +b (7.42)
whereA[n] is the DAC input code word at thenth sample. This is in keeping with the output
code from the DCDO in previous chapters. The end-point to end-point method takes the
difference of the output of the DAC at the maximum and minimum output code words and
198
then adjusts out the offset error. The linear least squares fit, described in detail in Section
6.1.3, finds the line that minimizes the mean square error between itself and the actual DAC
output data points.
Several common sources of static INL, and DNL for that matter, in CS DACs have been
described in literature.
? Finite output impedance of the DAC current sources [66], [67].
? Mismatch in current sources due to local process variation, transistor and resistor
mismatch, etc. [68]
? Voltage and temperature dependent resistive load variation (particularly if a polycrys-
talline silicon resistor [69],[66] is used).
Static INL degrades the spectral purity of the generated signal. This can be demon-
strated by applying a sinusoid through the DAC transfer characteristic. This will be per-
formed in subsequent sections. In order to start quantifying the static error for current
steering DACs, static DAC models must be created.
7.3.3 DAC Models
Performing a first order analysis of a current steering DAC provides insight into a few
of the major error sources that arise from the basic architecture. Figure 7.5 shows a single-
ended, binary weighted current-mode DAC architecture with quite a few assumptions and
simplifications. There are no output frequency dependent impedances as no load or source
capacitance is considered, for the moment ignore CL is the diagram. The switches are ideal
and switch instantaneously. The on resistance of the current source, ro, is assumed to scale
linearly with the increasing current. We will see that even some of the simplest models of
the CS DAC have a non-linear transfer function and thus produce a spurious response when
driven with a periodically repeating input. In Figure 7.5, RL is the load resistance, Iu is the
LSB value of the current source, bi is the ith bit of the code word A that is fed to the DAC,
199
roIu
b0 b1
r0
22Iu
RL
VCC
CL
VCC
Vout
bn
ro
2n2nIu
Figure 7.5: Simple Single-Ended Binary-Weighted Model
b0 is the least significant bit (LSB) and bn, where m = BA?1, is the most significant bit
(MSB). We note that this model can be transformed to an equivalent thermometer coded
DAC (Figure 7.6) and the transfer function analysis will remain the same. This is possible
since the current sources add in parallel and the resistances scale in the binary model. For
roIu
t0 t1
r0Iu
RL
VCC
CL
VCC
Vout
tn
roIu
Figure 7.6: Simple Single-Ended Thermometer Model
Figure 7.6, the number of switches closed at sample n is equal to the value of A[n] from
Equation 7.42. The total number of switches is NA = 2BA?1, where BA is the number of
bits required to represent in the DAC input code. Using this model, we can calculate the
transfer function of the DAC model. When A[n] = 0, all of the bits, ti are zero. For this
exercise, a bit value of zero indicates an open switch. In that case, no current is drawn and
the output Vout = VCC.
200
Now consider the scenario when A[n] = 1. In this case, t0 = 1 and t2 =???= tn = 0. So
the circuit in Figure 7.6 reduces to Figure 7.7. This is effectively one output impedance in
Iu
RL
VCC
ro
Vout
Figure 7.7: Single-Ended Single Bit Active
parallel with the resistive load. Now we note that as the switches close, the resistors combine
in parallel. From circuit theory, the parallel combination of k resistors of the same value is
R/k.
Iout[k] = kIu + Vout[k]r
o/k
VCC?Vout[k]
RL = kIu +
kVout[k]
ro
Vout[k] = roVCCr
o +kRL
? kRLroIur
o +kRL
(7.43)
Figure 7.8a shows the INL of Figure 7.6 using Equation 7.43 for values of BA = 10, RL =
50 ?, r0 = 100 k?, Iu = 20?A and VCC = 2 V. Mathematically, the INL can be computed
from Equation 7.43 as
INLSE [k] = IuR
2
Lk(k?NA)
r0 (7.44)
This is equivalent to the single-ended INL derivation used by several authors and whose
derivation can be found in Razavi?s popular converter design book [66] Principles of Data
201
-120
-100
-80
-60
-40
-20
0
0 200 400 600 800 1000
INL
(LSB)
Code Word
(a) Single-Ended Thermometer-Coded INL
-10
-5
0
5
10
0 200 400 600 800 1000
INL
(LSB)
Code Word
(b) Differential Thermometer-Coded INL
Figure7.8: INLCurvesforThermometer-CodedDACModelswithFiniteOutputImpedance
Current Sources
Conversion System Design. The worst case INL using Equation 7.44 is:
INLSE,max = IuR
2
LN
2
4ro (7.45)
Fortunately, the situation can be improved by using a differential DAC architecture.
Current steering designs inherently differential so a close look at the INL of the architecture
is important. Figure 7.9 shows a simple thermometer-coded DAC with a differential output.
In the architecture a switch closes the Voutp wire when the ti bit value is one and closes to
Voutm value when the ti bit value is zero.
Using Equation 7.43 for the single ended analysis independently,
Voutp[k] = roVCCr
o +kRL
? kRLroIur
o +kRL
(7.46)
Voutm[k] = roVCCr
o + (NA?k)RL
? (NA?k)RLroIur
o + (NA?k)RL
(7.47)
202
roIu
RL
VCC
RL
VCC
r0Iu
Voutm
Voutp
roIu
Figure 7.9: Simple Differential Thermometer Model
The output is taken as the difference between Voutp and Voutm. Therefore, after algebraic
manipulation, the output voltage is found to be:
Vout[k] = Voutp[k]?Voutm[k] (7.48)
= (NA?2k) (roRLVCC +Iur
2
oRL)
(kNA?k2)R2L +roNARL +r2o (7.49)
The INL can then be computed using Equation 7.48. Figure 7.8b shows the differential INL.
Notice the improvement is significant. As stated in the previous section, the effect of INL
on the output spectrum can be shown by driving a sinusoid through the transfer function.
Figure 7.10a shows the effect on the single-ended DAC INL and Figure 7.10b shows the effect
using the differential DAC INL. Note that the differential DAC has no even order harmonic
distortion, whereas the single-ended DAC suffers from a large second order spur. From this
analysis it is clear that DAC current source architectures should be chosen such that the
output impedance is large. Some publications refer to the INL/DNL degradation due to
changing output impedance from the DAC input code state as code dependent load variation
(CDLV) [70].
203
-100
-80
-60
-40
-20
0
0 0.1 0.2 0.3 0.4 0.5Normalized
Output
Sp
ectrum
(dB)
Frequency (GHz)
(a) Single-Ended Thermometer-Coded Spectrum
-100
-80
-60
-40
-20
0
0 0.1 0.2 0.3 0.4 0.5Normalized
Output
Sp
ectrum
(dB)
Frequency (GHz)
(b) Differential Thermometer-Coded Spectrum
7.4 Dynamic DAC Performance
One of the earliest papers dealing specifically identifying the causes of dynamic perfor-
mance degradation from Van den Bosch et al. [71] captures many of the dynamic problems.
1. The imperfect synchronization of the control signals of the current switches.
2. The digital signal feed-through via the Cgd of the switch transistors.
3. The voltage variation at the drain of the current source transistors.
4. The variation in the output impedance of the current sources.
In addition to these, one of the other major issues is inter-symbol interference. Each of these
will be briefly described before offering suggested solutions.
Non-linearities from the finite output impedance of CS DACs have been carefully ana-
lyzed by several authors. One of the better works discussing the problems arising from the
dynamic effects of a frequency dependent finite output impedance are provided by Lin et al.
[72]. Lin designed a 2.9 GS/s DAC in a 65 nm CMOS process with excellent linearity. Small
feature size CMOS generally does not provide a large output impedance at high frequencies,
204
when compared to the latest SiGe or InP bipolar devices and thus the design is remark-
ably interesting. If the ro from Section 7.3.3 is replaced by an complex impedance and the
CL of the load is not ignored, then the same non-linearities experienced with static output
impedance mismatches apply to the dynamic case. If the frequency of the synthesized tone is
large, thenZo andZL become small and there is a significant degradation in the performance
of the DAC [71].
Mismatches from process variation, or out-right nominal static timing errors, in delay
of a signal path can cause harmonic distortion and spurs. The mismatch creates a glitch at
the output of the DAC from an off-timing transition. If the DAC is generating a periodic
signal, then this mismatch occurs in a periodic manner, generating a spur.
Intersymbol inteference (ISI) describes the phenomenon of a previous DAC code word
(symbol) affecting the output characteristics of the current DAC code word. Ideally, the
output of the DAC would only be dependent on the current code word. Three significant
ISI causes are as follows:
1. The data value of a current switch is dependent on previous states due to the switches
themselves not recovering to a memoryless state in a given amount of time.
2. Dependency of the connected bias circuitry on the DAC code word.
3. Dependency on the output voltage of the DAC on the switching.
An example of when (2) becomes an issue is when the current source transistors are
influenced by the switching be action of the current steering transistors. If the tail current
does not return to its nominal operating state before transitioning to the next state, then an
code dependent effect will be observed. This effect might be observed when operating near
the frequency limits of the technology (i.e. the current source is simply not ?fast? enough)
or through an improperly designed current source (e.g. the switching pushes transistors into
saturation, which can take a significant amount of time to recover).
205
Figure 7.10 shows an output glitch dependent on the device size of the switches of the
current source. This is related to the charge feedthrough described in [64], but it is also
simply a function of not adjusting the driving cells of the current switches for scaling.
Figure 7.10: Glitch Versus Device Size (1 ?m to 10 ?m)
-0.4
-0.2
0
0.2
0.4
0.46 0.48 0.5 0.52 0.54
Output
Voltage
(V)
Time (ns)
Now that the main static and dynamic sources of error have been presented, DAC and
current steering architectures will be presented that help mitigate these errors.
7.5 DAC Architectures
Current steering DACs can be divided into categories based on the architecture chosen
in the design. This section provides a brief overview of several important DAC architectures
thatshouldbeconsideredwhendesigningaCSDAC.ThetwomaintypesofDACsconsidered
forCSarebinary-weightedDACsandthermometer-codedDACs. AnBA-bitbinaryweighted
CS DAC consists of BA scaled current sources. The simplest such current steering design
scales the transistors through in DAC by a binary weight.
7.5.1 R-2R DACs
The classic R-2R DAC is generally presented in an operational amplifier configuration,
but an analog exists for CS architectures [73][74]. In [73], an R-2R ladder network is used in
206
the design of a 10-bit DAC in a bipolar process. The main motivation for the architecture
is to avoid the challenging issue of scaling resistors to achieve binary weighting. Figure
7.11 shows a differential version (as [73] drove the signal single-ended into an operation
amplifier) of the architecture. Note that the resistor network is formed at the emitters of
the current sources. Also observe that the emitter area of the devices must be scaled with
Figure 7.11: R-2R with Binary Scaling (Emitter Network)
Q0
RL
VCC
RL
VCC
R
R
2R
Q1
R
2R
Q2
R
2R
Q3
2R
QnVb
the binary weighting. So this architecture, while it relaxes the resistor sizing requirements,
still suffers from a difficult NPN device scaling problem as the number of bits in the design
grows large. This particularly a problem at high speeds, as the minimum size device must
have a sufficient amount of current to operate at the target frequency. The metric used as a
measure of the operating speed of the device at a specific current density is the unity-gain
bandwidth product. For an HBT or bipolar transistor, this is (to the first order) [75]:
fT = 12pi gmC
pi +C?
(7.50)
207
where Cpi and C?, the Miller capacitance, can be found in the hybrid-pi model of a bipolar
transistor. gm is the transconductance of the transistor and is linearly related to the current
through the device (Equation 7.82).
The authors [74] introduce an alternative R-2R. The language used to distinguish be-
tween the two types of R-2R circuits are binary attenuation (the architecture proposed by
the author) and binary scaling, the previously described R-2R architecture. The naming
convention is borrowed for this work. In the binary attenuation architecture, the devices
Figure 7.12: R-2R with Binary Attenuation (Collector Network)
Q0
RR
RERE
QBA?2
2R2R
RE
QBA?1
2R2R
Vb
RR
RL
RR
RL
VCC
need not be scaled to achieve current scaling at the output. The currents driven by the
sources are attenuated through a resistor network to achieve binary weighting. The R-2R
ladder is located at the output of the DAC, which are the collectors of the transistor current
cell switches, and divides the output current down as shown in Figure 7.12.
The advantages of the binary scaling architecture are:
? The architecture requires half the number of resistors when compared against the
binary attenuation architecture in a differential DAC setting. This is because the
R-2R division must occur for both outputs.
208
? Matching between resistors in the network result in common mode INL distortion in
a differential architecture.
? The current through the R-2R network is mostly constant and therefore does not suffer
fromtemperaturechangesbasedonDACstate. ModernhighspeedDACdesignsrarely
mention this, as the temperature time constant is much, much less than the switching
speed of the DAC. The mismatch from device mismatch and timing will likely produce
error long before that generated by temperature gradients. We also note that small
feature sizes allow components to be placed in close proximity to each other, thus
allowing a more uniform heat distribution across the R-2R ladder network.
The benefits of a fewer number of resistors is lessened because the devices must be scaled
with increasing weight. The number and size of the devices can dominate the area of the
resulting DAC. The advantages of the binary attenuation architecture are:
? The devices do not need to be scaled with binary weighting. This is significant, as
larger or more devices result in higher parasitics which is of critical importance in high
speed designs.
? A simpler emitter generation that is more easily matched between current sources.
In small geometry designs, the metal routing can cause significant (where significant
depends on the resolution of the DAC) voltage drops.
? Scaling the current through an active current source decreases the output impedance.
Finite output impedance of current sources in major contributor to [72],[67],[76]. In
Section 7.3.3, the effects of finite output impedance was looked at more closely.
A pure R-2R DAC, regardless of the architecture chosen, is a binary-coded DAC.
7.5.2 Thermometer Coded and Segmented DACs
In binary-coded DACs, each control source is weighted by a factor of two, as discussed
in Section 7.5. The implementation of such a DAC is efficient in that only as many active
209
components as necessary are switched when a code word changes. However, binary DACs
are highly susceptible to mismatch errors. Binary DACs also suffer from non-monotinicity
in the presence of device mismatch. The jump typically happens on the bit boundary to the
next power of two bit, for instance from code 7 (3?b0111) to code 8 (3?b1000).
To address this issue, designers have introduced thermometer-coded DAC architectures
[77] [78] [79]. In fact, it would be more suprising to find a modern CS DAC architecture that
did not have thermometer coded DAC as part of the design. In these designs, the value of
the input code word A[n] determines the number of switches to close.
Ideally, a designer would like all the benefits of thermometer-coded and binary-coded
DACs simultaneously without suffering any of the drawbacks. One way to balance the area
and speed benefits of binary-coded DACs with thermometer-coded DACs is to segment the
designintoportions. Figure7.13showsagenericsegmentedarchitecturesusingthermometer-
coded current switches for the MSBs and a binary attenutation R-2R ladder for the LSBs.
QB0
RR
RERE
QBM?2
2R2R
RE
QBM?1
2R2R
RE
QT0
RE
QTN?1
Vb0
RR
RL
RR
RL
VCC
Vb1
Therm. Binary
Figure 7.13: Segmented R-2R Binary with Thermometer MSBs
210
7.5.3 Return-to-Zero (RTZ)
A common technique used to mitigate the impact of ISI is and sometimes to reduce
in the impact of sinc roll-off at higher frequencies. The technique can also be used to take
signals from higher Nyquist zones. Figure 7.2a shows a NRTZ DAC output and Figure 7.2b
shows an RTZ DAC output with 50% duty cycle. Though the RTZ technique has been used
extensively in DAC design for a significant period of time, it surprisingly does not show up in
many academic DAC publications. Table 7.1 is a small collection of RTZ DACs in literature.
Table 7.1: Published RTZ DACs
Publication Year Frequncy (GHz) SFDR (dBc)
[80] 2005 1.6 70
[81] 2011 1.6 66
[4] 2012 7.2 80
[4] 2012 12 67
Compare these results to the NRTZ DAC listed in the Table 7.2 below. Outside of
several low frequency DACs (where one would not have issues with ISI), there is a clear
performance improvement over a majority of the NRTZ cases. The requirements for an
inverse sinc filter are also relaxed as has already been discussed. RTZ addresses ISI by
forcing the output of the DAC to a memoryless state (i.e. zero) before applying the next
codeword. ThiscanclearlybeseeninFigure7.2b. ThispreventstheDACfromtransitioning
from a code dependent state, which is obviously have CDLV effects. RTZ also address the
charge feedthrough problem from data switching. This is because the data is allowed to
change while the output is at a zero state. This is illuminated further in Section 7.6.
7.5.4 Translinear Output Buffers and Non-Linear DACs
Though the issues of implementing a high speed phase accumulator has been addressed
in Chapter 5, oftentimes the one wishes to avoid the ROM compression circuitry or the
211
Source SFDR (Low) SFDR (Nyq.) Area (mm2) Power (mW) fs (MHz)
[82] 58 (9.6 MHz) N/A 5.000 730 1000
[83] 56 (3.9 MHz) N/A 1.800 150 125
[84] 49 (8.0 MHz) N/A 1.220 140 75
[78] 87 (2.0 MHz) 71 16.00 650 100
[85] 73 (8.0 MHz) 55 0.600 125 500
[86] 71 (1.0 MHz) 55 3.200 320 300
[57] 61 (5.0 MHz) <50 13.10 300 150
[87] 70 (100.0 MHz) 61.2 0.350 110 1000
[64] 78 62 1.130 216 500
[88] 75 63 30.60 6000 1200
[72] 74 52 0.310 188 2900
[89] 76 61 1.000 97 200
[90] 67 N/A 2.500 400 1400
[91] 95 (1 MHz) <59 0.440 82 320
[92] 71.68 (1 MHz) 43 0.800 25 250
[93] 64 (1 MHz) <40 1.000 20 100
[94] 82 72 11.83 180 100
[95] 98 (10 MHz) 74 1.950 400 400
[96] 60 (1 MHz) N/A 0.230 N/A 800
[79] 80.7 (1 MHz) 80.7 0.280 N/A 10
[97] 47.3 (30 MHz) 36.2 0.200 29 3000
[33] 50 (91.7 MHz) 45 4.200 4800 8600
Table 7.2: SFDR of NRTZ DACs
complex multiplexer tree. Directly generating the high speed phase with a pipeline phase
accumulator avoids the multiplexer tree entirely. The size and area requirements for ROMs
that operate at ultra-high frequencies proved prohibitive. This has lead to the creation of
non-linear DACs [98, 33]. In these DACs, the current sources are generally ?sine-weighted?,
such that a linear ramp through the DACs bits generates a sinusoidal output.
An alternative technique is run the phase output through a linear DAC that then drives
a translinear device for sinusoidal generation. The idea of using a non-linear device to
transform a linear output to a sinusoid is not new in DDFS literature; however, a recent
DDFS by Yang et al. demonstrates remarkably good high speed performance at low powers
[99]. Before delving into the more recent implementation, a review of earlier literature assists
212
Figure 7.14: Differential Pair
VCC
RL
Q1
R
i
Q2
RL
VCC
IT IT
vip vim+
VBE1-
+
VBE2-
vop vom
in the development. In 1976, Meyer et al. [100] used a differential pair as a triangle to sine
wave converter. Figure 7.14 shows the architecture used by Meyer for his triangle-to-sine
conversion analysis. The goal is to approximate a sinusoidal output at the terminals of the
differential pair given an triangular input using the physical properies a bipolar transistor,
i.e.
vod = vop?vom = a1 sin (a2vid) (7.51)
where vid represent the differential input voltage vip?vim where a1 and a2 are two linear
coefficients that do not affect the spectral purity of the generated signal. Observing Figure
7.14, becomes clear that the output can be written as a function of the current i through
the resistor R. Firstly, Ohm?s law is used to find the relationship of the collector current
through Q1 and Q2 to the output of the differential pair.
vop = VCC?RLIC1 (7.52)
vom = VCC?RLIC2 (7.53)
vod = RL (IC2?IC1) (7.54)
213
where RL the resistive load and VCC is the supply voltage. The emitter current is related to
the collector current of the transistor through the relationship
IE = IC + IC?
F
= IC
parenleftBigg ?
F
1 +?F
parenrightBigg
= ?FIC (7.55)
where?F is the forward gain of a bipolar transistor and?F = ?F/(1+?F) is commonly used
in microelectronics texts [101]. If ?F is sufficiently high, as is the case in SiGe HBTs, then
IC ?IE as ?F ? 1. In this particular analysis, ?F is kept throughout the analysis, which
differentiates it from Meyer?s analysis. This analysis also applies a differential input voltage,
as opposed to driving the differential pair single-ended.
vod = 1?
F
(IE2?IE1) (7.56)
Applying Kirchoff?s Current Law (KCL), it is clear thatIE1?i?IT = 0 andIE2 +i?IT = 0.
Adding IT to both sides of the equality yields Equation 7.57.
IE1 = IT +i (7.57)
IE2 = IT?i (7.58)
Substituting Equation 7.57 into Equation 7.56,
vod = 1?
F
[(IT?i)?(IT +i)] =? 2?
F
i (7.59)
Thus the output of the differential pair is a linear function i. Consider Kirchoff?s Voltage
Law (KVL) about the base-emitter pairs for solving for i.
?vip +VBE1 +iR?VBE2 +vim = 0 (7.60)
vid = VBE1 +iR?VBE2 (7.61)
214
The base-emitter voltage can be written as a function of the collector current as shown in
Equation 7.62
VBE = VT ln
parenleftbiggI
C
IS
parenrightbigg
(7.62)
where IS is the transport saturation current of the Gummel-Poon model [75] and VT is the
thermal voltage defined in Equation 7.83. The equation assumes that the forward Early volt-
age, VA, of the device is infinite (i.e. the bipolar transistors have infinite output impedance,
which is certainly not a valid assumption for high output frequencies). Using this relation-
ship, the large signal transfer function of the differential pair can be derived. Substituing
Equation 7.62 into Equation 7.60 yields the following equation
vid = iR +VT
bracketleftbigg
ln
parenleftbiggI
C1
IS
parenrightbigg
?ln
parenleftbiggI
C2
IS
parenrightbiggbracketrightbigg
(7.63)
= iR +VT ln
parenleftbiggI
C1
IC2
parenrightbigg
(7.64)
where the subtraction (addition) property of logarithms is used, ln(a)?ln(b) = ln (a/b).
Substituting Equation 7.57 and Equation 7.55 into Equation 7.63
vid
VT =
iR
VT + ln
bracketleftbiggparenleftbiggI
T +i
?F
parenrightbiggparenleftbigg ?
F
IT?i
parenrightbiggbracketrightbigg
(7.65)
= iRV
T
+ ln
parenleftbiggI
T +i
IT?i
parenrightbigg
(7.66)
Applying the Taylor series in the neighborhood of i = 0 yields the series
ln
parenleftbiggI
T +i
IT?i
parenrightbigg
= 2
bracketleftBigg i
IT +
i3
3I3T +
i5
5I5T +???
bracketrightBigg
(7.67)
= 2
?summationdisplay
n=0
i2n+1
(2n+ 1)I2n+1T (7.68)
Now the desired transfer function of i as a function of vid is:
i = b1 sin (b2vid) (7.69)
215
where b1 and b2 are some constants. Thus applying the inverse sine operation to both sides
yields
b2vid = sin?1
parenleftbigg i
b1
parenrightbigg
(7.70)
Applying the Taylor series expansion on the inverse sine function for i in the neighborhood
of i = 0 yields:
b2vid = ib
1
+ 16
parenleftbigg i
b1
parenrightbigg3
+ 340
parenleftbigg i
b1
parenrightbigg5
+??? (7.71)
Finding b1 and b2 in Equation 7.71 such that the error between it and Equation 7.65 is
minimized yields the final result
b1 = IT (7.72)
b2 = 1V
T
parenleftBigg 1
ITR/VT + 2
parenrightBigg
(7.73)
The resulting output is a triangle wave (or sine wave with large odd harmonic terms). But
one can outperform a single differential pair with a few more transistors.
Using the Pad? approximant, one can generate a rational function of low degree poly-
nomials that approximates transcendental functions such as sine or cosine quite well [102].
Equation 7.74 defines the Pad? polynomial approximation of a real function f.
f (x)? ?fp (x) =
summationtextm
j=0ajx
j
1 +summationtextnk=1bkxk (7.74)
216
where the first m+n derivatives of the function f are equal to the approximation ?fp,
f(0) = ?fp(0) (7.75)
fprime(0) = ?fprimep(0) (7.76)
f(m+n)(0) = ?f(m+n)p (0) (7.77)
Note that this approximation is closely related to the Maclaurin series of the function f and
in fact the Pad? approximant often uses the Taylor series during its derivation.
Now consider the following approximations for sinusoidal function.
1. Using the Pad? technique to approximate the sin function, we get
?sin (pix)?x(1?x
2)
1 +x2 (7.78)
2. Using the Pad? technique to approximate the cosine function, we get
cos (pix)? (1?4x
2) (2?x2)
2 +x2 (7.79)
Equations7.78and7.79areversionsofthePad?approximationwithcoefficientsrounded
to the nearest integer. Figure 7.15 is to help visualize the performance of the Pad? approxi-
mation against a more commonly used function the Taylor series approximation. The Pad?
approximant is interesting for sinusoidal approximation for the following two reasons:
? Division is more easily implemented in a translinear circuit than a high order polyno-
mial [102].
? A third order Pad? approximant is roughly as complex to implement as a third order
Taylor Series approximant with translinear circuit [102].
217
Figure 7.15: Pad? Sine Approximation
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
Normalized Phase
Absolute
Error
(P
ercen
t) Pad? (3rd)Taylor (3rd)
Taylor (5th)
Using the synthesis techniques described in [102], two translinear implementations of
the Pad? approximations were realized. Figure 7.16a and Figure 7.17 show ideal translin-
ear cicuits for the sine and cosine approximations respectively. Figure 7.16b shows a full
transistor implementation of the translinear sine operation.
A novel quadrature DDFS architecture has been proposed to take advantage of the
translinear output buffers described thus far in this section. Figure 7.18 provides a block
diagram of the proposed DDFS. The design requires a current after the output of of the
DAC on the cosine path, since the transfer function of the Pad? approximations have been
normalized. In practice, the output magnitude of the sine circuit of Figure 7.16a and the
cosine circuit of Figure 7.17 are different.
7.6 Current Steering Cell Architectures
One of the critical decisions for designing a current steering DAC is selecting an archi-
tecture for the current steering cells that comprise the DAC. The output impedance of the
DAC, ISI and sampling rate strongly depend on the performance of this single cell. In this
section, several current steering cell architectures are analyzed and the problems addressed,
218
1+x
Q1
Q2
Q3
Q4
Q5 Q6
Q7Q8
1?x
2
2
+ -?sin(pix)
(a) Ideal Current Sources
1+x
Q3
Q5
Q7
Q9
Q1 Q2 Q4
1?x
Q6
Q8
Q10
IO?=2?zIO+ =2+z
(b) Transistor Implementation
Figure 7.16: Translinear Sine Implementations
or raised, by each architecture are presented. The design decisions of this component center
on trade-offs between performance, complexity, area and power.
The most primitive cell for a differential current steering cell is a three transistor dif-
ferential pair. The current source transistor has no degeneration and there is no cascoding
at any level. Figure 7.19a shows a schematic for the simple current steering cell. QSW1 and
QSW2 are the current steering transistors, RL is the load resistor of the DAC and QCS is the
current source transistor. The current sourced by QCS is steered through QSW1 and QSW2
depending on which transistor is switched on. The value Vsp and Vsm are the differential
data driving signals with quick transistion times. The differential signals transistion in such
a way as to keep the time that both transistors are active relatively small in comparison to
the time the data is held.
The simple architecture has several advantages:
? Low power since the power supply voltage can be low.
? Small area since the current cell only requires three transistors to implement.
The drawbacks for this architecture are quite significant unfortunately.
1. The current source output impedance is low and susceptible data changes.
219
1?x
Q5
Q7 Q9 Q8Q10
Q1Q3 Q2 Q4 Q6
2 2+x
+ ?cos(pix) ?
6(1?x)
3
6(1?x)
Figure 7.17: Differential Translinear Cosine Implementation (Ideal Current Sources)
FCW
Translinear
Sine
Translinear
Cosine
Scale
Current
Time
Delay
Truncate
DAC(10Bit)
PhaseAccumulator
PhaseRegister
summationtext 2424
24
Figure 7.18: Quadrature Translinear DDFS
2. The voltage across the steering transistors is dependent on the output voltage of the
DAC.
3. Glitches that capacitively feed through the switching pair are data dependent.
The high speed data switching signal causes the tail current through QCS to change, since
QCS has a finite output impedance. At first glance, this may appear to be a common-mode
effect and therefore eliminated when looking at the output differentially. However, with
this configuration, this is not the case. When the glitch in the tail current occurs, it is
reflected disproportionately through the active transistor of the current steering pair. The
deactivated side is still in the process of activating and thus a majority of the tail current
220
QCSVb
QSW1
Vm
RL
VCC
QSW2
VpRL
VCC
Vsp Vsm
(a) Simple Current Steering Cell
QCS
RE
Vb
QSW1
Vm
RL
VLOAD
QSW2
VpRL
VLOAD
Vsp Vsm
(b) Simple Current Steering Cell with Degenera-
tion
Figure 7.19: Simple Current Steering Cells
fluctuation appears on a single side of the differential pair. This particular drawback is only
important when the glitch power becomes significant with respect to the output of the DAC.
For instance, if the DAC is clock slowly for a given process, the glitch will only appear for a
tinyfractionoftheoutputcode. Significanceisalsodictatedbytherequiredeffectivenumber
of bits (ENOB) for the DAC. For a DAC that operates near the limits of a technology, we
will argue that this is important.
While we are not concerned with the bias structure of the DAC at this point, large
fluctuations in the tail current of QCS will also propagate on the bias line Vb. If multiple
current cells are tied to the same bias node, then the current cells have a negative, code
dependent interaction. As rate of code changes are related to the signal being converted,
this produces a non-linear, output frequency dependent distortion. Improving the output
impedance of the current source mitigates this concern. As a reference, the small-signal
output impedance of the current source of QCS is approximately
ro?VAI
C
(7.80)
221
where VA is the Early voltage of the transistor and IC the collector current. The size of the
glitch at the emitters of the switching resistors is dependent on this value.
A simple step that may be used to improve the finite output impedance of a the DAC
is adding a resistor to the emitter of QCS as shown in Figure 7.19b. This resistor is called
a degeneration resistor and improves the output impedance of the DAC. Equation 7.81
approximately
ro,dg = ro (1 +gmRE) (7.81)
where RE is the value of the degeneration resistor and gm is the transconductance of QCS.
The transconductance gm is:
gm = ICV
T
(7.82)
where IC is the collector bias current through the transistor and VT is the thermal voltage
and is defined as
VT = kTq (7.83)
where k?1.3896593?10?23 J/K is Boltzmann?s constant and q?1.602176565?10?19 C
is the elementary charge constant. At room temperature, Tr = 27? C, thermal voltage is
roughly VT ? 0.026 V. As IC = 1 mA is a reasonable value for biasing the current source
and RE = 200 ? is a reasonable value for the degeneration resistor, the output impedance
of the current source is improved by?7.5 times.
The drawback, though small, is that the supply voltage of the DAC must be increased
to account for the voltage drop across the resistor. Resistors also require a non-negligible
amount of area. The output impedance can be improved further by adding a cascode tran-
sistor to the current source. Figure 7.20a introduces the cascode transistor QCA1. We first
consider adding the cascode without the degeneration transistor. In that case,
ro,ca = ro
parenleftBigg
1 + ?0gmro?
0 +gmro
parenrightBigg
??0ro, gmrogreatermuch?0 (7.84)
222
where?0 is the current gain ofQCA1. In SiGe HBT processes, the current gain can be on the
around 200 [103]. Using IC = 1 mA as the standard, this yields ro,ca ? 200ro. Adding the
degneration resistor only improves this situation further by replacing ro with Equation 7.81.
Using the same 200 ? resistor and 1 mA collector current, this results in ro,ca,dg?1400ro.
QCS
QCA1
RE
Vb1
Vb2
QSW1
Vm
RL
VLOAD
QSW2
VpRL
VLOAD
Vsp Vsm
(a) Current Steering Cell with Cascode Current
Source
QCS
QCA1
RE
Vb1
Vb2
QSW1
QCA3 VCA
Vm
RL
VLOAD
QSW2
QCA2VCA
VpRL
VLOAD
Vsp Vsm
(b) Current Steering Cell with Cascode Output
Figure 7.20: Current Steering Cells with Cascoding
Another dramatic improvement to the current source architecture can be achieved by
adding a cascode transistor to the output of the switching transistors. Figure 7.20b provides
an example of such a configuration. This particular configuration allows the switching tran-
sistors to drive a low impedance output, the emitter of the cascode transistors QCA2 and
QCA3. As shown in some of the DAC architectures of Section 7.5, in particular thermometer
coded segments, can have the load of tens of transistors tied to the same node. This load
impedance slows the performance of the DAC switches dramatically, but by adding the cas-
code transistors, the impedance is isolated from the switching transistors. Furthermore, the
transistors QCA2 and QCA3 fix the voltage variation across the switches across all DAC code
223
choices to a few hundred millivolts. Otherwise the DAC output voltage is directly applied
across the terminals of the switching transistors.
A new problem arises though from adding the cascode at the top of the transistors
shown in Figure 7.20b. When the current is steered away from the cascode transistor, the
bias current through the non-active switch cut the current off the cascode. This causes a
delay from the switch to the output of the DAC, as the cascode must change from operating
in an inactive region to being fully biased. To address this shortcoming, adding keep-alive
transistors to the output cascode as shown in Figure 7.21 will keep those transistors from
shutting off completely. The drawback of course is higher output power. This trade-off is
very often worth the increased performance. Consider the performance of [72] or [4], which
show some of the best published DAC results to date.
Q1
Q2
RE
Vb1
Vb2
QSW1 Q
KL1
RKL
QCA1 VCA
Vm
RL
VLOAD
QSW2Q
KL2
RKL
QCA2VCA
VpRL
VLOAD
Vsp VsmV
b3 Vb4
Figure 7.21: Current Steering Cell with Cascode Output and Keep Alive
All the techniques described thus far have done little to improve the non-linear glitching
from data switching or intersymbol interference. Both of these become considerable concerns
as the frequency of the process extends higher. As has already been discussed in Section
224
Q1
Q2
RE
Vb1
Vb2
QSW1
QRZ1QRZ2 Q
KL1
QKL3
RKL
QCA1 VCA
Vm
RL
VLOAD
QSW2
QRZ3QRZ4Q
KL2
QKL4
RKL
QCA2VCA
VpRL
VLOAD
Vsp Vsm
Vclkm V
clkpVclkp
VDUMP
Figure 7.22: Current Steering Cell with Cascode, Keep Alive and RTZ
7.5.3, using an RTZ architecture addresses both problems. Figure 7.22 Combines all of the
techniques discussed thus far, including an RTZ switching quad. This RTZ pair happens to
incidentally further improve the isolation of the output from the data switches by acting as
an extra cascode stage to the data switches.
The authorbelieves that constructinga DACusing segmented R-2R binary attenutation
architecture with the current switch shown in Figure 7.22 would dramatically improve the
dynamic performance of the DACs being designed at Auburn University. Many of the DACs
at Auburn, including the CMOS design discussed in Section 6.6, have dramatic decreases in
performance when synthesizing high frequency signal, where high is relative to the sampling
frequency of the DAC. The techniques described in this section provide a path to mitigate
dynamic degradation effects before even considering calibration.
225
Chapter 8
Conclusions
In this work, an exact derivation for the spurs generated by phase truncation error in
a phase accumulator were calculated using elementary number theory. The spectral theory
was developed from binary unsigned arithmetic to the final computations of the discrete
Fourier transform of the truncated phase sequence. The theory replaces the commonly cited
work by Nicholas [23] and the less commonly cited work from Torosyan [31]. The particular
derivation is well suited for teaching DDFS engineers both qualitatively and quantitatively
the origin of phase truncation spurs and would fit well in a textbook on the topic.
A novel parallel phase accumulator with linear frequency modulation was introduced
and its analysis on the size of the DCDO was presented. It was compared to other parallel
accumulators in patent literature that perform a similar operation. In processes CMOS and
BiCMOS processes with feature sizes less than or equal to 130 nm, the author argues that
every DDFS design should be parallelizing the phase accumulator. This approach removes
the need for non-linear DAC implementations entirely and allows designers to focus on the
components that are actually limiting the performance of DDFS systems (i.e. DACs).
Lastly the DDFS systems designed at Auburn University by the author are presented,
culminating in the quadrature DDFS used in the X-band radar-on-a-chip design that was
fabricated in a 130 nm BiCMOS process. A revision of the system correctly the errors
found during testing was developed to final GDSII form but the team at Auburn University
has since taken jobs making testing impractical. The design has not been submitted for
fabrication at the time of this writing.
Thereisasignificantopportunityforfutureworkderivedfromthisthesis. Firstly, imple-
menting the modified accumulators described in Section 4.7.1 and Section 4.7.2 with low-cost
226
FPGA from Xilinx feeding a low-cost DAC demonstration board from Analog Devices would
allows for physical verification of the theory, since the theory was only numerically verified in
this work. Secondly, use of the theory to modify fully explain the spectral analysis behavior
of the output response analyzer and subsequently implementing a new variable state phase
accumulator would prove interesting. The new accumulator described could also be used
to develop a very fine frequency resolution DDFS. The math fully developing the list of all
acquirable frequencies also makes for exciting analysis. Either one of these tasks, if properly
built from the work described in this dissertation would be feasible for a master?s student
to perform. The first could even be accomplished by a senior project for an undergraduate
student if proper components were supplied.
An exact analysis of the spectrum of a partial dynamic rotation CORDIC would require
significant undertaking but could also lead insights into the devices behavior (and potentially
techniques to improve it). The author believes that the CORDIC output stages can actually
be used as an ?error correction? stage at the output of a highly compressed ROM. To the
author?s knowledge, no one has ever taken a BTM or MTM LUT as the seed for a partial
dynamic rotation CORDIC.
Lastly, thetheorydevelopedinChapter4shouldbeusedintheanalysisofothersystems
where truncation occurs, such as a fractional-N synthesizer. Some of the theory used in
calculating the original phase truncation sequences property may also be used in analyzing
the sequences generated by LFSR or ?? modulators. Also, a more abstract, compact
analysis producing the same results as this work would also be instrumental in the field.
227
Bibliography
[1] L. K. Tan, E. Roth, G. Yee, and H. Samueli, ?An 800-MHz quadrature digital syn-
thesizer with ECL-compatible output drivers in 0.8 ?m CMOS,? IEEE Journal of
Solid-State Circuits, vol. 30, no. 12, pp. 1463 ?1473, Dec. 1995.
[2] A. Yamagishi, M. Ishikawa, T. Tsukahara, and S. Date, ?A 2-V, 2-GHz low-power
directdigitalfrequencysynthesizerchip-setforwirelesscommunication,?IEEE Journal
of Solid-State Circuits, vol. 33, pp. 210?217, 1998.
[3] B.-D. Yang, J.-H. Choi, S.-H. Han, L.-S. Kim, and H.-K. Yu, ?An 800-MHz low-power
direct digital frequency synthesizer with an on-chip D/A converter,? IEEE Journal of
Solid-State Circuits, vol. 39, no. 5, pp. 761?774, 2004.
[4] F. Van de Sande, N. Lugil, F. Demarsin, Z. Hendrix, A. Andries, P. Brandt, W. An-
klam, J. S. Patterson, B. Miller, M. Rytting, M. Whaley, B. Jewett, J. Liu, J. Wegman,
and K. Poulton, ?A 7.2 GSa/s, 14 Bit or 12 GSa/s, 12 Bit Signal Generator on a Chip
in a 165 GHz fT BiCMOS process,? IEEE Journal of Solid-State Circuits, vol. 47,
no. 4, pp. 1003?1012, 2012.
[5] T. Nagasaku, K. Kogo, H. Shinoda, H. Kondoh, Y. Muto, A. Yamamoto, and
T. Yoshikawa, ?77GHz low-cost single-chip radar sensor for automotive ground speed
detection,? in Proc. IEEE Compound Semiconductor Integrated Circuits Symp. CSIC
?08, 2008, pp. 1?4.
[6] Y.-A. Li, M.-H. Hung, S.-J. Huang, and J. Lee, ?A fully integrated 77GHz FMCW
radar system in 65nm CMOS,? in Proc. IEEE Int. Solid-State Circuits Conf. Digest
of Technical Papers (ISSCC), 2010, pp. 216?217.
[7] J. Rogers, C. Plett, and F. Dai, Integrated Circuit Design for High-Speed Frequency
Synthesis. Artech House, 2006.
[8] M. Skolnik, Radar Handbook, 3rd ed. McGraw Hill, 2008.
[9] J. Tierney, C. Rader, and B. Gold, ?A digital frequency synthesizer,? IEEE Transac-
tions on Audio and Electroacoustics, vol. 19, no. 1, pp. 48?57, 1971.
[10] A. Torosyan and A. N. Willson, ?Exact analysis of DDS spurs and SNR due to phase
truncation and arbitrary phase-to-amplitude errors,? in Proc. IEEE Int. Frequency
Control Symp. and Exposition, 2005.
228
[11] D. D. Sarma and D. W. Matula, ?Faithful bipartite rom reciprocal tables,? in Proceed-
ings of the 12th Symposium on Computer Arithmetic, 1995, p. 17.
[12] F. de Dinechin and A. Tisserand, ?Multipartite table methods,? IEEE Transactions
on Computers, vol. 54, no. 3, pp. 319?330, 2005.
[13] J. Qin, ?Selective spectrum analysis and numerically controlled oscillator in mixed-
signal built-in self-test,? Ph.D. dissertation, Auburn University, December 2010.
[14] G. E. Shilov, Elementary Real and Complex Analysis. Dover Publications, Inc., 1973.
[15] R. F. Lax, Modern Algebra and Discrete Structures. Addison-Wesley Educational
Publishers Inc., 1991.
[16] A. Torosyan, ?Direct digital frequency synthesizers: Complete analysis and design
guidelines,? Ph.D. dissertation, University of California, Los Angeles, 2003.
[17] J. F. Wakerly, Digital Design Principles and Practices, 3rd ed. Prentice Hall, 2001.
[18] J. R. Barry, E. A. Lee, and D. G. Messerschmitt, Digital Communication. Springer,
2003.
[19] A. Devices, ?1 GSPS, 14-Bit, 3.3V CMOS direct digital synthesizer,? 2012.
[20] ??, ?3.5 GSPS direct digital synthesizer with 12-bit DAC,? 2012.
[21] J. Qin, J. D. Cali, B. F. Dutton, G. J. Starr, F. F. Dai, and C. E. Stroud, ?Selec-
tive Spectrum Analysis for Analog Measurements,? IEEE Transactions on Industrial
Electronics, vol. 58, no. 10, pp. 4960?4971, October 2011.
[22] J. Yu, F. Zhao, J. Cali, D. Ma, X. Geng, F. F. Dai, J. D. Irwin, and A. Aklian, ?A
Single-Chip X-band Chirp Radar MMIC with Stretch Processing,? in CICC, 2012, pp.
1?4.
[23] H. T. Nicholas and H. Samueli, ?An analysis of the output spectrum of direct digital
frequency synthesizers in the presence of phase-accumulator truncation,? in Proc. 41st
Annual Symp. Frequency Control. 1987, 1987, pp. 495?502.
[24] Y. C. Jenq, ?Digital spectra of nonuniformly sampled signals. ii. Digital look-up tun-
able sinusoidal oscillators,? IEEE Transactions on Instrumentation and Measurement,
vol. 37, no. 3, pp. 358?362, 1988.
[25] S.Mehrgardt, ?Noisespectraofdigitalsine-generatorsusingthetable-lookupmethod,?
IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 31, no. 4, pp.
1037?1039, 1983.
[26] Y.-C. Jenq, ?Digital spectra of nonuniformly sampled signals: fundamentals and high-
speed waveform digitizers,? IEEE Transactions on Instrumentation and Measurement,
vol. 37, no. 2, pp. 245?251, 1988.
229
[27] Y. C. Jenq, ?Digital spectra of nonuniformly sampled signals: theories and
applications-measuring clock/aperture jitter of an A/D system,? IEEE Transactions
on Instrumentation and Measurement, vol. 39, no. 6, pp. 969?971, 1990.
[28] Y.-C. Jenq, ?Digital spectra of nonuniformly sampled signals: a robust sampling time
offsetestimationalgorithmforultrahigh-speedwaveformdigitizersusinginterleaving,?
IEEE Transactions on Instrumentation and Measurement, vol. 39, no. 1, pp. 71?75,
1990.
[29] A. Torosyan and J. Willson, A. N., ?Analysis of the output spectrum for direct digital
frequency synthesizers in the presence of phase truncation and finite arithmetic preci-
sion,? in Proc. 2nd Int. Symp. Image and Signal Processing and Analysis ISPA 2001,
2001, pp. 458?463.
[30] U. Dudley, Elementary Number Theory. Dover Publications, Inc., 1978.
[31] A. Torosyan, D. Fu, and J. Willson, A. N., ?A 300-MHz quadrature direct digital
synthesizer/mixer in 0.25-?m CMOS,? IEEE Journal of Solid-State Circuits, vol. 38,
no. 6, pp. 875?887, 2003.
[32] K. Doris, A. van Roermund, and D. Leenaerts, Wide-Bandwidth High Dynamic Range
D/A Converters. Springer, 2010.
[33] X. Geng, F. Dai, J. Irwin, and R. Jaeger, ?An 11-bit 8.6 GHz direct digital synthesizer
MMIC with 10-bit segmented sine-weighted DAC,? Solid-State Circuits, IEEE Journal
of, vol. 45, no. 2, pp. 300?313, feb 2010.
[34] S. Turner and D. Kotecki, ?Direct Digital Synthesizer with Sine-Weighted DAC at 32-
GHzClockFrequencyinInPDHBTtechnology,? IEEE Journal of Solid-State Circuits,
vol. 41, no. 10, pp. 2284?2290, oct 2006.
[35] A. Gutierrez-Aitken, J. Matsui, E. Kaneshiro, B. Oyama, D. Sawdai, A. Oki, and
D. Streit, ?Ultrahigh-speed direct digital synthesizer using inp dhbt technology,? IEEE
Journal of Solid-State Circuits, vol. 37, no. 9, pp. 1115 ? 1119, sep 2002.
[36] X. Yu, F. F. Dai, J. David Irwin, and R. Jaeger, ?A 9-bit Quadrature Direct Digital
Synthesizer Implemented in 0.18-?m SiGe BiCMOS Technology,? Microwave Theory
and Techniques, IEEE Transactions on, vol. 56, no. 5, pp. 1257?1266, may 2008.
[37] S. Pellerano, S. Levantino, C. Samori, and A. Lacaita, ?A 13.5-mw 5-ghz frequency
synthesizer with dynamic-logic frequency divider,? Solid-State Circuits, IEEE Journal
of, vol. 39, no. 2, pp. 378?383, 2004.
[38] R. H. A. W. Kovalick, ?Waveform synthesis using multiplexed parallel synthesizers,?
USA Patent 4,454,486, June, 1984.
[39] B.-G.Goldberg, ?Digitalfrequencysynthesizerhavingmultipleprocessingpaths,? USA
Patent 4,958,310, nov, 1990.
230
[40] P. A. D. B. L. Tise, ?Multiplexed chirp waveform synthesizer,? USA Patent 6,614,813,
September, 2003.
[41] S. Turner and D. Kotecki, ?Direct digital synthesizer with ROM-Less architecture at
13-GHz clock frequency in InP DHBT technology,? IEEE Microwave and Wireless
Components Letters, vol. 16, no. 5, pp. 296?298, may 2006.
[42] X. Geng, F. Dai, J. Irwin, and R. Jaeger, ?24-bit 5.0 GHz direct digital synthesizer
RFIC with direct digital modulations in 0.13 ? m sige bicmos technology,? IEEE
Journal of Solid-State Circuits, vol. 45, no. 5, pp. 944?954, may 2010.
[43] e2v, ?Low Power 12-bit 3 GSps DAC with 4/2:1 MUX,? October 2011.
[44] G. J. Starr, J. Qin, B. F. Dutton, C. E. Stroud, F. F. Dai, and V. P. Nelson, ?Au-
tomated generation of built-in self-test and measurement circuitry for mixed-signal
circuits and systems,? in Proc. 24th IEEE Int. Symp. Defect and Fault Tolerance in
VLSI Systems DFT ?09, 2009, pp. 11?19.
[45] ?Scientific computing tools for python - numpy,? Apr. 2012.
[46] ?welcome to mako,? Apr. 2012. [Online]. Available: http://www.makotemplates.org/
[47] D.D.Caro, N.Petra, andA.G.M.Strollo, ?Reducinglookup-tablesizeindirectdigital
frequency synthesizers using optimized multipartite table method,? IEEE Transaction
on Circuits and Systems, vol. 55, no. 7, pp. 2116?2127, Aug. 2008.
[48] M. J. Schulte and J. E. Stine, ?Approximating Elementary Functions with Symmetric
Bipartite Tables,? IEEE Transactions on Computers, p. 842, 1999.
[49] J. W. Eaton, D. Bateman, and S. Hauberg, GNU Octave Manual Version 3. Network
Theory Limited, 2008.
[50] S. Axler, Linear Algebra Done Right, 2nd ed. Springer, 1997.
[51] R. M. Gray and J. Stockham, T. G., ?Dithered quantizers,? IEEE Transactions on
Information Theory, vol. 39, no. 3, pp. 805?812, 1993.
[52] T. E. C. III, G. M. Flewelling, D. S. Jansen, J. D. Cali, D. A. Chan, J. Freedman,
M.Anthony, T.Dresser, F.Dai, andE.Gebarra, ?Self-HealinginSiGeBiCMOSICsfor
Low-SWAP Electronic Warfare Receivers,? in 38th Annual GOMACTech Conference,
March 11-14 2013.
[53] J. E. Volder, ?The cordic trigonometric computing technique,? IRE Transactions on
Electronic Computers, no. 3, pp. 330?334, 1959.
[54] J. S. Walther, ?A Unified Algorithm for Elementary Functions,? in Proc. of Spring
Joint Computer Conf., 1971, pp. 379?385.
[55] J.-M. Muller, Elementary Functions: Algorithms and Implementation, 2nd ed.
Birkhauser, 2006.
231
[56] H. Samueli, ?The design of multiplierless fir filters for compensating d/a converter
frequency response distortion,? IEEE Transactions on Circuits and Systems, vol. 35,
no. 8, pp. 1064?1066, 1988.
[57] G. A. M. Van Der Plas, J. Vandenbussche, W. Sansen, M. S. J. Steyaert, and G. G. E.
Gielen, ?A 14-bit intrinsic accuracy q2 random walk CMOS DAC,? IEEE Journal of
Solid-State Circuits, vol. 34, no. 12, pp. 1708?1718, 1999.
[58] A. Devices, ?Ad9737a: RF Digital-to-Analog Converters,? 2012.
[59] S. K. Mitra, Digital Signal Processing: A Computer-Based Approach, 3rd ed. McGraw
Hill, 2006.
[60] C.E.Shannon, ?Amathematicaltheoryofcommunication,?The Bell System Technical
Journal, vol. 27, pp. 379?423, 1948.
[61] B. Widrow and I. Koll?r, Quantization Noise. Cambridge University Press, 2008.
[62] V. I. Bogachev, Measure Theory: Volume 1. Springer, 2007.
[63] D. Duttweiler and D. Messerschmitt, ?Analysis of Digitally Generated Sinusoids with
Application to A/D and D/A Converter Testing,? IEEE Transactions on Communi-
cations, vol. 26, no. 5, pp. 669?675, 1978.
[64] K. Doris, J. Briaire, D. Leenaerts, M. Vertreg, and A. van Roermund, ?A 12b 500MS/s
DAC with > 70dB SFDR up to 120MHz in 0.18?m CMOS,? in Proc. Digest of Tech-
nical Papers Solid-State Circuits Conf. ISSCC. 2005 IEEE Int, 2005, pp. 116?588.
[65] IEEE Standard 746-1984: Performance Measurements of A/D and D/A Conversion
Techniques and Their Applications, IEEE Std., 1984.
[66] B. Razavi, Principles of Data Conversion System Design, J. B. Anderson, Ed. New
York: Wiley-IEEE Press, 1995.
[67] S. Luschas and H.-S. Lee, ?Output impedance requirements for DACs,? in Proc. Int.
Symp. Circuits and Systems ISCAS ?03, vol. 1, 2003.
[68] G. I. Radulov, M. Heydenreich, R. W. van der Hofstad, J. A. Hegt, and A. H. M. van
Roermund, ?Brownian-Bridge-Based Statistical Analysis of the DAC INL Caused by
Current Mismatch,? IEEE Transactions on Circuits and Systems?Part II: Express
Briefs, vol. 54, no. 2, pp. 146?150, 2007.
[69] N. C.-C. Lu, L. Gerzberg, C.-Y. Lu, and J. D. Meindl, ?Modeling and optimization of
monolithic polycrystalline silicon resistors,? IEEE Transactions on Electron Devices,
vol. 28, no. 7, pp. 818?830, 1981.
[70] W.-H. Tseng, C.-W. Fan, and J.-T. Wu, ?A 12-Bit 1.25-GS/s DAC in 90 nm CMOS
With > 70 db SFDR up to 500 MHz,? IEEE Journal of Solid-State Circuits, vol. 46,
pp. 2845?2856, 2011.
232
[71] A. Van den Bosch, M. Steyaert, and W. Sansen, ?SFDR-bandwidth limitations for
high speed high resolution current steering CMOS D/A converters,? in Electronics,
Circuits and Systems, 1999. Proceedings of ICECS ?99. The 6th IEEE International
Conference on, vol. 3, 1999, pp. 1193?1196 vol.3.
[72] C.-H.Lin, F.M.I.vanderGoes, J.R.Westra, J.Mulder, Y.Lin, E.Arslan, E.Ayranci,
X. Liu, and K. Bult, ?A 12 bit 2.9 GS/s DAC with IM3 <?60 dBc beyond 1 GHz in
65 nm CMOS,? IEEE Journal of Solid-State Circuits, vol. 44, pp. 3285?3293, 2009.
[73] D. J. Dooley, ?A Complete Monolithic 10-b D/A Converter,? IEEE Journal of Solid-
State Circuits, vol. 8, no. 6, pp. 404?408, 1973.
[74] G. Kelson, H. H. Stellrecht, and D. S. Perloff, ?A Monolithic 10-b Digital-to-Analog
Converter Using Ion Implantation,? IEEE Journal of Solid-State Circuits, vol. 8, no. 6,
pp. 396?403, 1973.
[75] P. R. Gray, P. J. Hurst, S. H. Lewis, and R. G. Meyer, Analysis and Design of Analog
Integrated Circuits, 4th ed. John Wiley and Sons, Inc., 2001.
[76] Y. Tang, J. Briaire, K. Doris, R. van Veldhoven, P. C. W. van Beek, H. J. A. Hegt,
and A. H. M. van Roermund, ?A 14 bit 200 MS/s DAC with SFDR > 78 dBc, IM3
<?83 dBc and NSD <?163 dBm/Hz Across the Whole Nyquist Band Enabled by
Dynamic-Mismatch Mapping,? IEEE Journal of Solid-State Circuits, vol. 46, no. 6,
pp. 1371?1381, 2011.
[77] A. Van Den Bosch, M. Borremans, M. Steyaert, and W. Sansen, ?A 12 b 500 MSam-
ple/s current-steering CMOS D/A converter,? in Proc. Digest of Technical Papers
Solid-State Circuits Conf. ISSCC. 2001 IEEE Int, 2001, pp. 366?367.
[78] B. J. Tesch and J. C. Garcia, ?A low glitch 14-b 100-MHz D/A converter,? IEEE
Journal of Solid-State Circuits, vol. 32, no. 9, pp. 1465?1469, 1997.
[79] D.-H. Lee, T.-H. Kuo, and K.-L. Wen, ?Low-Cost 14-Bit Current-Steering DAC with
a Randomized Thermometer-Coding Method,? IEEE Transactions on Circuits and
Systems?Part II: Express Briefs, vol. 56, no. 2, pp. 137?141, 2009.
[80] M.-J. Choe, K.-H. Baek, and M. Teshome, ?A 1.6-GS/s 12-bit return-to-zero GaAs RF
DAC for multiple Nyquist operation,? IEEE Journal of Solid-State Circuits, vol. 40,
pp. 2456?2468, 2005.
[81] W.-H. Tseng, J.-T. Wu, and Y.-C. Chu, ?A CMOS 8-Bit 1.6-GS/s DAC with Digi-
tal Random Return-to-Zero,? IEEE Transactions on Circuits and Systems?Part II:
Express Briefs, vol. 58, 2011.
[82] P. Vorenkamp, J. Verdaasdonk, R. van de Plassche, and D. Scheffer, ?A 1 GS/s, 10b
digital-to-analog converter,? in Proc. IEEE Int. Solid-State Circuits Conf. Digest of
Technical Papers. 41st ISSCC, 1994, pp. 52?53.
233
[83] S.-Y. Chin and C.-Y. Wu, ?A 10-b 125-MHz CMOS digital-to-analog converter (DAC)
with threshold-voltage compensated current sources,? IEEE Journal of Solid-State
Circuits, vol. 29, no. 11, pp. 1374?1380, 1994.
[84] T.-Y. Wu, C.-T. Jih, J.-C. Chen, and C.-Y. Wu, ?A low glitch 10-bit 75-MHz CMOS
video D/A converter,? IEEE Journal of Solid-State Circuits, vol. 30, no. 1, pp. 68?72,
1995.
[85] C.-H. Lin and K. Bult, ?A 10-b, 500-MSample/s CMOS DAC in 0.6 mm2,? IEEE
Journal of Solid-State Circuits, vol. 33, no. 12, pp. 1948?1958, 1998.
[86] J. Bastos, A. M. Marques, M. S. J. Steyaert, and W. Sansen, ?A 12-bit Intrinsic
Accuracy High-speed CMOS DAC,? IEEE Journal of Solid-State Circuits, vol. 33, pp.
1959?1969, 1998.
[87] A. Van den Bosch, M. Borremans, M. Steyaert, and W. Sansen, ?A 10-bit 1-GSample/s
Nyquist current-steering CMOS D/A converter,? in Proc. CICC Custom Integrated
Circuits Conf the IEEE 2000, 2000, pp. 265?268.
[88] B. Jewett, J. Liu, and K. Poulton, ?A 1.2GS/s 15b DAC for precision signal gener-
ation,? in Proc. Digest of Technical Papers Solid-State Circuits Conf. ISSCC. 2005
IEEE Int, 2005, pp. 110?587.
[89] Q. Huang, P. A. Francese, C. Martelli, and J. Nielsen, ?A 200MS/s 14b 97mW DAC in
0.18?m CMOS,? in Proc. Digest of Technical Papers Solid-State Circuits Conf. ISSCC.
2004 IEEE Int, 2004, pp. 364?532.
[90] B. Schafferer and R. Adams, ?A 3V CMOS 400mW 14b 1.4GS/s DAC for multi-carrier
applications,? in Proc. Digest of Technical Papers Solid-State Circuits Conf. ISSCC.
2004 IEEE Int, 2004, pp. 360?532.
[91] K. O?Sullivan, C. Gorman, M. Hennessy, and V. Callaghan, ?A 12b 320 MSample/s
current-steeringCMOSD/Aconverterin0.44mm2,? inProc. 29th European Solid-State
Circuits Conf. ESSCIRC ?03, 2003, pp. 89?92.
[92] J.-H. Chi, S.-H. Chu, and T.-H. Tsai, ?A 1.8-v 12-bit 250-ms/s 25-mw self-calibrated
DAC,? in Proc. ESSCIRC, 2010, pp. 222?225.
[93] M. P. Tiilikainen, ?A 14-bit 1.8-v 20-mw 1-mm2 CMOS DAC,? IEEE Journal of Solid-
State Circuits, vol. 36, no. 7, pp. 1144?1147, 2001.
[94] A. R. Bugeja and B.-S. Song, ?A self-trimming 14-b 100-MS/s CMOS DAC,? IEEE
Journal of Solid-State Circuits, vol. 35, no. 12, pp. 1841?1852, 2000.
[95] W. Schofield, D. Mercer, and L. S. Onge, ?A 16b 400MS/s DAC with?80dBc IMD to
300MHz and?160dBm/Hz noise power spectral density,? in Proc. Digest of Technical
Papers Solid-State Circuits Conf. ISSCC. 2003 IEEE Int, 2003, pp. 126?482.
234
[96] M. Borremans, A. Van den Bosch, M. Steynaert, and W. Sansen, ?A low power, 10-
bit CMOS D/A converter for high speed applications,? in Proc. IEEE Conf Custom
Integrated Circuits, 2001, pp. 157?160.
[97] X. Wu, P. Palmers, and M. Steyaert, ?A 130 nm CMOS 6-bit Full Nyquist 3 GS/s
DAC,? IEEE Journal of Solid-State Circuits, vol. 43, no. 11, pp. 2396?2403, 2008.
[98] Z. Zhou and G. S. La Rue, ?A 12-bit Nonlinear DAC for Direct Digital Frequency Syn-
thesis,? IEEE Transactions on Circuits and Systems?Part I: Regular Papers, vol. 55,
no. 9, pp. 2459?2468, 2008.
[99] C.-Y. Yang, J.-H. Weng, and H.-Y. Chang, ?A 5-GHz Direct Digital Frequency Syn-
thesizer Using an Analog-Sine-Mapping Technique in 0.35-?m SiGe BiCMOS,? IEEE
Journal of Solid-State Circuits, vol. 46, no. 9, pp. 2064?2072, 2011.
[100] R.G.Meyer, W.M.C.Sansen, andS.Peeters, ?TheDifferentialPairasaTriangle-Sine
Wave Converter,? IEEE Journal of Solid-State Circuits, vol. 11, no. 3, pp. 418?420,
1976.
[101] R. C. Jaeger and T. N. Blalock, Microelectronic Circuit Design, 3rd ed. McGraw Hill
Science, Engineering and Math, 2007.
[102] E. Seevinck, Analysis and Synthesis of Translinear Integrated Circuits. Booksurge
Publishing, 1988.
[103] J. D. Cressler and G. Niu, Silicon-Germanium Heterojunction Bipolar Transistors.
Artech House, 2003.
235