Integrated Circuit Design for Ultrahigh Speed Frequency Synthesis:
Direct Digital Synthesizer and Variable Frequency Oscillator
by
Xueyang Geng
A dissertation submitted to the Graduate Faculty of
Auburn University
in partial ful llment of the
requirements for the Degree of
Doctor of Philosophy
Auburn, Alabama
May 14, 2010
Keywords: direct digital synthesizer (DDS), ROM-less DDS, pipeline accumulator,
digital-to-analog converter (DAC), sine-weighted DAC, carry-look-ahead (CLA), ripple
carry adder, frequency modulation (FM), phase modulation (PM), voltage-contolled
oscillator(VCO), qudrature current-controlled oscillator(QCCO)
Copyright 2010 by Xueyang Geng
Approved by:
Fa Foster Dai, Chair, Professor of Electrical and Computer Engineering
Guofu Niu, Alumni Professor of Electrical and Computer Engineering
Richard C. Jaeger, Professor of Electrical and Computer Engineering
Bogdan M. Wilamowski, Professor of Electrical and Computer Engineering
Abstract
This dissertation presents design and implementation of the high speed direct digi-
tal frequency synthesizer (DDS) and variable-frequency oscillator (VFO). DDS is a digital
technique for frequency synthesis, waveform generation, sensor excitation, and digital mod-
ulation/demodulation in modern communication systems. The VFO can be used as the
reference clock of the DDS system, either standalone or combined with other phase-locked-
loop (PLL) components.
DDS provides many advantages including  ne frequency-tuning resolution, continuous-
phase switching and accurate matched quadrature signals. DDS can directly generate and
modulate signal at microwave frequencies. A high-speed DDS can be signi cantly simpli ed
the transceiver architecture. Thus the cost of radio and radar systems can be reduced
considerably.
Ultrahigh speed DDS over GHz is demanding for modern radar and communication
systems. This research proposes work on designing ultrahigh speed DDS chips with sine-
weighted digital-to-analog converter (DAC) in Silicon Germanium (SiGe) BiCMOS technol-
ogy and using a VFO as the reference clock. Sine-weighted DAC is necessary for ultrahigh
speed DDS design to overcome the speed limitation of the ROM lookup table (LUT) in
conventional DDS designs. The sine-weighted DAC replaces ROM LUT and linear DAC to
perform the phase-to-amplitude conversion (PAC) as well as digital-to-analog conversion.
A segmented sine-weighted DAC is designed and implemented to achieve 10-bit amplitude
resolution.
Due to the code dependent and frequency dependent non-ideal e ects from the sine-
weighted DAC, the unwanted harmonics and spurs of the DDS outputs have more signi cant
impacts on the whole systems. In this dissertation, the spurs and harmonics from di erent
ii
sources such as truncation errors, limited DAC amplitude resolutions and non-ideal e ects
of DAC will be discussed.
Four fabricated silicons are implemented in SiGe BiCMOS technology and discussed
in the dissertation, including three DDSs and one VFO. The  rst DDS is a 11-bit 8.6 GHz
ROM-less DDS with 10-bit segmented sine-weighted DAC. The second one is a 9-bit 2.9 GHz
ROM-less DDS with direct digital modulation capabilities. The last DDS is a 24-bit 5.0 GHz
ROM-less DDS with direct digital modulation capabilities. Besides the DDS designs, an
8.7-13.8 GHz VFO, implemented by a transformer coupled current-controlled varactor-less
oscillator with quadrature outputs, will be presented in this dissertation, too. Circuit and
layout designs of DDS building blocks such as current mode logic (CML), pipeline accumu-
lator, carry look-ahead adder/accumulator, ripple-carry adder/accumulator and segmented
and non-segmented sine-weighted DAC are presented. The quadrature current-controlled
oscillator (QCCO) is discussed as well as the design and implementation of the on-chip
transformer.
iii
Acknowledgments
It has been a great pleasure working with the faculty, sta , and students at the Electrical
and Computer Engineering Department, Auburn University, during my tenure as a doctoral
student. Completing this work is de nitely a high point in my academic career. I could
not have come this far without the assistance of many individuals and I want to express my
deepest appreciation to them.
My  rst and most earnest thanks go to my advisor, Dr. Fa Foster Dai, who guided and
encouraged me throughout my studies. His advice and research attitude have provided me
with a model for my entire future career.
I wish to thank my advisory committee members, Dr. Guofu Niu, Dr. Richard C.
Jaeger and Dr. Bogdan M. Wilamowski, for their guidance and advices on this work. Many
thanks to Dr. Richard O. Chapman who served as my outside reader for providing valuable
comments that improved the contents of this dissertation. I also wish to thank Dr. J. David
Irwin for his valuable comments on my paper publishing and endless support on my Ph.D.
study.
I would like to express my appreciation and sincere thanks to Dr. Yin Shi, the advi-
sor of my M.S. degree at Chinese Academy of Sciences. Without his encourage, help and
recommendation, I would not pursue and complete my Ph.D. study.
Appreciation is also expressed to those who have made contributions to my research.
I am especially indebted to Desheng Ma, Yuan Yao, Wenting Deng, Dayu Yang, Vasanth
Kakani, Xuefeng Yu, Jianjun Yu, Yuehai Jin, William Souder, Mark Ray, Joseph Cali,
Michael Pukish and Jie Qin for their cooperation and continued assistance throughout the
course of this research.
iv
My  nal, and most heartfelt, acknowledgment must go to my family members, especially
to my parents Xiuliang Geng and Jinglan Zhu, my wife Xueqin Lu, and my daughter Michelle
Q. Geng, for their continual encouragement and support throughout this work.
v
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 DDS Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Conventional DDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 ROM-less DDS with Sin-weighted DAC . . . . . . . . . . . . . . . . . 3
1.1.3 ROM-less DDS with Direct Digital Modulations . . . . . . . . . . . . 4
1.2 Direct Digital Synthesizer Used in Modern Radar Systems . . . . . . . . . . 6
1.3 DDS Spectral Purity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Outline and Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Design and Analysis of Sine-weighted DAC . . . . . . . . . . . . . . . . . . . . . 14
2.1 Sine-weighted DAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Segmented Sine-Weighted DAC . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Quantization and Segmentation of the Sine Wave . . . . . . . . . . . 15
2.2.2 Approximation Error Analysis . . . . . . . . . . . . . . . . . . . . . . 19
2.2.3 Optimizing the Segmentation . . . . . . . . . . . . . . . . . . . . . . 20
3 An 11-bit 8.6 GHz DDS RFIC with 10-bit Segmented Sine-weighted DAC . . . . 22
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Circuit Implementation of the 11-bit Rom-less DDS . . . . . . . . . . . . . . 23
3.2.1 11-Bit Pipeline Accumulator . . . . . . . . . . . . . . . . . . . . . . . 24
vi
3.2.2 10-Bit Segmented Sine-Weighted DAC . . . . . . . . . . . . . . . . . 24
3.2.3 Clock Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4 A 9-bit 2.9 GHz DDS RFIC with Direct Digital Modulations . . . . . . . . . . . 45
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Circuit Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.1 9-bit Carry Look Ahead Adder/Accumulator . . . . . . . . . . . . . . 46
4.2.2 7-bit Sine-weighted DAC . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5 A 24-bit 5.0 GHz DDS RFIC with Direct Digital Modulations . . . . . . . . . . 56
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.2 Ultrahigh Speed Adder Design . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2.1 Wire Delay in the 0.13  m SiGe BiCMOS Technology . . . . . . . . . 57
5.2.2 Propogation Delay Comparison Between the CLA and RCA Accumu-
lator/Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.3 Circuit Implementation of the 24-bit 5.0 GHz RCA . . . . . . . . . . 64
5.3 10-Bit Segmented Sine-weighted DAC . . . . . . . . . . . . . . . . . . . . . . 64
5.3.1 Architecture of the 10-bit Sine-weighted DAC . . . . . . . . . . . . . 64
5.3.2 Bandwidth Limitation of the DAC Switch Output Impedance . . . . 67
5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6 An 8.7-13.8 GHz Transformer-coupled Varactor-less QCCO RFIC . . . . . . . . 76
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2 Analysis and Design of Transformer Coupled Quadrature Oscillator . . . . . 78
6.2.1 Oscillation Analysis and Design . . . . . . . . . . . . . . . . . . . . . 78
vii
6.2.2 Quadrature Coupling Phase Accuracy and Phase Noise . . . . . . . . 81
6.3 Transformer Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3.1 Geometry Design of Transformers . . . . . . . . . . . . . . . . . . . . 84
6.3.2 Transformer Equivalent Circuit and Parameters . . . . . . . . . . . . 87
6.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.1 Summary of Original Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.2 Possible Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
viii
List of Figures
1.1 Block diagram of the conventional ROM-based DDS. . . . . . . . . . . . . . 3
1.2 Block diagram of the ROM-less DDS. . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Block diagram of the ROM-less DDS with segmented sine-weighted DAC. . . 5
1.4 DDS block diagram with direct digital modulations. . . . . . . . . . . . . . . 6
1.5 DDS direct digital modulations (A) BFSK (FCW = 16,  FCW = 32) (B)
LFM (CCW = 2 or FCW sweeps from 2 to 32) (C) BPSK (FCW = 255, PCW
= 215). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Simpli ed pulse compression radar with stretch processing. . . . . . . . . . . 8
1.7 Typical switching structure of current-steering DAC . . . . . . . . . . . . . . 11
2.1 Block diagram of (P-1)-bit sine-weighted DAC. . . . . . . . . . . . . . . . . . 14
3.1 Block diagram of the ROM-less DDS with 10-bit segmented sine-weighted DAC 23
3.2 The 11-bit pipeline phase accumulator. . . . . . . . . . . . . . . . . . . . . . 25
3.3 10-bit segmented sine-weighted DAC. . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Coarse DAC thermometer decoder. . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Fine DACs thermometer decoders. . . . . . . . . . . . . . . . . . . . . . . . 30
3.6 Illustration of interpolating the two adjacent outputs of a coarse DAC using
the  ne DAC current matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.7 Current switch circuit of the sine-weighted DAC. . . . . . . . . . . . . . . . 32
3.8 Diagram of the current source matrix. . . . . . . . . . . . . . . . . . . . . . . 33
3.9 Simpli ed clock tree distribution. . . . . . . . . . . . . . . . . . . . . . . . . 35
3.10 Die photo of the 11-bit ROM-less DDS RFIC. . . . . . . . . . . . . . . . . . 36
3.11 Evaluation board for the 11-bit ROM-less DDS RFIC. . . . . . . . . . . . . . 36
ix
3.12 Measured DDS output spectrum with a 4.2 MHz output and a maximum 8.6
GHz clock (FCW = 1), illustrating about 50 dBc SFDR. The tone at 91.7
MHz is from the nearby campus FM radio station. . . . . . . . . . . . . . . . 37
3.13 Measured DDS output waveform with a 4.2 MHz output and an 8.6 GHz clock. 38
3.14 Measured DDS Nyquist output spectrum with a 4.2958 GHz output and a
maximum 8.6 GHz clock (FCW = 1023), illustrating about 45 dBc SFDR.
The image tone is located at 4.3042 GHz. . . . . . . . . . . . . . . . . . . . . 39
3.15 Measured DDS output waveform with a 4.2958 GHz Nyquist output and an
8.6 GHz clock. The 8.4 MHz envelope frequency results from mixing the
output and its image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.16 The measured DDS SFDR versus FCW at clock frequency of 7.2 GHz. Illus-
trating a worst-case SFDR of 33 dBc for the Nyquist band (3.6 GHz) and 42
dBc for the narrow band (100 MHz), respectively. . . . . . . . . . . . . . . . 40
3.17 The measured DDS phase noise at an output frequency of 1.57 GHz with a
7.2 GHz clock input frequency. The input clock is generated from an Agilent
E8257D analog signal generator. The graph illustrates a  118:55 dBc/Hz
phase noise at a 10 kHz frequency o set. . . . . . . . . . . . . . . . . . . . . 41
4.1 Block diagram of 9-bit ROM-less DDS. . . . . . . . . . . . . . . . . . . . . . 46
4.2 Block diagram of 9-bit CLA accumulator (full adder). . . . . . . . . . . . . . 47
4.3 Block diagram of 7-bit sine-weighted DAC. . . . . . . . . . . . . . . . . . . . 49
4.4 Diagram of DAC switch and current source matrix cell. . . . . . . . . . . . . 51
4.5 Measured DDS output spectrum with 509 MHz output under 2.5 GHz clock
(FCW=104), showing about 48dBc narrow band SFDR. . . . . . . . . . . . 51
4.6 Measured DDS output spectrum with 1.444 GHz output and 2.9 GHz clock
(FCW = 255), showing about 35dBc narrow band SFDR. The image tone is
located at 1.455 GHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.7 Measured DDS output waveform with 1.444 GHz output and 2.9 GHz clock
(FCW=255). The envelope frequency is 12 MHz . . . . . . . . . . . . . . . 52
4.8 Measured DDS output with FCW = 2 frequency modulated by a frequency
step of  FCW = 1. The frequency before the step is 9.375 MHz with FCW
= 2, after the step is 14.062 MHz with FCW=3. . . . . . . . . . . . . . . . . 53
4.9 Measured DDS output with FCW = 2 phase modulated by a phase step of
 PCW = 256 with respect to 180 phase shift. The output frequency is 10
MHz with a 2.5 GHz clock. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
x
4.10 Die photo of the 9-bit DDS with direct digital modulations. . . . . . . . . . 55
5.1 Block diagram of the 24-bit 5.0 GHz DDS RFIC. . . . . . . . . . . . . . . . 56
5.2 Lumped RC model for a wire with length of L. . . . . . . . . . . . . . . . . . 58
5.3 Test bench to simulate the wire propagation delay. . . . . . . . . . . . . . . . 60
5.4 Simulated wire propagation delay versus length. . . . . . . . . . . . . . . . . 61
5.5 Diagram of N-bit RCA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.6 Estimated adder propagation delays with number of bits. . . . . . . . . . . . 63
5.7 Block diagram of the 10-bit segmented sine-weighted DAC. . . . . . . . . . . 65
5.8 Diagram of the DAC switch and current source matrix cell. . . . . . . . . . . 66
5.9 DAC switch core circuit and its small signal equivalent circuit. . . . . . . . . 68
5.10 Measured DDS output with a 469.360351 MHz output and the maximum 5.0
GHz clock (FCW = 0x180800), showing a 38 dBc Nyquist band SFDR. . . . 70
5.11 Measured DDS output with a 1.246258914 GHz output and the maximum 5.0
GHz clock (FCW = 0x3FCFE7), showing an 82 dBc narrow band SFDR. . . 71
5.12 Measured DDS LFM output with a FCW sweeps from 1 to 0x005AD9C and
using a 300 MHz clock. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.13 Measured DDS output with FCW = 7 phase modulated by a phase step of
 PCW = 0x800 causing an 180 phase shift. The output frequency is 1.251
kHz with a 3.0 GHz clock. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.14 Measured DDS narrow band SFDR versus output frequency within a 50 MHz
bandwidth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.15 Die photo of the 24-bit DDS RFIC. . . . . . . . . . . . . . . . . . . . . . . . 73
6.1 Quadrature VCO circuits with parallel coupling. . . . . . . . . . . . . . . . . 77
6.2 Schematic of transformer-coupled varactor-less QCCO. . . . . . . . . . . . . 78
6.3 AC equivalent circuit of the transformer tank. . . . . . . . . . . . . . . . . . 80
6.4 Stacked octagonal transformer. . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.5 AC equivalent circuit of the varactor-less QCCO. . . . . . . . . . . . . . . . 83
xi
6.6 Equivalent circuits of the varactor-less QCCO. . . . . . . . . . . . . . . . . . 83
6.7 Octagonal symmetrical transformer: (a) concentric, (b) inter-wound, and (c)
stacked. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.8 Diagram of the (a) three-dimension PGS substrate and (b) two-dimension
deep trench lattice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.9 Transformer time domain equivalent circuit model. . . . . . . . . . . . . . . 88
6.10 Simulated parameters of the transformer windings: (a) self-inductance L, (b)
coupling factor k, and (c) quality factor Q. . . . . . . . . . . . . . . . . . . 90
6.11 Simulated capacitance parallel with the transformer primary winding. . . . . 91
6.12 Fabricated QCCO RFIC die photo. . . . . . . . . . . . . . . . . . . . . . . . 92
6.13 Measured QCCO tuning range. . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.14 Measured QCCO outputs at 10.5 GHz with tuning current of 1.5 mA and core
current of 2 mA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.15 Measured QCCO phase noise with output frequency of 11.02 GHz. . . . . . . 94
xii
List of Tables
2.1 Simulated Segmentation FOM for Di erent Segmenations with 11-bit Phase
and 10-bit Amplitude Resolutions . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1 Performance Comparison of Ultrahigh Speed DDS RFICs with over 8 GHz
Maximum Clock Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1 Current Source Matrix in Sine-weighted DAC . . . . . . . . . . . . . . . . . 49
4.2 Selected Ultrahigh Speed DDS RFIC Performance Comparison . . . . . . . . 55
5.1 Ultrahigh Speed DDS RFIC Performance Comparison . . . . . . . . . . . . . 75
6.1 QCCO Performance Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.2 Performance Comparison of Variable-frequency Oscillators . . . . . . . . . . 95
xiii
List of Abbreviations
11B DDS the 11-Bit 8.6 GHz DDS
24B DDS the 24-Bit 5.0 GHz DDS
9B DDS the 9-Bit 2.9 GHz DDS
ADC Analog-to-Digital Converter
BFSK Binary Frequency Shift-Keying
BPSK Binary Phase Shift-Keying
CCO Current-Controlled Oscillator
CCW Chirp Control Word
CLA Carry-Look-Ahead
CLCC Ceremic Leadless Chip Carrier
CML Current Mode Logic
CMOS Complementary Metal-Oxide-Semiconductor
DAC Digital-to-Analog Converter
DDFS Direct Digiral Frequency Synthesizer
DDS Direct Digital Synthesizer
DFF D-Flip-Flop
DT Deep Trench
xiv
ENOB E ect Number of Bit
FA Full Adder
FCW Frequency Control Word
FM Frequency Modulation
FOM Figure-of-Merit
GCD the Greatest-Common-Divisor
HBT Heterojunction Bipolar Transistor
IC Intergrated Circuit
InP Indium Phosphide
LO Local Oscillator
LPF Linear Frequency Modulation
LPF Low-Pass Filter
LSB Least-Sigini cant Bit
LUT Look-Up Table
MOSFET Metal-Oxide-Semiconductor Field-E ect Transistor
MSB Most-Sigini cant-Bit
P-QVCO Parallel Voltage-Controlled Oscillator
PAC Phase-to-Amplitude Conversion/Converter
PCB Printed Circuit Board
PCW Phase Control Word
xv
PGS Patterned Ground Shield
PLL Phase-Locked-Loop
PM Phase Modulation
PSAC-DAC Phase-to-Sine Amplitude Conversion Digital-to-Analog Converter
QCCO Qudrature Current-Controlled Oscillator
RCA Ripple Carry Adder
RFIC Radio Frequency Intergrated Circuit
RMS Root-Mean-Square
ROM Read Only Memery
S-QVCO Series Voltage-Controlled Oscillator
SFDR Spurious-Free-Dynamic-Range
SiGe Silicon Germanium
SINAD Signal-to-Noise and Total Harmonic Distortion
SMA SubMiniature version A
SNR Siginal-to-Noise Ratio
THD Total Harmonic Distortion
VCO Voltage-Controlled Oscillator
VNA Vector Network Analyzer
xvi
Chapter 1
Introduction
Ultrahigh speed 1 direct digital synthesizers (DDS)2 RFIC 3 will play key roles in next
generation radar and communication systems. Recent developments in radar systems require
frequency synthesis with low power consumption, high output frequency,  ne frequency res-
olution, fast channel switching and versatile modulation capabilities. Linear frequency mod-
ulation (LFM) or chirp modulation is widely used in radars to achieve high range resolution,
while pulsed phase modulation (PM) can provide anti-jamming capability. With  ne fre-
quency resolution, fast channel switching and versatile modulation capabilities, the DDS
provides frequency synthesis and direct modulation capabilities that cannot be easily imple-
mented by other synthesizer tools such as analog-based phase-locked loop (PLL) synthesizers.
It is di cult for conventional PLL-based frequency synthesizers to meet these requirements
due to internal loop delay, low resolution, modulation problems and limited tuning range of
the voltage-controlled oscillator (VCO). Ultrahigh-speed heterojunction bipolar transistors
(HBT) allow a DDS to operate up to mm-wave frequency, which is a preferable solution to the
synthesis of sine waveforms using in modern ultrahigh speed radar and other communication
systems. [1,2].
1Ultra high frequency (UHF) in ITU radio band means 300MHz to 3 GHz. In this dissertation, ultrahigh
speed represents that DDS output frequency is over 1 GHz.
2Sometimes, use direct digital frequency synthesizer (DDFS). They represent same circuits and systems.
3Usually, the term of RFIC refers to the radio frequency or wireless integrated circuit fabricated in Si/SiGe
CMOS/BiCMOS technologies. While the term of MMIC refers to the microwave monolithic integrated circuit
fabricated in GaAs/InP high fT technology. With the development of modern technology, the two terms
appear to be merged together. So in this dissertation, both RFIC and MMIC represent RF/Microwave
monolithic integrated circuit regardless what technology is fabricated in.
1
1.1 DDS Architectures
1.1.1 Conventional DDS
Conventional DDS design normally consists of a phase accumulator, a ROM lookup table
(LUT) and a linear digital-to-analog converter (DAC). The phase accumulator computes the
correct phase angle for the output sine wave by accumulating the input frequency control
word (FCW) on each clock cycle. If the size of the accumulator is N bits, as shown in
Fig. 1.1, the maximum phase value will be 2 (2N 1)=2N. To save power and reduce the
complexity of the sinusoidal LUT, the N-bit output of the accumulator may be truncated to
P bits before addressing the ROM. The ROM LUT performs a phase-to-amplitude conversion
(PAC) of the output sinusoidal wave. Once the amplitude information is obtained, it may
be further truncated to D bits that correspond to the number of input bits of the DAC.
The digital amplitude codes are then fed into a linear DAC that generates an analog replica
of the synthesized waveform. A low pass  lter (LPF) usually follows the DAC to remove
the unwanted frequency components. The input clock frequency and FCW determine the
frequency step size of the DDS as
 f = fclk2N ; (1.1)
and the output frequency of the DDS is given by
fout = fclk FCW2N ; (1.2)
where fclk is the DDS clock frequency, FCW is the input frequency control word, and N is
the size of the phase accumulator. Based upon the Nyquist theorem, at least two samples
per clock cycle are required to reconstruct a sinusoidal wave without aliasing. Thus, the
largest value of the FCW is 2N 1. Therefore, the maximum output frequency of the DDS is
limited to less than fclk=2. However, the output frequency of the DDS is usually constrained
to be less than fclk=3 in a practical implementation of the deglitch LPF.
2
R e g i s t e r s
F C W
S i n / C o s  
L U T
N
L i n e a r
D A C
Ad
d
e
r
f
out
f
cl k
P D
f
out
= f
c l k
F CW
2
N
Figure 1.1: Block diagram of the conventional ROM-based DDS.
1.1.2 ROM-less DDS with Sin-weighted DAC
The ROM size of the conventional DDS increases exponentially with an increase of
the number of phase bits used to address the LUT. In general, increasing the ROM size
results in higher power consumption and larger area in ROM-based DDS designs. Numerous
attempts have been made to compress or eliminate the ROM LUT in the PAC. Langlois has
published a comprehensive review of the PAC techniques [3], including angular decomposition
[4,5,6], angular rotation, sine amplitude LUT compression [7], polynomial approximation and
phase-to-sine amplitude conversion (PSAC)-DAC combinations. All the phase-to-amplitude
conversion methods with the exception of PSAC-DAC involve either a large ROM or a
complex architecture, yet operate at relatively low speed. To overcome the speed and power
performance limits of the ROM-based DDS with high resolution, a ROM-less DDS with
sine-weighted DAC (identi ed as PSAC-DAC by Langlois) has been developed in both low
speed and ultrahigh speed DDSs.
The conceptual block diagram of the ROM-less DDS employing a sine-weighted DAC
is shown in Fig. 1.2. The ROM-less DDS replaces the ROM and linear DAC with a sine-
weighted DAC that serves as a PAC block as well as a DAC. It eliminates the sine LUT,
which is the speed and area bottleneck for high-speed DDS implementations. But, it is a
3
S i n e - w e i g h t e d
D A C
f
out
= f
c l k
F CW
2
N
R e g i s t e r s
F C W
N
A
d
d
e
r
f
cl k
f
out
P
Figure 1.2: Block diagram of the ROM-less DDS.
design challenge to achieve high resolution in the sine-weighted DAC due to the required
nonlinear segmentation process.
Fig. 1.3 shows a ROM-less DDS with segmented sine-weighted DAC. [8,9]. The major
part of the ROM-less DDS is an N-bit phase accumulator and a current-steering sine-weighted
DAC. Since the output frequency cannot exceed the Nyquist rate, the most-signi cant-bit
(MSB) of the accumulator input is tied to zero. The N-bit FCW (including MSB = 0) feeds
the accumulator which controls the output frequency of the synthesized sine wave. The two
MSBs of the accumulator output are used to determine the quadrant of the sine waveform.
The remaining (P-2)-bits are use to control the segmented sine-weighted DAC in generating
the amplitude for a quarter phase (0   /2) sine wave. With the segmentation method
described in the following sections, (a + b) MSBs are used to control the coarse DAC, while
the a-bit MSBs and c-bit least-signi cant-bits (LSB) are used to control the  ne DACs.
1.1.3 ROM-less DDS with Direct Digital Modulations
With proper designs, DDS can be used to implement modulations and generate wave-
forms such as phase modulation (PM), linear frequency modulation (LFM), step frequency
modulation (frequency hopping), binary frequency shift-keying (BFSK), binary phase shift-
keying (BPSK) and other hybrid modulations. Fig. 1.4 shows a general architecture of a
4
FCW
N
DFFs
A
D
D
E
R
Coarse DAC
P
1
'
s
 
C
o
m
p
l
e
m
e
n
t
o
r
P-2
Fine DACs
a
c
b
2
nd
 MSB
MSB
N-bit pipline accumulator (P-1)-bit sine-weighted DAC
fclk fout
N
N
Figure 1.3: Block diagram of the ROM-less DDS with segmented sine-weighted DAC.
ROM-less DDS with direct digital modulation capabilities designed for radar system. The
architecture has four parts, a D-bit sine-weighted DAC, a P-bit adder used as a phase mod-
ulator, an N-bit phase accumulator and another N-bit accumulator used as an N-bit chirp
ramp signal generator. Chirp control words (CCW), frequency control words (FCW) and
phase control words (PCW) provide the control signal for the chirp accumulator, phase accu-
mulator and phase modulator, respectively. Through the direct use of digital control words
to change the values of registers in the data path of the DDS, the frequency, phase, and
amplitude of the output waveforms can be precisely controlled. Since all the modulations
are done in the digital domain, many disadvantages associated with normal analog modula-
tions can be avoided. In this ROM-less DDS architecture, the sine-weighted DAC assumes
the responsibility for phase-to-amplitude conversion as well as digital-to-analog conversion.
Without a ROM, which is usually the speed bottleneck, this DDS architecture can be devel-
oped to produce over-GHz frequency waveforms. To perform the direct digital modulations,
the accumulators and modulator (full adder) must be updated in every clock cycle. As a
result of this requirement, a pipeline accumulator is not suitable for the modulation, and the
5
N
Reg
A
d
d
e
r
f
out
P
D
Reg
A
d
d
e
r
Sine-
weighted
DAC
PCW
CCW
N
N
Reg
A
d
d
e
r
FCW
f
CLK
M
U
X
N
N
Figure 1.4: DDS block diagram with direct digital modulations.
carry-look-ahead (CLA) or ripple carry adder (RCA) architecture is used with an attendant
sacri ce in speed.
Fig. 1.5 shows some direct digital modulation waveforms generated from a 16-bit phase
resolution DDS. This DDS has both 16-bit FM resolution as well as 14-bit PM capabilities.
Fig. 1.5(A) displays a BFSK modulation waveform. The input CCW switches between 16
and 32 for frequency f1 and f2 labeled in the waveform. Fig. 1.5(B) shows an LFM waveform
with CCW = 2, which performs as though FCW is swept from 2 to 32, repeatedly. Fig. 1.5(C)
shows a BPSK modulation waveform with FCW = 255 and PCW = 215 for a phase shift of
180 .
1.2 Direct Digital Synthesizer Used in Modern Radar Systems
Range resolution is the ability of a radar system to distinguish between two or more
targets on the same bearing but at di erent distances. Weapon-control radar, which requires
great precision, should be able to distinguish between targets that are only yards apart.
Search radar is usually less precise and only distinguishes between targets that are hundreds
of yards or even miles apart. The degree of range resolution depends on the width of the
transmitted pulse, the types and sizes of the targets, and the e ciency of the receiver and
indicator. The range resolution of simple single pulse radar is cT=2, where c is the pulse
transmitting velocity and T is the pulse width transmitted by the pulse radar. In pulse
compression radar shown in Fig. 1.6, with the help of a versatile modulated signal generated
6
(C)
(B)
0 500 1000 1500 2000 2500
Time (?s)
O
u
t
p
u
t
 
(
V
)
-1.5
-1
-0.5
0
0.5
1
1.5
0 500 1000 1500 2000 2500 3000 3500
+180? +180? +180?
Time (?s)
O
u
t
p
u
t
 
(
V
)
-1.5
-1
-0.5
0
0.5
1
1.5
0 10 20 30 40 50 60 70 80
-1.5
-1
-0.5
0
0.5
1
1.5
Time (ms)
O
u
t
p
u
t
 
(
V
)
f
1
f
1
f
2
(A)
Figure 1.5: DDS direct digital modulations (A) BFSK (FCW = 16,  FCW = 32) (B) LFM
(CCW = 2 or FCW sweeps from 2 to 32) (C) BPSK (FCW = 255, PCW = 215).
by a DDS, such as LFM, nonlinear FM or phase-coded waveforms, the range resolution can
be improved to c=(2B) without losing received pulse strength [10], where c is the signal
7
Duplexer
DDS
ADC
DSP
Antenna
Control 
and 
Display
Correlation 
Mixer
LNA
PA
s(t)
s
R
(t)
r(t)
x
IF
(t)
LO
Figure 1.6: Simpli ed pulse compression radar with stretch processing.
transmitting velocity and B is the bandwidth of the transmitted signal. In comparison
to the simple single pulse radar, the range resolution is increased by T=B times while the
transmitted signal maintains the same instantaneous power. The quantity T=B is the pulse
compression ratio, and it is usually much greater than 1.
The traditional radar receiver uses a wide bandwidth convolution processor with a
matched  lter to process the received pulse compression signal. It requires high bandwidth
for the analog-to-digital converter (ADC) as well as the back-end processing. In modern
radar system, stretch processing is used to reduce the bandwidth requirement of the ADC
and back-end processing. Stretch processing is a technique for processing LFM, or other
modulated wideband waveforms, using a signal processor with a bandwidth that is much
smaller than the transmitted signal bandwidth, without losses in the signal-to-noise ratio
(SNR) or range resolution [11, 12]. As shown in Fig. 1.6, stretch processing can be imple-
mented in modern radar systems with the help of a simple mixer and the modulated reference
signal generated from the same DDS as in the transmit path.
1.3 DDS Spectral Purity
In order to achieve  ne step size, a large phase accumulator is desired. However, the
phase accumulator output is normally truncated to save die area and power. For instance,
the output of the phase accumulator is truncated into P bits (P<N). The number of phase
8
bits (P) is chosen based on the power and area budgets, as well as the signal-to-noise ratio
(SNR) requirement of the DDS.
In the process of discrete phase accumulation and phase word truncation, spurs and
quantization noise are introduced at the DDS output spectrum that can be modeled as a
linear additive noise to the phase of the sinusoidal wave. Phase truncation error is periodic. If
the MSBs of an N-bit phase word are used to address the DAC or LUT, the resultant spurs are
mixed with the DDS output frequency generating spurs at multiples of that frequency [13],
given by
fspur = fclk GCD
 FCW;2N P 
2N P ; (1.3)
where GCD(A, B) denotes the greatest-common-divisor (GCD) of A and B.
In addition to the spurious components, the DDS output waveform will su er from
amplitude distortion due to the  nite number of quantization levels in the DAC. The envelope
of the DDS output waveform is modulated by a sine wave with the frequency of
fEnvelope = fclk 2
N 1 mod FCW
2N 1 : (1.4)
Note that the envelope of the DDS output waveform is modulated by a low-frequency signal
except when the FCW is an integer power of 2. For a Nyquist output, the frequency of the
amplitude distortion, which looks like amplitude modulation (AM), is given by
fEnvelope = fclk 2
N 1 mod  2N 1 1 
2N 1
= fclk 
 1
2
 N 1
:
(1.5)
In addition to the spurs that come from phase truncation, DAC spurs represent another
big source of error. Spurious-free-dynamic-range (SFDR) is one of the most important spec-
i cations for the dynamic performance of a DAC, as well as a DDS. The sine-weighted DAC
9
shares many design challenges with the linear DAC. The most important factors a ecting
linear and sine-weighted current-steering DACs are summarized below [14]:
a. imperfect synchronization of the control signal at the switches;
b. digital signal feed-through via the CGD or CBC of the switch transistors;
c. voltage variation at the drain or collector of the current source transistors;
d.  nite output impedance of the current switches.
The  rst three problems can be minimized by careful layout to balance the delays of
the signal and clock paths such that the signals arriving at the switches are synchronized.
However, it is not easy to distribute the high frequency clocks across long distances. To
ensure clock synchronization, a speci c clock distribution scheme, such as an H-tree or a
grid topology, need to be employed.
SFDR is also a ected by the output impedance of the DAC [15]. For an N-bit current-
steering DAC with a typical switching structure shown in Fig. 1.7, the SFDR can be esti-
mated as
SFDR 20 log
 R
unit
Rload
 
 6 (N 2); (1.6)
where Runit is the output impedance at the drain, or collector, of each switch, and Rload is the
load resistance for the DAC output. In addition to other factors, Runit must be maintained
as high as possible in order to obtain a high SFDR in the desired frequency bandwidth. A
cascode current source is a simple and e ective way to increase the output impedance, and
is adopted in this ultrahigh-speed sine-weighted DAC design.
1.4 Outline and Contribution
This dissertation is organized in the following chapters and the author?s contributions
are as follows:
Chapter 1: This chapter discussed the fundamental of DDS architecture, including the
conventional ROM-based DDS and ROM-less DDS with sine-weighted DAC. A ROM-less
10
out
R
load
R
unit
V
cs
Figure 1.7: Typical switching structure of current-steering DAC
DDS with segmented DDS is presented, too. Direct digital modulation capabilities is very
important when the DDS used in modern radar and communication systems. How the DDS
with modulation capabilities works in stretch processing radar is evaluated. Finally, the
DDS spectral purity is summarized.
Chapter 2: Sine-weighted DAC is introduced in this chapter. A segmented sine-
weighted DAC used in ultrahigh speed ROM-less DDS is presented in the second part.
Chapter 3: A 11-bit 8.6 GHz DDS with 10-bit sine-weighted DAC (11B DDS) will be
presented at Chapter 3. It is a low power, ultrahigh-speed and high resolution SiGe DDS
RFIC with 11-bit phase and 10-bit amplitude resolutions. Using more than twenty thousand
transistors, including an 11-bit pipeline accumulator, a 6-bit coarse sine-weighted DAC and
eight 3-bit  ne sine-weighted DACs, the core area of the DDS is 3 2.5 mm2. The maximum
clock frequency was measured at 8.6 GHz with a 4.2958 GHz output. The DDS consumes
4.8 W of power using a single 3.3 V power supply. It achieves the best reported phase and
amplitude resolutions, as well as a leading power e ciency  gure-of-merit (FOM) of 81.1
GHz 2SFDR=6/W in the ultrahigh speed DDS design. The measured SFDR is approximately
45 dBc with a 4.2958 GHz Nyquist output, and 50 dBc with a 4.2 MHz output in the
Nyquist band at the maximum clock frequency of 8.6 GHz. Under a 7.2 GHz clock input,
11
the worst-case Nyquist band SFDR and narrow band SFDR are measured as 33 dBc and
42 dBc respectively. The measured phase noise with an output frequency of 1.57 GHz is
-118.55 dBc/Hz at a 10 kHz frequency o set with a 7.2 GHz clock input generated from an
Agilent E8257D analog signal generator. All the measurements were taken with the chips
bonded in a CLCC4-52 package.
Chapter 4: A 9-bit 2.9 GHz DDS (9B DDS) with direct digital modulation capabilities
will be presented at Chapter 4. It is a low power, high speed SiGe DDS RFIC with 9-bit
phase and 7-bit amplitude resolutions. This DDS is one of the  rst reported GHz range
output DDS RFIC with direct digital frequency and phase modulation capabilities. Using
more than eight thousand transistors, the DDS RFIC includes a 9-bit CLA accumulator
for phase accumulation, a 9-bit CLA adder for phase modulation and a 7-bit sine-weighted
DAC. The core area of the DDS occupies 1.7 2.0 mm2. The DDS consumes low power of 2.0
W under a 3.3 V single power supply even with the added modulation blocks. The narrow
band SFDR is measured as 35 dBc with the maximum update frequency of 2.9 GHz. The
DDS RFIC is tested in a CLCC-44 package.
Chapter 5: A 24-bit 5.0 GHz DDS (24B DDS) with direct digital modulation capa-
bilities will be presented at Chapter 5. This design is a ultrahigh speed DDS with direct
digital modulation capabilities used in a pulse compression radar. This design represents one
of the  rst DDS RFIC in over-GHz output frequency range with direct digital modulation
capabilities. It adopts a ROM-less architecture and has the capabilities for direct digital
frequency and phase modulation with 24 bit and 12 bit resolution, respectively. The DDS
includes a 24-bit RCA accumulator for phase accumulation, a 12-bit RCA for phase modula-
tion and a 10-bit segmented sine-weighted DAC for phase-to-amplitude conversion as well as
digital-to-analog conversion. The DDS core occupies 3.0 2.5 mm2 and consumes 4.7 W of
power with a single 3.3 V power supply. This 24-bit DDS has more than 20,000 transistors
and achieves a maximum clock frequency of 5.0 GHz. The measured worst-case SFDR is 45
4CLCC represents ceramic lead-less chip carrier
12
dBc under a 5.0 GHz clock frequency and within a 50 MHz bandwidth. At 1.246258914 GHz
output frequency, the 50 MHz narrow band SFDR is measured as 82 dBc. The best Nyquist
band SFDR is 38 dBc with a 469.360351 MHz output using a 5.0 GHz clock frequency. This
DDS was tested in a CLCC-68 package.
All the DDSs discussed in the above chapter outlines were developed in 0.13  m silicon
germanium (SiGe) BiCMOS technology with fT=fMAX = 200/250 GHz.
Chapter 6: Chapter 6 presents an 8.7-13.8 GHz transformer-coupled varactor-less
quadrature current-controlled oscillator (QCCO) RFIC. It incorporates a transformer-coupled
technique and tuned by changing the operation current through the primary and secondary
windings. Fabricated in a 0.18  m SiGe BiCMOS process, the prototype QCCO achieves a
45.3% wide tuning range. With two stacked octagonal transformers the QCCO core circuit
occupies 0.4 0.5 mm2 chip area and draws 8-18 mA current under a 1.8 V power supply. The
measured phase noise is about -86.83 dBc/Hz at 1 MHz o set and 110 dBc/Hz at 10 MHz
o set with 11.02 GHz quadrature outputs. The QCCO achieves a phase noise  gure-of-merit
of -154 dBc/Hz.
Chapter 7: The dissertation concludes in Chapter 7 with future research topics sug-
gested.
13
Chapter 2
Design and Analysis of Sine-weighted DAC
2.1 Sine-weighted DAC
The sine-weighted DAC combines the sine/cosine mapping block with the digital-to-
analog amplitude converter. The major di erence between the linear DAC and sine-weighted
DAC is that the linear DAC has an identical current source or a power of 2 weighted current
sources for each bit, depending upon the decoder scheme, while the sine-weighted DAC has a
variety of weighted current sources. Fig. 2.1 shows the structure of sine-weighted DAC with
thermometer decoder. For the P-bit phase word, the  rst two MSBs are used to determine
the quadrant of the sine wave, and the remaining P-2 bits will be used to represent one
quarter phase (0   /2) of the sine wave. The current source matrix is calculated by Eq.
(2.1).
T
h
e
r
m
o
m
e
t
e
r
 
D
e
c
o
d
e
r
1
'
s
 
C
o
m
p
l
e
m
e
n
t
o
r
P
MSB
2nd MSB
P-2
P-2
IN
OUT
VCC
K=0
K=1
K=2
P-2
-1
Figure 2.1: Block diagram of (P-1)-bit sine-weighted DAC.
14
Ik =
8
>>>
<
>>>
:
b 2M 1 sin
 
 
2  
(0:5)
2P 2
 
c;for k = 0
b 2M 1 sin
 
 
2  
(k+0:5)
2P 2  
k 1X
n=0
In
!
c; 0 k 2P 2 1
(2.1)
In Eq. (2.1), P is the phase resolution of the sine-weighted DAC, which is the total input
number of bit of the sine-weighted DAC. M is the amplitude resolution including the mir-
roring e ect of the MSB. Usually, M=P-1, generated by the (P-2)-bit quater sine-wave and
the mirroring of the MSB.
2.2 Segmented Sine-Weighted DAC
It is quite di cult to build a non-segmented DAC with more than 10 bit resolution
due to the exponential increase in area and power consumption that results from increasing
the DAC resolution. The problem becomes even more pronounced for sine-weighted DAC
designs than the linear DAC. In linear DAC design, high accuracy can be achieved using
segmentation. For instance, a 10-bit DAC can be segmented into a 5-bit coarse DAC and a
5-bit  ne DAC, i.e., a 5+5 segmentation, while a 12-bit DAC can be segmented into an 8-bit
coarse DAC and a 4-bit  ne DAC, i.e., 8+4 segmentation [16,13]. Similarly, a sine-weighted
DAC can also be segmented into coarse DAC and  ne DACs [17].
2.2.1 Quantization and Segmentation of the Sine Wave
For the P-bit phase word, since the quadrant of the sine waveform was determined by
the two MSBs, only one quarter of the sine wave needs to be generated by the left P-2 bits.
If we further segment the remaining P-2 phase bits in three parts with a, b and c bits (a + b
+ c = P-2), there are 2a+b+c phase words for one quarter of the sine wave. The phase word
can thus be represented as
 = x 2b+c +y 2c +z (2.2)
15
with 0  x 2a 1;0  y 2b 1 and 0  z 2c 1, where x, y and z are the phase
sequence numbers related to the segmented parts a, b and c. Thus, if the amplitude of the
sine wave is given by A = 2M 1, where M is number of amplitude bits, and for a speci c
phase word  , the quarter sine wave can be represented as
Asin
  
2  
 
2a+b+c
 
= (2M 1) sin
  
2  
x 2b+c +y 2c +z
2a+b+c
 
= (2M 1) sin
  
2  
x 2b+c +y 2c
2a+b+c
 
cos
  
2  
z
2a+b+c
 
+ (2M 1) cos
  
2  
x 2b+c +y 2c
2a+b+c
 
sin
  
2  
z
2a+b+c
 
:
(2.3)
Since
z x 2b+c +y 2c 2a+b+c; (2.4)
we have
cos( 2  z2a+b+c) 1: (2.5)
Thus, the sine wave can be approximated as
Asin
  
2  
 
2a+b+c
 
 (2M 1) sin
  
2  
x 2b+c +y 2c
2a+b+c
 
+ (2M 1) cos
  
2  
x 2b+c +y 2c
2a+b+c
 
sin
  
2  
z
2a+b+c
 
= C(x;y) +F(x;y;z);
(2.6)
with
C(x;y) = (2M 1) sin
  
2  
x 2b+c +y 2c
2a+b+c
 
; (2.7)
16
F(x;y;z) =(2M 1)cos
  
2  
x 2b+c +y 2c
2a+b+c
 
 sin
  
2  
z
2a+b+c
 
;
(2.8)
where C(x;y) is the sinusoidal value to be stored in a coarse DAC, and F(x;y;z) denotes
the sinusoidal value to be stored in  ne DACs, respectively. From the above decomposition,
two sub-DACs can be designed to convert a complete sine wave to its analog waveform. The
 ne DAC data F(x;y;z) can be used to interpolate the coarse DAC data C(x;y). In order
to quantize C(x;y), the amplitude di erences between the two adjacent coarse phase words
are derived as shown in Eq. (2.9).
 C(x;y) =
8>
>>>
>>><
>>>
>>>>
:
b(2M 1) sin
 
2 (0:5)
2P
 
c;for x = y = 0
b(2M 1) sin
 
2 (x 2b+c+y 2c)
2P
 
 
x 1X
m=0
2b 1X
n=0
 C(m;n) 
yX
n=0
 C(x;n)c;
for 0 x 2a 1;1 y 2b 1
(2.9)
To simplify the quantization of F(x;y;z), the average of y is used to represent every y value
and F(x;y;z) is thus simpli ed to F(x;z). Hence, the amplitude di erence between the two
adjacent  ne phase words for the  ne DACs can be obtained as shown in Eq. (2.10).
 F(x;z) =
8
>>><
>>>
:
b(2M 1) cos
 
2 (x 2b+c+y 2c)
2P
 
sin
 
2 (0:5)
2P
 
c;for z = 0
b(2M 1) cos
 
2 (x 2b+c+y 2c)
2P
 
sin
 
2 (z+0:5)
2P
 
 
z 1X
n=0
 F(x;n)c; for 1 z 2c 1
(2.10)
In Eqs. (2.9) and (2.10), it should be pointed out that
a. P is the truncated phase resolution, P = a+b+c+ 2;
b. bAc denotes the rounding of number A down to the nearest integer toward zero;
c. y = 0+1+   +(2b 1)2b = 2b 12 is the average value of y; and
17
d. F(x;z) = F(x;y;z) F(x;y;z), where y is replaced with its averaged value. With
Eqs. (2.9) and (2.10), the sine function can be rewritten as
(2M 1) sin
  
2  
 
2a+b+c
 
 C(x;y) +F(x;z)
=
xX
i=0
yX
j=0
 C(x;y) +
xX
i=0
yX
k=0
 F(x;z);
(2.11)
where the  rst term denotes the data stored in the coarse DAC current sources and the
second term denotes the data stored in the  ne DAC current sources.
This trigonometric decomposition is similar to the ROM compression in the ROM-based
DDS. In the approaches by Sunderland [4] and Nicholas [5],
sin(A+B +C) = sin(A+B) cos(C) + cos(A+B) sin(C)
 sin(A+B) + cos(A) sin(C):
(2.12)
The following two approximations
8
>><
>>:
cos(C) 1
cos(A+B) cos(A)
(2.13)
have been made, while in the approach adopted here, the approximation is improved by
using 8
>><
>>:
cos(C) 1
cos(A+B) cos(A+B);
(2.14)
where B is the mean value of B. The approximation error will be analyzed in the next
subsection.
18
2.2.2 Approximation Error Analysis
In the previous subsection, two approximations are used for the coarse DAC and  ne
DACs respectively. The  rst is represented in Eq. (2.5). The second is the use of the mean
value of y for the computation of F(x;y;z). Both the approximations lead to errors in the
computation of the sine wave?s amplitude. For the coarse DAC the approximation error is
EC = cos
  
2  
x 2b+c +y 2c
2a+b+c
 
 1: (2.15)
The maximum value of EC is
maxfECg= sin
  
2  
1
2a+b
 
; (2.16)
when x = 2a 1 and y = 2b 1.
For the  ne DACs,
EF = cos
  
2  
x 2b+c +y 2c
2a+b+c
 
 cos
  
2  
x 2b+c +y 2c
2a+b+c
 
; (2.17)
and the maximum value of EF is
maxfEFg 2 sin
  
2  
1
2a+2
 
; (2.18)
when x = 2a 1, y = 2b 1 and y = (2b 1)=2.
If the whole DAC requires a 9-bit amplitude resolution, excluding the MSB mirroring,
then the coarse DAC should have at least a 9-bit resolution and the  ne DACs should have
c-bit resolution. From Eqs. (2.16) and (2.18),
8
>><
>>:
sin  2  12a+b  129
2 sin  2  12a+2  12c:
(2.19)
19
From Eq. (2.19), 8
>>>
>>><
>>>
>>>:
a+b 4
c 6;when a = 0
c 10;when a = 4:
(2.20)
As long as a, b and c are in the range of Eq. (2.20), the approximation errors are less than
the quantization noise and can be ignored.
2.2.3 Optimizing the Segmentation
From the above discussion, the quantization noise is signi cantly a ected by the seg-
mentation. To optimize the segmentation for better performance, one or more optimization
parameters need to be considered. SFDR, power consumption and die area are the most
critical parameters in the ultrahigh speed DDS design. An optimized segmentation  gure-
of-merit, normalized by the non-segmented values, is de ned as
FOMsg = (SFDRsg SFDRns) PsgP
ns
 AsgA
ns
: (2.21)
where FOMsg, SFDR, P and A represent the segmentation  gure-of-merit, spurious-free-
dynamic-range, power consumption and occupied area, respectively. The subscript \sg"
means segmented DAC and \ns" denotes non-segmented DAC. Unlike CMOS logic design,
where the power consumption results mainly from dynamic power, the primary power con-
sumed by the current-mode-logic (CML) circuits that are used in the ultrahigh speed DDS
designs is the static bias current in the CML current sources. Moreover, we assume that
both the DAC power consumption and area are proportional to the number of DAC switch
cells. If we segment the switch cells to a, b and c, the normalized number of switch cells is
given by
2a+b + 2a+c
2a+b+c : (2.22)
20
Table 2.1: Simulated Segmentation FOM for Di erent Segmenations with 11-bit Phase and
10-bit Amplitude Resolutions
Segmentation SFDR Normalized Power FOMsg
a-b-c Consumption or Area
2-2-5 51.08 0.2813 2.1895
2-3-4 57.73 0.1875 0.7390
2-4-3 65.44 0.1875 0.4679
3-2-4 64.03 0.3125 1.4375
3-3-3 71.07 0.2500 0.4675
3-4-2 72.19 0.3125 0.6406
4-2-3 72.05 0.3750 0.9422
4-3-2 70.87 0.3750 1.1081
4-4-1 71.27 0.5625 2.3667
4-5-0 78.75 1 0
which can be used to represent either the normalized power consumption Psg=Pns or the
normalized area Asg=Ans. For a sine-weighted DAC with total 9 input bits, Table 2.1 shows
the simulated SFDR, normalized power consumption or area and the FOMsg. The results
in Table 2.1 demonstrate that with a larger a or b, a better SFDR can be achieved, but
power consumption and area will increase as well. Segmentation with a + b = 9 yields the
best SFDR, yet it also leads to the highest power consumption and largest area. This result
is understandable since a + b = 9 means a non-segmented DAC. The segmentation with
a = b = c = 3 results in a good power and area e ciency, and a relatively high SFDR.
Moreover, it achieves the best FOMsg of 0.47. Note that the simulated SFDR in Table 2.1
includes only the e ect of static quantization errors of the sine-weighted DAC, whereas the
practical integrated circuit also su ers from other nonlinearities and distortions. As a result,
the measured SFDR will be worse than what is given in Table 2.1.
21
Chapter 3
An 11-bit 8.6 GHz DDS RFIC with 10-bit Segmented Sine-weighted DAC
3.1 Introduction
Ultrahigh-speed HBTs allow a DDS to operate up to mm-wave frequency, which is
a preferable solution to the synthesis of sine waveforms with  ne frequency resolution, fast
channel switching and versatile modulation capability [1,2]. There are several ultrahigh speed
DDS designs reported with clock frequencies from 9 GHz to 32 GHz and DAC resolution from
5 bits to a maximum of 8 bits [18, 19, 20]. These DDSs have been implemented in indium
phosphide (InP) (HBT) technology and only tested on-wafer [18, 19, 20]. The maximum
achieved SFDR in these DDS designs is less than 30 dBc, which is not su cient for typical
radar and wireless applications. The low yield and high power consumption of InP HBTs
limits the InP HBT-based DDS from achieving higher resolution. Several DDSs have been
developed in SiGe BiCMOS technology with more robust and higher yield devices than the
InP counterpart [21,22]. However, theses earlier versions of SiGe DDSs still su er from less
than 30 dBc SFDR. A higher spectrum purity and higher amplitude resolution are required
in modern radar and communication systems. With a segmented sine-weighted DAC, the
DDS presented in this chapter achieves 11-bit phase and 10-bit amplitude resolutions with a
maximum clock frequency of 8.6 GHz [8,9]. The DDS consumes 4.8 W with a leading power
e ciency FOM of 81.1 GHz 2SFDR=6/W and the best reported Nyquist band worst-case SFDR
of 33 dBc in ultrahigh speed DDS designs.
The proposed DDS adopts a ROM-less architecture, which combines both the sine/cosine
mapping and digital-to-analog conversion together in a sine-weighted DAC [8,9]. The block
diagram of the ROM-less DDS, with 11-bit phase and 10-bit amplitude resolution is shown
in Fig. 3.1. The major part of the ROM-less DDS is an 11-bit pipeline phase accumulator
22
and a 10-bit current-steering segmented sine-weighted DAC. Since the output frequency can-
not exceed the Nyquist rate, the MSB of the accumulator input is tied to zero. The 11-bit
FCW (including MSB = 0) feeds the accumulator which controls the output frequency of
the synthesized sine wave. The two MSBs of the accumulator output are used to determine
the quadrant of the sine waveform. The remaining 9-bits are use to control the segmented
sine-weighted DAC in generating the amplitude for a quarter phase (0   /2) sine wave.
With the segmentation method described in Chapter 2, 3+3 MSBs are used to control the
coarse DAC, while the 3-bit MSBs and 3-bit LSBs are used to control the  ne DACs.
FCW
11
DFFs
A
D
D
E
R
Coarse DAC
10
1
'
s
 
C
o
m
p
l
e
m
e
n
t
o
r
9
Fine DACs
3
3
3
2
nd
 MSB
MSB
11-bit pipline accumulator 10-bit sine-weighted DAC
fclk fout
11
11
Figure 3.1: Block diagram of the ROM-less DDS with 10-bit segmented sine-weighted DAC
.
3.2 Circuit Implementation of the 11-bit Rom-less DDS
With a 3.3 V power supply and a SiGe HBT base-collector voltage of 0.85 V 0.9 V, all of
the digital logic is implemented using 3-level CML with di erential output swings of 400 mV.
A trade-o has been made between the DDS operational speed and its power consumption.
For an 11-bit packaged DDS RFIC, power consumption is the primary concern. To save
23
power, each tail current in a CML current source is set to 0.3 mA, which is close to 40%
of the peak fT current. In the contrast, traditional CML circuits are biased at 70 80% of
the peak fT current. A traditional implementation of the CML circuits would end up with
a DDS with power consumption larger than 9.0 W.
3.2.1 11-Bit Pipeline Accumulator
To achieve the maximum operating speed with a  xed FCW, a pipeline accumulator is
used in this design. It uses the most hardware, but achieves the fastest speed. The total
delay of the accumulator is one full adder (FA) propagation delay plus one D  ip- op (DFF)
propagation delay.
Fig. 3.2 illustrates the architecture of the 11-bit pipeline phase accumulator, which has
a total of 11 pipelined rows. Each row has a total of 12 DFF delay stages placed at the input
and output of a 1-bit FA. Eleven DFF stages are needed for an 11-bit pipeline accumulator.
One more DFF is used for each row to retime the signal for data synchronization. This
scheme retimes the signal to remove the timing mismatch due to the metal wire delays from
the accumulator output to its input. Obviously, the pipeline accumulator has a propagation
delay of 12 clock cycles, including a latency period of 11 clock cycles plus one retiming
clock cycle. Note that an accumulator requires at least one delay stage even without any
pipelined stages. So the pipeline architecture shown in Fig. 3.2 allows the 11-bit accumulator
to operate at the speed of a 1-bit accumulator consisting of an FA and a DFF.
3.2.2 10-Bit Segmented Sine-Weighted DAC
The block diagram of the 10-bit sine-weighted DAC is shown in Fig. 3.3. It has a 9-bit
complementor and a current-steering sine-weighted DAC, which includes a 6-bit coarse DAC
and eight 3-bit  ne DACs. The MSB of the DAC input is used to provide the proper mirroring
of the sine waveform about the  phase point. The 2nd MSB is used by the complementor to
invert the remaining 9 bits for the 2nd and 4th quadrants of the sine waveform. The outputs
24
A
B
C
in
C
out
SUM
DFF
DFF DFF
1-bit FA
DFF
A
B
C
in
C
out
SUM
DFF
DFF DFF
1-bit FA
DFF
A
B
C
in
C
out
SUM
DFF
DFF DFF
1-bit FA
DFF
DFF
11
10
11
Q1
Q2
Q11
D1
D2
D11
(tied to zero)
Carry out
Carry in tied to zero
Figure 3.2: The 11-bit pipeline phase accumulator.
of the complementor are applied to the segmented sine-weighted DAC to form a quarter of
the sine waveform. Because of the quadrant mirror, the total amplitude resolution of the
sine-weighted DAC is 10-bits, while a 9-bit segmented sine-weighted DAC is used to generate
the amplitude for a quarter phase (0  /2) sine wave.
Based on the discussion in Chapter 2, setting a = b = c = 3 results in a segmentation
with the best segmentation FOM. Therefore, the 9-bit sine-weighted DAC is divided into a
6-bit coarse sine-weighted DAC and eight 3-bit  ne sine-weighted DACs. The  rst 6 bits
of the complementor output control the coarse sine-weighted DAC, and the highest 3 bits
also address the selection of the  ne DACs. The lowest 3 bits of the complementor output
determine the output value of each of the  ne DACs.
25
3-7
 R
o
w 
D
eco
d
er
3-7 T h er mo mete r  D eco d er
1'
s 
C
o
mp
leme
n
to
r
M SB
2
nd
 M SB
3-7 C o l u mn  D eco d er
< 4:6>
< 1:9>
< 7:9> < 1:3>
VC C
OU T3-8
 B
in
ar
y 
D
eco
d
er
(B ) (C )
(D )
(A )
< 1:9>
(A ) (B ) (C )
9 
b
(D )
10 
b
? 2?0
0 2 1 2 1 2 1 2
0 2 1 2 1 2 1 2
0 1 2 1 2 1 1 2
0 1 1 2 1 1 1 2
0 1 1 1 1 1 1 1
0 1 1 0 1 1 1 0
0 0 1 0 1 0 1 0
0 0 0 1 0 0 0 0
1 12 13 12 13 12 13 12
12 13 12 12 12 12 12 12
11 12 11 11 12 11 10 11
11 10 10 10 10 9 10 9
9 9 8 8 9 7 8 7
7 7 7 6 6 6 5 5
5 5 4 4 4 4 3 3
2 3 2 1 2 1 0 1
< 1:11 >
Figure 3.3: 10-bit segmented sine-weighted DAC.
With 11-bit phase and 10-bit amplitude resolutions, the weighted current sources of the
coarse DAC and  ne DACs can be calculated from Eqs. (2.9) and (2.10). The numbers within
the coarse DAC and  ne DACs in Fig. 3.3 represent the weights of the various current sources.
To describe the DAC core architecture and its operation, an operator is de ned between two
8 8 square matrices.
A~B = aij ~bij =
7X
i=0
7X
j=0
aij bij: (3.1)
To match the sine-weighted DAC description, the matrix indices start from 0 instead of 1.
As an example, for a speci c phase word
 = x 2b+c +y 2c +z
= 64x+ 8y +z;
(3.2)
26
the quarter sine wave is rebuilt using Eq. (2.11) and represented in Eq. (3.3),
(2M 1) sin
  
2  
 
2a+b+c
 
= 1023 sin
  
2  
64x+ 8y +z
29
 
=
2
66
66
66
66
66
66
66
66
66
66
4
1 12 13 12 13 12 13 12
12 13 12 12 12 12 12 12
11 12 11 11 12 11 10 11
11 10 10 10 10 9 10 9
9 9 8 8 9 7 8 7
7 7 7 6 6 6 5 5
5 5 4 4 4 4 3 3
2 3 2 1 2 1 0 1
3
77
77
77
77
77
77
77
77
77
77
5
| {z }
Coarse current matrix
~
2
66
66
66
66
66
66
66
66
66
66
4
c00 c01 c02 c03 c04 c05 c06 c07
c10 c11 c12 c13 c14 c15 c16 c17
c20 c21 c22 c23 c24 c25 c26 c27
c30 c31 c32 c33 c34 c35 c36 c37
c40 c41 c42 c43 c44 c45 c46 c47
c50 c51 c52 c53 c54 c55 c56 c57
c60 c61 c62 c63 c64 c65 c66 c67
c70 c71 c72 c73 c74 c75 c76 c77
3
77
77
77
77
77
77
77
77
77
77
5
| {z }
Coarse switch matrix
+
2
66
66
66
66
66
66
66
66
66
66
4
0 2 1 2 1 2 1 2
0 2 1 2 1 2 1 2
0 1 2 1 2 1 1 2
0 1 1 2 1 1 1 2
0 1 1 1 1 1 1 1
0 1 1 0 1 1 1 0
0 0 1 0 1 0 1 0
0 0 0 1 0 0 0 0
3
77
77
77
77
77
77
77
77
77
77
5
| {z }
Fine current matrix
~
2
66
66
66
66
66
66
66
66
66
66
4
f00 f01 f02 f03 f04 f05 f06 f07
f10 f11 f12 f13 f14 f15 f16 f17
f20 f21 f22 f23 f24 f25 f26 f27
f30 f31 f32 f33 f34 f35 f36 f37
f40 f41 f42 f43 f44 f45 f46 f47
f50 f51 f52 f53 f54 f55 f56 f57
f60 f61 f62 f63 f64 f65 f66 f67
f70 f71 f72 f73 f74 f75 f76 f77
3
77
77
77
77
77
77
77
77
77
77
5
| {z }
Fine switch matrix
(3.3)
where cij and fij represent the operation state (0 means open and 1 means closed) of the
respective coarse DAC and  ne DAC switches. Comparing to Eqs. (2.11) and (3.3), we have
27
cij =
8
>>>
>>><
>>>
>>>:
1; when i<x
1; when i = x;j y
0; others;
(3.4)
fij =
8
>><
>>:
1; when i = x;j z
0; others;
(3.5)
and
0 x 7;0 y 7;and 0 z 7: (3.6)
From Eq. (3.4), the control bits of the coarse DAC switch matrix can be generated through
thermometer decoders. Fig. 3.4 shows the coarse DAC decoders. d10 d4 represent the input
bits to the coarse DAC and e9  e4 represent the complemented bits at the complementor
output. The full 6-bit thermometer decoder includes 3 parts: a column decoder, a row
decoder and second level decoders. e9  e7 and e6  e4 are inputs to the column decoder
and row decoder, respectively. Following the second level thermometer decoder, 6-bit binary
codes are converted to 64-bit thermometer codes represented by cij. As shown in Fig. 3.5,
the control bits of the  ne DAC?s switch matrix can be generated through a thermometer
decoder, a binary decoder and a second level address-select decoder. d10  d7 and d3  d1
represent the input bits to the  ne DACs. e9  e7 and e3  e1 represent the complemented
bits at the complementor output. e3  e1 is input through the thermometer decoder and
converts the input bits for each  ne DAC, and e9  e7 is input through the binary decoder
to generate the address-select code. The binary decoder and the address-select decoder work
together to select which  ne DAC is used to interpolate the respective coarse DAC steps.
Through a combination of all the decoders, the 64-bit  ne DAC control matrix is generated
and represented by fij as described in Eq. (3.5).
28
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
Complementor
Complementor
Thermometer Decoder
Thermometer Decoder
d
10
d
9
d
8
d
7
d
10
d
6
d
5
d
4
e
8
e
7
e
9
e
6
e
5
e
4
e
9
+e
8
+e
7
e
9
+e
8
e
9
+e
8
e
7
e
9
e
9
(e
8
+e
7
)
e
9
e
8
e
9
e
8
e
7
e
6
+e
5
+e
4
e
6
+e
5
e
6
+e
5
e
4
e
6
e
6
(e
5
+e
4
)
e
6
e
5
e
6
e
5
e
4
C
7
C
6
C
5
C
4
C
3
C
2
C
1
C
0
=0
R
7
R
6
R
5
R
4
R
3
R
2
R
1
R
0
=0
R
8
=1
0 ? i, j
 
? 7
c
ij
C
i
R
j
R
j+1
Figure 3.4: Coarse DAC thermometer decoder.
As shown in Fig. 3.6, the coarse DAC current source matrix provides 512 unit current
sources. Each  ne DAC uses about 8 unit current sources to interpolate the two adjacent
outputs of the coarse DAC. For example, for a phase value represented by x = 2, y = 3 and
29
DFF
e9e8e7e9e8e7e9e8e7e9e8e7e9e8e7e9e8e7e9e8e7
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFFe9e8e7
DFF
Complementor
Binary Decoder
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
Complementor
Thermometer Decoder
C
7
C
6
C
5
C
4
C
3
C
2
C
1
C
0
R
7
R
6
R
5
R
4
R
3
R
2
R
1
R
0
=0
0 ? i, j ? 7
f
ij
C
i
R
j
d
10
d
9
d
8
d
7
d
10
d
3
d
2
d
1
e
3
e
2
e
1
e
9
e
8
e
7
e
3
+e
2
+e
1
e
3
+e
2
e
3
+e
2
e
1
e
3
e
3
(e
2
+e
1
)
e
3
e
2
e
3
e
2
e
1
Figure 3.5: Fine DACs thermometer decoders.
z = 5, the coarse DAC current output is the sum of all the numbers  lled in the gray-shaded
boxes in the coarse DAC current matrix in Fig. 3.6. The  ne DAC current output, which is
30
the sum of all the numbers  lled in the gray-shaded boxes in the  ne DAC current matrix,
is added to the coarse DAC output. As a result, the total current output of the DAC is the
sum of all the gray-shaded boxes and equal to 237 unit current sources. The unit current of
each current source is set at 26  A. The largest current in the current source matrix of this
10-bit sine-weighted DAC is 338  A, which is composed of 13 unit current sources.
1 12 13 12 13 12 13 12
12 13 12 12 12 12 12 12
11 12 11 11 12 11 10 11
11 10 10 10 10 9 10 9
9 9 8 8 9 7 8 7
7 7 7 6 6 6 5 5
5 5 4 4 4 4 3 3
2 3 2 1 2 1 0 1
0 2 1 2 1 2 1 2
0 2 1 2 1 2 1 2
0 1 2 1 2 1 1 2
0 1 1 2 1 1 1 2
0 1 1 1 1 1 1 1
0 1 1 0 1 1 1 0
0 0 1 0 1 0 1 0
0 0 0 1 0 0 0 0
x
z y
C o a rse  D ACF i n e  D AC s
1 2 3 4 5 6 70
1 2 3 4 5 6 70
0
1
2
3
4
5
6
7
Figure 3.6: Illustration of interpolating the two adjacent outputs of a coarse DAC using the
 ne DAC current matrix.
The current switch contains two di erential pairs with cascode current sources for im-
proved output impedance and current mirror accuracy. The current outputs are converted
to di erential voltages by a pair of o -chip 15  pull-up resistors. Fig. 3.7 shows that the
currents from the cascode current sources are fed to outputs OUTp and OUTm by pairs of
switches (Msw). The MSB controls the selection between di erent half periods. The current
switch contains two di erential pairs with minimum size transistors and a cascode transistor
to isolate the current sources from the switches, which also improves the bandwidth of the
switching circuits.
31
MSB
p
MSB
m
OUT
p
OUT
m
M
sw
Q
p
Q
m
D
p
D
m
S
p
S
m
C
L
K
p
C
L
K
m
Pull up resistor
V
cas
V
cs
D
p
D
m
C
p
C
m
MSB
p
MSB
m
M
sw
Q
p
Q
m
D
p
D
m
S
p
S
m
C
L
K
m
C
L
K
p
D
p
D
m
C
m
C
p
DAC 
current cell
Figure 3.7: Current switch circuit of the sine-weighted DAC.
In order to achieve current source matching in the layout, each current source is split into
four identical small current sources which carry a quarter of the required current. To further
improve this matching, all the current source transistors, including those in the coarse DAC
and  ne DACs, are distributed in the current source matrix with a pseudo-double-centroid
switching scheme [23]. The coarse DAC and  ne DACs use a total of 568 current sources.
Therefore, a 24 row by 24 column current source cells are used to build the current matrix in
Fig. 3.8. All the current sources are distributed through a rotation from the matrix center
to the edge. The total number of current source cells used for the coarse DAC are 511 and
57 are used for  ne DACs. The remaining 8 current sources are used for bias. Four 24 by
24 current source matrices are placed around a common cenrtoid. Two dummy rows and
columns are added around the current source matrix to avoid edge e ects. So the complete
current matrix has 52 rows and 52 columns.
32
10 11 12 13
9 2 3 14
8 1 4 ???
7 6 5
568
10111213
92314
814???
765
568
10 11 12 13
9 2 3 14
8 1 4 ???
7 6 5
568
10111213
92314
814???
765
568
24 columns
2
4
 
r
o
w
s
52 columns
5
2
 
r
o
w
s
Figure 3.8: Diagram of the current source matrix.
3.2.3 Clock Distribution
To synchronize the signal in high speed circuit design, numerous DFFs are used between
the logic elements. In the accumulator design, the number of the DFFs in the pipeline
accumulator increases rapidly with the increasing number of the pipeline stages. Hence
there are more than 100 DFFs used in the 11-bit pipeline accumulator. Counting the number
of the DFFs used in the sine-weighted DAC to synchronize the current switches, it yields
approximately 300 DFFs. All of these DFFs must be synchronized with a simultaneous clock
edge. In order to minimize the phase di erence and maintain the same drive strength between
the clock and DFFs, an H-tree clock scheme is used to ensure that the clock signal reaches
each block simultaneously. Fig. 3.9 shows a simpli ed architecture of the \H"-shaped clock
33
tree. The actual clock tree is 3 times bigger than the simpli ed version shown in Fig. 3.9.
The external clock is bu ered by a di erential pair and then drives two emitter follower pairs
which are used as a level-shifter as well as a bu er. Each emitter follower pair drives two
or four di erential pairs and each di erential pair drives other emitter follower pairs, until
the clock reaches the leaves that  nally drive the DFFs. The number of di erential pairs
or emitter follower pairs driven by the previous stage depends on the driving strength of
the previous stage and is proportional to the CML tail current. To keep enough swing fully
switching the next stage, a 1-driving-2 current ratio is maintained throughout the whole
clock bu er tree.
3.3 Experimental Results
The die photo of the SiGe DDS RFIC is shown in Fig. 3.10. This DDS design is quite
compact with an active area of 3 2.5 mm2 and a total die area of 4 3.5 mm2. The DDS
was tested in a CLCC-52 package. Fig. 3.11 shows the packaged chip soldered on a PCB
fabricated with RO4004 material. The clock signal is generated from an Agilent E8257D
analog signal generator and is converted to di erential signals by a hybrid coupler. Two
SMA connectors and symmetrical tracks are used to send the clock signal to the DDS chip.
The DDS current outputs are converted to voltage outputs through a pair of 15  on-board
resistors and connected to the spectrum analyzer or oscilloscope through SMA connectors
and RF cables.
The package has a thermal resistance of approximately 30 C/W. With a 4.8 W power
consumption at room ambient temperature, the junction temperature of the SiGe devices can
theoretically reach as high as 180 C. At such high temperature, the device performance is
greatly degraded and the DAC current switches are no longer synchronized due to increased
internal delays, which introduce noticeable distortion in the output waveform. When the
device is e ectively cooled, the DDS operates at a maximum clock frequency of 8.6 GHz.
34
E0
E0
E0
E0
D0
E0
E0
E0
E0
D0
E0
E0
E0
E0
D0
E0
E0
E0
E0
D0
E1
E0
E0
E0
E0
D0
E0
E0
E0
E0
D0
E0
E0
E0
E0
D0
E0
E0
E0
E0
D0
E1
D1 E2
E0 E1
D0 D1
E2
Emitter follower pairs 
Differential pairs
Clock In
Figure 3.9: Simpli ed clock tree distribution.
At room temperature, the packaged DDS operates at the maximum clock frequency of 7.2
GHz.
35
Pipeline Accumulator
1.4 mm x 0.8 mm
Fine DAC Switches
1.4 mm x 1.0 mm
Coarse DAC Switches
1.4 mm x1.0 mm
Current Source Matrix
2.8 mm x 0.5 mm
Figure 3.10: Die photo of the 11-bit ROM-less DDS RFIC.
FCW INPUT D0~D9
CLOCK
INPUT
DDS OUTPUT
DDS in 
CLCC-52
Figure 3.11: Evaluation board for the 11-bit ROM-less DDS RFIC.
36
Figs. 3.12-3.15 illustrate the measured DDS output spectra and waveforms for di erent
outputs and clock frequencies. All measurements were done with packaged parts and without
calibrating the losses of the cables and PCB tracks. Fig. 3.12 presents a 4.2 MHz DDS output
spectrum with an 8.6 GHz clock input, and a minimum FCW of 1. The measured output
power is approximately -8.3 dBm, and the measured SFDR is about 50 dBc. The tone at
91.7 MHz was generated by the nearby campus FM radio station. To show the signal tone
clearly, a 100 MHz band spectrum plot is used instead of the full Nyquist band. However, the
worst-case spur is located within this band, so within both the Nyquist band and the narrow
band the SFDR is 50 dBc. Fig. 3.13 shows the waveform for the spectrum in Fig. 3.12.
 
Figure 3.12: Measured DDS output spectrum with a 4.2 MHz output and a maximum 8.6
GHz clock (FCW = 1), illustrating about 50 dBc SFDR. The tone at 91.7 MHz is from the
nearby campus FM radio station.
Fig. 3.14 demonstrates the operation of the DDS at a maximum clock frequency of 8.6
GHz with Nyquist output (i.e., FCW = 1023). Thus, the output frequency is set as
210 1
211  fclk = 4:2958 GHz: (3.7)
37
 
Figure 3.13: Measured DDS output waveform with a 4.2 MHz output and an 8.6 GHz clock.
The  rst order image tone due to mixing the clock frequency and the DDS output frequency
occurs at
8:6 GHz 4:2958 GHz = 4:3042 GHz: (3.8)
The measured SFDR of the device is approximately 45 dBc. The tone at 91.7 MHz once
again appears in the spectrum. Fig. 3.15 illustrates the measured DDS output waveform
with a 4.2958 GHz Nyquist output and an 8.6 GHz clock. The signal envelope frequency
results from mixing the output and its image, which is
210 + 1
211  fclk 
210 1
211  fclk 8:4 MHz: (3.9)
Fig. 3.16 shows the measured DDS SFDR plot at both the Nyquist band (3.6 GHz) and
the narrow band (100 MHz) versus the FCW with a clock frequency of 7.2 GHz. The
worst-case SFDR is 33 dBc and 42 dBc for the Nyquist band and narrow band, respectively.
38
Fig. 3.17 shows the measured DDS phase noise at an output frequency of 1.57 GHz with a
7.2 GHz clock input frequency. There is a -118.55 dBc/Hz phase noise at a 10 kHz frequency
o set. The input clock is generated from an Agilent E8257D analog signal generator. The
spurs showing in the measurement are not harmonically related to the synthesized output
frequency. It is test environment related.
Figure 3.14: Measured DDS Nyquist output spectrum with a 4.2958 GHz output and a
maximum 8.6 GHz clock (FCW = 1023), illustrating about 45 dBc SFDR. The image tone
is located at 4.3042 GHz.
To evaluate the performance of ultrahigh speed DDSs, an easily measured and calculated
FOM must be de ned from a combination of performance parameters. In the previous
literature [24], a power e ciency FOM has been de ned as
FOM = Max. Clock(GHz)Power(W) : (3.10)
This previously de ned FOM includes the maximum update frequency as well as the
power consumption, but does not consider the amplitude resolution information, which is
39
 
Figure 3.15: Measured DDS output waveform with a 4.2958 GHz Nyquist output and an 8.6
GHz clock. The 8.4 MHz envelope frequency results from mixing the output and its image.
0 100 200 300 400 500 600 700 800 900 1000
0
10
20
30
40
50
60
70
80
FREQUENCY CONTROL WORD
M
E
A
S
U
R
E
D
 
S
F
D
R
 
(
d
B
c
)
MEASURED SFDR VS. FCW
Narrow Band
Nyquist Band
Figure 3.16: The measured DDS SFDR versus FCW at clock frequency of 7.2 GHz. Illus-
trating a worst-case SFDR of 33 dBc for the Nyquist band (3.6 GHz) and 42 dBc for the
narrow band (100 MHz), respectively.
limited by the DAC. For an ultrahigh speed DDS, this lack of information is unfortunate since
the DAC is the most challenging part of these DDS designs. Thus, we de ne a new FOM
including the e ective number of bits (ENOB) that measures the DAC spurious performance.
40
Figure 3.17: The measured DDS phase noise at an output frequency of 1.57 GHz with a 7.2
GHz clock input frequency. The input clock is generated from an Agilent E8257D analog
signal generator. The graph illustrates a 118:55 dBc/Hz phase noise at a 10 kHz frequency
o set.
From [25], the signal to noise and total harmonic distortion (SINAD) are used to calculate
the ENOB as follows:
ENOB = SINADdB 1:766:02 : (3.11)
SINAD is the ratio of the root-mean-square (RMS) value of the sine wave (reconstructed
output of a DAC) to the RMS value of the noise plus the total harmonic distortion (THD) up
to the Nyquist frequency, excluding the fundamental and the DC o set. SINAD is typically
expressed in dB as
SINAD = SN + THD; (3.12)
41
where S and N are the RMS energy values of the signal and noise; THD is the total harmonic
distortion de ned as
THD = PHD1 +PHD2 +   P
signal
= the biggest spur powerP
signal
+ the sum of all other spurs? power except the biggistP
signal
= 1SFDR
+ the sum of all other spurs? power except the biggistP
signal
:
(3.13)
PHD1, PHD2,    are the  rst and second harmonic distortion energy. Psignal is the funda-
mental tone or signal tone energy.
Table 3.1: Performance Comparison of Ultrahigh Speed DDS RFICs with over 8 GHz Max-
imum Clock Frequency
[18] [19] [20] [21] [this work]
Technology InP InP InP SiGe SiGe
fT=fMAX [GHz] 137/267 300/300 300/300 100/120 200/250
Phase resolution [bit] 8 8 8 9 11
Amplitude resolution [bit] 7 7 5 8 10
Maximum clock [GHz] 9.2 13 32 12.3 8.6
Nyquist band SFDR [dBc] <30 26.67 21.56 20 33
Power consumption [W] 15 5.42 9.45 1.9 4.8
Die area [mm2] 8 5 2.7 1.45 2.7 1.45 3 3 4 3.5
FOM [GHz 2SFDR=6/W] <16.0 42.6 34.8 65.3 81.1
Although the second items in Eq. (3.13) may be larger than the  rst item, the SFDR
is easily obtained since it can be read directly from the spectrum analyzer. Herein, we use
1/SFDR to represent the THD. In general, the RMS value of the noise is far below the
THD. As a result, the SFDR is used to represent SINAD to calculate the FOM, which can
42
be de ned as
FOM = Max. Clock(GHz) 2
(SFDRdB 1:76)=6:02
Power(W)
 Max. Clock(GHz) 2
SFDRdB=6
Power(W) :
(3.14)
SFDRdB=6 represents the ENOB obtained from the SFDR measurement [26]. Although the
SFDR is de ned in the Nyquist band, the narrow band SFDR is often more important since
wideband spurs can be removed relatively easily. It is only a speci c narrow band near the
output, which is usually less than 1% of the update frequency, which is of the interest of
many applications.
Table 3.1 is a performance comparison of ultrahigh speed DDS RFICs with more than
8 GHz maximum clock frequency. Compared to the InP DDS RFICs, this SiGe DDS signif-
icantly improves the resolution, and it is the most complicated ultrahigh speed DDS design
containing approximately twenty thousand transistors. Most of the InP DDS RFICs were
measured using probe stations [18, 19, 20], while this DDS RFIC was packaged. As men-
tioned earlier, the package has a thermal resistance of approximately 30 C/W, and at room
ambient temperature, the junction temperature of the SiGe devices can theoretically reach
as high as 180 C. At such high temperature, the device performance is greatly degraded
and the DAC current switches are no longer synchronized due to increased internal delays.
When the device is e ectively cooled, the DDS operates at a maximum clock frequency of 8.6
GHz. At room temperature, the packaged DDS operates at the maximum clock frequency of
7.2 GHz. When compared with the 9-bit 12.3 GHz DDS [21], this design achieves two more
bits for both phase and amplitude. As a result, this DDS achieves a 10 dB larger SFDR.
3.4 Conclusion
This chapter presented an 11-bit 8.6 GHz SiGe DDS RFIC design with a 10-bit seg-
mented sine-weighted DAC, implemented in 0.13 m SiGe BiCMOS technology withfT=fMAX
43
of 200/250 GHz. With Nyquist output, the DDS achieves a maximum clock frequency of 8.6
GHz. The Power consumption of the DDS is approximately 4.8 W and the power e ciency
FOM is 81.1 GHz 2SFDR=6/W. This DDS RFIC is the  rst ultrahigh speed DDS with 11-bit
phase and 10-bit DAC amplitude resolutions that achieves a record high SFDR of 33 dBc
with leading power e ciency.
44
Chapter 4
A 9-bit 2.9 GHz DDS RFIC with Direct Digital Modulations
4.1 Introduction
So far, no DDS with over-GHz output that have been developed provide desired mod-
ulation capabilities to be used in next generation radar and communication systems. [18,
19, 20, 27, 22, 21, 8]. To achieve an over-GHz output frequency, all existing DDS RFICs use
pipeline accumulators that work only with a constant input FCW, and thus no FM can
be performed [18, 19, 20, 27, 22, 21, 8]. To implement direct FM or PM, CLA or RCA must
be used with the attendant penalty of reduced speed. Ref. [28] reported a 9-bit DDS with
RCA accumulator. It has the capability of FM, but only at low frequency because the FCW
cannot change too fast with the bipolar plus NMOS adder architecture, and no PM can be
performed. The 9B DDS using CLA accumulator and adder to implement the direct digital
modulation capabilities is presented in this chapter. And in next chapter, the 24B DDS
using RCA accumulator and adder to implement the direct digital modulation capabilities is
presented. This two DDS RFICs represent the  rst reported GHz range output DDSs with
direct digital frequency and phase modulation capabilities.
4.2 Circuit Implementation
The 9B DDS adopts a ROM-less architecture which combines both the sine/cosine map-
ping and digital-to-analog conversion together in a sine-weighted digital-to-analog converter
(DAC). The block diagram of the 9-bit 2.9 GHz ROM-less DDS is shown in Fig. 4.1. The
major parts of the ROM-less DDS are a 9-bit CLA phase accumulator, a 9-bit CLA full
adder and a 7-bit sine-weighted DAC. The 9-bit phase accumulator output modulates with
45
the 9-bit PCW and truncated to 8 bits. After phase modulation and truncation, the highest
8 bit output is fed into the sine-weighted DAC. The two MSBs of the residue are used to
determine the quadrant of the sine wave. The MSB output of the phase accumulator is used
to provide the proper mirroring of the sine waveform about the  phase point. The 2nd MSB
is used to invert the remaining 6 bits for the 2nd and 4th quadrants of the sine wave by a 1?s
complementor, and the outputs of the complementor are applied to the sine-weighted DAC
to form a quarter of the sine waveform. Because of the  phase point mirroring, the total
amplitude resolution of the sine-weighted DAC is 7 bit.
FCW
9
f
clk
f
out
7
PCW
9
9
DFFs
DFFs
C
L
A
C
L
A
Sine-weighted
DAC
Figure 4.1: Block diagram of 9-bit ROM-less DDS.
4.2.1 9-bit Carry Look Ahead Adder/Accumulator
To perform a direct digital modulation, the adder must have no latency. Pipelined
accumulator is not suitable because of its big latencies and can only handle a  xed FCW. In
this design, CLA adder is used to implement the direct digital modulations due to its small
delays beyond other zero latency architectures. A 9-bit CLA adder is used to implement the
9-bit accumulator. Fig. 4.2 shows the architecture of the 9-bit CLA adder. The output and
carry out for each bit are calculated as
46
Level II CLA
A6 B6 c5A7 B7 c6A8 B8 c7 A3 B3 c2A4 B4 c3A5 B5 c4 A0 B0 Cin=0A1 B1 c0A2 B2 c1
P0 G0
C1
C0
Cin
FA FA FA
Level I CLA
P1 G1
FA FA FA
Level I CLA
FA FA FA
Level I CLA
p8g8
s8
p7g7 p6g6 p5g5 p4g4 p3g3 p2g2 p1g1 p0g0
s7 s6 s4 s3 s1 s0s2s5
Figure 4.2: Block diagram of 9-bit CLA accumulator (full adder).
8
>><
>>:
Carry out: ci = gi +pi ci 1
Sum: si = piLci 1
(4.1)
where ci is the carry out and ci 1 is the carry in or the carry out from the previous bit. gi
and pi are the carry generate and carry propagate in level I CLA. The  rst level carry out
can be obtained by
8
>><
>>:
c0 = g0 +p0 Cin
c1 = g1 +p1 g0 +p1 p0 Cin
(4.2)
where
8
>><
>>:
Carry generate: gi = Ai Bi
Carry propagate: pi = AiLBi
(4.3)
and the second level carry out can be obtained by
8
>><
>>:
C0 = G0 +P0 Cin
C1 = G1 +P1 G0 +P1 P0 Cin
(4.4)
47
where second level propagates are obtained by
8
>><
>>:
P0 = p2 p1 p0
P1 = p5 p4 p3
(4.5)
and second generates are obtained by
8
>><
>>:
G0 = g2 +p2 g1 +p2 p1 g0
G1 = g5 +p5 g4 +p5 p4 g3
(4.6)
In the above equations, all the logics must be implemented within less than three inputs.
This is selected to compromise with the power supply voltage and CML logics. Under a 3.3
V power supply and a SiGe HBT base-collector voltage of 0.85 V 0.9 V, all the digital logic
is implemented using 3 level CML with di erential output swings of 400 mV. Level shifters
may be needed to shift between di erent voltage level inputs. The level shifter usually
runs much faster than other CML gates. It can be ignored when counting the gate delays.
Suppose XOR gate?s delay is two times of the AND gate. The total delays can be calculated
from the equations and diagram. (A) two gates delay to calculate level I carry generate gi
and propagate pi in Eq. (4.3); (B) two gate delays to calculate level II carry generate Gi
and propagate Pi in Eqs. (4.5) and (4.6); (C) two gate delays to calculate level II carry in
Eq. (4.4); (D) two gate delays to calculate level I carry in Eq. (4.2); and (E) two gates delay
to calculate sum and carry out from Eq. (4.1). Therefore the 9-bit CLA adder needs only
10 AND gates delay, which has a much less delay than the ripple carry adder?s (2N-1) =
17 gate delays especially for high resolution adders (It is true without considering the wire
delay. The e ect of the wire delay will be discussed in Chapter 5), while it is much slower
than the pipelined counterpart.
48
Table 4.1: Current Source Matrix in Sine-weighted DAC
2 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3
3 3 3 2 3 3 3 2
3 2 3 2 3 2 2 3
2 2 2 2 2 2 2 2
1 2 2 1 2 1 1 2
1 1 1 1 1 1 1 0
1 0 1 0 1 0 0 0
4.2.2 7-bit Sine-weighted DAC
The structure of the sine-weighted DAC is shown in Fig. 4.3. Since the quadrant of
the sine waveform was determined by the two MSBs, the left 6 bits are used to control the
switch matrix and generate the amplitude for a quarter phase (0   /2) sine wave. The
current source matrix is calculated by the below equations and shown in Table 4.1.
3
-
7
 
R
o
w
 
D
e
c
o
d
e
r
3-7 Column Decoder
1
'
s
 
C
o
m
p
l
e
m
e
n
t
o
r
3
3
8
MSB
2nd MSB
6 6
IN
OUT
VCC
K=0
K=8
K=56 K=63
K=15
K=7
Figure 4.3: Block diagram of 7-bit sine-weighted DAC.
49
Ik =
8
>>><
>>>
:
b(29 1) sin
 
 
2  
(0:5)
26
 
c;for k = 0
b(29 1) sin
 
 
2  
(k+0:5)
26  
k 1X
n=0
In
!
c; 0 k 26 1
(4.7)
The sine-weighted DAC current source matrix provides totally 128 unit current sources.
The unit current of each current source is set as 105  A. The largest current in the current
source is 315  A, which is composed of 3 unit current sources. The current switch contains
two di erential pairs with cascode current sources for better isolation and current mirror
accuracy. The current outputs are converted to di erential voltages by a pair of o -chip
15  pull-up resistors. Fig. 4.4 shows that the currents from the cascode current sources
are fed to outputs OUTp and OUTm by pairs of switches (Msw). The MSB controls the
selection between di erent half periods. The current switch contains two di erential pairs
with minimum size transistors and a cascode transistor to isolate the current sources from the
switches, which improves the bandwidth of the switching circuits. For the layout, vertical
devices SiGe HBTs are used with a degeneration resistor to improve the current source
matching. All the current source transistors are randomly distributed in the current source
matrix. Two dummy rows and columns have been added around the current source array
to avoid edge e ects. In order to minimize the phase di erence of the clock to the  ip- ops,
an H-tree clock scheme is used to make the clock signal reach each block simultaneously in
both the adder/accumulator and DAC.
4.3 Experimental Results
Figs. 4.5-4.7 illustrate the measured DDS output spectra and waveforms for di erent
output and clock frequencies without modulations. Fig. 4.5 presents a 509 MHz DDS output
with a 2.5 GHz clock input, with the FCW equals to 104. The measured output power
is approximately 0 dBm and the measured narrow band SFDR is approximately 48 dBc.
Fig. 4.6 gives the measured DDS output spectrum with 1.444 GHz Nyquist output under
50
Vca s
Qp
Qm
Vcs
Sp
Sm
Dp
Dm
Cp Cm
MSBp
MSBm
Dp
Dm
C
L
Kp
C
L
Km
Qp
Qm
Sp
Sm
Dp
Dm
Cp Cm
MSBp
MSBm
Dp
Dm
C
L
Kp
C
L
Km
D AC
C u rre n t  
C e l l
Msw Msw
O U T p O U T m
Pu l l  U p  R e s i s t o rs
Figure 4.4: Diagram of DAC switch and current source matrix cell.
 
Figure 4.5: Measured DDS output spectrum with 509 MHz output under 2.5 GHz clock
(FCW=104), showing about 48dBc narrow band SFDR.
51
 
Figure 4.6: Measured DDS output spectrum with 1.444 GHz output and 2.9 GHz clock
(FCW = 255), showing about 35dBc narrow band SFDR. The image tone is located at 1.455
GHz.
- 1 0 0 - 8 0 - 6 0 - 4 0 - 2 0 0 20 40 60 80 100
- 0 . 1
- 0 . 0 5
0
0 . 0 5
0 . 1
0 . 1 5
T I M E ( n s )
D
D
S
 O
U
T
P
U
T
 A
M
P
L
IT
U
D
E
 (
V
)
Figure 4.7: Measured DDS output waveform with 1.444 GHz output and 2.9 GHz clock
(FCW=255). The envelope frequency is 12 MHz
2.9 GHz clock. Since FCW = 28-1 = 255, the output frequency is
FCW
2N  fclk =
255
29  2:9 GHz = 1:444 GHz
52
The  rst order image tone mixed by the clock frequency and the DDS output frequency
occurs at
2:9 GHz 1:444 GHz = 1:456 GHz
Fig. 4.7 shows the time domain waveform of Fig. 4.6. The envelope frequency of the waveform
is
29+1
29  fclk 
29 1
29  fclk 12 MHz
Fig. 4.8 shows the measured DDS output with FCW = 2 frequency modulated by a step
of  FCW = 1. The frequency before the step is 9.375 MHz with FCW = 2 and after the
step is 14.0625 MHz with FCW = 3. Fig. 4.9 shows the measured DDS output with FCW =
2 phase modulated by a step of  PCW=256 with respect to 180 phase shift. The output
frequency is 10 MHz with a 2.5GHz clock.
Figure 4.8: Measured DDS output with FCW = 2 frequency modulated by a frequency step
of  FCW = 1. The frequency before the step is 9.375 MHz with FCW = 2, after the step
is 14.062 MHz with FCW=3.
All measurements were done in CLCC-44 packaged parts without deglitch  lter or cali-
brating the losses of the cables and PCB tracks.
53
 
Figure 4.9: Measured DDS output with FCW = 2 phase modulated by a phase step of
 PCW = 256 with respect to 180 phase shift. The output frequency is 10 MHz with a 2.5
GHz clock.
Table 4.2 compares mm-wave DDS RFIC performances. Although this DDS have a
relatively low frequency than others, it is the  rst DDS with direct digital frequency and
phase modulation capabilities and has more than GHz output frequency. Some commercial
parts have the FM and PM capabilities, but all the parts work no more than 1 GHz and can
only output less than 500 MHz frequency. The die photo of the SiGe DDS RFIC is shown
in Fig. 4.10. This DDS design is quite compact with an active area of 1.7 2.0 mm2 and a
total die area of 2.5 3.0 mm2.
4.4 Conclusion
Implemented in a 0.13  m SiGe BiCMOS technology with fT=fmax of 200/250 GHz, this
chapter presented a 9-bit 2.9 GHz SiGe DDS RFIC design with direct digital 9-bit frequency
and 9-bit phase modulations. With Nyquist output, the DDS achieves a maximum clock
frequency of 2.9 GHz, and a narrow band SFDR of 35 dBc. It has low power consumption
as well. The power consumption is approximately 2.0 W under a single 3.3 V power supply
54
Table 4.2: Selected Ultrahigh Speed DDS RFIC Performance Comparison
[18] [19] [20] [22] [8] [9B DDS]
Technology InP InP InP SiGe SiGe SiGe
fT=fmax [GHz] 137/267 300/300 300/300 100/120 200/250 200/250
Phase [bit] 8 8 8 9 11 9
Amplitude [bit] 7 7 5 8 10 7
FM [bit] None None None None None 9
PM [bit] None None None None None 9
Max clock [GHz] 9.2 13 32 9.6 8.6 2.9
SFDR [dBc] 30 26.67 21.56 30 40 35
Power [W] 15 5.42 9.45 1.9 4.8 2.0
Area [mm2] 8.0 5.0 2.7 1.45 2.7 1.45 3.0 3.0 4.0 3.5 2.5 3.0
Figure 4.10: Die photo of the 9-bit DDS with direct digital modulations.
even with added modulation blocks. This DDS RFIC is the  rst reported GHz range output
DDS with direct digital frequency and phase modulation capabilities.
55
Chapter 5
A 24-bit 5.0 GHz DDS RFIC with Direct Digital Modulations
5.1 Introduction
This chapter presents a 24-bit 5.0 GHz DDS with over-GHz output frequency and direct
digital modulation capabilities. This work represents one of the  rst DDS RFIC with over-
GHz range output as well as direct digital FM and PM capabilities. The 24B DDS is
implemented with direct digital FM and PM capabilities using RCA adders. The block
diagram of the 24-bit 5.0 GHz ROM-less DDS with RCA accumulator and modulator is
shown in Fig. 5.1 [2, 29]. The major parts of the ROM-less DDS are a 24-bit RCA phase
accumulator, a 12-bit RCA modulator, and a 10-bit sine-weighted DAC. The 24-bit RCA
phase accumulator output is truncated to 12 bits and modulated with a 12-bit PCW. After
PM, the output is truncated again, and the highest 11 bits are fed into the sine-weighted
DAC. The sine-weighted DAC maps the 11-bit linear phase word to the digital amplitude
and generates the analog waveform. The ultrahigh speed RCA accumulator/adder and sine-
weighted DAC will be described in the following two sections, respectively.
24
Reg
R
C
A
f
out
12
11
Reg
R
C
A
10-bit 
Sine-weighted
DAC
PCW
FCW
f
CLK
24
Figure 5.1: Block diagram of the 24-bit 5.0 GHz DDS RFIC.
56
5.2 Ultrahigh Speed Adder Design
5.2.1 Wire Delay in the 0.13  m SiGe BiCMOS Technology
With the introduction of deep submicron semiconductor technology, the parasitic ef-
fects introduced by the wire delay begin to dominate the performance of high speed digital
integrated circuits. The typical bu er delay in the 0.13  m SiGe BiCMOS technology is less
than 4 ps while the wire delay of a 2  m wide and 100  m long wire can be as high as 10
ps. From [30], the transmission line e ects should be considered when the rise or fall time
of the input signal is smaller than the time of  ight of the transmission line. The following
equation is used to determine when transmission line e ects should be considered.
trf  2:5tflight = 2:5Lv (5.1)
In Eq. (5.1), trf is the rise and the fall time of the signal transmitted through the wire. tflight
is the  ight time, which is the time it takes for the wave to propagate from one end of the
wire to the other, and is 15 cm/ns in silicon oxide (SiO2). So the minimum length that must
be considered as a transmission line for a signal is
Lmin = 0:4 trf v: (5.2)
For a 5.0 GHz signal the rise and fall time should not be longer than 67 ps. If the wire
length is less than 4 mm, a lumped RC model can be used to evaluate the propagation delay
through the wire. Fig. 5.2 shows the equivalent circuit of a wire with length L. From the
Elmore delay rule, the dominant time constant is
 D = RscL+ 0:5rcL2; (5.3)
57
where Rs is the internal resistance of the driver, and r and c are the unit length parasitic
resistance and capacitance of the wire. The delay introduced by the wire resistance becomes
dominant when the second term is bigger than  rst, i.e. when L 2Rs=r. In the 0.13  m
SiGe BiCMOS technology, the  rst term in Eq. (5.3) will dominate the propagation delay,
as long as L <2 mm, and as a result the propagation delay of the wire is approximately
proportional to the length.
r? L
c ? L
V
o u t
r? L r? L
c ? Lc ? L
V
in
R
s
Figure 5.2: Lumped RC model for a wire with length of L.
To evaluate wire delay e ects in high speed digital logic design, several simulations have
been performed in a 0.13  m SiGe process for a current-mode-logic (CML) cell implemented
using a di erential pair without an emitter follower as the output bu er and its connection
wires. Fig. 5.3 shows the test bench used to simulate the wire delay e ects. Fig. 5.3(A) is the
schematic view. It is used to  nd the intrinsic propagation delay of the CML bu er that is 2
 m wide and 100  m long. Di erential wires with the third metal layer are inserted between
the two bu ers in Fig. 5.3(B). The space between the two di erential wires is typically
maintained at 2  m in the layout. Fig. 5.3(C) uses the same test bench as that employed in
Fig. 5.3(B), with the exception of an additional piece of metal under the di erential wires.
Fig. 5.3(D) uses the same test bench as that in Fig. 5.3(B), except in this case two pieces
of metal are used to sandwiched the di erential wires. Clearly, cases Fig. 5.3(C) and (D)
result in a larger parasitic capacitance than Fig. 5.3(B). However, it is not always possible
to place the wire without any overlap with the metals that are under and above the wires,
especially for modern processes with more than 5 layer metal connections. The simulated
results are plotted in Fig. 5.4. In Fig. 5.4, plot (A) represents the propagation delay of
58
Fig. 5.3(A), and illustrates the propagation delay of only the input and output bu ers. It
does not include the wire delay so it is constant along the wire length. Plots (B), (C) and
(D) show the propagation delay of test bench (B), (C) and (D) but does not include the
bu er delays in the test benches. These three plots re ect the third metal wire propagation
delay in a 0.13  m SiGe process. It is proportional to the wire length as long as the length
is less than 2 mm. Comparing (B), (C) and (D), the wire delay with shielded metal is two
(for (C)) or three (for (D)) times larger than an unshielded metal wire. This conclusion, as
well as the linear relationship between the wire delay and the wire length, indicates that the
wire delay is dominated by the time constant of the product of parasitic capacitance and the
input bu er output impedance, as described by the  rst term of Eq. (5.3). Note that test
benches (C) and (D) are more practical cases than (B), because in a real layout environment
all the wires overlap each other and produce several times more parasitic capacitance than
the coupling capacitance with the substrate. From the wire delay plot of Fig. 5.4, the wire
delay coe cient (delay for 1  m wire) for case (C) is about 0.10 ps/ m. This number will
be used to estimate the adder delay in the following sections.
5.2.2 Propogation Delay Comparison Between the CLA and RCA Accumula-
tor/Adder
A pipeline accumulator can only handle a  xed input FCW. Direct modulations such
as LFM require a varying input FCW. Thus, either RCA or CLA adders must be used to
implement direct modulations. Chapter 4 has calculated a 9-bit CLA adder delay for the
critical path with 3-input CML implementations. However, the calculation did not count
the level shifters that are used to shift between di erent input voltage levels, as well as the
wire delays. In general, level shifters usually run much faster than CML gates, and thus
can be ignored when counting the gate delays. Furthermore, the XOR gate delay is treated
the same as the AND gate delay since CML gate delays are essentially identical for di erent
gates. For example, both the carry and sum logic in a full adder can be implemented by
59
2 0 0 m V
+
-
+
-
2 0 0 m V
+
-
+
-
2 0 0 m V
+
-
+
-
2 0 0 m V
+
-
+
-
v
4
(A )
(B )
(C )
(D )
v
1
v
2
v
3
M e ta l 3 M e ta l 2 M e ta l 4
Figure 5.3: Test bench to simulate the wire propagation delay.
only one current tail CML gate. Given this information, the 9-bit CLA adder has a total
of 8 CML gate delays. One CML gate delay is about 9 ps in the 0.13  m SiGe BiCMOS
technology. In the 9-bit full adder, the total logic delay is approximately 72 ps without the
wire delay. From Fig. 4.2 in Chapter 4 and the actual layout, the wire delay in the critical
60
0 200 400 600 800 1000 1200 1400 1600 1800 2000
0
50
100
150
200
250
300
W IRE LENGTH (
?
m )
P
R
O
P
O
G
A
T
I
O
N
 
D
E
L
A
Y
 
(
p
s
)
0 100 200 300 400 500 600
0
10
20
30
40
50
W IRE LENGTH (
?
m )
P
R
O
P
O
G
A
T
I
O
N
 
D
E
L
A
Y
 
(
p
s
)
(A)
(B)
(C)
(D)
(A)
(B)
(C)
(D)
Figure 5.4: Simulated wire propagation delay versus length.
path of the 9-bit CLA adder is calculated as follows: (A) A 200  m wire is added to calculate
the delay of level I generate gi and propagate pi; (B) A 200  m wire is added to calculate
the delay of level II generate Gi and propagate Pi; (C) A 300  m wire is added to calculate
the delay of the level II carry; (D) A 200  m wire is added to calculate the delay of level I
carry; (E) A 200  m wire delay is added to calculate the delay of the sum and carry-out.
Therefore, the total wire length to calculate the delay of the critical path is 1100  m. If
the bit number of the adder is higher than 9-bit and less than 27-bit, level III CLA block
is needed to calculate the carry-out. To generate the third level CLA logic and the carry-in
signal for the second level CLA block, the total gate delay increases to 12 CML gate and
the total wire length increases to 1800  m. The worst-case delay from a 10-bit CLA adder
to 27-bit CLA adder remains the same since these adders have an identical critical path.
61
The calculation of propagation delay for the RCA is much easier than the CLA adder.
Fig. 5.5 shows the architecture of an N-bit RCA. It represents the layout  oor plan and the
component placement as well as the schematic wire connection. The output and carry-out
for each bit are calculated as
8
>><
>>:
Carry out: ci = AiBi +Bici 1 +ci 1Ai
Sum: si = AiLBiLci 1;
(5.4)
where Ai and Bi are the input of the N-bit full adder, i = 0, 1,    , (N-1). ci is the carry-
out of the ith-bit full adder. c 1 = Cin = 0 is the initial carry-in of the N-bit full adder.
Cout = cN 1 is the last bit carry-out of the N-bit full adder. Therefore, the worst-case
propagation delay of the N-bit full adder is the delay of N-1 carry logic gates and one sum
logic gate. There is almost no wire delay since the carry logic can be placed as close as
possible to minimize the amount of wire in the connection. The level shifter delay can be
eliminated as well because the input voltage level can be intentionally removed from the
critical path.
A
N
 B
N
 c
N-1
FA
s
N
A
1
 B
1
 c
0
FA
s
1
A
0
 B
0
 C
in
=0
FA
c
0
s
0
C
out
c
1
Figure 5.5: Diagram of N-bit RCA.
Fig. 5.6 shows the comparison of the estimated propagation delay of the CLA adder and
RCA. Not counting the wire delay, the speed of the CLA adder is close to that of the RCA
for small numbers of bits. At high numbers of bits, the CLA adder runs much faster than
62
the RCA. With the added wire delay, the RCA delay does not change too much because the
RCA is very compact and can be layed out very closely, thus having almost no wire delays.
However the layout of the CLA is very complex and introduces signi cant wire delay. So
the CLA adder runs much slower than the RCA especially for 10-bit to 25-bit adders. In
addition to the internal wire delays of the CLA adder, the CLA adder layout area is several
times larger than the RCA adder, which results in more wire delays for global wiring.
5 10 15 20 25
0
50
100
150
200
250
300
ADDER NUMBER OF BIT
E
S
T
I
M
A
T
E
D
 
P
R
O
P
A
G
A
T
I
O
N
 
D
E
L
A
Y
 
(
p
s
)
ESTIMATED PROPAGATION DELAY OF CLA AND RCA ADDERS
CLA W/ WIRE DELAY
CLA W/O WIRE DELAY
RCA W/ WIRE DELAY
RCA W/O WIRE DELAY
Figure 5.6: Estimated adder propagation delays with number of bits.
In conclusion, at the low or medium speed with an older and slower fabrication technol-
ogy, the CLA speeds up the adder operation by using additional logic for carry calculations.
However, for high speed implementation with fast technology (e.g., <0.13  m), adder delay
is mainly dominated by the wire delays. When compared to a CLA adder, the RCA adder
63
has a simple ripple architecture, which can be layed out in cascaded format one bit after
another, leading to very compact layout with short wire interconnections between stages.
5.2.3 Circuit Implementation of the 24-bit 5.0 GHz RCA
In this DDS design, a 24-bit RCA is used to implement the 24-bit accumulator. The
24-bit RCA is composed of 24 1-bit full adders carefully designed in a compact manner. The
output of the carry-out logic remains at the top CML level, and no level shifter is needed to
convert the signal level for the critical path. Therefore the longest delay from input to output
of the 24-bit RCA is 23 carry-out CML delays and 1 sum CML gate delay. The wire delay in
the RCA can be minimized since the carry-in can be directly connected to the carry-out of
the previous bit, leading to a compact layout in a cascaded format. Another 12-bit RCA was
implemented for the 12-bit phase modulator. In addition, the 24-bit CLA adder runs slower
than the RCA adder as shown in Fig. 5.6. When the 0.13  m SiGe BiCMOS technology is
used, long wires contribute much more delay than the logic gates. So a 24-bit CLA adder
cannot run as fast as the RCA adder, not only because of the amount of cascade CLA logic
with the attendant limited CML fan-in numbers but also because of the much longer wire
delays needed by CLA logic. The wire delay in the RCA adder can be minimized since the
carry-in can be directly connected to the carry-out of the previous bit, leading to a compact
layout in a cascaded format.
5.3 10-Bit Segmented Sine-weighted DAC
5.3.1 Architecture of the 10-bit Sine-weighted DAC
The structure of the 10-bit sine-weighted DAC is shown in Fig. 5.7. The total phase
word input for the sine-weighted DAC is 11 bits. The two MSB are used to determine the
quadrants of the sine wave. The MSB output of the phase word is used to provide the proper
mirroring of the sine waveform about the  phase point. The 2nd MSB is used to invert the
remaining 9 bits for the 2nd and 4th quadrants of the sine wave by a 1?s complementor, and
64
the outputs of the complementor are applied to a 9-bit sine-weighted DAC core to form a
quarter of the sine waveform. Because of this  phase point mirroring, the total amplitude
resolution of the sine-weighted DAC is 10 bits.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
'
s
 
C
o
m
p
l
e
m
e
n
t
o
r
F
i
n
e
 
D
A
C
s
 
S
e
l
e
c
t
 
D
e
c
o
d
e
r
Latches and 
Current 
Switch Matrix
OUTp
2nd MSB
MSB
<1:11>
IN
<1:9>
<7:9>
<1:3>
8
7
64
64
VCC
OUTm
3
-
8
 
B
i
n
a
r
y
 
D
e
c
o
d
e
r
3
-
7
 
T
h
e
r
m
o
m
e
t
e
r
 
D
e
c
o
d
e
r
6
4
-
b
i
t
 
T
h
e
r
m
o
m
e
t
e
r
 
D
e
c
o
d
e
r
Latches and 
Current 
Switch Matrix
7
7
64
64
<4:6>
<7:9>
1 12 11 11 9 7 5 2
12 13 12 10 9 7 5 3
13 12 11 10 8 7 4 2
12 12 11 10 8 6 4 1
13 12 12 10 9 6 4 2
12 12 11 9 7 6 4 1
13 12 10 10 8 5 3 0
12 12 11 9 7 5 3 1
3
-
7
 
R
o
w
 
D
e
c
o
d
e
r
3
-
7
 
C
o
l
u
m
 
D
e
c
o
d
e
r
2 2 2 2 1 0 0 0
1 1 1 1 1 1 1 0
2 2 1 1 1 1 0 0
1 1 2 1 1 1 1 0
2 2 1 2 1 0 0 1
1 1 2 1 1 1 1 0
2 2 1 1 1 1 0 0
0 0 0 0 0 0 0 0
Fine (above) & Coarse (below) 
DACs Current Source Matrix
F
i
n
e
 
D
A
C
s
C
o
a
r
s
e
 
D
A
C
Figure 5.7: Block diagram of the 10-bit segmented sine-weighted DAC.
To reduce the complexity of the sine-weighted DAC, segmentation has been employed
[8,9]. The 9-bit sine-weighted DAC core is divided into a 6-bit thermometer-decoded coarse
sine-weighted DAC and eight 3-bit thermometer-decoded  ne sine-weighted DACs. The  rst
6 bits of the complementor?s output control the coarse DAC and the highest 3 bits also
address the selection of the  ne DACs. The lowest 3 bits of the complementor?s output
determines the output value of each  ne DAC. The bit division between the 6-bit coarse
DAC and the 3-bit  ne DACs is based on the trade-o of static and dynamic accuracies,
chip area and power consumption. As shown in Fig. 5.7, the bottom current source array
implements the coarse DAC. The coarse DAC current source array provides 512 unit current
sources. The top current source array implements the  ne DACs. Each column of the  ne
DAC current source array forms the current sources of one  ne DAC. Every  ne DAC has
65
about 8 unit current sources used to interpolate the coarse DAC. The unit current of both
the coarse and  ne DACs is set at 26  A. The largest current in the current source matrix is
338  A, which is composed of 13 unit current sources. In the layout, with consideration for
current source matching, each current source is split into four identical current sources which
carry a quarter of the whole current. To further improve their matching, all current source
transistors, including those used in both the coarse and  ne DACs, are randomly distributed
in the whole current source matrix.
The current switch contains two di erential pairs and improves the bandwidth of the
switching operation with minimum logic transistors and cascode current sources that provide
better isolation and current mirror accuracy. The current outputs are converted to di erential
voltages by a pair of o -chip 15  pull-up resistors. Fig. 5.8 shows that the currents from the
cascode current sources are fed to outputs OUTp and OUTm by pairs of switches (Msw).
The MSB controls the selection between di erent half periods.
MSB
p
MSB
m
OUT
p
OUT
m
M
sw
Q
p
Q
m
D
p
D
m
S
p
S
m
C
L
K
p
C
L
K
m
Pull up resistor
V
cas
V
cs
D
p
D
m
C
p
C
m
MSB
p
MSB
m
M
sw
Q
p
Q
m
D
p
D
m
S
p
S
m
C
L
K
m
C
L
K
p
D
p
D
m
C
m
C
p
DAC 
current cell
Figure 5.8: Diagram of the DAC switch and current source matrix cell.
66
5.3.2 Bandwidth Limitation of the DAC Switch Output Impedance
The dynamic performance of the current-steering DAC is closely related to the current
switch output impedance as well as the frequency response of the output impedance. With
a full thermometer decoded DAC, the SFDR can be written as the function of the output
impedance Zout and the DAC?s number of bit resolution N [23].
SFDR = 20 log
 Z
out
RL
 
 6:02(N 2); (5.5)
where RL is the load resistance of the current switch. The core switch cell of the sine-
weighted DAC is shown in Fig. 5.9(A). C0 and C1 denote the parasitic capacitance at the
drain of the current source transistor and the collector of the cascade transistor, including
the device parasitic capacitance and the wire capacitance. The small signal equivalent circuit
is drawn in Fig. 5.9(B). If r is neglected, the total output impedance looking through the
switch output is given by
Zout = gmswrosw 
 1
sC1==
 
gmcasrocas 
 1
sC0==rocs
   
: (5.6)
In Eq. (5.6), gmsw and gmcas denote the transconductance of the switch and the cascode
transistor, respectively. rosw, rocas and rocs denote the output resistance of the the switch,
the cascode and the current source transistor, respectively. At low frequency, the output
impedance can be simpli ed as
Zout = gmswrosw (gmcasrocas) rocs: (5.7)
gmcasrocas is the maximum gain of the cascode transistor. The bipolar transistor is chosen
since it has higher maximum gain than the MOS counterpart, while an MOS transistor
is chosen for the current source because it has higher output resistance than the bipolar
transistor as well as a lower overdrive voltage.
67
C
1
M
sw
M
cs
M
cas
R
L
OUT
R
L
VCC
C
0
Z
out
g
mcs
v
gscs
r
ocs
g
mcas
v
becas
r
ocas
g
msw
v
besw
r
osw
v
gscs
v
becas
v
besw
C
0
C
1
Z
out
(A) (B)
r
?
r
?
Figure 5.9: DAC switch core circuit and its small signal equivalent circuit.
From Eq. (5.6), the dominant pole of the output impedance is
!p = 1r
ocs(C0 +rocasgmcasC1)
: (5.8)
To increase the bandwidth of the output impedance, the parasitic capacitances C0 and
C1 must keep as small as possible. Because the maximum gain of the cascode transistor
gmcasrocas is much larger than 1, the parasitic capacitance C1 a ects the bandwidth of the
output impedance much more signi cant than C0. So the long wire connection between the
current source and current switch is used in the drain of the current source transistor. It
results in a much larger capacitance C0 than C1.
5.4 Experimental Results
Figs. 5.10 and 5.11 illustrate the measured DDS output spectra and waveforms for
di erent outputs and clock frequencies. All measurements were done in single output and
using CLCC-68 packaged parts without calibrating the losses of the cables and PCB tracks.
68
Fig. 5.10 shows a 469.360351 MHz DDS output with a 5.0 GHz clock input with a FCW
= 0x180800 in hex format. Because of the MSB mirroring shown in Fig. 5.8, the single
output peak to peak voltage is 400 mV. So the output power in theory can be calculated as
10 log (400 mV=(2
p2))2
15   1 mW  1:25 dBm: (5.9)
Counting the loss due to the parasitic capacitances, PCB tracks and RF cables, the measured
output power is approximately -2.12 dBm. The measured SFDR is approximately 38 dBc in
Nyquist bandwidth. Since the FCW = 0x180800, the output frequency is given by
FCW
2N  fclk =
0x180800
224  5:0 GHz = 469:360351 MHz: (5.10)
In the stretch processing radar, which is essentially narrow band system, the narrow
band SFDR of the DDS is often more important since wideband spurs can be removed
relatively easily. It is only a speci c narrow band near the output, which is usually less
than 1% of the update frequency, which is of the interest of many applications [9]. Fig. 5.11
provides an example of a 1.246258914 GHz output frequency (FCW = 0x3FCFE7) with a
5.0 GHz clock frequency in 50 MHz band nearby. The measured narrow band SFDR is about
82 dBc.
Fig. 5.12 demonstrates the operation of the DDS with an LFM output. With a 300 MHz
clock input, a 24-bit 300 MHz ramp from 0x000001 to 0x00AD9C is fed into the FCW input.
The output sweeps from 18 Hz to 397.367947 kHz. In this DDS, CMOS logic was used to
provide the modulation data inputs. The speed is limited by the speed of the data source
that was provided by an Agilent pattern generator through the PCB board. A maximum of
2.5 GHz LFM can be reached if the modulation ramp is generated inside the DDS chip.
Fig. 5.13 shows the measured DDS output with FCW = 7, phase modulated by a step
of PCW = 0x800 resulting in an 180 phase shift. The output frequency is 1.251 kHz with a
3.0 GHz clock. Both the LFM and the PM waveforms can be used in radar transceivers as
69
SFDR=38 dBc
Figure 5.10: Measured DDS output with a 469.360351 MHz output and the maximum 5.0
GHz clock (FCW = 0x180800), showing a 38 dBc Nyquist band SFDR.
the source of transmitted chirp signal and the reference chirp signal for stretch processing,
as described in Section II. Based on the discussion in Section II, chirp modulated waveform
improves the radar range resolution, while the stretch processing using LFM reduces the
bandwidth requirement for the ADC in receiver path.
Fig. 5.14 provides a plot of the measured SFDR versus the output frequency for the
24-bit 5.0 GHz DDS within a 50 MHz bandwidth, and demonstrates about 45 dBc narrow
band worst-case SFDR. In addition, the DDS has several sweet spots, in which its output
spectrum purity and its dynamic performance are much better than what can be obtained
in other frequency bands. Fig. 5.11 gave an example of an 82 dBc SFDR in 50 MHz narrow
band.
The die photo of the DDS is shown in Fig. 5.15. This DDS design is quite compact with
an active area of 3.0 2.5 mm2 and a total die area of 3.7 3.0 mm2. Table 5.1 compares
all the reported ultrahigh speed DDS RFIC performances with over-GHz output frequency,
70
SFDR = 82 dBc
Figure 5.11: Measured DDS output with a 1.246258914 GHz output and the maximum 5.0
GHz clock (FCW = 0x3FCFE7), showing an 82 dBc narrow band SFDR.
Figure 5.12: Measured DDS LFM output with a FCW sweeps from 1 to 0x005AD9C and
using a 300 MHz clock.
71
Figure 5.13: Measured DDS output with FCW = 7 phase modulated by a phase step of
 PCW = 0x800 causing an 180 phase shift. The output frequency is 1.251 kHz with a 3.0
GHz clock.
0 0.5 1 1.5 2 2.5
0
20
40
60
OUTPUT FREQUENCY (GHz)
M
E
A
S
U
R
E
D
 
S
F
D
R
 
(
d
B
c
)
MEASURED SFDR VS. OUTPUT FREQUENCY (F
CLK
 = 5.0 GHz)
Figure 5.14: Measured DDS narrow band SFDR versus output frequency within a 50 MHz
bandwidth.
including the DDSs presented in the previous chapters. Although the DDS reported here
has relatively low frequency when compared with others, it is the  rst over-GHz output
frequency implementation with direct digital FM and PM capabilities. Some commercial
72
parts have the direct digital FM and PM capabilities, but all the parts only work up to 1.0
GHz with an output of less than 500 MHz [31].
24-bit Ripple Accumulator 
12-bit Ripple FA 
10-bit Segmented 
Sine-weighted DAC 
PRBS 
Figure 5.15: Die photo of the 24-bit DDS RFIC.
5.5 Conclusion
This chapter presented a 24-bit 5.0 GHz DDS RFIC with direct digital modulation
capabilities, developed in a 0.13  m SiGe BiCMOS technology for pulse compression radar
applications. A 24-bit RCA accumulator and a 12-bit RCA are implemented for the use of
modulator designs with over-GHz frequency output. For high-speed DDS implementation,
adder delay is mainly dominated by the wire delays. A comparison between the RCA and
CLA adder has been performed in this chapter. Compared to a CLA adder, the RCA has a
simple ripple architecture, which can be layed out in a cascaded format one bit after another,
resulting in a very compact layout with short wire interconnections between stages. Thus,
the RCA actually ends up with higher operation frequency than the CLA adder.
73
This 24-bit DDS has more than 20,000 transistors and achieves a maximum clock fre-
quency of 5.0 GHz. The measured worst-case SFDR is 45 dBc under a 5.0 GHz clock
frequency and within a 50 MHz bandwidth. The best Nyquist band SFDR is 38 dBc with
a 469.360351 MHz output using a 5.0 GHz clock frequency. This DDS represent the  rst
implemented RFICs with direct digital modulations at over-GHz output frequency.
74
Table
5.1:
Ultrahigh
Sp
eed
DDS
RFIC
Performance
Comparison
Tec
hnology
fT
=f
M
AX
Phase
Amplitude
Max
Clo
ck
FM
PM
SFDR
Po
wer
Area
FOM
[GHz]
[bit]
[bit]
[GHz]
[bit]
[bit]
[bit]
[W]
[mm
2]
[GHz
 2SFDR
=6
/W]
[18]
InP
137/267
8
7
9.2
None
None
<30
15
8 
5
<16.0
[19]
InP
300/300
8
7
13
None
None
26.67
5.42
2.7
 1.45
42.6
[20]
InP
300/300
8
5
32
None
None
21.56
9.45
2.7
 1.45
34.8
[27]
InP
370/370
12
7.5
24
None
None
30.7
19.8
5.0
 3.3
42.1
[21]
SiGe
100/120
9
8
12.3
None
None
20
1.9
3.0
 3.0
65.3
[8,
9]
SiGe
200/250
11
10
8.6
None
None
33
4.8
4.0
 3.5
81.1
[28]
SiGe:C
fT
=70
9
8
6.0
9
None
17
0.308
1
138.8
[9B
DDS]
SiGe
200/250
9
7
2.9
9
9
35
2.0
2.5
 3.0
82.7
[24B
DDS]
SiGe
200/250
24
10
5.0
24
12
45
4.7
3.7
 3.0
192.6
75
Chapter 6
An 8.7-13.8 GHz Transformer-coupled Varactor-less QCCO RFIC
6.1 Introduction
Quadrature signals are widely used in the wireless transceivers as local oscillator (LO) to
generate the up- and down-conversions with image-reject mixing. There are several ways to
generate quadrature signals. A frequency divider can be used to divide a voltage-controlled-
oscillator (VCO) output at higher frequency to quadrature phase outputs. Divided-by-four
is usually used because the divided-by-two method requires a 50% duty cycle for the VCO
output. However, the divided-by-four method requires a VCO frequency output running
at four times of the LO frequency, which results in higher power consumption and poor
phase noise. A VCO followed by a passive poly-phase complex  lter can be used to generate
the quadrature outputs as well. However, the output has poor phase accuracy for wide
band input. In addition, large loss due to the poly-phase network requires power-hungry
bu ers to boost the LO magnitude. At higher frequency, poly-phase  lter is very di cult
to be implemented because the reduced component values are more sensitive to the process
variations and parasitic in uences. Cross-coupling two single phase LC-VCO architectures
are widely used to generate a quadrature output at high frequency. This technique provides
wide-band quadrature accuracy and superior phase noise performance with increase power
consumption.
There are various ways to couple the two VCOs and lock their oscillation frequency.
The most common quadrature VCO (QVCO) topology shown in Fig. 6.1 utilizes the parallel
coupling proposed by Rofougaran et al. [32]. The parallel VCO (P-QVCO) delivers quadra-
ture signals with low phase and amplitude errors, yet has a narrow tuning range with the
tuning limit of the varactor. Series QVCOs (S-QVCO) have been proposed using CMOS or
76
I+
I-
Q+
Q-
VCC
V
tune
Q+
Q-
I-
I+
VCC
V
tune
Figure 6.1: Quadrature VCO circuits with parallel coupling.
BiCMOS technology by connecting the coupling transistors in series [33, 34, 35]. It reduces
the noise by using the cascode devices and provides better isolation between the VCO output
and its current sources. However, the S-QVCO also su ers from a narrow frequency tuning
range because of the varactor?s small tuning capability. A magnetically tuned quadrature
oscillator has been reported by Cusmai et al. and the output frequency can be tuned from
3.2 GHz to 7.3 GHz [36]. Modern communication and radar systems require quadrature sig-
nal generation at X- and Ku-bands with wide tuning range for the frequency source used in
phase-locked-loops (PLL) or direct digital frequency synthesizers (DDS) [8,9,21]. An 8.7-13.8
GHz transformer-coupled varactor-less quadrature current-controlled oscillator (QCCO) is
presented in this chapter [37,38]. It employs the same mechanism as what presented in [36]
but has a higher output frequency.
This chapter will present the principle and oscillator implementation as well as the
phase accuracy and phase noise analysis. The implementation and modeling of the adopted
stacked octagonal transformers will be discussed. Finally, it gives the experimental results
and the conclusion is drawn.
77
I+
I-
Q+
Q-
VCC
Q+
Q-
I-
I+
VCC
I
tune
I
core
T
1
T
2
T
5
T
6
T
3
T
4
T
7
T
8
M
2
M
1
M
3
M
4
I
tune
I
core
Figure 6.2: Schematic of transformer-coupled varactor-less QCCO.
6.2 Analysis and Design of Transformer Coupled Quadrature Oscillator
6.2.1 Oscillation Analysis and Design
The varactor-less QCCO presented here is a transformer-coupled current-controlled LC
oscillator that utilizes SiGe hetero-junction bipolar transistors (HBT) for oscillation and
current tuning. The NPN HBTs achieve very high oscillation frequency and low phase
noise. The proposed QCCO circuit is illustrated in Fig. 6.2, in which two pairs of cross-
coupled NPN HBTs T1, T2 and T3, T4 are used to generate the negative resistance for
in-phase CCO (I-CCO) and quadrature phase CCO (Q-CCO) output respectively. Another
two pairs of NPN HBTs T5, T6 and T7, T8 are used to provide the tuning currents for the
transformers. All the HBTs operate near the peak fT bias current in order to maximize the
switching speed. Fig. 6.3 shows an AC equivalent circuit of one of the I-CCO or Q-CCO.
The discussions presented below take I-CCO as an example, since the Q-CCO has the same
structure. The primary winding of the transformer has the same function as the LC-tank
in the conventional LC coupled VCO. 1=gm is generated from the cross-coupled transistor
78
pair T1 and T2. cp and rp are the total parasitic resistance and capacitance between the two
terminals of the primary transformer winding in the oscillator circuit. The capacitance cp
includes all the transformer capacitances as well as the transistor parasitic capacitance. The
secondary winding parasitic devices do not show up since they have little e ect on the CCO
output. To achieve high frequency, no extra capacitor or varactor is used. With intuitive
analysis based on Fig. 6.3, the output voltage Vo of the I-CCO equivalent circuit can be
expressed as
Vo = j!LpIcore +j!MItune
= j!Icore(Lp + M);
(6.1)
where  = Itune=Icore. M is the mutual inductance between the primary and secondary
windings. The mutual inductance M can be calculated using
M = kpLpLs: (6.2)
where k is the coupling factor of the transformer. Thus, the e ective inductance for the
oscillation tank is given by
Leff = Lp + M: (6.3)
For either I-CCO or Q-CCO the oscillation frequency can be found as
fosc = 12 p(L
p + M)cp
: (6.4)
By changing the tuning current Itune,  will be changed, so does the oscillation frequency
of the QCCO output. Because  can be tuned arbitrarily by the QCCO core current and
turning current, and can be negative or positive, the ideal oscillation frequency can be tuned
from a small value to in nity when  is tuned from positive in nity to Lp=M. So the QCCO
output frequency can be very widely tuned with carefully selected devices and current ratios.
79
2 c
p
2 c
p
r
p
/2
r
p
/2
I
tu n e
M
L
s
L
p
-1 /g
m
-1 /g
m
I
c o re
+
-
V
o
Figure 6.3: AC equivalent circuit of the transformer tank.
To determine the actual accuracy output voltage and oscillation frequency, from the circuit
analysis of the AC equivalent circuit shown in Fig. 6.3, the output voltage of the oscillator
can be calculated as
Vo = Icore
 1
j!cp==(rp +j!Lp)
 
 j!M: (6.5)
Separation Eq. (6.5) into real and imaginary parts leads to the following expression for the
oscillation amplitude Vo and frequency !osc:
Vo = Icore Lpr
pcp
= Icore !0QLp: (6.6)
!2osc = !20
2
4(1 12m 12Q
2) 
1
2m
s 
m!20
!2c + (3 2m)
 2
 4 [(m 1)2 + 1]
3
5
 !20
 
(1 12m 12Q2) 12mp1 4m
 
:
(6.7)
where Q = (!0Lp)=rp, !0 = 1=pLpcp and !c = 1=(rpcp) are the quality factor, self-resonance
frequency and corner frequency of the transformer primary winding. m = MLp  ItuneIcore can be
80
considered as the coupling strength of the transformer [36,39,40]. The oscillation amplitude
is independent of the tuning current and is determined by core current and transformer
parameters only. The approximation of osc is acceptable when !c  !0, which is true for
the transformer windings. The oscillation frequency is determined by the quality factor
as well as m, which is the function of the self-inductance and mutual inductance of the
transformer and the current ratio of the tuning current and core current of the oscillator.
To increase the tuning capability with small tuning current, transformer need to be
carefully designed to maximize its mutual inductance. A stacked octagonal transformer
shown in Fig. 6.4, which has the maximum mutual inductance in theory, is designed to
reduce the magnetic  ux leakage [41]. The transformer design will be discussed in the
following section.
6.2.2 Quadrature Coupling Phase Accuracy and Phase Noise
Fig. 6.5 shows the full AC equivalent circuit of the varactor-less QCCO. If we take T5,
T6 and T7, T8 as -Gm ampli ers, Fig. 6.5 can be further simpli ed to Fig. 6.6. Fig. 6.6(a)
shows the transformer in-phase case of the QCCO, while Fig. 6.6(b) shows the transformer
anti-phase. Suppose the phase delay of both -Gm ampli ers is  , from Barkhausen criteria,
the phase delay for the in-phase and anti-phase oscillators can be determined by
 + + = 2n ; n = 1;2;   : (6.8)
and
 + + + + = 2n ; n = 1;2;   : (6.9)
Thus, the phase delay of the -Gm ampli er is given by
 =  2: (6.10)
81
T o p  M e ta l
2
n d
 T o p  M e ta l
S
p
S
m
P
m
P
p
P
p
P
m
S
p
S
m
L
p
L
s
M
(A )
(B )
Figure 6.4: Stacked octagonal transformer.
Therefore, regardless of in-phase or anti-phase, the phase delay between I and Q is  =2 or
  =2. In another word, quadrature frequency outputs can be generated. In practice, there
are mismatches between the devices used in the oscillator circuit, which results in slightly
di erent phase delays between the two -Gm ampli ers as well as the two transformers. The
device and transformer mismatches are the major contributors of phase error between the
quadrature outputs. Another phase error source comes from the coupling between the two
transformers used in I oscillator and Q oscillator. From [40], the total phase error  e can be
determined by
 e = 12 Qm2"+k0Qm: (6.11)
82
I+
I- Q -
Q +
-1 /g
m
-1 /g
m
-1 /g
m
-1 /g
m
T
6
T
7
T
5
T
8
Figure 6.5: AC equivalent circuit of the varactor-less QCCO.
-G m -G m
I-
I+
Q -
Q +0 ?
1 8 0 ?
0 ?
1 8 0 ?
0 ?
1 8 0 ?
9 0 ?9 0 ?
2 7 0 ?2 7 0 ?
-G m -G m
I-
I+
Q -
Q +1 8 0 ?
0 ?
0 ?
1 8 0 ?
0 ?
1 8 0 ?
2 7 0 ?9 0 ?
9 0 ?2 7 0 ?
(a ) T ra n s fo rm e r in  p h a s e
(b ) T ra n s fo rm e r a n ti p h a s e
Figure 6.6: Equivalent circuits of the varactor-less QCCO.
83
where " represents the mismatches between the devices and transformers, k0 is the coupling
factor between the primary windings of the two transformers. m is the coupling strength of
the transformer de ned same as that in Eq. (6.7). So the phase error will increase with a
better quality factor and a bigger coupling strength of the transformer.
The phase noise of the oscillators has been intensively investigated previously [41,42,43].
The analysis of the conventional quadrature oscillator has been proposed in [33,39,44]. The
phase noise of the transformer coupled oscillator is similar with the conventional quadrature
oscillators de ned in [39], namely,
L( !) = kTC  !oscQ  1 +m
2
2  
1
 !2  
1 + (1 +m)F
A20 (6.12)
where A0 is the oscillation amplitude across one of the tanks, F is the noise factor of the
conventional single-phase oscillator. From Eq. (6.12), the phase noise of the quadrature
oscillator increases rapidly with the increasing of m and reduced with a better quality factor
of coupling inductors or transformers. This conclusion can be obtained by the phase noise
analysis in [36], as well. So, a trade-o needs to be considered between the phase accuracy
and phase noise with respect to m and Q in the quadrature oscillator designs.
6.3 Transformer Implementation
6.3.1 Geometry Design of Transformers
The transformer coupled QCCO has been designed in a 0.18  m SiGe BiCMOS tech-
nology. The transformer design has been optimized in simulations, by means of the full-wave
electromagnetic solver, Agilent Momentum, in order to maximize magnetic coupling k or
the M=L ratio and the primary winding quality factor Q. For di erent transformer struc-
tures, the self-inductance L, the mutual inductance M or the coupling coe cient k, the
turn ratio n, the quality factor Q and the self-resonance frequency !0 may vary signi -
cantly. Depending on the transformer structure and the magnetic coupling method (lateral
84
or vertical), di erent approaches in transformer layout have been proposed [45,46]. Usually,
transformers are formed by magnetically coupling two or more inductors. There are four
commonly used inductor shapes: square, hexagonal, octagonal and circular. Based on these
inductors, inter-winding or stacked transformer can be built with di erent geometry shapes.
Considering the transformer performance (usually inductance and quality factor), the cir-
cular is the best choice, followed by octagonal and hexagonal, and the square is the worst.
But the circular layout of the transformers is not compatible with most of the design rules.
So octagonal is the most commonly used shape to build inductors and transformers. For
di erential circuits, such as what used in the proposed quadrature oscillator, symmetrical
shape is required. Based on above discussions, Fig. 6.7 illustrates three transformers: con-
centric, inter-wound and stacked. The concentric transformer has a worse coupling factor
than the inter-wound and stacked structure. A stacked structure transformer is used in this
design because of a smaller layout area than the inter-wound one as well as a better magnetic
coupling than the concentric one. Fig. 6.4 shows the adopted stacked transformer drawing
diagram with terminal names labeled with respect to the transformer symbol. In this SiGe
technology, top metal layer is much thicker than any of other metals and stays farthest to
the substrate. It is thus used to fabricate the primary winding in order to achieve a bet-
ter quality factor. The second top metal layer is used to fabricate the secondary windings.
While the top metal is thicker than the second top metal, both top and second top metals in
the chosen 0.18  m SiGe technology are much thicker than other metal layers and both are
optimized with lower sheet resistance for analog routing. Both the primary and secondary
windings are 10  m wide and have two turns with diameter of 200  m. The two windings
are exactly overlapped and the winding wire space is 5  m, which is the minimum design
rules allowed space, between the two turns, to maximize the  ll ratio (de ned later). At
low frequencies, the Q of the inductor is limited primarily by the resistance of the metal
layer. At high frequency, Q degradation is dominated by the loss mechanisms caused by the
substrate [32]. To minimize the Q dependence on the substrate resistivity, the transformer
85
(a) (b)
(c)
Figure 6.7: Octagonal symmetrical transformer: (a) concentric, (b) inter-wound, and (c)
stacked.
is placed on top of a patterned ground shield (PGS) to minimize the current injected into
the substrate. The PGS is a patterned conductive layer and formed by a lattice of highly
resistive deep trench (DT) isolation layer in this design. Fig. 6.8 shows the diagram of the
three-dimension substrate and two-dimension DT lattice used in this design. The PGS sub-
strate is used to reduce the parasitic capacitance of the transformer to the substrate as well
as increase the parasitic resistance to the substrate.
86
(b)(a)
Silicon Substrate
Figure 6.8: Diagram of the (a) three-dimension PGS substrate and (b) two-dimension deep
trench lattice.
6.3.2 Transformer Equivalent Circuit and Parameters
Usually the frequency domain model of the transformer is more important since most
performances of the oscillator are analyzed in frequency domain. However the time domain
equivalent circuit is more intuitive. Fig. 6.9 shows the 2- equivalent circuit of the stacked
transformer. Lp and Ls are the self-inductance of the primary and secondary windings. Rp
and Rs are the series resistance of the primary and secondary windings. Cpp and Css are
the inter-winding capacitance between the two turns of the primary or secondary winding.
Cps is the capacitance between the stacked primary and secondary winding. Cbp and Cbs are
the parasitic capacitances of primary and secondary windings coupled to the PGS. Rb is the
parasitic resistance to the PGS substrate.
The self-inductance for the octagonal inductor can be estimated using equation devel-
oped by Mohan [46,47], namely,
L = 2:25 0 n
2davg
1 + 3:55 (6.13)
87
k
k
L
p
/2
L
p
/2
L
s
/2 L
s
/2
R
p
/2
R
p
/2
R
s
/2 R
s
/2
C
p p
C
s s
C
p s
/2 C
p s
/2
2 R
b
2 R
b
C
b p
/4C
b p
/4 C
b p
/2
C
b s
/4C
b s
/4 C
b s
/2
P
p
S
p
P
m
S
m
P G S
P G S
Figure 6.9: Transformer time domain equivalent circuit model.
where, davg = 0:5(dout + din) is the average value of the outer diameter dout and the inner
diameter din of the octagonal inductor.  is the  ll ratio de ned as  = (dout din)=(dout+din).
The mutual inductance M is de ned in Eq. (6.2). The coupling factor k of the stacked
transformer is over 0.8.
More accurate self-inductance and mutual inductance or coupling factor can be obtained
by electromagnetic simulation and vector network analyzer (VNA) measurement. Fig. 6.10
shows the simulated parameters of the octagonal stacked transformer. Fig. 6.10(a) is the
plot of primary and secondary self-inductance. They are almost identical since the metal
material and thickness doesn?t a ect the inductance greatly [45]. Fig. 6.10(b) shows the
coupling factor of the transformer. It is around 0.8 and approaches to 1 at high frequency.
Fig. 6.10(c) is the plot of quality factor of the primary and secondary windings. The peak Q
88
of the primary winding is about twice of that of the secondary winding because the primary
winding metal is thicker and has a lower sheet resistance than the secondary one.
In this oscillator design, the parallel capacitance between the terminals of the trans-
former primary winding is used as the oscillation tank capacitance. The total parallel ca-
pacitance is given by
Cp = Cpp + Cbp8 + Cps4 ==(Css + Cbs8 ): (6.14)
With the geometry and electronic parameters of the transformer, all these capacitance Cpp,
Cbp, Cps, Css and Cbs represent the total capacitance corresponding to the respective nodes
and can be calculated using simple parallel-plate capacitor model. The PGS is far away
from the windings, so Cbp is much smaller than other capacitance. The accurate frequency
dependent capacitance paralleled with the primary winding can be simulated from elec-
tromagnetic simulation tools, too. Fig. 6.11 gives the plot of the total capacitance with
simulated capacitance value of 0.6 pF at 10 GHz frequency.
In practical, it is not easy to  nd out the exact inductance and capacitance associated
with the transformer. With the help of S-parameter, all the simulations can be performed
through hybrid simulation tools. In this design, AgilentDynamicLink tool is used to recall
the SPICE simulator and Momentum electromagnetic simulator for all time domain and
frequency domain simulations. Therefore, the oscillator was designed by directly specifying
the geometric parameters of the transformers instead of giving the L, C parameters in
the traditional design  ows. The more  exible electromagnetic and circuit co-simulation
approach increases the design speed dramatically since the modeling process is removed.
6.4 Experimental Results
The transformer-coupled varactor-less QCCO was implemented and fabricated in a
0.18  m SiGe BiCMOS process with the chip die photo shown in Fig. 6.12. The QCCO
89
0 5 10 15 20 25 30 35
0
2
4
6
8
10
F r e q u e n cy  [ G H z ]
In
d
u
ct
a
n
ce 
[n
H
]
P r i m a r y
S e c o n d a r y
0 5 10 15 20 25 30 35
0
0 . 2
0 . 4
0 . 6
0 . 8
1
F r e q u e n cy  [ G H z ]
C
o
u
p
li
n
g
 F
a
ct
o
r 
(k)
0 5 10 15 20 25 30 35
0
5
10
15
20
25
F r e q u e n cy  [ G H z ]
Q
u
a
li
ty 
F
a
ct
o
r 
(Q
)
P r i m a r y
S e c o n d a r y
(a)
(b)
(c)
Figure 6.10: Simulated parameters of the transformer windings: (a) self-inductance L, (b)
coupling factor k, and (c) quality factor Q.
90
0 5 10 15 20 25 30 35
10
-1
10
0
10
1
F r e q u e n cy  [ G H z ]
C
a
p
a
ci
t
a
n
ce
 
[
p
F
]
Figure 6.11: Simulated capacitance parallel with the transformer primary winding.
core area is 0.5 0.4 mm2. As shown in the die photo, the I-CCO and Q-CCO are symmetri-
cally placed. The layout is also optimized to lower the e ect of layout parasitic on the QCCO
performance including the harmonic distortion and phase noise. The QCCO is tested in the
package of CLCC-28. A bu er is included on-chip in order to drive 50 Omega load provided
at the input of a spectrum analyzer or a digital oscilloscope. Due to the limitation of the
test set up, all the test results were measured based on the single-ended output, although
the QCCO has full di erential frequency output capability. Single-ended testing ends up
with degraded phase noise and I-Q mismatch.
A wide tuning range of 45.3% is achieved with the tuning current Itune tuned from 0.4
to 2.9 mA and the QCCO core current Icore tuned from 1.2 to 5.5 mA. The measured QCCO
turning range is given in Fig. 6.13. It shows continuous tuning range from 8.7 to 13.8 GHz.
Fig. 6.14 shows the measured quadrature outputs with 11.5 GHz frequency.
The measured phase noise with an 11.02 GHz output frequency is shown in Fig.6.15.
With the single-ended test, the transformer-coupled varactor-less QCCO achieves 86.8 dBc/Hz
91
V C O
Figure 6.12: Fabricated QCCO RFIC die photo.
0.5 1 1.5 2 2.5
8
9
10
11
12
13
14
TUNING CURRENT (mA)
O
U
T
P
U
T
 
F
R
E
Q
U
E
N
C
Y
 
(
G
H
z
)
I
c
 = 1.2mA
I
c
 = 2.0mA
I
c
 = 2.9mA
I
c
 = 3.8mA
I
c
 = 4.6mA
I
c
 = 5.5mA
Figure 6.13: Measured QCCO tuning range.
phase noise at 1 MHz o set frequency and 110 dBc/Hz at 10 MHz o set frequency. A widely
accepted  gure-of-merit (FOM) for VCO designs is proposed in [48]. The FOM takes into ac-
count output frequency, phase noise performances and power consumption and is expressed
by
FOM = L( f) 20 log
 f
0
 f
 
+ 10 log
 P
diss
1mW
 
: (6.15)
92
Figure 6.14: Measured QCCO outputs at 10.5 GHz with tuning current of 1.5 mA and core
current of 2 mA.
where L( f) is the phase noise at the o set frequency  f from the carrier frequency f0. Pdiss
is the total core power dissipation of the I-CCO and Q-CCO. The FOM of this transformer-
coupled varactor-less QCCO is calculated as -154 dBc/Hz.
Table 6.1 summarizes the measured results of the transformer-coupled varactor-less
QCCO. It achieves 45.3% wide tuning range and the core circuit occupies 0.4 0.5 mm2
chip area in a 0.18  m SiGe BiCMOS process. It draws 8-18 mA current over the tuning
range under a 1.8 V power supply. Table 6.2 compares the frequency, tuning range, power
consumption and phase noise for several variable-frequency oscillators. Compared to [32,33,
35], the proposed oscillator has a much higher output frequency and tuning range. Compared
to [36], it has higher frequency, yet a worse phase noise.
6.5 Conclusion
A transformer-coupled varactor-less wide tuning QCCO is presented in this chapter.
It achieves 45.3% wide tuning range by tuning the oscillator currents  owing through the
93
Figure 6.15: Measured QCCO phase noise with output frequency of 11.02 GHz.
Table 6.1: QCCO Performance Summary
Technology 0.18  m SiGe BiCMOS
Supply voltage 1.8 V
Oscillation frequency 8.7-13.8 GHz
Tuning range 45.3%
Core current 8-18 mA
Bu er current 8 mA
Phase noise @ 1MHz -86.8 dBc/Hz
Phase noise @ 10MHz -110 dBc/Hz
QCCO area 500  m 400  m
FOM -154 dBc/Hz
primary and secondary winding of the stacked octagonal transformers. The prototype QCCO
is fabricated in 0.18  m SiGe BiCMOS technology and the core circuit occupies 0.4 0.5 mm2
chip area. It draws 8-18 mA current under a 1.8 V power supply. The measured phase noise
94
Table 6.2: Performance Comparison of Variable-frequency Oscillators
Ref Technology Frequency Tuning Range Power Phase Noise
[GHz] [mW] [dB/Hz]
[32] 1  m CMOS 0.9 17% 30 -110@1MHz
[33] 0.35  m CMOS 1.8 18% 50 -140@3MHz
[35] 0.5  m BiCMOS 4.3-5 14.6% 19.8 -115@2MHz
[36] 65 nm CMOS 3.2-7.3 67% 7.2-24 -120 -135@1MHz
[37] 0.18  m BiCMOS 8.7-13.8 45.3% 14.4-32.4 -86.8@1MHz,
/This Work -110@10MHz
of the single-end output is about -86.8 dBc/Hz at 1 MHz o set and 110 dBc/Hz at 10 MHz
o set with a 11.02 GHz quadrature output. The phase noise FOM of this transformer-coupled
QCCO is -154 dBc/Hz.
95
Chapter 7
Summary and Future Work
7.1 Summary of Original Work
This dissertation presents detailed design procedure of ultrahigh speed DDS. The main
target is to achieve microwave range speed and high resolution performance as well as keep
moderate power consumption. A transformer-coupled frequency variable oscillator is de-
signed to provide the clock frequency of the DDS system.
Three DDSs with di erent architectures have been implemented in 0.13  m SiGe BiC-
MOS technology. The sine-weighted DAC brings in the system additional spurs to the  nal
DDS output spectra. The major problem is the non-ideal nature of a sine-weighted DAC
is more noticeable than that of a linear DAC. The detailed analysis shows that the DAC
associated spurs coming from two major sources. One is the static performance of a DAC,
such as DNL or INL. The other is the dynamic performance of a DAC, which is input code
dependent and clock frequency dependent. To make the whole situation more complicate,
the noise coming from digital block will also inject into the substrate and power supply. To
suppress the crosstalk and power and ground bouncing, more unknown variables need to be
taken into account.
7.2 Possible Future Directions
The main di culty comes from good model to accurately re ect the real natures of
a working DDS. Though there are some theoretical works appeared in this  eld, there is
still something to be desired. Especially, to predicate the dynamic performance of the DAC
and the DDS with good accuracy, up to now hardly anyone can give satis ed results. The
96
following works are intend to make further study of modeling of the DDS, provide some new
thoughts into this problem and hopefully  nd alternative solution to answer the questions
when designing a DDS.
97
Bibliography
[1] Xueyang Geng, F. F. Dai, J. D. Irwin, and R. C. Jaeger, \A 9-bit 2.9 GHz
direct digital synthesizer MMIC with direct digital frequency and phase
modulations," in IEEE MTT-S International Microwave Symposium Di-
gest, June 2009, pp. 1125{1128.
[2] Xueyang Geng, F. F. Dai, J. D. Irwin, and R. C. Jaeger, \A 24-bit 5.0 GHz
direct digital synthesizer MMIC with direct digital modulations and spur
randomization," in Radio Frequency Integrated Circuits (RFIC) Sympo-
sium, 2009 IEEE, June 2009, pp. 419{422.
[3] J. M. P. Langlois and D. Al-Khalili, \Phase to sinusoid amplitude conversion techniques
for direct digital frequency synthesis," in IEE Proc. Circuits Devices Syst., Dec. 2004,
pp. 519{528.
[4] D. A. Sunderland, R. A. Strauch, S. S. Whar eld, H. T. Peterson, and C. R. Cole,
\CMOS/SOS frequency synthesizer LSI circuit for spread spectrum communications,"
IEEE J. of Solid-State Circuits, vol. sc-19, no. 4, pp. 497{506, Aug. 1984.
[5] H. T. Nicholas, III, H. Samueli, and B. Kim, \The optimization of direct digital fre-
quency synthesizer performance in the presence of  nite word length e ects," in Proc.
of the 42nd Annual Frequency Control Symposium, 1988, pp. 257{263.
[6] C. C. Wang, Y. L. Tseng, H. C. She, C. C. Li, and R. Hu, \A 13-bit resolution ROM-less
direct digital frequency synthesizer based on a trigeometric quadruple angle formula,"
IEEE Trans. on Very Large Scale Integration Systems, vol. 12, no. 9, pp. 895{900, Sept.
2004.
[7] D. De Caro, N. Petra, and A. G. M. Strollo, \Reducing lookup-table size in direct
digital frequency synthesizers using optimized multipartite table method," IEEE Trans.
on Circuits and Systems II, vol. 55, no. 7, pp. 2116{2127, Aug. 2008.
[8] Xueyang Geng, X. Yu, F. F. Dai, J. D. Irwin, and R. C. Jaeger, \An 11-bit
8.6 GHz direct digital synthesizer MMIC with 10-bit segmented nonlin-
ear DAC," in 34th European Solid-State Circuits Conference (ESSCIRC),
Sept. 2008, pp. 362{365.
[9] Xueyang Geng, F. F. Dai, J. D. Irwin, and R. C. Jaeger, \An 11-bit 8.6
GHz direct digital synthesizer MMIC with 10-bit segmented sine-weighted
DAC," IEEE J. of Solid-State Circuits, vol. 45, no. 2, pp. 300{313, Feb.
2010.
98
[10] P. Lacomme, J.-P. Hardange, J.-C. Marchais, and E. Normant, Air and Spaceborne
Radar System: An Introduction, ISBN 1-891121-13-8. William Andrew Publishing,
LLC, 2001.
[11] J. R. Klauder, A. C. Price, S. Darlington, and W. J. Albersheim, \The theory and
design of chirp radars," The Bell System Technical Journal, vol. 39, no. 4, pp. 745{808,
1960.
[12] M. Skolnik, Radar handbook, 3rd ed., ISBN-10: 0071485473, ISBN-13: 978-0071485470,
McGraw-Hill Professional, 2008.
[13] F. F. Dai, W. Ni, Y. Shi, and R. C. Jaeger, \A direct digital frequency synthesizer with
fourth-order phase domain   noise shaper and 12-bit current steering DAC," IEEE
J. Solid-State Circuits, vol. 41, no. 4, pp. 839{850, April 2006.
[14] A. Van den Bosch, M. Steyaert, and W. Sansen, Static and Dynamic Performance
Limitations for High Speed D/A converters, ISBN 1402077610, Kluwer Academic
Publishers, chapter 5.
[15] J. J. Wikner and N. Tan, \Modeling of CMOS digital-to-analog converters for telecom-
munication," IEEE Trans. on Circuit and Systems II: Analog and Digital Signal Pro-
cessing, vol. 46, no. 5, pp. 489{499, May 1999.
[16] W. Ni, Xueyang Geng, Y. Shi, and F. Dai, \A 12-bit 300 MHz CMOS DAC
for high-speed system applications," in IEEE International Symposium on
Circuits and Systems (ISCAS), Kos, Greece, May 2006, pp. 1402{1405.
[17] J. Jiang and E. K. F. Lee, \A low-power segmented nonlinear DAC-based direct digital
frequency synthesizer," IEEE J. Solid-State Circuits, vol. 37, no. 10, pp. 1326{1330,
Oct. 2002.
[18] A. Gutierrez-Aitken, J. Matsui, E. Kaneshiro, B. Oyama, A. Oki, and D. Streit, \Ultra
high speed direct digital synthesizer using InP DHBT technology," IEEE J. of Solid-
State Circuits, vol. 37, no. 9, pp. 1115{1121, Sept. 2002.
[19] S. E. Turner and D. E. Kotecki, \Direct digital synthesizer with ROM-less architecture
at 13-GHz clock frequency in InP DHBT technology," IEEE Microwave and Wireless
Components Letters, vol. 16, no. 5, pp. 296{298, May 2006.
[20] S. E. Turner and D. E. Kotecki, \Direct digital synthesizer with sine-weighted DAC
at 32-GHz clock frequency in InP DHBT technology," IEEE J. of Solid-State Circuits,
vol. 41, no. 10, pp. 2284{2290, Oct. 2006.
[21] X. Yu, F. F. Dai, J. D. Irwin, and R. C. Jaeger, \A 12 GHz 1.9 W direct digital
synthesizer RFIC implemented in 0.18 m SiGe BiCMOS technology," IEEE J. of Solid-
State Circuits, vol. 43, no. 6, pp. 1384{1393, June 2008.
99
[22] X. Yu, F. F. Dai, J. D. Irwin, and R. C. Jaeger, \A 9-bit quadrature direct digital
synthesizer implemented in 0.18- m SiGe BiCMOS technology," IEEE Trans. on Mi-
crowave Theory and Technique, vol. 56, no. 5, pp. 1257{1266, May 2008.
[23] A. Van den Bosch, M. A. F. Borremans, M. S. J. Steyaert, and W. Sansen, \A 10-
bit 1-Gsample/s Nyquist current-steering CMOS D/A converter," IEEE J. Solid-State
Circuits, vol. 36, no. 3, pp. 315{324, March 2001.
[24] K. R. Elliott, \Direct digital synthesis for enabling next generation RF systems," in
IEEE Compound Semiconductor Integrated Circuit Symposium (CSIC), Nov. 2005, pp.
125{128.
[25] P. G. A. Jespers, Integrated Converters: D to A and A to D Architectures, Analysis and
Simulation, ISBN 0198564465, Oxford University Press.
[26] R. H. Walden, \Analog-to-digital converter survey and analysis," IEEE J. on Selected
Areas in Communications, vol. 17, no. 4, pp. 539{550, April 1999.
[27] S. E. Turner, R. T. Chan, and J. T. Feng, \ROM-based direct digital synthesizer at
24 GHz clock frequency in InP DHBT technology," IEEE Microwave and Wireless
Components Letters, vol. 18, no. 8, pp. 566{568, Aug. 2008.
[28] S. Thuries, E. Tournier, A. Cathelin, S. Godet, and J. Gra euil, \A 6-GHz low-power
BiCMOS SiGe:C 0.25  m direct digital synthesizer," IEEE Microwave and Wireless
Components Letters, vol. 16, no. 1, pp. 46{48, Jan 2008.
[29] Xueyang Geng, F. F. Dai, J. D. Irwin, and R. C. Jaeger, \A 24-bit 5.0 GHz
direct digital synthesizer RFIC with direct digital modulations in 0.13  m
SiGe BiCMOS Technology," IEEE J. of Solid-State Circuits, vol. 45, no. 5,
pp. 944{954, May 2010.
[30] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits, A Design
Perspective (Second Edition), 2nd ed., ISBN 0-13-597444-5, Pearson, Inc, 2003.
[31] AD9912 datasheet, Analog Devices.
[32] A. Rofougaran, J. Rael, M. Rofougaran, , and A. Abidi, \A 900 MHz CMOS LC-
oscillator with quadrature outputs," in IEEE Int. Solid-State Circuits Conf. (ISSCC),
1996, pp. 392{393.
[33] P. Andreani, A. Bonfanti, L. Romano, , and C. Samori, \Analysis and design of a 1.8-
GHz CMOS LC quadrature VCO," IEEE J. of Solid-State Circuits, vol. 37, no. 12, pp.
1737{1747, Dec. 2002.
[34] P. Andreani and X. Wang, \On the phase-noise and phase error performance of multi-
phase LC CMOS VCOs," IEEE J. of Solid-State Circuits (JSSC), vol. 39, no. 11, pp.
1883{1893, Nov. 2004.
100
[35] V. Kakani, F. Dai, , and R. C. Jaeger, \A 5GHz BiCMOS quadrature LC VCO with wide
tuning range," in IEEE Bipolar/BiCMOS Circuits and Technology meeting (BCTM),
2006, pp. 138{141.
[36] G. Cusmai, M. Repossi, G. Albasini, A. Mazzanti, , and F. Svelto, \A magnetically
tuned quadrature oscillator," IEEE J. of Solid-State Circuits (JSSC), vol. 42, no. 12,
pp. 2870{2877, Dec. 2007.
[37] Xueyang Geng and F. F. Dai, \An 8.7-13.8 GHz transformer-coupled
varactor-less quadrature current -controlled oscillator," in the 2009 Bipo-
lar/BiCMOS Circuits and Technology Meeting (BCTM), IEEE, 2009, pp.
63{66.
[38] Xueyang Geng and F. F. Dai, \An X-band transformer-coupled varactor-less
quadrature current-controlled oscillator in 0.18  m SiGe BiCMOS technol-
ogy," Invited to submit to IEEE J. of Solid-State Circuits (JSSC), BCTM
Special Issue, vol. 45, no. 9, Sep 2010.
[39] L. Romano, S. Levantino, A. Bonfanti, C. Samori, , and A. L. Lacaita, \Phase noise and
accuracy in quadrature oscillators," in in IEEE International Symposium on Circuits
and Systems (ISCAS), 2006, pp. 161{164.
[40] A. Mazzanti, F. Svelto, , and P. Andreani, \On the amplitude and phase errors of the
quadrature LC-tank CMOS oscillators," IEEE J. of Solid-State Circuits (JSSC), vol. 41,
no. 6, pp. 1305{1313, Jun. 2006.
[41] D. B. Leeson, \A simple model of feedback oscillator noise spectrum," in in Proc. IEEE,
1966, pp. 329{330.
[42] A. Hajimiri and T. H. Lee, \A general theory of phase noise in electrical oscillators,"
IEEE J. of Solid-State Circuits (JSSC), vol. 33, no. 2, pp. 179{194, Feb. 1998.
[43] T. H. Lee and A. Hajimiri, \Oscillator phase noise: A tutorial," IEEE Journal of Solid-
State Circuits (JSSC), vol. 35, no. 3, pp. 326{336, Mar. 2000.
[44] A. Andreani, \A time-variant analysis of the 1=f2 phase noise in CMOS parallel LC-
tank quadrature oscillators," IEEE Trans. Circuits Syst. I, vol. 53, no. 8, pp. 1749{1770,
Aug. 2006.
[45] T. H. Lee, The Design of CMOS Radio-Frequency Integrated Circuits, 2nd ed., Cam-
bridge University Press.
[46] S. S. Mohan, M. D. Hershenson, S. P. Boyd, , and T. H. Lee, \Simple accurate expres-
sions for planar spiral inductances," IEEE J. of Solid-State Circuits (JSSC), vol. 34,
pp. 1419{1424, 1999.
[47] J. Rogers and C. Plett, Radio Frequency Integrated Circuit Design, Artech House.
[48] P. Kinget, Integrated GHz Voltage Controlled Oscillators, New York: Kluwer Academic
Publisher, 1999.
101