High Speed ROM-Less Direct Digital Frequency Synthesizer
Except where reference is made to the work of others, the work described in this
dissertation is my own or was done in collaboration with my advisory committee. This
dissertation does not include proprietary or classi ed information.
Xuefeng Yu
Certi cate of Approval:
Richard C. Jaeger
Distinguished University Professor Emer-
itus
Electrical and Computer Engineering
Fa Foster Dai, Chair
Professor
Electrical and Computer Engineering
Guofu Niu
Alumni Professor
Electrical and Computer Engineering
Stuart M. Wentworth
Associate Professor
Electrical and Computer Engineering
George Flowers
Dean
Graduate School
High Speed ROM-Less Direct Digital Frequency Synthesizer
Xuefeng Yu
A Dissertation
Submitted to
the Graduate Faculty of
Auburn University
in Partial Ful llment of the
Requirements for the
Degree of
Doctor of Philosophy
Auburn, Alabama
August 10, 2009
High Speed ROM-Less Direct Digital Frequency Synthesizer
Xuefeng Yu
Permission is granted to Auburn University to make copies of this dissertation at its
discretion, upon the request of individuals or institutions and at
their expense. The author reserves all publication rights.
Signature of Author
Date of Graduation
iii
Dissertation Abstract
High Speed ROM-Less Direct Digital Frequency Synthesizer
Xuefeng Yu
Doctor of Philosophy, August 10, 2009
(M.A., Inst. of Semiconductors, CAS, 2003)
(B.S., Tsinghua University, 2000)
98 Typed Pages
Directed by Fa Foster Dai
This dissertation presents a complete  ow for design and evaluation of high speed direct
digital frequency synthesizer (DDS). Though some ultra high speed DDSs have already
been reported in the literature, to satisfy the demand of keeping good balance between
the power consumption and the high performance of DDS is still quite challenging for the
analog designer and is worth to be explored from di erent perspectives.
As a digital method to direct generating sine or cosine waveforms with speci c fre-
quency, DDS does have many merits. DDS has  ne frequency tuning step, fast frequency
switching speed, precisely controlled output phases. Since there is no feedback loop in a
DDS structure, the DDS doesn?t su er the internal loop delay like that in the phase-locked-
loop (PLL) synthesizer. One major bene t that makes DDS stand out is that DDS can
be directly modulated in the digital domain. It can be incorporated with various kinds
modulation schemes to generate modulated signals. By this way, DDS can be served as an
important component to build  exible and recon gurable transmitter in the communication
systems. DDS can also generate quadrature phases and multiple phases with ease. Other
than sine and cosine waveforms, DDS can be utilized to synthesize arbitrary waveforms.
iv
Taking the advantage of well developed Silicon-Germanium (SiGe) process, it is possible
to push the envelope of the DDS speed performance as well as keep the moderate budget
of power consumption. The standard CMOS technology has also been investigated.
Several DDSs have been implemented with the non-linear digital to analog converters
(DAC). The non-linear DAC can directly map the linear phase word into sine or cosine
analog output without the assistance of the ROM. By eliminating the ROM, the speed of
DDS can be dramatically improved. Due to the code dependent and frequency dependent
non-ideal e ects from the non-linear DAC, the unwanted harmonics and spurs of the DDS
outputs have more signi cant impacts on the whole systems. In this dissertation, the spurs
and harmonics from di erent sources such as truncation errors, limited DAC amplitude
resolutions and non-ideal e ects of DAC will be discussed.
During the design, a couple of issues such as clock feed through, clock skew, device
matching properties will be addressed. In the layout period, an method that can automat-
ically synthesize the layout of current source matrix block has been developed, which can
alleviate the transistor matching problem coming from the fabrication.
The unique structure of a compact periodical waveform generator has also been inves-
tigated. In the waveform generator, a ring oscillator has been combined with a weighted
non-linear DAC, thus the external clock and internal clock distributing circuits are no longer
need. This will provide some bene ts for certain on-chip test applications.
v
Acknowledgments
There are many people, throughout my life, provide me the supports in many di erent
ways. Without them, I could not make this far all by myself. First of all, I would like to
express my sincere gratitude to my advisor, Dr. Fa Foster Dai. He is more like an important
friend to me both in my academic and my personal lives. His invaluable knowledge and
wisdom, more than once helped me and guided me to overcome the troubles and di culties
during my study and research.
I would like to take this opportunity to thank my committee members, Dr. Richard
Jaeger, Dr. Stuart Wentworth and Dr. Guofu Niu for taking their valuable times to serve
in the committee. I would also thank all the professors and sta s in our ECE department.
When I have a need, they are always ready to help.
I would express my appreciation to those who helped and made contributions to my
research. Among them, the special thanks go to Dayu Yang, Vasanth Kakani, Yuan Yao,
Xueyang Geng and Desheng Ma for their cooperations and assistants.
Finally, I want to thank my parents, my wife and my other family members for their
unconditional love. They are always on my side, share my joys and sorrows, give me courages
and hopes, let me persue my career and let me have a meaningful life.
vi
Style manual or journal used Journal of Approximation Theory (together with the style
known as \auphd"). Bibliograpy follows van Leunen?s A Handbook for Scholars.
Computer software used The document preparation package TEX (speci cally LATEX)
together with the departmental style- le aums.sty.
vii
Table of Contents
List of Figures x
1 INTRODUCTION 1
1.1 Background of DDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Applications of DDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Performance speci cations of DDS . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 SINGLE PHASE SIGE DDS 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Ultra-high speed DDS architecture . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 DDS spectra purity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 DDS circuit design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.1 Pipelined accumulator . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.2 SiGe CML logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.3 Clock and MSB trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.4 DAC current source and switch . . . . . . . . . . . . . . . . . . . . . 19
2.5 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 QUADRATURE PHASES SIGE DDS 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Direct Modulations in DDS . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 DDS circuit design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.1 Quadrature DDS architecture . . . . . . . . . . . . . . . . . . . . . . 37
3.3.2 Pipelined accumulator . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.3 DAC current source and switch circuits . . . . . . . . . . . . . . . . 44
3.3.4 Clock tree and MSB tree designs . . . . . . . . . . . . . . . . . . . . 47
3.3.5 Layout considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4 Measured results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4 QUADRATURE PHASES SIGE DDS WITH UP-CONVERSION 59
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Architecture and circuit design . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3 Measured results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
viii
5 RING OSCILLATOR BASED PERIODICAL WAVEFORM GENERATOR 71
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Waveform generator architectures . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3 Circuits of the waveform generator . . . . . . . . . . . . . . . . . . . . . . . 74
5.4 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6 SUMMARY AND FUTURE WORKS 82
6.1 Summary of the works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Bibliography 84
ix
List of Figures
2.1 DDS architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Digital modulation capability in di erent DDSs . . . . . . . . . . . . . . . . 9
2.3 Conceptual diagram of the ROM-less DDS . . . . . . . . . . . . . . . . . . . 10
2.4 Block diagram of the implemented DDS MMIC . . . . . . . . . . . . . . . . 13
2.5 NxM generic architecture of a pipelined accumulator . . . . . . . . . . . . . 15
2.6 CML full adder circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.7 Current switch circuit of the nonlinear DAC . . . . . . . . . . . . . . . . . . 20
2.8 Synchronous switch control circuit of the nonlinear DAC . . . . . . . . . . . 21
2.9 Die photo of DDS chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.10 DDS MMIC test setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.11 Measured DDS output waveform (a) and spectrum (b) with a 23.5MHz out-
put (FCW=1) and a clock at 12.021GHz . . . . . . . . . . . . . . . . . . . . 27
2.12 Measured DDS output spectra at Nyquist rate (FCW=511). (a) The out-
put frequency at 5.930GHz and the image tone at 5.98GHz with a clock at
11.913GHz; (b) The output frequency at 5.04GHz and the image tone at
5.08GHz with a clock at 10.110GHz . . . . . . . . . . . . . . . . . . . . . . 27
2.13 Measured DDS output spectrum with a 1.7898GHz output and a 9.59GHz
clock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.14 Measured DDS output waveform with 1.125GHz output and 9GHz clock . . 29
2.15 Measured DDS output SFDR versus frequency control word at -20C ambient
temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1 Direct modulation through a DDS . . . . . . . . . . . . . . . . . . . . . . . 35
x
3.2 Extend the output frequency range using a quadrature DDS and SSB mixers 36
3.3 Conceptual drawing of the quadrature DDS RFIC . . . . . . . . . . . . . . 37
3.4 Detailed block diagram of the qaudarture DDS . . . . . . . . . . . . . . . . 40
3.5 Output sine and cosine waveform depending on the symmetry property of
the sine waveform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.6 DAC current switch circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.7 Die photo of the quadrature DDS MMIC . . . . . . . . . . . . . . . . . . . 51
3.8 Test setup of the quadrature DDS RFIC . . . . . . . . . . . . . . . . . . . . 51
3.9 Measured single phase DDS output spectrum with clock at 9.07 GHz and
output at 2.227GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.10 Measured quadrature phase DDS output spectrum with clock at 5.44 GHz
and output at 0.397GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.11 Measured qaudrature phase DDS output spectrum with clock at 6.815 GHz
and output at 3.394GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.12 Measured DDS output waveforms without deglitch  lter at 0.389GHz with
clock at 6.2GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.13 Measured quadrature DDS output waveforms at 1.58GHz with clock at 6.3GHz 56
4.1 Concept diagram of the frequency synthesizer . . . . . . . . . . . . . . . . . 60
4.2 Block diagram with the circuit of quadrature VCO . . . . . . . . . . . . . . 61
4.3 Circuits of up-convert and down-convert mixers . . . . . . . . . . . . . . . . 63
4.4 Frequency synthesizer die photo . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.5 Measured 37MHz output waveforms with a 6.4GHz QDDS . . . . . . . . . . 66
4.6 Measured output spectra of 4.6GHz QDDS clock input and 11.7GHz LO output 67
4.7 Measured output spectra of 4.6GHz QDDS clock input and 2.3GHz QDDS
output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
xi
4.8 Measured output down-converted 9.4GHz output . . . . . . . . . . . . . . . 69
5.1 One cycle of the waveform with constant sampling step . . . . . . . . . . . 72
5.2 Block diagram of the ring oscillator waveform generator . . . . . . . . . . . 73
5.3 Simpli ed circuit of current switch with 3-bit+sign programmable current
source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.4 Die photo of the waveform generator . . . . . . . . . . . . . . . . . . . . . . 77
5.5 Simulated output waveform during data loading . . . . . . . . . . . . . . . . 78
5.6 Simulated output sine waveform . . . . . . . . . . . . . . . . . . . . . . . . 78
5.7 Simulated output waveform during transition . . . . . . . . . . . . . . . . . 79
5.8 Measured output waveform during data loading . . . . . . . . . . . . . . . . 79
5.9 Measured synthesized arbitrary waveform . . . . . . . . . . . . . . . . . . . 80
5.10 Measured synthesized arbitrary waveform . . . . . . . . . . . . . . . . . . . 80
xii
Chapter 1
INTRODUCTION
1.1 Background of DDS
High speed frequency synthesizer with  ne tuning step and large tuning range is the
crucial part in the modern wireless communication. However, the conventional phase locked
loop (PLL) based frequency synthesizer can?t meet the requirement due to internal loop de-
lay, low resolution and limited tuning range of voltage controlled oscillator (VCO). DDFS
provides many advantages including fast frequency switching,  ne frequency tuning resolu-
tion, continuous-phase switching, and allowing direct phase and frequency modulations in
the digital domain.
The traditional DDFS contains the ROM to store sine waveform data and the ROM
size is exponentially proportional to the desired phase resolution. The ROM for sine look-
up table occupies the majority of the DDFS area and also limits its maximum operation
frequency due to the delay through the multi-layer decoders. The simplest method is to
reduce the ROM size is based on the quarter-wave symmetry in the sine function and cut
the ROM size by a factor of 4. Though many other ROM compression methods have
been proposed, such as trigonometric approximation and parabolic approximation [18], the
problems indicated above still exist.
A novel approach is to replace the conventional linear DAC that converts digital am-
plitude words to analog amplitude waveform with a nonlinear one that converts the digital
phase word into an analog sine waveform directly [13]. Thus the ROM is completely re-
moved and the performance of the DDFS is improved signi cantly. As to design the high
1
speed nonlinear DAC, current-steering DAC [15] becomes the ideal candidate which can
generate a Nyquist output signal with high accuracy at a high update rate [26].
1.2 Applications of DDS
Integrating a millimeter-wave (mm-wave) frequency synthesizer into a wireless transceiver
that can accommodate multiple standards coexisting in communication systems has been
a challenging task and attracted great interest in recent years. One conventional approach
to cover the frequency bands for di erent standards is to use a phase-locked loop (PLL)
based frequency synthesizer. However, multi-band PLL synthesizers consume large die
area and power. Digital synthesis of highly complex wideband waveforms at the highest
possible frequency would considerably reduce the size, weight, power and cost of modern
communication systems. Recent developments in communication and radar systems are
placing increasing demands on low power consumption, high output frequency,  ne fre-
quency resolution, fast channel switching and versatile modulation capability for frequency
synthesis. These requirements are surpassing the performance capabilities of conventional
analog PLL synthesizers. It is di cult for the PLL-based frequency synthesizer to meet
these requirements due to internal loop delay, low resolution, modulation problems and the
limited tuning range of the voltage-controlled oscillator (VCO). In contrast, a direct digital
synthesizer (DDS) generates a digitized waveform at a desired frequency by accumulating
the phase word at a higher clock frequency. DDS is a digital technique for frequency synthe-
sis, waveform generation, sensor excitation, and digital modulation/demodulation. Since
there is no feedback in a DDS structure, the DDS is capable of extremely fast frequency
2
switching or hopping at the speed of the clock frequency. A DDS provides various mod-
ulation capability and many other advantages, including  ne frequency tuning resolution,
continuous-phase switching and the ability to provide quadrature signals with accurate I/Q
matching. Furthermore, a DDS can generate arbitrary waveforms in the digital domain.
The increasing availability of ultra high-speed DACs allows a DDS to operate at mm-wave
frequency, providing an attractive alternative solution to conventional analog PLL synthe-
sizers.
Radar systems demand highly accurate control over the output frequencies and phases
of the frequency synthesizers in radar transceiver for coherent detection. It is not uncommon
that the modern radar systems require frequency synthesizers with low power consump-
tion, high output frequency,  ne frequency resolution, fast channel switching and versatile
modulation capability. These requirements are surpassing the performance capabilities of
conventional analog phase-locked loops (PLL). It is di cult for the conventional PLL-based
frequency synthesizer to meet these requirements due to internal loop delay, low resolu-
tion, modulation problems and the limited tuning range of the voltage-controlled oscillator
(VCO). In contrast, a direct digital synthesizer (DDS) is capable of fast frequency hopping,
 ne frequency tuning, continuous-phase switching, direct modulation, arbitrary waveform
and quadrature signal generations. The advance of technology brings the device operating
frequency to a higher level, increases the circuit density and cuts down the manufacture
cost. With the improvement of the technology, it becomes feasible to implement a single
chip DDS operating at mm-wave frequency at a reasonable cost, replacing the conventional
analog PLL synthesizers in radar systems.
3
1.3 Performance speci cations of DDS
One of the most important metric for DDS is the spurious free dynamic range (SFDR).
SFDR is de ned as the ratio of the RMS amplitude of the carrier frequency (maximum
signal components) to the RMS value of their next largest distortion component. SFDR is
usually measured in dBc. It can be shown as following
SFDR = 20log10 Vcarriermax(V
spur)
(1.1)
Another useful metric for DDS is the signal-to-noise ratio(SNR). SNR is the ratio of
the power of the desired signal to the total power of noise signals, which is always expressed
in dB. It can be shown as following
SNR = 10log10 PcarrierPP
spur
(1.2)
1.4 Outline
This dissertation is organized as follows: The  rst chapter is some basic introduction
of the DDS. The second chapter introduces the ultra-high speed single phase DDS design.
In the third chapter, the single phase DDS will be extended to quadrature DDS. In chapter
four, by adding the internal mixer, the output frequency of a multiple GHz DDS can be
moved to a higher frequency. In chapter  ve, a ring oscillator based periodical waveform
generator will be presented. The last chapter is the summary of the whole work and gives
some thoughts for the future work.
4
Chapter 2
SINGLE PHASE SIGE DDS
2.1 Introduction
Though CMOS technology can be used to achieve better integration and reduce total
cost, heterojunction bipolar transistor (HBT) technology is more favorable in microwave
analog circuit design for their high current gain and low device noise in this frequency range.
Two major candidates for high speed DDS design are indium phosphide (InP) HBT and
silicon germanium (SiGe) HBT technologies. The mobility of the carriers in InP devices
is high and the cut o frequency of the device can be well over 300GHz, but the yield
of complicated InP designs still is lower than those of other mature technologies. Taking
the device performance, manufacture cost and integration density into consideration, SiGe
process appears to be a better choice for DDS circuit design.
Radar systems demand highly accurate control over the output frequencies and phases
of the frequency synthesizers in radar transceiver for coherent detection. It is not uncommon
that the modern radar systems require frequency synthesizers with low power consump-
tion, high output frequency,  ne frequency resolution, fast channel switching and versatile
modulation capability. These requirements are surpassing the performance capabilities of
conventional analog phase-locked loops (PLL). It is di cult for the conventional PLL-based
frequency synthesizer to meet these requirements due to internal loop delay, low resolu-
tion, modulation problems and the limited tuning range of the voltage-controlled oscillator
(VCO). In contrast, a direct digital synthesizer (DDS) is capable of fast frequency hopping,
 ne frequency tuning, continuous-phase switching, direct modulation, arbitrary waveform
5
and quadrature signal generations. The advance of technology brings the device operating
frequency to a higher level, increases the circuit density and cuts down the manufacture
cost. With the improvement of the technology, it becomes feasible to implement a single
chip DDS operating at mm-wave frequency at a reasonable cost, replacing the conventional
analog PLL synthesizers in radar systems.
2.2 Ultra-high speed DDS architecture
A conventional DDS consists of three primary building blocks, a phase accumulator,
sine/cosine mapping block, and digital-to-analog converter (DAC), which performs the dig-
ital amplitude to analog amplitude conversion [20]. A deglitch  lter is normally added
o -chip to smooth the waveform by removing the unwanted spectral components. The
phase control word (FCW) at the input of the phase accumulator determines the output
frequency of the DDS. The sine/cosine block maps the accumulated phase to the sine or
cosine amplitude.
Depending on the transfer characteristic of the DAC, a DDS can be characterized in
three types, as shown in Fig. 2.1. The  rst type represents the conventional DDS that has
a linear DAC, and the phase-to-amplitude conversion is done in the digital domain using
a sine look-up table [20]. The second one also contains a linear DAC, but the sine/cosine
conversion is performed in the analog domain by converting an analog triangle waveform
to an analog sine waveform [12]. The third type is a ROM-less DDS that combines both
the sine/cosine mapping and digital-to-analog conversion in a nonlinear DAC whose current
sources are weighted with sine amplitude information [2][13].
6
 
P h a s e  
A c c u m u l a t o r
D i g i t a l  S i n e /
C o s i n e  
M a p p i n g
L i n e a r  D A C
P h a s e  
A c c u m u l a t o r
L i n e a r  D A C
A n a l o g  S i n e /
C o s i n e  
M a p p i n g
P h a s e  
A c c u m u l a t o r
N o n l i n e a r  D A C  
W i t h  S i n e /
C o s i n e  
M a p p i n g
Figure 2.1: DDS architectures
The  rst type of DDS has several variations, depending upon the di erent mapping
methods employed in the phase-to-amplitude look-up table. In the traditional DDS, a
sine look-up table is built using a ROM which stores the sine/cosine mapping information.
However, the ROM size expands exponentially with phase resolution. The sine ROM look-
up table occupies the majority of the DDS area, and also limits its maximum operation
frequency due to delay through the phase decoders.
The simplest way to reduce the ROM size is to employ the quarter-wave symmetry
of a sine function, reducing the ROM size by a factor of 4. Numerous ROM compression
techniques have been proposed, including trigonometric approximation [19][16], parabolic
approximation [18], and interpolation [14]. Even though these compression methods par-
tially alleviate the problem, the internal delay caused by retrieving ROM data still restricts
the speed of the DDS. Another approach employs series expansions, such as a Taylor ex-
pansion or polynomial expansion, to approximate the ideal curve. The coordinate rotating
digital computer (CORDIC) method calculates the amplitude directly, based on the pro-
jection of a rotating vector in a polar axial system [25]. Both the series expansion and
7
CORDIC approaches require a considerable amount of hardware, and the complexity limits
the  nal speed, so these structures normally appear in DDS implementations below the
multiple GHz range. Using improved di erential CORDIC [10], the theoretical output of
the ROM-based DDS can reach the GHz range, but its performance still needs proof-in-
silicon. Implementing a GHz ROM simply consumes too large a power and area. Ref. [10]
implemented a linear-DAC DDS in 0.25 m CMOS technology with 1.2GHz clock speed.
However, it?s very di cult, if not presently impossible, to implement a ROM-based DDS
with clock frequency beyond 10GHz and amplitude resolution larger than 8 bits.
The second type converts a linear analog triangle waveform to an analog sine waveform.
This technique utilizes bipolar di erential pairs to perform the conversion task by choosing
degenerating resistor values and biasing currents to  t the  rst two terms of a Taylor expan-
sion. Theoretically, 0.1% total harmonic distortion can be achieved, which corresponds to a
30dB signal-to-noise ratio, or 5 e ective bits. Stringent current requirements in the di er-
ential pairs limit the usage of this method, particularly when the capacitive load that must
be driven varies as a result of di erent applications. The third type of DDS is a ROM-less
DDS with a nonlinear DAC. The  rst two structures require a linear DAC, while in a ROM-
less DDS, the ROM is removed, and a nonlinear DAC serves as the phase-to-amplitude and
digital-to-analog converter. The sine weighted DAC eliminates the sine look-up table, which
is the speed and area bottleneck for high-speed DDS implementations.
Our design employs the ROM-less structure with a nonlinear current steering DAC.
This structure combines the sine/cosine mapping block and the digital amplitude to ana-
log amplitude conversion block, thus signi cantly improving the speed of the DDS. In the
ROM-less DDS design, the current steering DAC structure is an ideal candidate capable of
8
 
D i g i t a l  
S i n e /
C o s i n e  
M a p p i n g
L i n e a r  
DAC
F M  C o n t r o l P M  C o n t r o l A M  C o n t r o l
N o n l i n e a r  
D A C  W i t h  
Si n e / C o s i n e  
M a p p i n g
F M  C o n t r o l P M  C o n t r o l
F C W
F C W
L i n e a r  D A C
A n a l o g  
S i n e / C o s i n e  
M a p p i n g
F M  C o n t r o l P M  C o n t r o l
F C W
Figure 2.2: Digital modulation capability in di erent DDSs
generating a Nyquist output signal with excellent accuracy and high update rate. Digital
domain modulation can be easily implemented in a DDS, as illustrated in Fig. 2.2. Fre-
quency modulation (FM), chirp, and phase modulation (PM) can be easily implemented in
all three types of DDS. However the  rst type can implement amplitude modulation (AM)
in digital domain prior to the DAC, while the other two can implement AM only in the
analog domain. Delta-Sigma modulation can also be added in the DDS to improve the
output spectral purity and to reduce the e ective number of phase bits [5].
Quadrature rotation can also be implemented in a DDS with quadrature outputs. A
quadrature DDS consists of a shared phase accumulator and two DACs with sine and
cosine outputs. If linear DACs are used, quadrature rotation can be implemented in digital
domain, since digital quadrature waveforms are available at the inputs of the DACs. For
a ROM-less DDS, quadrature rotation can only be implemented in analog domain using
mixers. A mm-wave quadrature DDS has been implemented in SiGe technology with clock
frequency beyond 6GHz [34].
9
 
P h a s e
A c c u m u la t o r
S i n e
O u t p u t
M S B
2 n d  M S B
F C W
N o n li n e a r  
D A CC o m p le m e n t o r
N N P P - 2P - 2P h a s e
T r u n c a t o r
Figure 2.3: Conceptual diagram of the ROM-less DDS
2.3 DDS spectra purity
The conceptual block diagram of the ROM-less DDS, employing a nonlinear DAC, is
shown in Fig. 2.3. In order to save die area and power, the phase accumulator output is
normally truncated. For instance, the output of the phase accumulator is truncated into
P bits, according to the signal-to-noise ratio (SNR) requirement of the DDS output. The
two MSBs are used to determine the quadrant of the phase accumulator output, according
to the quadrant symmetry of the sine wave. The lowest P  2 bits are fed through the
complementor and converted to a sine waveform by the nonlinear DAC. The sinusoidal
waveform data are programmed into the current source matrix of the DAC, and the output
currents are summed from the DAC output. In the process of discrete phase accumulation
and phase word truncation, spurs and quantization noise will be introduced at the DDS
output spectrum as discussed below.
The N-bit FCW feeds a phase accumulator that controls the output frequency of the
synthesized sine waveform as
fout = FCW2L fclock (2.1)
10
where fclk is the DDS clock frequency. Thus, the desired output period is given by
T0 = 2NFCWTclk. For an N-bit discrete phase accumulator, there is another periodicity, i.e.,
Tspur = 2NGCD(FCW;2N)Tclk , where GCD(a;b) denotes the greatest-common-divisor of a and
b. The accumulator repeats its value at the intervals of Tspur, which generates equally
spaced spurious tones located at multiples of the frequency
fspur = GCD(FCW;2
N)
2N fclock (2.2)
When the input frequency word is a power of two, i.e., FCW = 2i, there will be no
spurs due to discrete phase accumulation. In this case, GCD(FCW;2N) = FCW, namely,
the accumulator output repeats at the same value after every over ow.
The phase truncation process also introduces spurs and quantization noise, which can
be modeled as a linear additive noise to the phase of the sinusoidal wave. Phase truncation
error is periodic [16]. If the P most signi cant bits (MSB) of an N-bit phase word are used
to address the DAC or lookup table, the truncation resultant spurs are mixed with the DDS
output frequency generating spurs at multiples of the frequency
fspur = GCD(FCW;2
N P)
2N P fclock (2.3)
Note the phase truncation causes errors only when the greatest-common-divisor
GCD(FCW;2N) 2N P. Otherwise, the N P least-signi cant bits (LSB) of the phase
word vanish and the phase truncation does not cause any error.
In addition to the spurious components, the DDS output waveform will su er AM
distortion due to the  nite number of levels that cannot accurately represent the output
11
waveform. The envelope of the DDS output waveform is modulated by a sine wave with
the frequency of
fenvelope = 2
N 1 mod FCW
2N 1 fclock (2.4)
where AmodB represents the integer residue of A modulo B. If 2NmodFCW = 0,
no amplitude modulation will be observed. For a Nyquist output, the frequency of the
amplitude modulation becomes
fenvelope = 2
N 1 mod (2N 1 1)
2N 1 fclock =
1
2N 1fclock (2.5)
Therefore, the envelope of the DDS output waveform is modulated by a low frequency
signal except when the FCW is an integer power of 2 such that 2N mod FCW = 0 .
2.4 DDS circuit design
The implemented ultra-high speed DDS MMIC is comprised of a 9-bit pipeline accu-
mulator, and an 8-bit sine-weighted current steering DAC, as shown in Fig. 2.4. Since
the output frequency cannot exceed the Nyquist rate, an 8-bit FCW is fed into a 9-bit
pipeline accumulator with the MSB of the accumulator input tied to zero. The output of
the pipeline accumulator is a 9-bit phase word, whose LSB will be truncated before driving
the 8-bit DAC. One bit truncation reduces the size and power consumption of the DAC
with minimum spurious penalty. The MSB output of the phase accumulator is used to
provide the proper mirroring of the sine waveform about the phase point. The second
MSB is used to invert the remaining 6-bits for the second and fourth quadrants of the sine
wave prior to the decoding logic. Each column-row decoder has a linear 3:8 operation. The
12
 
P i p e l i n e  
A c c u m u l a t o r
X O R
D e c o d e r
D e c o d e r
B u f f e r B u f f e r
D A C  
C u r r e n t  
S w i t c h
S i n e  
W e i g h t e d  
C u r r e n t  M a t r i x
6 b
M S B
2
nd
 M S B
3 b
3 b
7 b
7 b
63
O u t p u t
V C O B u f f e r
1 b
M S B
C l o c k  O u t p u t
8 b
F C W
Figure 2.4: Block diagram of the implemented DDS MMIC
outputs of the column-row decoders go to the switch matrix to control the switches in each
cell [15]. The latch and switch matrices contain 64 cells, and each of the cells is comprised
of a local decoder, latches, and switch pairs. The current switch outputs are summed at
open-collector output nodes. Next, the circuit design of the DDS building blocks will be
discussed.
2.4.1 Pipelined accumulator
The speed of the DDS is often limited by the speed of the phase accumulator. The speed
of the accumulator depends upon the N-bit adder design. The simplest way to construct an
N-bit adder is to place N 1-bit adders in a chain starting with a 1-bit half adder followed
by (N-1) 1-bit full adders with the carry-in of the full adder connected to the carry-out of
the previous bit. This ripple adder topology uses the least hardware, but operates at the
slowest speed. The delay of a ripple adder is due to the propagation of the carry bit from
13
the LSB to the MSB. The sum and carry-out of a full adder can be expressed as:
SUM = A B Cin
Cout = A B +B Cin +Cin A (2.6)
where A and B are the input bits and Cin is the carry in of the adder. The delay of
an N-bit ripple adder is given by
Delayripple = (N 1)Tcarry +Tsum (2.7)
where Tcarry is the time for carry generation and is equal to twice the delay of an AND
gate. Similarly, Tsum is the time for sum generation in a 1-bit adder and is about twice the
delay of an XOR gate.
If the accumulator input is time-invariant, each bit of the input word and the adder
output bits can be properly delayed so that a N-bit accumulator can operate at the speed
of a 1-bit adder. This type of accumulator, called a pipelined accumulator [7, 11], uses
the most hardware, but achieves the fastest speed. Ref. [11] employed the pipeline adder
architecture to implement the phase accumulator for a numerically controlled oscillator
(NCO).
Fig. 2.5 illustrates a generic architecture for an NxM pipelined accumulator with a
total of M pipelined rows. Each row has a total of M delay stages placed at the input and
output of an N-bit adder. Obviously, an NxM pipelined accumulator has a latency period
equal to the propagation delay of M-1 clock cycles. Note that an accumulator needs at least
one delay stage even without any pipelined stages. The pipeline accumulator shown allows
14
 
M - 1   F l i p - f l o p s
A
B
S U M
C
o u t
N
N - b i t  
A d d e r
N
N
N N
N - b i t  
FF
N - b i t  
FF
N - b i t  
FF
N - b i t  
FF
1 - b i t  
FF
C
in
A
B
S U M
C o u t
N
N - b i t  
A d d e r
N
N
N N
N - b i t  
FF
C in
M   F l i p - f l o p s
N - b i t  
FF
N - b i t  
FF
? ?? ?1:1 ???? NMNMF C W
? ?1:0 ?NF C W
? ?? ?1:1 ???? NMNMP h a s e
? ?1:0 ?NP h as e
M
 
 
P
i
p
e
li
n
e
d
 
R
o
w
s
Figure 2.5: NxM generic architecture of a pipelined accumulator
the NxM bit accumulator to operate at the speed of an N-bit accumulator, i.e., a speed-up
of M times. When the number of adder bits is set to one (N = 1), the 1xM bit accumulator
can operate at the same speed as a 1-bit adder. To realize a 9-bit accumulator, we can set
N = 1 and M = 9. Then, a 9-bit accumulator will run at the speed of a 1-bit accumulator
consisting of a full-adder and a  ip- op.
The pipelined accumulator is used for constant input words and can achieve the max-
imum operating frequency, whereas an accumulator with a carry look-ahead (CLA) adder
can be employed for variable inputs with medium operation frequencies. To achieve the
maximum operating speed with a  xed FCW, a pipelined accumulator is used in this de-
sign. The total delay of the accumulator is one full adder propagation delay plus one
D- ip- op propagation delay. The MSB of the accumulator input is tied to zero, since the
FCW will not exceed half of the clock frequency. The LSB of the pipeline accumulator
output is discarded and only its 8 MSB bits are fed to the nonlinear DAC. The  ip- ops in
15
the accumulator were designed with a reset signal that can be used to reset the accumulator
to zero.
In general, the ripple carry adder has complexity in the order of O(N) and delay
proportional to N, where N is the FCW length of the accumulator. The hardware cost
of the pipeline accumulator is of the order O(N2). In order to properly trade the area
for power, k-bit adders can be used for each pipeline stage as illustrated in Fig. 2.5. We
implemented a 1x8 pipelined accumulator in order to achieve the maximum speed. If 2-bit
adders are used in each pipeline stage, the critical path delay will not double based on Eq.
(7). Thus, the accumulator speed will be greater than half of that of the accumulator using
1-bit pipelined adders.
2.4.2 SiGe CML logic
Previous ultra high speed DDS designs used InP technology in order to take advantage
of the high speed InP transistors. However, these InP DDS designs su er from high power
consumption and low yield. This DDS design utilizes a commercial 0.18 m SiGe BiCMOS
technology with the HBT peak ft/fmax of 120/100GHz. The digital logic is implemented
using current mode logic (CML) cells with di erential output swings of 400mV. For a 3-
level CML circuit, a 3.3V power supply is su cient to keep all the BJT transistors from
saturation. If an NPN transistor operates in saturation mode, its speed is greatly degraded
and its parasitic PNP transistor is turned on, causing increased noise coupling through the
substrate.
In order to achieve a good balance between speed and power consumption, the bias
current is set to 70% of the peak fT current. Further increasing the biasing current doesn?t
16
speed up the CML circuit signi cantly. It?s not proper to bias the CML circuits at peak fT
current, since any variation of the biasing current may drive the circuit beyond fT current,
slowing down the transistors signi cantly with unnecessarily large power consumption. Al-
though the peak fT current is not a parameter that guarantees the operational speed for
di erent CML circuits with di erent loads, it?s a good indicator for the average speed of
the CML logic circuits. The current in a typical three-input CML gate is 0.55mA, which is
less than 1/5 of that used in InP DDS designs [21]. This bias current is su cient to keep
the delay of the three level gates below 25ps.
To provide more headroom for bipolar transistor operation, the current source of the
CML logic uses an NMOS transistor with degeneration resistor. In this case, the overdrive
voltage of the current source MOSFET is around 0.4 0.5V, which is smaller than the head-
room required by a bipolar transistor. In order to ensure that all the critical paths have
the same delays, the signal paths are designed using symmetrical patterns. Pipelined adder
stages are used to achieve the speed that is equivalent to a 1-bit adder. To reduce the logic
requirement of the adder, the structures described in [3] are adopted, as shown in Fig. 2.6.
The sum and carry-out are implemented using one current tail for low power application.
This adder circuit reduces the total number of bipolar transistors in the sum circuit from
14 to 10, and provides a speed improvement of around 15%. The delay of the sum block is
estimated to be 30ps, and the carry block is 25ps, with optimized biasing.
The breakdown voltage BVCEO of the NPN transistors in the 0.18 m SiGe BiCMOS
technology is approximately 1.8V. With 4-stacked NPNs under a 3.3V supply, each transis-
tor will experience less than 1V voltage cross the C-E junction. In addition, all the circuits
17
 
Figure 2.6: CML full adder circuit
are self-biased with no base open, which guarantees safe operation of the transistors without
breakdown.
2.4.3 Clock and MSB trees
The most challenging parts of the design are the clock tree and MSB bu er tree designs.
To eliminate glitches due to code errors induced by clock skews, clock trees are carefully bal-
anced to ensure synchronization and drive capability. Because the di erential clock signals
drive every  ip- op cell, and the total number of  ip- ops is above 200, synchronization
of the clock signals is not a trivial task. With a clock input frequency around 10GHz, the
current gain of the transistor degrades to about 10. Thus, the fan-out ratio of a clock bu er
is only 3 4, and the depth of the clock bu er chain is at least 6 levels for this design. To
fully turn on or turn o the di erential pairs, the input di erential peak-to-peak voltage
swing should be more than 6VT + IERE, in which VT is the thermal voltage and IE and
18
RE are the emitter current and emitter resistance of the bipolar transistor, respectively.
The voltage swing also depends on junction temperature, which can reach above 100C for
normal operating conditions. The clock signals at the  ip- op cells should swing no less
than 150mV, which is equal to 6VT at room temperature. Since every switching cell in the
DAC has an MSB signal, the total number of gates that the MSB must drive exceeds 120.
This MSB signal must also be synchronized with other decoded digital bits. The depth of
the bu er chain for the MSB signal is 5 levels. To accomplish all of this, the clock and
MSB bu ers require careful design, with layout symmetry and balance, in order to ensure
synchronization along the clock and MSB distributions.
2.4.4 DAC current source and switch
The essential building block of the nonlinear DAC is the sine weighted current source
matrix. The unit current of each current source is 0.1mA, which should provide the current
switches with enough switching speed when toggling. The largest source current is 0.7mA,
which is composed of 7 identical current sources. The table in Fig. 2.7 indicates the number
of unit current sources in each sine-weighted current source. The sum of each row is the
same, which assures the regularity of the current source array, as well as its compactness.
The current source matrix provides 64 pairs of sine-weighted currents that are summed
at the di erential current outputs, OUTP and OUTM. The current outputs are converted
to di erential voltages by a pair of o -chip 25 pull-up resistors. Fig. 2.7 shows that the
currents from the cascode current sources are fed to outputs, OUTP and OUTM, by pairs
of switches (Mswitch). The MSB controls the selection between di erent half periods. The
current switch contains two di erential pairs, with minimum size transistors, and a cascode
19
 
M S B
m
M S B
p
O U T
p O U T m
M
s w i t c h
Q
m
Q
p
D
m
D
p
S
m
S
p
C
L
K
p
C
L
K
m
P u l l  u p  
r e s i s t o r
V
cas
V
r e f
D
m
D
p
C p C
m
M S B m
M S B pM
s w i t c h
Q
m
Q
p
D
m
D
p
S
m
S
p
C
L
K
p
C
L
K
m
V
cas
V
r e f
D
m
D
p
C pC
m
D A C  
c u r r e n t  
c e l l
D A C  c u r r e n t  m a t r i x
4
Figure 2.7: Current switch circuit of the nonlinear DAC
transistor to isolate the current sources from the switches, and improve the bandwidth of the
entire group of switching circuits. The size of the switching transistor pairs is chosen to be
minimal in order to achieve the fastest switching speed with minimum power consumption,
and to reduce the e ect of clock feed-through.
For the current steering DAC, the impedance Zimp seen at the collectors of the switch
transistors of each current cell must be large enough so its impact on the integral non-
linearity (INL) speci cation of the DAC can be tolerated [23]. However, Zimp is frequency
20
 
D p
S m
D m
V r e f
C p C
m
C p
V C C
Q p
Q m
S p
V r e f
Figure 2.8: Synchronous switch control circuit of the nonlinear DAC
dependent. The impedance that is required to obtain a speci ed resolution is approximately
Zimp = NRL4Q (2.8)
where RL is the load resistance, N represents the total number of unit current sources,
and Q is the ratio of the signal to the second harmonic. To obtain 8-bit output resolution,
Zimp should be approximately 500k . When the frequency increases above 100MHZ, a
cascode current source is needed to meet the requirement for Zimp.
Uncertainty of the switching time of current switches is one of the major causes of
glitches at the DAC outputs. To synchronize the switches of the DAC, a D- ip- op with
NAND function is inserted between the MSB control bit and the switch pairs in the DAC.
The Sp and Sm are controlled by MSB signal and select either 0 or 180 degree phase, as
shown in Fig. 2.8.
21
Device matching is one of the important factors that a ect the static and dynamic
performance of the DAC. The matching properties of SiGe HBT bipolar transistors are
normally one order of magnitude better than those of MOSFETs with similar feature sizes.
To reduce of IR drops and matching errors, one must carefully choose the current source
transistor sizes and layout placements, and use wide interconnections. For long intercon-
nections carrying global signals, such as the clock and the MSB phase word, transmission
line e ects are taken into consideration during the layout. In order to minimize parasitic
capacitances and inductances, thick analog metal layers are used for global signal routing.
2.5 Layout
When running at a 10GHz clock rate, layout plays an important role in assuring that
the  nal design meets the expected speed requirement. The current source matrix and the
switching matrix are separately laid out and isolated from each other using a deep oxide
trench to reduce noise coupling from the digital circuitry to the current sources through the
substrate. The output of the DAC is placed close to the output pins to reduce interference
from the rest of the circuits. Di erential pairs are placed in a symmetrical manner so that
the di erential signals travel the same distance. In order to make the layout compact and
easy to cascade, the CML building blocks were designed to have the same height. Power
supply distribution stacks several metal layers to reduce resistance. Cadence Skill language
was utilized to generate the connections that form the unit current sources into the sine-
weighted current sources, in accordance with the given switching sequence. Hence, the INL
of the nonlinear DAC, due to symmetrical and gradient errors, is minimized. Two dummy
rows and columns have been added around the current source array to avoid edge e ects.
22
To minimize the systematic error, introduced by the voltage drop in the ground lines of
the current-source transistors, su ciently wide wires have been used. The clock inputs are
di erential CML compatible signals, and multiple clock inputs are provided to reduce the
parametric inductance resulting from the pins. The maximum delay of the metal wire is
about 40ps, and the clock tree is carefully built to ensure an acceptable clock skew.
2.6 Experiment results
The die photo of the DDS MMIC is shown in Fig. 2.9. This DDS design is quite
compact with an active area of 2.3 x 0.7mm2 and a total chip die area of 3 x 3 mm2
including the ESD pads, the layer density  lling elements and an 8.2 GHz on-chip VCO
that could be used to clock the DDS. The DDS prototypes were packaged using 48 pin
ceramic leadless packages. For a frequency range over 10GHz, the PCB test board was
developed using a Rogers RO4003 laminate board, which has a loss tangent of less than
0.003 and good temperature stability. To convert the single-ended signal to di erential clock
inputs, a 180 degree 3dB hybrid coupler is employed at the clock input. For the di erential
outputs, a second hybrid coupler is inserted into the output path. The test setup diagram
is illustrated in Fig. 2.10.
Power consumption of the DDS with the DAC is approximately 1.9W, and the max-
imum clock frequency as measured is 12.3GHz. With Nyquist output, the DDS achieves
a maximum clock frequency of 11.9GHz. The digital and analog parts of a sine-weighted
DAC consume 300mA from a 3.3V supply and 35mA from a 4V supply, respectively. The
accumulator consumes 250mA of current with a 3.3V supply.
23
 
V C O
P i p e l i n e  
A c c u m u l a t o r
S i n e  
W e i g h t e d  
DAC
Figure 2.9: Die photo of DDS chip
 
S i g n a l  
G e n e r a t o r
3 d B  
H y b r i d  
c o u p l e r
D D S  
T e s t  
B o a r d
3 d B  
H y b r i d  
c o u p l e r S p e c t r u m  A n a l y z e r
50 O 
T e m i n a t o r
50 O 
T e m i n a t o r
D C  B l o c k D C  B l o c k
O s c i l l o s c o p e
Figure 2.10: DDS MMIC test setup
24
Although the power consumption of the SiGe DDS is small compared to other InP
DDS, its power density is high due to its small die size. For 1.9W power concentrated on a
small die area of 9mm2, the power density of the DDS MMIC would exceed 21W/cm2, which
is a number that normally appears only for high performance processors. The relatively
high power density of the DDS MMIC makes it di cult to dissipate the heat when it?s
packaged. The junction-to-ambient thermal resistance  JA of the 48-pin ceramic package
is about 40 C/W with zero air  ow. Therefore, the device junction temperature of the
DDS MMIC could reach above 100 C at the room ambient temperature of 25 C with 1.9W
power consumption. For this reason, an external fan is used to cool the device during
measurements. To further reduce the thermal resistance and maximize heat dissipation,
packages with a heat sink can be used. To our knowledge, other InP MMICs [21, 22, 8]
were tested on wafer, while this SiGe DDS MMIC was tested as a packaged part. To
test the maximum speed, the packaged DDS chips were cooled down to -50 C -80 C such
that the junction temperature is around room temperature. This test condition provides a
fair comparison between the packaged SiGe DDS MMIC and the wafer-probed InP DDSs.
Lowering the junction temperature improves the transistor speed due to increased carrier
mobility at lower temperature. Without cooling, the maximum clock speed of the packaged
DDS MMIC is measured as 9.6GHz with Nyquist output and 11GHz with FCW = 1 at
room ambient temperature. For the 20 tested prototypes, the chip performances are quite
consistent. SiGe technology gains advantages of high yield and high performance over the
InP technology.
25
Fig. 2.11,2.12,2.13,2.14 illustrates the measured DDS output spectra and waveforms
for di erent outputs and clock frequencies. The measured spectra were obtained by cool-
ing down the packaged chips so that the device junction temperature approaches the room
temperature. Fig. 2.11 presents the 23.5MHz DDS output waveform and spectrum with
a 12.021GHz clock input. The time-domain waveform measurements were limited by the
digital sampling scope?s 500 MHz bandwidth. The measured DDS output power is ap-
proximately -6.67 dBm. All measurements were done without calibrating the losses of the
cables, the coupler and the PCB tracks. Fig. 2.12 gives the measured DDS output spec-
tra at Nyquist rate, namely, (a) output frequency at 5.930GHz with clock at 11.913GHz;
and (b) output frequency at 5.04GHz with clock at 10.110GHz. Fig. 2.12(a) demonstrates
the maximum DDS operation frequency of 11.9 GHz at Nyquist output with the SFDR of
22dBc. The measured SFDR of the device, at 5.08 GHz output frequency with a 10.11 GHz
clock, is approximately 30dBc in narrow-band as shown in Fig. 2.12(b). For Fig. 2.12(a),
the FCW is chosen as 28-1, which is the maximum allowed by an 8-bit FCW input. Thus,
the output frequency is set at . The  rst order image tone mixed by the clock frequency
and the DDS output frequency occurs at 11.913GHz-5.93GHz = 5.98GHz, which is 50MHz
apart from the output frequency, as shown in Fig. 2.12(a). Operating the DDS at close to
Nyquist rate makes it very hard to  lter out the image tones. Practically, the DDS output
frequency is restricted to be less than 3/8 of the clock frequency. An image tone at 5.08GHz
is also observed in Fig. 2.12(b) with a clock at 10.110GHz.
Fig. 2.13 presents the measured DDS output spectrum with a 1.7898GHz output and
a 9.59GHz clock. The measured output power of the DDS is -9 dBm, which corresponds
to greater than -5dBm power when cable and coupler loss are considered. The input FCW
26
Figure 2.11: Measured DDS output waveform (a) and spectrum (b) with a 23.5MHz output
(FCW=1) and a clock at 12.021GHz
Figure 2.12: Measured DDS output spectra at Nyquist rate (FCW=511). (a) The output
frequency at 5.930GHz and the image tone at 5.98GHz with a clock at 11.913GHz; (b) The
output frequency at 5.04GHz and the image tone at 5.08GHz with a clock at 10.110GHz
27
Figure 2.13: Measured DDS output spectrum with a 1.7898GHz output and a 9.59GHz
clock
equals to 96, so that the GCD(96,29) = 32, which leads to spurs equally spaced with 600
MHz spacing around the fundamental tone when the clock is 9.59GHz.
Fig. 2.14 gives the DDS output waveforms at 1.125GHz with a 9GHz clock. At
high temperature, the transistors are slowed down and the DAC current switches are no
longer perfectly synchronized due to increased internal delays. Fig. 2.14 demonstrates
a clean sinusoidal output waveform with the package measurements at the 9.6GHz clock
frequency. Fig. 2.15 illustrates the measured DDS output spurious-free-dynamic-range
(SFDR) versus frequency control word with a 4.6GHz clock at ambient temperature of
-20 0C. The measured SFDR ranges from 20 to 30 dBc. Compared to the theoretical
28
Figure 2.14: Measured DDS output waveform with 1.125GHz output and 9GHz clock
analysis, the degradation of the measured SFDR are due to a combination of e ects including
the wideband matching of the clock and output signals, nonlinearity associated with the
nonlinear DAC, and noise coupling from the reference line, the substrate and the power
supply.
When compared with the InP DDS in [9], which operates at a 9.2GHz clock frequency,
this design achieves similar SFDR performance, yet with much lower power consumption.
Most of the InP DDS MMICs were measured using probe stations [21, 22], while this DDS
RFIC was tested with packaged parts. Table 1 compares the recently published mm-wave
DDS MMIC performances. The designs reported in [21] and [22] used InP technologies
with an ft/fmax above 300/300 GHz, which is almost triple those reported here. The
InP DDS[9] employs an 8-bit accumulator and an 8-bit DAC and operates at a maximum
29
 
0 50 1 0 0 1 5 0 2 0 0 2 5 0
0
5
10
15
20
25
30
35
F r e q u e n c y  C o n t r o l  W o r d
S
F
D
R
 
(
d
B
c
)
Figure 2.15: Measured DDS output SFDR versus frequency control word at -20C ambient
temperature
clock frequency of 9.2 GHz with a power consumption of 15 W. On the other hand, this
SiGe 9-bit DDS consumes 1.9W with 3.3V power supplies for digital and analog circuits,
respectively. The 4V power supply was tied to a pair of pull-up resistors, providing more
voltage headroom and output swing for the DAC output stage. The VCO and the DDS are
separately powered, and the 1.9W power consumption does not include the power of the
VCO.
As shown in Table 2.1, the minimum transistor size in the InP technology is much
larger than SiGe technology. Although the current densities required to achieve peak fT
frequency in InP and SiGe technologies are similar, the current required to operate the
minimum transistor close to a peak fT frequency di ers quite a bit, which contributes to
the superior power e ciency performance of this SiGe DDS. When compared with the
30
Technology InP InP InP InP TFASTInP SiGe
fT/fmax[GHz] 137/267 300/300 300/300 180/266 406/423 120/100
[9] [21] [22] [8] [8] [Thiswork]
Emitterarea of minimal size
transistor[mm2]
1.5x4 0.4x2 0.4x2 0.5x2 0.25x1 0.2x0.64
Emittercurrent density at peak
fT[mA/mm2]
1 1.2 5 5 - - 6
PeakfTcurrent ofminsizetran-
sistor[mA]
7.2 4 4 - - 0.77
BreakdownvoltageBVceo [V] 8 4 4 - 5 1.8
Accumulatorsize[bit] 8 8 8 9 9 9
DAC resolution[bit] 7 7 5 - - 8
Maxclock frequency[GHz] 9.2 13 32 8 12 12.3cooled9.6room
SFDRwithNyquistoutput[dBc] 30 26.67 21.56 38 30 22@12GHz27@10GHz
Power consumption[W] 15 5.42 9.45 7 8 1.9
Transistorsnumber 3000 1646 1891 8695 8800 9600
Diesize[mm2] 8x5 2.7x1.45 2.7x1.45 4x2 - 3x3chip2.3x0.7active
FOM[GHz/W] 0.5 2.4 3.386 1.1 1.5 6.3cooled5.05room
Table 2.1: Ultra-high speed DDS performance comparison
published DDS MMICs, this SiGe DDS achieves the best reported power e ciency FOM of
6.3GHz/W with a much smaller die size of 2.5x0.7mm2.
2.7 Conclusion
In this chapter, a 12 GHz direct digital synthesizer (DDS) MMIC with 9-bit phase and
8-bit amplitude resolution has been implemented in a 0.18 m SiGe BiCMOS technology.
Composed of a 9-bit pipeline accumulator and an 8-bit sine-weighted current steering DAC,
the DDS is capable of synthesizing sinusoidal waveforms up to 5.93 GHz. The maximum
clock frequency of the DDS MMIC is measured as 11.9 GHz at the Nyquist output and 12.3
GHz at 2.31 GHz output. The spurious free dynamic range (SFDR) of the DDS, measured
at Nyquist output with an 11.9 GHz clock, is 22 dBc. The power consumption of the DDS
31
MMIC measured at a 12 GHz clock input is 1.9 W with dual power supplies of 3.3V/4V.
The DDS thus achieves a record-high power e ciency  gure of merit (FOM) of 6.3 GHz/W.
With more than 9600 transistors, the active area of the MMIC is only 2.5 x 0.7mm2. The
chip was measured in packaged prototypes using 48-pin ceramic LCC packages.
32
Chapter 3
QUADRATURE PHASES SIGE DDS
3.1 Introduction
In wireless transceivers, quadrature clock signals are always required for the modulator
and the demodulator. There are several ways to generate the quadrature waveforms that
widely adopted in the circuit design. The  rst is using a divider to divide down the output
signal from a local oscillator or external source and the sine and cosine signals natively come
out. This requires the output frequency of a local oscillator doubles the carrier frequency.
The advantage is that the pulling e ect or DC o set coming from the local oscillator can
be minimized. Another way is to implement a quadrature VCO which takes more area
and consume more power that the single phase VCO. The third approach is by using a
polyphase  lter to convert single phase signal to quadrature phase outputs. To reduce
the phase and amplitude imbalances, multiple stages polyphase  lter may required. This
will introduce insertion loss and thermal noise. For DDS design, to generate well balanced
quadrature waveforms in a large frequency range natively will avoid the problems of the
divider method and polyphase method.
3.2 Direct Modulations in DDS
The conceptual diagram in Fig. 3.1 shows the method to implement di erent types of
modulation con gurations in a ROM-less DDS employing a nonlinear DAC. The principle
of a DDS can be brie y descried as,  rst integrating the frequency control word into a phase
control word, then mapping the phase control word to an amplitude word,  nally converting
33
the amplitude word into an analog signal output. All the frequency, phase and amplitude
information are readily available in the DDS data path and can be directly addressed and
manipulated, thus the digital modulation can be done without too much extra hardware
cost. By directly using digital control words to change the values of registers in the data
path of a DDS, the frequency, phase and amplitude of the output waveforms can be precisely
controlled. Since all the modulations are done in the digital domain, many disadvantages
associated with normal analog modulations can be precluded. The values of the registers in
a DDS are updated with a data rate that equals to the input clock frequency, which means
that high speed modulated waveforms can be generated. Waveform generation for various
modulation schemes is desired for novel radio transmitter architectures. As an example,
modern radar systems place ever-increasing demand for a ordable low noise signals and
high speed waveform generation. With the availability of single chip DDSs working at
microwave frequency, digitally generating highly complex wide bandwidth waveforms at
the highest possible frequency instead of down near baseband would considerably reduce
the transmitter architecture in terms of size, weight and power requirements as well as
cost. These waveforms are used for high range resolution radars in sorting targets from
clutter with low probability of intercepting communication applications. The modulated
waveform generation is a unique feature of the DDS approach. The DDS synthesizer can
implement modulations and waveforms such as chirp, step frequency, frequency modulation
(FM), frequency shift keying (FSK), minimum shift keying (MSK), phase modulation (PM),
amplitude modulation (AM), quadrature amplitude modulation QAM and other hybrid
modulations, as illustrated in Fig. 3.1.
34
 
C h ir p  
D a ta
D D S  
F c lk
++ +
P h a s e  
R e g is t e r  
C a r r ie r  
F r e q u e n c y  
W o r d  F c
Fc + Fb / 4
F r e q u e n c y
R e g i s t e r
+
Fc - Fb / 4
M S K  D a t a
M
U
X
F M  D a t a P M  D a ta
N o n l in e a r
D A C
~
~
~
D D S  
M o d u la t o r  
O u t p u t
A M  D a t a
D e la y
D i g i t a l  F r e q u e n c y  D o m a i n D i g i t a l  P h a s e  D o m a i n A n a l o g  
A m p l i t u d e  
D o m a i n
Figure 3.1: Direct modulation through a DDS
The typical choice converting the baseband signal from to polar magnitude and phase
data to Cartesian I and Q data during the modulation is based on the normal practical
consideration. Direct manipulation of magnitude and phase in polar system is expensive
and di cult to design and build. The approach of taking DDS into transceiver system to
perform the polar modulation task in addition to the normal frequency synthesis is one
way to solve this problem that worth further exploring. Since the major parts of a DDS
are digital circuits, it?s easier to integrate the DDS with baseband circuit and provides a
compact solution to the transmitter design.
Sometimes it is expected that the DDS output can cover more frequency range while
the typical DDS output frequency ranges from DC to one third of the input clock frequency.
When the output frequency closes to the Nyquist output, the frequency of the alias image
will come closer to the output frequency, which made it almost impossible to be removed
with analog low pass  lter. To build low pass  lter with steep roll o characteristic at several
35
sin?t cos?t
I(t)
Q(t)
RFout
RFout
-
Quadrature DDS
?
?
Figure 3.2: Extend the output frequency range using a quadrature DDS and SSB mixers
GHz will require tremendous e ort. A practical solution to extend the output frequency of
a DDS to a wider range without incurring the problems of alias images are using single side
band (SSB) mixer, as shown in Fig. 3.2.
The local oscillator generates quadrature outputs with relatively  xed output frequency
0, which are mixed with the outputs of a quadrature DDS. Then the mixer outputs are
summed and subtracted with each other, so the up-converted cosine waveforms with a fre-
quency of 0+ or 0- are derived. Theoretically the  nal output should be clean of alias
images. However, in practice the DDS output contains harmonics and spurs that signi -
cantly deteriorate the purity of desired output waveforms. The imperfections of the mixers
due to leakage and second order e ects will introduce some other spurs that have negative
impacts on the output signals. Even though, the power of the alias image tune is small
compare to the fundamental tune, which greatly easies the  lter design. Assuming the local
oscillator frequency is higher than the output frequency of the quadrature DDS, the above
mixing scheme can be used to up convert the DDS output frequency to a higher frequency
band.
36
 
Figure 3.3: Conceptual drawing of the quadrature DDS RFIC
3.3 DDS circuit design
3.3.1 Quadrature DDS architecture
The simpli ed block diagram of the ROM-less quadrature DDS, employing one 9-
bit pipeline accumulator and two nonlinear DACs, is shown in Fig. 3.3. Intuitively, by
paralleling two single phase DDSs, one with sine output and another with cosine output,
and then merging them together, a quadrature DDS can be realized. When performing the
merge of two single phase DDSs, the goal is to share the commonly used circuits in both
DDSs as much as possible. In this design, the phase accumulator is shared for the two DDSs
due to the limitation of the fan-out factors of the digital logic gates at multi GHZ frequency,
which leaves very marginal gain when sharing the decoders and other digital blocks inside
the DACs.
The main components in a single phase ROM-less DDS are phase accumulator and
sine-weighted nonlinear DAC. For a DDS with L-bit frequency control word (FCW) and
M-bit phase resolution DAC, the output frequency of the synthesized sine waveform is
37
modulated by the truncation error of the accumulator. The output of the phase accumulator
is truncated into M bits to  t the inputs of the nonlinear DAC. Usually the phase resolution
of the DAC is much less than the resolution of the phase accumulator, then L-M bits are
discarded, which introduces FCW depended spurs. The contribution of phase truncation
related spurs to the total spurs and noise of the DDS output is considered to be a dominate
factor if the following assumption is valid, that DAC is ideal or close to ideal. However, even
for linear DAC, when sampling rate is over multiple GHz and transition of the magnitude is
larger comparing to the full scale output, the validity of above assumption is no longer hold.
For a nonlinear DAC, the situation is more complicated. It is not an easy task to reach high
amplitude resolution with a nonlinear DAC at multiple GHz clock speed. In fact, the ultra
high speed nonlinear DACs in the published works at most have 8bit amplitude resolution.
The nonlinear DAC approach is still attractive for the microwave DDS design because it
provides drastically speed improvement to the ROM based or algorithm based DDS design.
To reduce the e ect of the amplitude error introduced spurs in an ultra high speed DDS
needs to be taking into account during the design. The phase truncation error introduced
spurs are already minimized because only one bit of the phase accumulator output has been
truncated.
As illustrated in Fig. 3.3, the quadrature DDS RFIC utilizes one 9-bit pipeline ac-
cumulator and two nonlinear 8-bit sine-weighted current-steering DACs to simultaneously
generate the sine and cosine waveforms. The DDS comprises a 9-bit pipeline accumulator.
Since the out frequency cannot exceed the Nyquist rate, an 8-bit FCW is fed into a 9-bit
pipeline accumulator with the most signi cant bit (MSB) of the accumulator input tied to
zero internally. Thus, the DDS requires only 8-bit FCW inputs. The output of pipeline
38
accumulator gives a 9-bit phase word. The least-signi cant-bit (LSB) of the 9-bit phase
word is truncation before driving the 8-bit DAC input.
To produce the 90 degree phase word, binary number of ?01? need to be added to the
two most signi cant bits of the DAC input. Translating the add function into gate level,
the output of the MSB is the results of an Exclusive-OR (XOR) of the  rst two MSB
inputs and the output of the 2nd MSB is the inversion of the 2nd MSB input. Because all
the digital logics have di erential outputs, only one XOR gate is needed to be inserted at
the inputs of the sine-weighted DAC to converter it to DAC with 90 degree output phase
di erence. Current steering DAC structure is chosen for its advantages of high speed and
good matching between unit cells. The di erential current outputs of the nonlinear current
steering DACs are converted to di erential voltage outputs with two pairs of external 15ohm
pull-up resistors. The detailed block diagram of the quadrature DDS is shown in Fig. 3.4.
On the right side there are two back-to-back sine-weighted current steering DACs and on the
left side the phase accumulator is shared by both DACs. The DDS also includes a standard
LC-tuned VCO, which can be connected to the input of the clock bu er on the upper side
to drive the whole DDS. Since the two nonlinear DACs are identical, naturally it appears
to be bene cial if all the decoders and bu ers in the DACs can be shared and only leave
current switches and current sources separated. This is a plausible suggestion and worth to
be investigated further. Before evaluating this alternative solution, the mechanism of the
sine waveform generation in the nonlinear sine-weighted DAC will be explained.
The pipeline accumulator integrates the input FCW to phase information. Due to
the symmetry of a sine waveform, only one quarter of sine waveform data is stored in the
sine-weighted DAC. The two MSBs are used to determine in which sine wave quadrant the
39
 
8 b
S i n  O u t
9 B i t  P i p e l i n e  
A c c u m u l a t o r
X O R
D e c o d e r
D e c o d e r
B u f f e r B u f f e r
C u r r e n t  
S w i t c h
S i n e  W e i g h t e d  
C u r r e n t  S o u r c e
F C W
6 b
M S B
2
nd
M S B
3 b
3 b
7 b
7 b
63
V C O B u f f e r
1 b
M S B
C L O C K _ O U T
X O R
D e c o d e r
D e c o d e r
B u f f e r B u f f e r
C u r r e n t  
S w i t c h
S i n e  W e i g h t e d  
C u r r e n t  S o u r c e
6 b
3 b
3 b
7 b
7 b
63
M S B
X O R
6 b
B u f f e r
C L O C K
C L O C K _ IN
C o s  O u t
Figure 3.4: Detailed block diagram of the qaudarture DDS
40
phase accumulator output resides, according to the quadrant symmetry of the sine wave.
The MSB output of the phase accumulator is used to provide the proper mirroring of the
sine waveform at the phase point. The 2nd MSB is used to invert the remaining 6-bit for the
second and fourth quadrants of the sine wave prior to the decoding logic. The 6-bit outputs
are split to 3-bit and 3-bit and fed into two column-row decoders which drive column lines
and row lines of the inputs of the current switch cells. Each column-row decoder in this
circuit is a linear 3:8 operation. The outputs of the column-row decoder go to the switch
matrix to control the switches in each cell. The latch and switch matrices contain 64 cells,
and each is comprised of a local decoder, latches, and switch pairs. The current weights of
the current sources inside the current source matrix are preset to the sinusoidal waveform
data. The current switch outputs are summed at the open-collector output nodes. Sharing
all the digital blocks before current switch cells in the DACs in order to reduce the circuit
size or power consumption may not have too much impact on the performance of the DDS.
The symmetry properties of sine and cosine waveform are di erent, particularly, sine is an
odd function and cosine is an even function. The turning on sequences of the switch cells
guarantee the complete sine waveform generation. According to the symmetry property of
the sine waveform, the derived cosine waveform is not continuous, as shown in Fig. 3.5. So
directly share all the logic before the current switch cells to simultaneous produce sine and
cosine waveforms will encounter some problems. One way to overcome the di culty is to
implement the XOR function which generates the  rst two MSBs input for the cosine DAC
at the input into the current switch cells. This method need add 64 or more logic gates
to the switching cells. Considering the fan out factor of the logic cells at several GHz, the
total loads on the signal paths are nearly the same. Thus the actual number of gates are
41
not reduced, the area cost and power consumption will maintain the same. It looks like
that sharing the logic blocks before the current switch cells can only achieve marginal gain.
In this design, only the phase accumulator is shared between the sine and cosine DACs.
3.3.2 Pipelined accumulator
The adder in the phase accumulator of the DDS can be chosen from variant types such
as pipeline, ripple-carry and carry-look-ahead adders. For modulation purposes, it would
be bene cial to use a carry-look-ahead adder or ripple-carry adder, but their speeds are
restricted by inevitably introducing more delay stages in the critical paths. Instead, to
achieve maximum operating speed, a pipelined accumulator is adopted in this design. The
delays of the accumulators are determined by the propagation delay of the full adder (FA),
the D- ip- op (DFF) and the level shifter (LS) and can be expressed as
Ttotal(Pipeline) = Tdq(DFF) +Tdq(FA) +Tdq(LS)
Ttotal(Ripple carry) = Tdq(DFF) + (Tdq(FA) +Tdq(LS)) N (3.1)
The total delay of carry-looked-ahead accumulator has a delay between those two. From
the above equations it?s clear that the pipeline accumulator can run at least double clock
speed and is straight forward to build. The ripple adder topology uses the least hardware,
but operates at the slowest speed. The delay of the carry-looked-ahead accumulator is
estimated with a maximum fan-in of 3 and a group size of 3. For the D- ip- op in the
accumulator, a reset pin is added to reset the accumulator to the initial state.
In this DDS, current mode logic cell (CML) has been chosen to implement the digital
logic block. The breakdown voltage VCEO is 1.8V and the VBE is approximately 0.9V
42
 
p
p
0
0
0 . 5 p
0 . 5 p
0 . 5 p
0 . 5 p
1 . 5 p
1 . 5 p
2 p
2 p
S i n e
C o s i n e
Figure 3.5: Output sine and cosine waveform depending on the symmetry property of the
sine waveform
under typical bias condition. For a 3-level CML logic cell, to keep all the bipolar transistor
work in the active region the minimum power supply voltage is VSWING + VBE + 2 
VCESAT + VDSSAT, in which VSWING is the output amplitude of the CML, VCESAT
is the saturation collector-emitter voltage of the SiGe bipolar transistor and the VDSSAT
is the saturation overdrive voltage of the NMOS transistor in the current source. Roughly
estimation indicates that the CML logic can work with a 2.7V power supply while the speed
will be sacri ced. 3.3V is a more comfortable choice to ensure the base-collectors of all the
BJT transistors are reverse biased. The current source of the CML logic uses a NMOS
transistor with a degeneration resistor to provide more headroom for the bipolar transistor
operation. The overdrive voltage of a NMOS FET is around 0.4 0.5V, which is signi cantly
smaller than a normal bipolar transistor. Choosing optimized value of the bias current of
the CML logic depends on several factors, the structure of this stage, total loads of the
next stage and the drive strength of the previous stage, which means the bias current for
every logic gate should be separately tuned. A more practical approach is to choose same
bias current for the same type CML logic with the equal size devices. Here the bias current
is set to 70% of the peak fT current to achieve a good trade-o between speed and power
43
consumption. Although it will be more meaningful to use peak fMAX current to specify the
operational speed, the peak fMAX is related to the load of the previous stage, a variable
factor determined by the circuit itself. After all the peak fT current still can be serve as a
reasonable indicator of average speed of the CML logic circuits. As a result, the propagation
delay of the sum logic is 30ps, and the carry block is 25ps when setting fan-out factor to 1
or 2.
3.3.3 DAC current source and switch circuits
The essential building block of the nonlinear DAC is the sine weighted current source
matrix. The smallest unit current of each current source is 0.1mA, which should provide
the current switches with enough switching speed when toggling. The largest current in
the current source is 0.7mA, which is composed of 7 identical current sources. The current
switch contains two di erential pairs, with minimal sized transistors, and a cascade tran-
sistor, to isolate the current sources from the switches, and improve the bandwidth of the
entire group of switching circuits. The current source matrix provides 128 sine-weighted
currents that are summed at the di erential current outputs, OUTP and OUTM. The cur-
rent outputs are converted to di erential voltages by a pair of o -chip 25 pull-up resistors.
Fig. 3.6 shows that the currents from the cascode current sources are fed to outputs, OUTP
and OUTM, by pair of switches. The MSB controls the selection between Part A and Part
B during di erent half periods. The size of the switching transistor pairs is chosen to be
minimal in order to achieve the fastest switching speed, with minimum power consumption,
and to reduce the e ect of clock feed-through.
44
 
M S B
p M S B
m
D
m
D
p
R E F
R E F
C L K
p
C L K
m
C L K
p
V C C
Q
p
Q
m
O U T
m
D
p
D
m
M S B
m
M S B
p
O U T
p
I
S
I
S
Q
m
Q p
M S B
m
M S B
p
Q
m
Q
p
M S B
m
M S B p
C L K
p
C L K
m
D - FF
D - FF
Figure 3.6: DAC current switch circuit
45
In the current steering DAC, the impedance Zimp seen in the drain of the switch tran-
sistors of each current cell must be large enough so its impact on the integral non-linearity
(INL) speci cation of the DAC can be tolerated [23]. However, Zimp is frequency depen-
dent. To obtain 8-bit output resolution, Zimp should be about 500k . When the frequency
increases above 100MHZ, a cascode current source is needed to meet the requirement for
Zimp. Device matching is one of the important factors that a ect the static and dynamic
performance of the DAC. The matching properties of SiGe HBT bipolar transistors are
normally one order of magnitude better than those of MOSFETs with similar feature sizes.
Carefully choosing current source transistor sizes and positions, and increasing the widths
of the interconnections to reduce IR drops, helps to reduce matching errors. For those long
interconnections carrying global signals like clock and the MSB phase word, transmission
line e ects are taken into consideration during the layout. In order to minimize parasitic
capacitances and inductances, top metal layers are used for global signal routings.
Power consumption of the DDS in the Ghz range is always a severe problem due to
the scale of the circuit and the high current density that the transistors require at these
frequencies. To increase the operational speed, the current  ow in the transistors should
be increased proportionally to overcome both the transistor parasitic and interconnection
loads. With a scaled-down device feature size, the latter plays a signi cant part in the total
delay at relatively high speeds. The advantage of using bipolar transistors over their CMOS
counterparts is that a bipolar transistor provides higher current gain while maintaining a
reasonable size. The CMOS transistor area, on the other hand, is larger in order to generate
the same amount current. The more meaningful spec of the bipolar transistor is the Fmax,
which takes into account the base resistance and provides insight into CML circuit speed
46
under normal operation. The relatively high power density of DDS chips makes it extremely
di cult to quickly remove heat without an external air  ow. According to the ceramic
package  JA spec, it?s quite normal to get a 20 30 degree Celsius temperature increase
per watt dissipation on the chip. If only the bare die area is considered, the situation
is even worse, because a total power of approximately 2W would be concentrated on a
small die area of 10mm2. Thus the indicated power density would exceed 100W/inch2,
which is an alarming number and normally appears in only high performance processors.
Without a heat sink, the chip temperature could reach 85 degrees, with 27 degrees of
ambient temperature. For this reason, an external fan is used to cool down the device when
making measurements.
3.3.4 Clock tree and MSB tree designs
The most challenging parts of the design are the clock tree and MSB bu er tree de-
signs. To eliminate glitches due to code errors induced by clock skews, clock trees are
carefully balanced to ensure synchronization and drive capability. Because the di erential
clock signals drive every  ip- op cell, and the total number of  ip- ops is above 200, syn-
chronization of the clock signals is not a trivial task. With a clock input frequency around
10GHz, the fanout ratio is only around 3 4, and so the depth of the clock bu er chain is
at least 6. To fully turn on or turn o the di erential bipolar pairs, the input di erential
peak-to-peak voltage swing should be more than 6VT+IE*RE, in which VT is the thermal
voltage, 26mV at the room temperature and IE, RE are emitter current and emitter re-
sistance of the bipolar transistor respectively. The voltage swing also depends on junction
temperature, which can reach above 80 degree Celsius at the normal operation condition.
47
The clock signals at the  ip- op cells should swing no less than 150mV, which is equal to
6VT at room temperature. Since every switching cell in the DAC has an MSB signal, the
total number of gates to be driven exceeds 120. This MSB signal must also be synchronized
with other decoded digital bits. The depth of the bu er chain for the MSB signal is 5. To
accomplish all of this, the clock and MSB bu er require careful consideration in order to
ensure that each middle and end point load are well balanced.
3.3.5 Layout considerations
When running at a 10GHz clock rate, layout plays an important role in assuring that
the  nal design meets the expected speed requirement. The current source matrix and
the switching matrix are separately laid out and isolated from one another using a deep
oxide trench to reduce the noise coupling from the digital part to the current source from
the substrate. The output of the DAC is placed close to the output pins to reduce the
interference from the other parts of the circuit. Di erential pair device signals are placed
in a symmetrical manner such that the signal traveling lengths are almost same. In order
to make the layout compact and easy to assemble, the CML building block has the same
height, and power supply distribution stacks several metal layers to reduce the width of the
metal.
Table.3.1 indicates the number of unit current sources in each sine-weighted current
source. The sum of each row is the same, which assures the regularity of the current source
array, as well as its compactness. Cadence Skill language has been utilized to generate the
connections which form the unit current sources into the sine-weighted current sources, in
48
2 4 6 6 6 5 3 0
0 3 5 6 6 6 4 2
2 4 6 6 6 5 2 1
1 3 5 6 6 6 3 2
3 4 6 3 6 5 3 2
1 3 5 6 7 5 4 1
2 4 5 6 6 5 3 1
0 4 4 6 7 5 5 1
Table 3.1: Number of unit current sources in sine-weighted current source
accordance with the given switching sequence. Hence, the INL of the nonlinear DAC, due
to symmetrical and gradient errors, is optimized.
In the layout, two dummy rows and columns have been added around the current source
array to avoid edge e ects. To minimize the systematic error, introduced by the voltage
drop in the ground lines of the current-source transistors, su ciently wide lines have been
used. The clock inputs are di erential CML compatible signals, and multiple clock inputs
are provided to reduce the parametric inductance resulting from the pins. The maximum
delay of the metal wire in chip is about 40ps and the clock tree is carefully built to ensure
an acceptable clock skew.
3.4 Measured results
The die photo of the quadrature DDS RFIC is shown in Fig. 3.7. This DDS design is
quite compact with an active area of 2.3x2.5mm2 and a total die area of 3 x 3 mm2. The
DDS MMIC was packaged in a 48-pin ceramic leadless package. The test board was built
using Rogers RO4003 laminate board, which has a loss tangent of less than 0.003 and good
temperature stability. To convert the single-ended signal to di erential clock inputs, a 180
degree 3dB hybrid coupler is employed at the clock input. For the di erential outputs, a
49
second hybrid coupler is inserted into the output path to covert them into single-end for
testing. Fig. 3.8 illustrates the test setup.
We  rst tested the output of a single-phase DDS RFIC. In a separate design, we have
implemented a single-phase DDS RFIC that was tested at a maximum clock frequency of
11GHz with a power consumption of 1.9W. At Nyquist rate, the single phase DDS can
operate at a maximum clock frequency of 9.6GHz, which corresponds to the record high
power e ciency FOM of 5.1GHz/W [30]. Fig. 3.9 illustrates the measured single-phase
DDS output spectrum with 2.227GHz output and 9.07GHz clock. The measured output
power is approximately -16dBm. All measurements were done without calibrating the losses
of the cables and PCB tracks.
Figures 10-13 illustrate the measured quadrature DDS output spectra and waveforms
for di erent outputs and clock frequencies. Fig. 3.10 presents the 0.397GHz quadrature
DDS output spectrum with a 5.44GHz clock input. The measured output power is ap-
proximately -4.67dBm. Fig. 3.11 demonstrates the highest operational frequency of the
quadrature DDS at 6.815GHz with close to Nyquist output of 3.394. The measured SFDR
of the device, at a 3.394 GHz output frequency with a 6.815 GHz clock, is around 30dBc.
For Fig. 3.11, the FCW is chosen as 28-1, which is the maximum allowed by an 8-bit FCW
input. Thus, the output frequency is set at . The  rst order image tone mixed by the clock
frequency and the DDS output frequency occurs at 6.8GHz-3.387GHz = 3.421GHz, which
is 27MHz apart from the output frequency, as shown in Fig. 3.11. Operating the DDS at
close to Nyquist rate makes it very hard to  lter out the image tones. Practically, the DDS
output frequency is restricted to be less than 3/8 of the clock frequency.
50
Figure 3.7: Die photo of the quadrature DDS MMIC
 50 O T e m i n a t o r
S i g n a l  
G e n e r a t o r
3 d B  
H y b r i d  
c o u p l e r
Q D D S  
T e s t  
B o a r d
3 d B  
H y b r i d  
c o u p l e r
S p e c t r u m  
A n a l y z e r
50 O 
T e m i n a t o r
D C  B l o c k
D C  B l o c k
O s c i l l o s c o p e
S i n  O u t
C o s  O u t
D C  B l o c k
D C  B l o c k
C L K 3 d B  
H y b r i d  
c o u p l e r
Figure 3.8: Test setup of the quadrature DDS RFIC
51
Figure 3.9: Measured single phase DDS output spectrum with clock at 9.07 GHz and output
at 2.227GHz
52
Figure 3.10: Measured quadrature phase DDS output spectrum with clock at 5.44 GHz and
output at 0.397GHz
53
Fig. 3.12 illustrates the measured output waveform of the quadrature DDS outputs at
389 MHz with a 6.2GHz clock. The time-domain waveform measurements were limited by
the digital sampling scope?s 500 MHz bandwidth. Using a 6GS/s sampling digital scope,
Fig. 3.13 provides the output waveforms of the quadrature DDS RFIC with a 6.3GHz clock
input frequency and 1.58GHz output. The measured I/Q waveforms demonstrate the 90
degree phase di erence for the quadrature DDS RFIC outputs.
Table 3.2 provides a performance comparison of the recently published microwave band
DDS designs. The designs reported in [21] and [22], used InP technologies with an ft/fmax
above 300/300 GHz, which is almost triple those reported here. The InP DDS[9] employs
with an 8-bit accumulator and an 8-bit DAC and operates at a maximum clock frequency of
9.2 GHz with a power consumption of 15W. On the other hand, the single phase SiGe DDS
[30] that has an 8-bit DAC and a 9-bit accumulator consumes only 1.9W with 3.3V and
4V power supplies for digital and analog circuits, respectively. For this design, the digital
portion of the DAC consumes 300mA and the accumulator consumes 250mA under 3.3V.
The analog portion of the DAC consumes 35mA using a 4.0V supply voltage.
This is the  rst mm-wave quadrature DDS design reported so far [34]. The quadrature
DDS RFIC contains more than 13500 active devices with quite compact die size. The
active area of the quadrature DDS RFIC is about 2.3x0.7mm2 and its total die area is 3 x 3
mm2. When compared with other single-phase mm-wave DDSs [9, 21], this design achieves
similar SFDR performance. It?s more complex, yet more compact and has lower power, as
shown in Table 3.2. The minimum size of the InP transistor is much larger than that of the
SiGe transistor. Although the current density needed to achieve peak fT frequency in InP
and SiGe technologies are similar, the current required to operate the minimum size SiGe
54
Figure 3.11: Measured qaudrature phase DDS output spectrum with clock at 6.815 GHz
and output at 3.394GHz
transistor is only one third of the current of InP transistor. It is for this reason that the
SiGe DDS leads to a superior power e ciency performance.
3.5 Conclusion
In this chapter, a 9-bit, 6.2GHz low power quadrature DDS has been implemented in
a 0.18 m SiGe BiCMOS technology. With a 9-bit pipeline accumulator and two 8-bit sine-
weighted current steering DACs, this DDS is capable of generating quadrature sinusoidal
waveforms up to 3.15GHz with a maximum clock frequency of 6.2GHz. Packed more than
13500 transistors, the quadrature DDS occupies an active area of 2.3x2.5mm2 and a total
55
Figure 3.12: Measured DDS output waveforms without deglitch  lter at 0.389GHz with
clock at 6.2GHz
Figure 3.13: Measured quadrature DDS output waveforms at 1.58GHz with clock at 6.3GHz
56
Technology InP InP InP SiGe SiGe
fT/fmax [GHz] 137/267 300/300 300/300 120/100 120/100
[9] [21] [22] [30] [34]
Emitter area of
min size npn
[mm2]
1.5x4 0.4x2 0.4x2 0.2x0.64 0.2x0.64
Peak fT current of
min size npn [mA]
7.2 4 4 0.77 0.77
Quadrature phase
outputs
No No No No Yes
Accumulator size
[Bit]
8 8 8 9 9
DAC resolution
[Bit]
7 7 5 8 8
Max clock fre-
quency [GHz]
9.2 13 32 12 6.8
SFDR [dBc] 30 26.67 21.56 30 26
Power consump-
tion [W]
15 5.42 9.45 1.9 2.5
Transistors num-
ber
3000 1646 1891 9600 13500
Die size [mm2] 8x5 2.7x1.45 2.7x1.45 2.3x0.7 2.3x2.5
Power e ciency
[Phase-GHz/W]
0.5 2.4 3.386 6.3 5.44
Area e ciency
[DAC-Bit/mm2]
0.175 1.79 1.28 4.97 w. VCO 2.78 w. VCO
Tested prototypes wafer wafer wafer packaged packaged
Table 3.2: ULTRA-HIGH SPEED DDS PERFORMANCE COMPARISON. THIS WORK
IS QUOTED FOR SINGLE-PHASE / QUADRATURE-PHASE DDS DESIGNS
57
die area of 3.0x3.0mm2. The measured SFDR is about 26dBc at a clock frequency 6.2GHz.
At the maximum clock frequency, the power consumption of the DDS is 2.5W with 3.3V and
4.0V power supplies for the digital and analog parts, respectively. The DDS thus achieves a
power e ciency  gure of merit (FOM) of 5.04GHz/W/Phase. The DDS chips were packaged
with 48-pin ceramic LCC carriers and air cooling was used during the measurement.
58
Chapter 4
QUADRATURE PHASES SIGE DDS WITH UP-CONVERSION
4.1 Introduction
In the next generation radar system, there are emerging trends toward digitization
in radar receiver designs by applying direct intermediate frequency-to-digital conversion
(IF sampling) and direct digital synthesis (DDS). The digital radar receivers can obtain
much higher precision, low noise, low power and better stability than analog counterparts.
Moreover, it can retain the  exibility of digital techniques such as direct digital modulation
and waveform generation. A DDS generates a digitized waveform of a given frequency by
accumulating phase changes at a higher clock frequency. Microwave range DDS has been
developed in both InP and SiGe technologies [9, 21, 22] with output frequency up to 10GHz.
It?s highly desirable to develop frequency synthesis means for X/Ku-band applications. By
mixing the outputs of a quadrature DDS (QDDS) and a quadrature VCO, X/Ku-band
waveform generation can be achieved.
4.2 Architecture and circuit design
The conceptual diagram of the frequency synthesizer is shown in Fig. 4.1. The quadra-
ture outputs from the local oscillator are mixed with the outputs of a quadrature DDS and
the mixers outputs are summed and subtracted with each other, so the up-converted and
down-converted sine waveforms are derived [6]. The local oscillator generates quadrature
outputs with relatively  xed output frequency 0, which are mixed with the outputs of a
quadrature DDS. Then the mixer outputs are summed and subtracted with each other, so
59
 
S I N
C O S
Q u a d r a t u r e  
V C O
-
+
+
+
0 ?
90 ?
D o w n  
C o n v e r t  
O u t p u t
U p  
C o n v e r t  
O u tp u t
Q u a d r a tu r e  
DDS
?
?
BPF
BPF
C L K
F C W
Figure 4.1: Concept diagram of the frequency synthesizer
the up-converted cosine waveforms with a frequency of 0+ or 0- are derived. Assuming
the local oscillator frequency is higher than the output frequency of the quadrature DDS,
the above mixing scheme can be used to up convert the DDS output frequency to a higher
frequency band.
Theoretically the output should be clean of alias images. However, in practice the DDS
output contains harmonics and spurs that signi cantly deteriorate the purity of desired
output waveforms. The imperfections of the mixers due to leakage and second order e ects
will introduce spurs and harmonics to the output signals. In the multiple GHz DDS design,
this will be more complicated due to the impact post by large scale circuit and huge power
dissipation.
The frequency synthesizer contains three major parts, the quadrature DDS, quadrature
VCO and mixers. DDS can provide quadrature signals with accurate I/Q matching, as
shown in Fig. 4.2. The quadrature DDS is formed by merging two sine-weighted current
steering DACs and a 9-bit pipeline accumulator. The nonlinear DAC approach is still
attractive for the microwave DDS design because it provides drastically speed improvement
60
 
V R E F
I
mIp
V R E F
Q
p
V C C
V T U N E
V T U N E
Q
m
Q p
Q m
I
p
Im
A c c u m u l a t o r
M I X E R
V C O
V C O
/ 2
S i n e
W e i g h t e d
D A C
S i n e
W e i g h t e d
D A C
C l o c k  
B u f f e r
11 . 7 G H z
5 . 85 G H z
F C W
C L K
Figure 4.2: Block diagram with the circuit of quadrature VCO
to the ROM based or algorithm based DDS design. To reduce the e ect of the amplitude
error introduced spurs in an ultra high speed DDS needs to be taking into account during
the design. The phase truncation error introduced spurs have already minimized because
only one bit of the phase accumulator output has been truncated.
The input frequency control word (FCW) speci es the output frequency of the quadra-
ture DDS. The output of the quadrature VCO is tuned to 11.7GHz and is also divided by
2 to generate 5.85GHz for potential use as the DDS clock. The quadrature VCO design
adopts a standard cross-coupled LC-VCO topology. The center tapped inductor in the
LC-tank has been replaced by 4 transmission line inductors to facilitate a symmetrical and
compact layout. However, the Q factor is relatively lower than typical spiral inductors,
61
which needs to be accounted for in the design. To reduce the losses of the inductors, thick
analog metals are used for the connections between the transmission line inductors. To
produce the 90 degree phase word, binary number of ?01? need to be added to the two most
signi cant bits of the DAC input. Translating the add function into gate level, the output
of the MSB is the results of an Exclusive-OR (XOR) of the  rst two MSB inputs and the
output of the 2nd MSB is the inversion of the 2nd MSB input. Because all the digital logics
have di erential outputs, only one XOR gate is needed to be inserted at the inputs of the
sine-weighted DAC to converter it to DAC with 90 degree output phase di erence.
The essential building block of the nonlinear DAC is the sine weighted current source
matrix. The smallest unit current of each current source is 0.1mA, which should provide
the current switches with enough switching speed when toggling. The largest current in
the current source is 0.7mA, which is composed of 7 identical current sources. The cur-
rent switch contains two di erential pairs, with minimal sized transistors, and a cascade
transistor, to isolate the current sources from the switches, and improve the bandwidth
of the entire group of switching circuits. In this ultra-high speed DDS design, the ROM-
less structure with two nonlinear current steering DACs is employed, and the sine/cosine
mapping function is performed by a sine-weighted DAC instead of using the traditional
ROM-based sine waveform look-up-table. By eliminating the ROM, speed of the DDS is
improved and the power consumption is reduced. This quadrature DDS comprises a 9-bit
pipeline accumulator and two 8-bit sine-weighted current-steering DACs. To produce the
90 degree phase, an XOR gate is inserted into the inputs of one sine-weighted DAC. Since
the out frequency cannot exceed the Nyquist rate, an 8-bit frequency control word (FCW)
is fed into a 9-bit pipeline accumulator with the MSB of the accumulator input tied to zero.
62
 
V C C
R E F R E F R E F R E F
R E F
S I N p S I N m
LO _ 90 p
LO _ 90 m
M I X C O S p
M I X C O S m
R E F
M I X C O S p M IX C O S mM I X S I N p M I X S I N m
D O W N mD O W N p
R E F
M I X C O S p M IX C O S
mM I X S I N
p
M I X S I N m
UP
m
UP
p
V C C
R E F R E F R E F R E FR E F
C O S p
LO _ 0 p
LO _ 0 m
M I X S I N
p
M I X S I N m
C O S m
Figure 4.3: Circuits of up-convert and down-convert mixers
The LSB of the 9-bit phase word is truncated, and its MSB is used to provide the proper
mirroring of the sine waveform about the phase point. Its 2nd MSB is used to invert the
remaining 6-bits for the 2nd and 4th quadrants of the sine wave prior to the decoding logic.
The outputs of 3:8 column-row decoders go to the switch matrix to control the switches in
each DAC cell. The latch and switch matrices contain 64 cells.
The implemented ultra-high speed DDS presents the  rst mm-wave quadrature DDS
design reported so far. When compared with other single-phase mm-wave DDSs [9, 21, 22],
it?s more complex, yet more compact and has lower power, as shown in Table 1. The
minimum size of the InP transistor is much larger than that of the SiGe transistor. Although
the current density needed to achieve peak fT frequency in InP and SiGe technologies are
similar, the current required to operate the minimum size SiGe transistor is much less. It
is for this reason that the SiGe DDS leads to a superior power e ciency performance.
63
Technology InP InP SiGe
fT/fmax [GHz] 137/267 300/300 120/100
[9] [21] [30, 34]
Emitter area of min npn [mm2] 1.5x4 0.4x2 0.2x0.64
Current density at peak fT [mA/mm2] 1 1.2 5 6
Peak fT current of min npn [mA] 7.2 4 0.77
Break down voltage Bvceo [V] 8 4 1.8
Accumulator size [bit] 8 8 9
DAC resolution [bit] 7 7 8
Max clock frequency [GHz] 9.2 13 9.6/6.3
SFDR [dBc] 30 26.67 30/26
Power consumption [W] 15 5.42 1.9/2.5
Number of Transistors 3000 1646 9600 /13500
Die size [mm2] 8x5 2.7x1.45 2.3x0.7 /2.3x2.5
FOM[GHz/W/Phase] 0.5 2.4 5.1/5.04
Table 4.1: ULTRA-HIGH SPEED DDS PERFORMANCE COMPARISON. THIS WORK
IS QUOTED FOR SINGLE-PHASE / QUADRATURE-PHASE.
To shorten the connections to the mixers and make the layout as symmetrical as pos-
sible, the mixers are placed in the center of the chip, and two VCOs and two sine-weighted
DACs are placed at the opposite sides of the mixers. The die photo of the frequency
synthesizer is shown in Fig. 4.4. The active area is approximately 2.5x2.5mm2.
4.3 Measured results
The test is performed on ceramic leadless free packaged chips. The test board was built
using Rogers RO4003 laminate board, which has a loss tangent of less than 0.003 and good
temperature stability. To convert the single-ended signal to di erential clock inputs, a 180
degree 3dB hybrid coupler is employed at the clock input. For the di erential outputs, a
second hybrid coupler is inserted into the output path to covert them into single-end for
testing. To ensure the chips working in the safe range, external air cooling is used. The
64
Figure 4.4: Frequency synthesizer die photo
65
Figure 4.5: Measured 37MHz output waveforms with a 6.4GHz QDDS
measured I/Q waveforms with a digital oscilloscope con rm the 90 degree phase di erence
of the outputs of the quadrature DDS, as shown in Fig. 4.5. The measured amplitude
imbalance is 5% and the phase imbalance is 2 degree.
The spectra of the frequency synthesizer outputs shown in Fig. 4.6,4.7,4.8 are taken
at the down-convert output side without calibrate the attenuation. Fig. 4.6 is the output
spectrum of 11.7GHz VCO output and 4.6GHz DDS clock input when DDS has been turned
o . The leaked clock power to the mixer output is -50dBm and the power of leaked local
oscillator is -41dBm. The spur at 5.85GHz is purely due to the leakage of the divide-by-2
output of the local oscillator, which contains built in divider for test purpose. The divided
output of the local oscillator is attenuated by 27dB.
66
 
LO
C L K
LO / 2
Figure 4.6: Measured output spectra of 4.6GHz QDDS clock input and 11.7GHz LO output
67
 
C L K
Q D D S
O U T
Figure 4.7: Measured output spectra of 4.6GHz QDDS clock input and 2.3GHz QDDS
output
Fig. 4.7 shows the Nyquist output spectrum of the DDS with a 4.6GHz clock input
when the local oscillator has been turned o . The output of the DDS is located close to
2.31GHz. Since the measurement is taken at the mixer output side, the DDS output power
has been signi cantly reduced. The output power of the quadrature DDS with single output
is approximately -53.75dBm.
In Fig. 4.8, the local oscillator and DDS are switched on and the output spectra of
mixed outputs are shown. The frequency of the down-converted signal is 9.4GHz with
a power of -35.24dBm and the up-converted 14.0GHz signal can also be noticed which
has a power of -46dBm. During the measurement, one of the quadrature outputs of the
quadrature VCO shows a strong distortion. One of the reason cause it can be explained as
68
 
LO
D O W N
C O N V E R T
O U T
Q D D S
O U T
C L K
Figure 4.8: Measured output down-converted 9.4GHz output
69
the interconnection wires connecting the transmission line inductors used in the LC tanks
are possibly induce inductive peaking and drive some of the transistors into saturation.
The single side band suppression has been also a ect by the imbalance of the integrated
quadrature VCO, which will be improved in the next version.
4.4 Conclusion
In this chapter, an X/Ku-band  ne-tuning frequency synthesizer using a quadrature
DDS has been implemented in a 0.18m SiGe BiCMOS technology. The frequency synthesizer
comprises a 9-bit quadrature DDS, an 11.7GHz quadrature VCO and image rejection mixers.
The outputs of the quadrature DDS are down-converted to 9.4 11.7GHz and up-converted
to 11.7 14.0GHz, respectively. The die area of the synthesizer is 3.0x3.0mm2 and the power
consumption is 2.6W under a 3.3V supply. The chip is measured with a 48-pin leadless free
ceramic package and external cooling.
70
Chapter 5
RING OSCILLATOR BASED PERIODICAL WAVEFORM GENERATOR
5.1 Introduction
Arbitrary periodical waveform generation is highly desirable for certain wireless and
radar applications. To generate arbitrary waveforms over several GHz usually requires a
power consuming direct digital synthesizer (DDS) [4, 22]. A DDS for waveform generation is
normally comprised of a digital-to-analog converter (DAC) to convert the digital amplitude
inputs to analog amplitude outputs and digital controlling circuits to retrieve data stored
in a RAM and pass the data to the DAC inputs. The amplitude data can also be directly
calculated by a signal processor. However, running the signal processor or RAM controller
at the same refresh rate as the input clock becomes extremely di cult due to restrictions of
power consumption and implementation complexity. To overcome the di culty of feeding
digital amplitude signals at multiple GHz to the DAC, the designers are forced to adopt
some types of simpli cations. In [17], an accumulator has been applied to the input of a 6b
DAC to generate a ramp waveform for test purposes. With an on-chip programmable ROM,
ref [1] demonstrates the capability of synthesizing the speci c waveform for ultra wide band
(UWB) transmission. However, the distributive nature of millimeter wave signal has been
overlooked in [17] and [1]. In [35], a structure similar to an equalizer has been utilized to
combine multiple narrow pulses trigged by a delayed clock signal to form the envelope of
the objective waveforms. The works mentioned above have provided certain insight to the
problem of high speed waveform generation and intrigued our interests to seek alternative
possible solutions. One goal is to simplify the structure so the power and area requirement
71
 
t
y
a 1 x 1
p 2 p
Figure 5.1: One cycle of the waveform with constant sampling step
can be minimized. The fact that in a ring oscillator multiple phase clock outputs are
automatically generated can be explored further. By combining a programmable DAC and
a ring oscillator topology, a more compact waveform generator is formed [32].
5.2 Waveform generator architectures
The principle of the periodic waveform generation can be explained with basic sampling
theory. As shown in Fig. 5.1, one cycle of periodic waveform can be expressed as
f(t) =
nX
k=0
akxk(u(t) u(kT));nT  t (n+ 1)T (5.1)
where T is the period of sampling function and NT is the period of the waveform, u(t) is
unit step function, and akxk is the incremental value of each step. ak is the weight coe cient
and xk is the binary value. Assume that akxk does not depends explicitly on time, so the
value of the sampled waveform value can be written as yn = Pnk=0akxk . By accumulating
akxk with a period of T, an arbitrary waveform can be synthesized by substituting the
values of setfakxkg. The waveform generator is conventionally implemented using a global
sampling clock and a phase accumulator. Considering the multiple phases synthesizing
72
 
D e l a y  
C e l l
D e l a y  
C e l l
D e l a y  
C e l l
D e l a y  
C e l l
D
e
l
a
y
 
C
e
l
l
D
e
l
a
y
 
C
e
l
l
D
e
l
a
y
 
C
e
l
l
D
e
l
a
y
 
C
e
l
l
Delay 
Cell
Delay 
Cell
Delay 
Cell
Delay 
Cell
D
e
l
a
y
 
C
e
l
l
D
e
l
a
y
 
C
e
l
l
D
e
l
a
y
 
C
e
l
l
D
e
l
a
y
 
C
e
l
l
I
0
I
1
I
2
I
3
I
4
I
5
I
6
I
7
I8I
9
I10I11
I
1
2
I
1
3
I
1
4
I
1
5
I
O U T
Figure 5.2: Block diagram of the ring oscillator waveform generator
capability of a ring oscillator, it?s possible to localizing the global sampling clock. The
outputs of the ring oscillator with di erent phase delay provide proper time stamps for
sampling purpose.
In this work, the structure of the waveform generator that eliminates the external
clock and phase accumulator has been implemented. As shown in Fig. 5.2, the waveform
generator contains two major parts: a 16-stage ring oscillator with 16 current switch cells,
and 16 programmable current sources. A 64-bit shift register chain is designed to store the
current weight data that sets the current for each of the current switch cells. The clock
73
distribution network in a conventional DDS is replaced by a 16-stage ring oscillator, which
naturally provides multiphase clocks with the exact sample time stamps. The clock signals
propagate along the stages and can be directly employed to switch the weighed currents
one by one at the desired sample time points. The currents that  ow through the current
switches have weights of , which are stored in the shift register. Then the current outputs
are summed by connecting all the output nodes of the ring oscillator stages together.
In the ring oscillator, the rising and falling edges of the clock propagate along the chain
of the delay bu ers. The half period of the ring oscillator is the total number of delay cells
multiplied by the propagation delay of a single cell. Referring to Fig. 5.2, the 16-stage delay
cell based ring oscillator is located in the center of the chip, and the outputs of the ring
oscillator drives 16 identical current switch cells. The outside is a 64-stage shift register
chain with an external clock and a serial data input. The data carrying the amplitude
weight information is shifted in and stored in the register.
5.3 Circuits of the waveform generator
One issue associated with the typical DDS is distributing the clock signal uniformly over
the whole chip at several GHz frequency ranges [22], which puts a stringent requirement for
the driving capabilities of the clock bu ers and the time delay between the di erent branches
of the clock tree. The power consumed by the clock distribution system is counted more
than 20% in a circuit with lots of sequential logic gates. Eliminating the high speed clock
distribution circuit in a periodic waveform generator can greatly reduce the power and save
the area.
74
 
V R E F
O U T
mO U T p
D
3
D
2
D
1
x 4 x 2 x 1
V C A S
D 4
IN
Figure 5.3: Simpli ed circuit of current switch with 3-bit+sign programmable current source
The ring oscillator is composed of cascaded multiple stages of the CML delay bu er,
which delay can be controlled by the bias current. The outputs of each stage of the delay
bu er are fed into the current switches, which can be turned on or turned o according to
the propagation of signals inside the ring oscillator. The programmable weighted currents
are switched to the positive or negative current output depending on the status of the
current switches. The current outputs are summed up and converted to the voltage outputs
with a pair of pull up resistors. Since only one current source is switched at one time, the
output waveform slope is limited to the ratio of current of the single current source to the
output load capacitance.
Fig. 5.3 gives the simpli ed circuit of the current switch cell, which is composed of the
switch driver and weighted current source. The lower 3 digital bits, D3 to D1 control the
current magnitude of the current source and the highest digital bit, D4 selects the polarity
of the current source by a XOR gate. Thus, positive and negative values can be chosen
75
by switching the polarity of the current source. The current source cell is preset by the
input digital bits. The 3 programmable digital bits turn on the gate driving voltages of the
NMOS current mirrors and for a 3-bit input, 3-bit resolution of the current source can be
attained.
The ring oscillator is composed of cascaded stages of CML delay bu ers, whose delay
can be controlled by the bias current. The outputs of each stage of the delay bu er are fed
into the current switches, which turn on or turn o according to the propagation of signals
inside the ring oscillator. The programmable weighted currents are switched to the positive
or negative current output depending on the status of the current switches. The current
outputs are summed and converted to voltage outputs with a pair of pull up resistors. Since
only one current source is switched at a time, the output waveform slope is limited to the
ratio of the current of a single current source to the output load capacitance.
One issue associated with the typical DDS is distributing the clock signal uniformly
over the whole chip at multi-GHz frequencies [22], which puts a stringent requirement or the
driving capabilities of the clock bu ers and the time delay between the di erent branches
of the clock tree. The power consumed by the clock distribution system represents more
than 20% in a circuit with many sequential logic gates. Eliminating the high speed clock
distribution circuit in a periodic waveform generator can greatly reduce the power and save
the area.
The layout of the waveform generator adopts a  oor plan similar to the conceptual dia-
gram shown in Fig. 5.2. This can make the layout much more symmetrical and the delay in
the critical path can be well controlled. To reduce bonding wire inductance e ects, multiple
pads are used for the output pins. Without the wire bounding pads, the active core area
76
Figure 5.4: Die photo of the waveform generator
of the waveform generator chip is approximately 0.6x0.6mm2 and in which approximately
40% of the area is occupied by decoupling capacitors. Including the pads, the total chip
area is 1.0x1.0 mm2, as shown in Fig. 5.4.
5.4 Experiment results
To verify functionality of the waveform generator, simulated results are provided as
well as measured results. The simulated output waveforms are shown in Fig. 5.5 to Fig.
5.7. In Fig. 5.5, during the data is serially loaded into the shift register chain, and the
outputs stabilize after all 64 bits have been loaded. Fig.6 gives the simulated results for
synthesizes sine waveform. Fig. 5.7 shows the output waveform as the output changes from
one type to another by changing the stored data of the shift registers.
The outputs of the periodic waveform generator have been directly captured with a
6GHz bandwidth digital oscilloscope. Fig. 5.8 to Fig.5.10 present the measured waveforms
77
Figure 5.5: Simulated output waveform during data loading
Figure 5.6: Simulated output sine waveform
78
Figure 5.7: Simulated output waveform during transition
Figure 5.8: Measured output waveform during data loading
79
Figure 5.9: Measured synthesized arbitrary waveform
Figure 5.10: Measured synthesized arbitrary waveform
80
created by the circuit. Fig. 5.8 shows the waveform during data loading to the shift register
chain. The output of the waveform generator experiences transients due to the changing of
the current weight data. After 64ns, data loading is  nished, and the waveform generator?s
output becomes stable. The output swing is 300mV peak to peak on a 15 load and the
output frequency is 3.0GHz. Fig. 5.9 shows the measured synthesized waveform with a
frequency of 2.887GHz. Fig. 5.10 shows a generated arbitrary waveform with a frequency
of 2.867GHz. The slight di erent output frequencies of the synthesized waveform re ect the
load change with di erent weight current sources. Due to the input bandwidth limitation
of the sampling oscilloscope, the spectral components outside of the 6GHz input bandwidth
are  ltered.
5.5 Conclusion
In this chapter, a periodic arbitrary waveform generator based on a ring oscillator
structure has been implemented in a 0.13 m SiGe BiCMOS technology. Using 16 delay
stages with control programmable weighted currents, the proposed waveform generator can
output 3GHz periodic waveforms. The total power consumption is less than 200mW with
a 2.2V power supply. The total area of the SiGe chip is 1.0mm2.
81
Chapter 6
SUMMARY AND FUTURE WORKS
6.1 Summary of the works
This dissertation presents detailed design procedure of high speed direct digital fre-
quency synthesizer. The main target is to achieve microwave range speed performance as
well as keep moderate power consumption.
With the advanced SiGe technology, the output frequency of DDS can be over multiple
GHz. Choosing the right process and architecture for a DDS among di erent candidates
need to specify the interested application. The ROM based DDS can be optimized in
many ways, by implement all kinds of ROM compression techniques, the size of ROM
can be reduced signi cantly. The ROM-Less approach by eliminating the ROM, a speed
bottleneck in DDS design, is suitable for application the require extreme performance. Since
there are many ways to implement the mapping from phase to sine amplitude, selecting the
appropriate mapping block worths consideration. Using non-linear sine weighted DAC is
a straight forward solution for the mapping function. However, its impacts on the whole
DDS performance need to be carefully evaluated.
The non-linear DAC brings in the system additional spurs to the  nal DDS output
spectra. The major problem is the non-ideal nature of a non-linear DAC is more noticeable
than that of a linear DAC. The detailed analysis shows that the DAC associated spurs
coming from two major sources. One is the static performance of a DAC, such as DNL or
INL. The other is the dynamic performance of a DAC, which is input code dependent and
clock frequency dependent. To make the who situation more complicate, the noise coming
82
from digital block will also inject into the substrate and power supply. To suppress the
crosstalk and power and ground bouncing, more unknown variables need to be taken into
account.
Some of the above mention issues will be investigated in this dissertation and the design
 ow will also address some practical problems encounter
The single phase high speed DDS  rst has been designed and the quadrature version
also has been developed. Later on, the DDS with I/Q outputs and internal mixers has been
shown. Generally speaking, the output signals can be up-converted to even higher frequency
band. Another approach is to make the waveform generator more compact to suit for on
chip test. Thus a waveform generator based on ring oscillator structure has been made.
Without external clock, this waveform generator can be useful for certain application.
6.2 Future works
The main di culty comes from good model to accurately re ect the real natures of
a working DDS. Though there are some theoretical works appeared in this  eld, there is
still something to be desired. Especially, to predicate the dynamic performance of the DAC
and the DDS with good accuracy, up to now hardly anyone can give satis ed results. The
following works are intend to make further study of modeling of the DDS, provide some new
thoughts into this problem and hopefully  nd alternative solution to answer the questions
when designing a DDS.
83
Bibliography
[1] D. Baranauskas and D. Zelenin, \A 0.36w 6b up to 20gs/s dac for uwb wave forma-
tion," in Proc. Digest of Technical Papers. IEEE International Solid-State Circuits
Conference ISSCC 2006, 6{9 Feb. 2006, pp. 2380{2389.
[2] B. Bjerede, \Suppression of spurious frequency components in direct digital frequency
synthesizer," Patent, Dec. 17, 1991, uS Patent 5,073,869.
[3] K. Chu and D. Pulfrey, \Design procedures for di erential cascode voltage switch
circuits," Solid-State Circuits, IEEE Journal of, vol. 21, no. 6, pp. 1082{1087, 1986.
[4] L. Cordesses, \Direct digital synthesis: a tool for periodic wave generation (part 1),"
Signal Processing Magazine, IEEE, vol. 21, no. 4, pp. 50{54, July 2004.
[5] A. Corry and R. Sutherland, \Direct digital frequency synthesizer using sigma-delta
techniques," Patent, Oct. 8, 1996, uS Patent 5,563,535.
[6] R. Cushing, \Single-sideband upconversion of quadrature dds signals to the 800-to-
2500-mhz band," Analog Dialogue, pp. 34{3, 2000.
[7] L. Dadda and V. Piuri, \Pipelined adders," Computers, IEEE Transactions on, vol. 45,
no. 3, pp. 348{356, 1996.
[8] K. R. Elliott, \Direct digital synthesis for enabling next generation rf systems," in
Proc. IEEE Compound Semiconductor Integrated Circuit Symposium CSIC ?05, 30
Oct.{2 Nov. 2005, p. 4pp.
[9] A. Gutierrez-Aitken, J. Matsui, E. N. Kaneshiro, B. K. Oyama, D. Sawdai, A. K. Oki,
and D. C. Streit, \Ultrahigh-speed direct digital synthesizer using inp dhbt technology,"
Solid-State Circuits, IEEE Journal of, vol. 37, no. 9, pp. 1115{1119, Sep 2002.
[10] C. Kang and E. Swartzlander Jr, \Digit-pipelined direct digital frequency synthesis
based on di erential cordic," Circuits and Systems I: Regular Papers, IEEE Transac-
tions on [see also Circuits and Systems I: Fundamental Theory and Applications, IEEE
Transactions on], vol. 53, no. 5, pp. 1035{1044, 2006.
[11] F. Lu, H. Samueli, J. Yuan, and C. Svensson, \A 700mhz 24-b pipelined accumulator
in 1.2- m cmos for application as a numerically controlled oscillator," IEEE Journal
of Solid-State Circuits, vol. 28, no. 8, pp. 878{886, 1993.
[12] R. Meyer, W. Sansen, and S. Peeters, \The di erential pair as a triangle-sine wave
converter," Solid-State Circuits, IEEE Journal of, vol. 11, no. 3, pp. 418{420, 1976.
84
[13] S. Mortezapour and E. K. F. Lee, \Design of low-power rom-less direct digital frequency
synthesizer using nonlinear digital-to-analog converter," Solid-State Circuits, IEEE
Journal of, vol. 34, no. 10, pp. 1350{1359, Oct. 1999.
[14] T. Nakagawa and H. Nosaka, \A direct digital synthesizer with interpolation circuits,"
Solid-State Circuits, IEEE Journal of, vol. 32, no. 5, pp. 766{770, May 1997.
[15] Y. Nakamura, T. Miki, A. Maeda, H. Kondoh, and N. Yazawa, \A 10-b 70-ms/s cmos
d/a converter," Solid-State Circuits, IEEE Journal of, vol. 26, no. 4, pp. 637{642, April
1991.
[16] I. Nicholas, H. T. and H. Samueli, \A 150-mhz direct digital frequency synthesizer in
1.25-& cmos with -90-dbc spurious performance," Solid-State Circuits, IEEE Journal
of, vol. 26, no. 12, pp. 1959{1969, Dec. 1991.
[17] P. Schvan, D. Pollex, and T. Bellingrath, \A 22gs/s 6b dac with integrated digital
ramp generator," in Proc. Digest of Technical Papers Solid-State Circuits Conference
ISSCC. 2005 IEEE International, 10{10 Feb. 2005, pp. 122{588.
[18] A. M. Sodagar and G. Roientan Lahiji, \Mapping from phase to sine-amplitude in di-
rect digital frequency synthesizers using parabolic approximation," IEEE Transactions
on Circuits and Systems II: Analog and Digital Signal Processing, vol. 47, no. 12, pp.
1452{1457, Dec. 2000.
[19] D. Sunderland, R. Strauch, S. Whar eld, H. Peterson, and C. Cole, \Cmos/sos fre-
quency synthesizer lsi circuit for spread spectrum communications," Solid-State Cir-
cuits, IEEE Journal of, vol. 19, no. 4, pp. 497{506, 1984.
[20] J. Tierney, C. Rader, and B. Gold, \A digital frequency synthesizer," Audio and Elec-
troacoustics, IEEE Transactions on, vol. 19, no. 1, pp. 48{57, 1971.
[21] S. E. Turner and D. E. Kotecki, \Direct digital synthesizer with rom-less architecture at
13-ghz clock frequency in inp dhbt technology," Microwave and Wireless Components
Letters, IEEE, vol. 16, no. 5, pp. 296{298, May 2006.
[22] S. Turner and D. Kotecki, \Direct digital synthesizer with sine-weighted dac at 32-ghz
clock frequency in inp dhbt technology," Solid-State Circuits, IEEE Journal of, vol. 41,
no. 10, p. 2284, 2006.
[23] A. Van den Bosch, M. Steyaert, and W. Sansen, \Sfdr-bandwidth limitations for high
speed high resolution current steering cmos d/a converters," Electronics, Circuits and
Systems, 1999. Proceedings of ICECS ?99. The 6th IEEE International Conference on,
vol. 3, pp. 1193{1196 vol.3, 1999.
[24] J. Vandenbussche, G. Van der Plas, A. Van den Bosch, W. Daems, G. Gielen,
M. Steyaert, and W. Sansen, \A 14 b 150 msample/s update rate q2 random walk
85
cmos dac," Solid-State Circuits Conference, 1999. Digest of Technical Papers. ISSCC.
1999 IEEE International, pp. 146{147, 1999.
[25] J. Volder, \The cordic trigonometric computing technique," IRE Trans. Electron. Com-
put, vol. 8, no. 3, pp. 330{334, 1959.
[26] P. Vorenkamp, J. Verdaasdonk, R. van de Plassche, and D. Sche er, \A 1 gs/s, 10b
digital-to-analog converter," Feb 1994, pp. 52{53.
[27] M. YAMASHINA and H. YAMADA, \An mos current mode logic (mcml) circuit
for low-power sub-ghz processors," IEICE TRANSACTIONS on Electronics, vol. 75,
no. 10, pp. 1181{1187, 1992.
[28] X. Yu, F. F. Dai, J. D. Irwin, and R. C. Jaeger, \A 12 ghz 1.9 w direct digital syn-
thesizer mmic implemented in 0.18 m sige bicmos technology," Solid-State Circuits,
IEEE Journal of, vol. 43, no. 6, pp. 1384{1393, June 2008.
[29] X. Yu, F. F. Dai, Y. Shi, and R. Zhu, \2 ghz 8-bit cmos rom-less direct digital frequency
synthesizer," in Proc. IEEE International Symposium on Circuits and Systems ISCAS
2005, 23{26 May 2005, pp. 4397{4400.
[30] X. Yu, F. F. Dai, D. Yang, V. Kakani, J. D. Irwin, and R. C. Jaeger, \A 9-bit 9.6ghz
1.9w direct digital synthesizer r c implemented in 0.18 m sige bicmos technology," in
Proc. IEEE Radio Frequency Integrated Circuits (RFIC) Symposium, 3{5 June 2007,
pp. 241{244.
[31] X. Yu, F. F. Dai, J. David Irwin, and R. C. Jaeger, \A 9-bit quadrature direct digital
synthesizer implemented in 0.18 m sige bicmos technology," Microwave Theory and
Techniques, IEEE Transactions on, vol. 56, no. 5, pp. 1257{1266, May 2008.
[32] X. Yu, F. F. Dai, D. J. Irwin, and R. C. Jaeger, \A 2.2v 200mw 3ghz ring oscillator
based waveform," Silicon Monolithic Integrated Circuits in RF Systems, 2009. SiRF
?09. IEEE Topical Meeting on, pp. 1{4, Jan. 2009.
[33] X. Yu, F. F. Dai, D. Yang, J. D. Irwin, and R. C. Jaeger, \An x/ku-band frequency
synthesizer using a 9-bit quadrature dds," in Proc. IEEE Custom Integrated Circuits
Conference CICC 2008, 21{24 Sept. 2008, pp. 491{494.
[34] X. Yu, F. F. Dai, D. Yang, V. Kakani, J. D. Irwin, and R. C. Jaeger, \A 9-bit 6.3ghz
2.5w quadrature direct digital synthesizer mmic," in Proc. IEEE Symposium on VLSI
Circuits, 14{16 June 2007, pp. 52{53.
[35] Y. Zhu, J. Zuegel, J. Marciante, and H. Wu, \A 10 gs/s distributed waveform generator
for sub-nanosecond pulse generation and modulation in 0.18> m standard digital cmos,"
in Radio Frequency Integrated Circuits (RFIC) Symposium, 2007 IEEE, 2007, pp. 35{
38.
86