High Speed ROM-Less Direct Digital Frequency Synthesizer Except where reference is made to the work of others, the work described in this dissertation is my own or was done in collaboration with my advisory committee. This dissertation does not include proprietary or classi ed information. Xuefeng Yu Certi cate of Approval: Richard C. Jaeger Distinguished University Professor Emer- itus Electrical and Computer Engineering Fa Foster Dai, Chair Professor Electrical and Computer Engineering Guofu Niu Alumni Professor Electrical and Computer Engineering Stuart M. Wentworth Associate Professor Electrical and Computer Engineering George Flowers Dean Graduate School High Speed ROM-Less Direct Digital Frequency Synthesizer Xuefeng Yu A Dissertation Submitted to the Graduate Faculty of Auburn University in Partial Ful llment of the Requirements for the Degree of Doctor of Philosophy Auburn, Alabama August 10, 2009 High Speed ROM-Less Direct Digital Frequency Synthesizer Xuefeng Yu Permission is granted to Auburn University to make copies of this dissertation at its discretion, upon the request of individuals or institutions and at their expense. The author reserves all publication rights. Signature of Author Date of Graduation iii Dissertation Abstract High Speed ROM-Less Direct Digital Frequency Synthesizer Xuefeng Yu Doctor of Philosophy, August 10, 2009 (M.A., Inst. of Semiconductors, CAS, 2003) (B.S., Tsinghua University, 2000) 98 Typed Pages Directed by Fa Foster Dai This dissertation presents a complete ow for design and evaluation of high speed direct digital frequency synthesizer (DDS). Though some ultra high speed DDSs have already been reported in the literature, to satisfy the demand of keeping good balance between the power consumption and the high performance of DDS is still quite challenging for the analog designer and is worth to be explored from di erent perspectives. As a digital method to direct generating sine or cosine waveforms with speci c fre- quency, DDS does have many merits. DDS has ne frequency tuning step, fast frequency switching speed, precisely controlled output phases. Since there is no feedback loop in a DDS structure, the DDS doesn?t su er the internal loop delay like that in the phase-locked- loop (PLL) synthesizer. One major bene t that makes DDS stand out is that DDS can be directly modulated in the digital domain. It can be incorporated with various kinds modulation schemes to generate modulated signals. By this way, DDS can be served as an important component to build exible and recon gurable transmitter in the communication systems. DDS can also generate quadrature phases and multiple phases with ease. Other than sine and cosine waveforms, DDS can be utilized to synthesize arbitrary waveforms. iv Taking the advantage of well developed Silicon-Germanium (SiGe) process, it is possible to push the envelope of the DDS speed performance as well as keep the moderate budget of power consumption. The standard CMOS technology has also been investigated. Several DDSs have been implemented with the non-linear digital to analog converters (DAC). The non-linear DAC can directly map the linear phase word into sine or cosine analog output without the assistance of the ROM. By eliminating the ROM, the speed of DDS can be dramatically improved. Due to the code dependent and frequency dependent non-ideal e ects from the non-linear DAC, the unwanted harmonics and spurs of the DDS outputs have more signi cant impacts on the whole systems. In this dissertation, the spurs and harmonics from di erent sources such as truncation errors, limited DAC amplitude resolutions and non-ideal e ects of DAC will be discussed. During the design, a couple of issues such as clock feed through, clock skew, device matching properties will be addressed. In the layout period, an method that can automat- ically synthesize the layout of current source matrix block has been developed, which can alleviate the transistor matching problem coming from the fabrication. The unique structure of a compact periodical waveform generator has also been inves- tigated. In the waveform generator, a ring oscillator has been combined with a weighted non-linear DAC, thus the external clock and internal clock distributing circuits are no longer need. This will provide some bene ts for certain on-chip test applications. v Acknowledgments There are many people, throughout my life, provide me the supports in many di erent ways. Without them, I could not make this far all by myself. First of all, I would like to express my sincere gratitude to my advisor, Dr. Fa Foster Dai. He is more like an important friend to me both in my academic and my personal lives. His invaluable knowledge and wisdom, more than once helped me and guided me to overcome the troubles and di culties during my study and research. I would like to take this opportunity to thank my committee members, Dr. Richard Jaeger, Dr. Stuart Wentworth and Dr. Guofu Niu for taking their valuable times to serve in the committee. I would also thank all the professors and sta s in our ECE department. When I have a need, they are always ready to help. I would express my appreciation to those who helped and made contributions to my research. Among them, the special thanks go to Dayu Yang, Vasanth Kakani, Yuan Yao, Xueyang Geng and Desheng Ma for their cooperations and assistants. Finally, I want to thank my parents, my wife and my other family members for their unconditional love. They are always on my side, share my joys and sorrows, give me courages and hopes, let me persue my career and let me have a meaningful life. vi Style manual or journal used Journal of Approximation Theory (together with the style known as \auphd"). Bibliograpy follows van Leunen?s A Handbook for Scholars. Computer software used The document preparation package TEX (speci cally LATEX) together with the departmental style- le aums.sty. vii Table of Contents List of Figures x 1 INTRODUCTION 1 1.1 Background of DDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Applications of DDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Performance speci cations of DDS . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 SINGLE PHASE SIGE DDS 5 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Ultra-high speed DDS architecture . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 DDS spectra purity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 DDS circuit design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.1 Pipelined accumulator . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.2 SiGe CML logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4.3 Clock and MSB trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.4 DAC current source and switch . . . . . . . . . . . . . . . . . . . . . 19 2.5 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3 QUADRATURE PHASES SIGE DDS 33 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Direct Modulations in DDS . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3 DDS circuit design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3.1 Quadrature DDS architecture . . . . . . . . . . . . . . . . . . . . . . 37 3.3.2 Pipelined accumulator . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3.3 DAC current source and switch circuits . . . . . . . . . . . . . . . . 44 3.3.4 Clock tree and MSB tree designs . . . . . . . . . . . . . . . . . . . . 47 3.3.5 Layout considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.4 Measured results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4 QUADRATURE PHASES SIGE DDS WITH UP-CONVERSION 59 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2 Architecture and circuit design . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.3 Measured results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 viii 5 RING OSCILLATOR BASED PERIODICAL WAVEFORM GENERATOR 71 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.2 Waveform generator architectures . . . . . . . . . . . . . . . . . . . . . . . . 72 5.3 Circuits of the waveform generator . . . . . . . . . . . . . . . . . . . . . . . 74 5.4 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6 SUMMARY AND FUTURE WORKS 82 6.1 Summary of the works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.2 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Bibliography 84 ix List of Figures 2.1 DDS architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Digital modulation capability in di erent DDSs . . . . . . . . . . . . . . . . 9 2.3 Conceptual diagram of the ROM-less DDS . . . . . . . . . . . . . . . . . . . 10 2.4 Block diagram of the implemented DDS MMIC . . . . . . . . . . . . . . . . 13 2.5 NxM generic architecture of a pipelined accumulator . . . . . . . . . . . . . 15 2.6 CML full adder circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.7 Current switch circuit of the nonlinear DAC . . . . . . . . . . . . . . . . . . 20 2.8 Synchronous switch control circuit of the nonlinear DAC . . . . . . . . . . . 21 2.9 Die photo of DDS chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.10 DDS MMIC test setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.11 Measured DDS output waveform (a) and spectrum (b) with a 23.5MHz out- put (FCW=1) and a clock at 12.021GHz . . . . . . . . . . . . . . . . . . . . 27 2.12 Measured DDS output spectra at Nyquist rate (FCW=511). (a) The out- put frequency at 5.930GHz and the image tone at 5.98GHz with a clock at 11.913GHz; (b) The output frequency at 5.04GHz and the image tone at 5.08GHz with a clock at 10.110GHz . . . . . . . . . . . . . . . . . . . . . . 27 2.13 Measured DDS output spectrum with a 1.7898GHz output and a 9.59GHz clock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.14 Measured DDS output waveform with 1.125GHz output and 9GHz clock . . 29 2.15 Measured DDS output SFDR versus frequency control word at -20C ambient temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1 Direct modulation through a DDS . . . . . . . . . . . . . . . . . . . . . . . 35 x 3.2 Extend the output frequency range using a quadrature DDS and SSB mixers 36 3.3 Conceptual drawing of the quadrature DDS RFIC . . . . . . . . . . . . . . 37 3.4 Detailed block diagram of the qaudarture DDS . . . . . . . . . . . . . . . . 40 3.5 Output sine and cosine waveform depending on the symmetry property of the sine waveform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.6 DAC current switch circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.7 Die photo of the quadrature DDS MMIC . . . . . . . . . . . . . . . . . . . 51 3.8 Test setup of the quadrature DDS RFIC . . . . . . . . . . . . . . . . . . . . 51 3.9 Measured single phase DDS output spectrum with clock at 9.07 GHz and output at 2.227GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.10 Measured quadrature phase DDS output spectrum with clock at 5.44 GHz and output at 0.397GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.11 Measured qaudrature phase DDS output spectrum with clock at 6.815 GHz and output at 3.394GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.12 Measured DDS output waveforms without deglitch lter at 0.389GHz with clock at 6.2GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.13 Measured quadrature DDS output waveforms at 1.58GHz with clock at 6.3GHz 56 4.1 Concept diagram of the frequency synthesizer . . . . . . . . . . . . . . . . . 60 4.2 Block diagram with the circuit of quadrature VCO . . . . . . . . . . . . . . 61 4.3 Circuits of up-convert and down-convert mixers . . . . . . . . . . . . . . . . 63 4.4 Frequency synthesizer die photo . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.5 Measured 37MHz output waveforms with a 6.4GHz QDDS . . . . . . . . . . 66 4.6 Measured output spectra of 4.6GHz QDDS clock input and 11.7GHz LO output 67 4.7 Measured output spectra of 4.6GHz QDDS clock input and 2.3GHz QDDS output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 xi 4.8 Measured output down-converted 9.4GHz output . . . . . . . . . . . . . . . 69 5.1 One cycle of the waveform with constant sampling step . . . . . . . . . . . 72 5.2 Block diagram of the ring oscillator waveform generator . . . . . . . . . . . 73 5.3 Simpli ed circuit of current switch with 3-bit+sign programmable current source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.4 Die photo of the waveform generator . . . . . . . . . . . . . . . . . . . . . . 77 5.5 Simulated output waveform during data loading . . . . . . . . . . . . . . . . 78 5.6 Simulated output sine waveform . . . . . . . . . . . . . . . . . . . . . . . . 78 5.7 Simulated output waveform during transition . . . . . . . . . . . . . . . . . 79 5.8 Measured output waveform during data loading . . . . . . . . . . . . . . . . 79 5.9 Measured synthesized arbitrary waveform . . . . . . . . . . . . . . . . . . . 80 5.10 Measured synthesized arbitrary waveform . . . . . . . . . . . . . . . . . . . 80 xii Chapter 1 INTRODUCTION 1.1 Background of DDS High speed frequency synthesizer with ne tuning step and large tuning range is the crucial part in the modern wireless communication. However, the conventional phase locked loop (PLL) based frequency synthesizer can?t meet the requirement due to internal loop de- lay, low resolution and limited tuning range of voltage controlled oscillator (VCO). DDFS provides many advantages including fast frequency switching, ne frequency tuning resolu- tion, continuous-phase switching, and allowing direct phase and frequency modulations in the digital domain. The traditional DDFS contains the ROM to store sine waveform data and the ROM size is exponentially proportional to the desired phase resolution. The ROM for sine look- up table occupies the majority of the DDFS area and also limits its maximum operation frequency due to the delay through the multi-layer decoders. The simplest method is to reduce the ROM size is based on the quarter-wave symmetry in the sine function and cut the ROM size by a factor of 4. Though many other ROM compression methods have been proposed, such as trigonometric approximation and parabolic approximation [18], the problems indicated above still exist. A novel approach is to replace the conventional linear DAC that converts digital am- plitude words to analog amplitude waveform with a nonlinear one that converts the digital phase word into an analog sine waveform directly [13]. Thus the ROM is completely re- moved and the performance of the DDFS is improved signi cantly. As to design the high 1 speed nonlinear DAC, current-steering DAC [15] becomes the ideal candidate which can generate a Nyquist output signal with high accuracy at a high update rate [26]. 1.2 Applications of DDS Integrating a millimeter-wave (mm-wave) frequency synthesizer into a wireless transceiver that can accommodate multiple standards coexisting in communication systems has been a challenging task and attracted great interest in recent years. One conventional approach to cover the frequency bands for di erent standards is to use a phase-locked loop (PLL) based frequency synthesizer. However, multi-band PLL synthesizers consume large die area and power. Digital synthesis of highly complex wideband waveforms at the highest possible frequency would considerably reduce the size, weight, power and cost of modern communication systems. Recent developments in communication and radar systems are placing increasing demands on low power consumption, high output frequency, ne fre- quency resolution, fast channel switching and versatile modulation capability for frequency synthesis. These requirements are surpassing the performance capabilities of conventional analog PLL synthesizers. It is di cult for the PLL-based frequency synthesizer to meet these requirements due to internal loop delay, low resolution, modulation problems and the limited tuning range of the voltage-controlled oscillator (VCO). In contrast, a direct digital synthesizer (DDS) generates a digitized waveform at a desired frequency by accumulating the phase word at a higher clock frequency. DDS is a digital technique for frequency synthe- sis, waveform generation, sensor excitation, and digital modulation/demodulation. Since there is no feedback in a DDS structure, the DDS is capable of extremely fast frequency 2 switching or hopping at the speed of the clock frequency. A DDS provides various mod- ulation capability and many other advantages, including ne frequency tuning resolution, continuous-phase switching and the ability to provide quadrature signals with accurate I/Q matching. Furthermore, a DDS can generate arbitrary waveforms in the digital domain. The increasing availability of ultra high-speed DACs allows a DDS to operate at mm-wave frequency, providing an attractive alternative solution to conventional analog PLL synthe- sizers. Radar systems demand highly accurate control over the output frequencies and phases of the frequency synthesizers in radar transceiver for coherent detection. It is not uncommon that the modern radar systems require frequency synthesizers with low power consump- tion, high output frequency, ne frequency resolution, fast channel switching and versatile modulation capability. These requirements are surpassing the performance capabilities of conventional analog phase-locked loops (PLL). It is di cult for the conventional PLL-based frequency synthesizer to meet these requirements due to internal loop delay, low resolu- tion, modulation problems and the limited tuning range of the voltage-controlled oscillator (VCO). In contrast, a direct digital synthesizer (DDS) is capable of fast frequency hopping, ne frequency tuning, continuous-phase switching, direct modulation, arbitrary waveform and quadrature signal generations. The advance of technology brings the device operating frequency to a higher level, increases the circuit density and cuts down the manufacture cost. With the improvement of the technology, it becomes feasible to implement a single chip DDS operating at mm-wave frequency at a reasonable cost, replacing the conventional analog PLL synthesizers in radar systems. 3 1.3 Performance speci cations of DDS One of the most important metric for DDS is the spurious free dynamic range (SFDR). SFDR is de ned as the ratio of the RMS amplitude of the carrier frequency (maximum signal components) to the RMS value of their next largest distortion component. SFDR is usually measured in dBc. It can be shown as following SFDR = 20log10 Vcarriermax(V spur) (1.1) Another useful metric for DDS is the signal-to-noise ratio(SNR). SNR is the ratio of the power of the desired signal to the total power of noise signals, which is always expressed in dB. It can be shown as following SNR = 10log10 PcarrierPP spur (1.2) 1.4 Outline This dissertation is organized as follows: The rst chapter is some basic introduction of the DDS. The second chapter introduces the ultra-high speed single phase DDS design. In the third chapter, the single phase DDS will be extended to quadrature DDS. In chapter four, by adding the internal mixer, the output frequency of a multiple GHz DDS can be moved to a higher frequency. In chapter ve, a ring oscillator based periodical waveform generator will be presented. The last chapter is the summary of the whole work and gives some thoughts for the future work. 4 Chapter 2 SINGLE PHASE SIGE DDS 2.1 Introduction Though CMOS technology can be used to achieve better integration and reduce total cost, heterojunction bipolar transistor (HBT) technology is more favorable in microwave analog circuit design for their high current gain and low device noise in this frequency range. Two major candidates for high speed DDS design are indium phosphide (InP) HBT and silicon germanium (SiGe) HBT technologies. The mobility of the carriers in InP devices is high and the cut o frequency of the device can be well over 300GHz, but the yield of complicated InP designs still is lower than those of other mature technologies. Taking the device performance, manufacture cost and integration density into consideration, SiGe process appears to be a better choice for DDS circuit design. Radar systems demand highly accurate control over the output frequencies and phases of the frequency synthesizers in radar transceiver for coherent detection. It is not uncommon that the modern radar systems require frequency synthesizers with low power consump- tion, high output frequency, ne frequency resolution, fast channel switching and versatile modulation capability. These requirements are surpassing the performance capabilities of conventional analog phase-locked loops (PLL). It is di cult for the conventional PLL-based frequency synthesizer to meet these requirements due to internal loop delay, low resolu- tion, modulation problems and the limited tuning range of the voltage-controlled oscillator (VCO). In contrast, a direct digital synthesizer (DDS) is capable of fast frequency hopping, ne frequency tuning, continuous-phase switching, direct modulation, arbitrary waveform 5 and quadrature signal generations. The advance of technology brings the device operating frequency to a higher level, increases the circuit density and cuts down the manufacture cost. With the improvement of the technology, it becomes feasible to implement a single chip DDS operating at mm-wave frequency at a reasonable cost, replacing the conventional analog PLL synthesizers in radar systems. 2.2 Ultra-high speed DDS architecture A conventional DDS consists of three primary building blocks, a phase accumulator, sine/cosine mapping block, and digital-to-analog converter (DAC), which performs the dig- ital amplitude to analog amplitude conversion [20]. A deglitch lter is normally added o -chip to smooth the waveform by removing the unwanted spectral components. The phase control word (FCW) at the input of the phase accumulator determines the output frequency of the DDS. The sine/cosine block maps the accumulated phase to the sine or cosine amplitude. Depending on the transfer characteristic of the DAC, a DDS can be characterized in three types, as shown in Fig. 2.1. The rst type represents the conventional DDS that has a linear DAC, and the phase-to-amplitude conversion is done in the digital domain using a sine look-up table [20]. The second one also contains a linear DAC, but the sine/cosine conversion is performed in the analog domain by converting an analog triangle waveform to an analog sine waveform [12]. The third type is a ROM-less DDS that combines both the sine/cosine mapping and digital-to-analog conversion in a nonlinear DAC whose current sources are weighted with sine amplitude information [2][13]. 6 P h a s e A c c u m u l a t o r D i g i t a l S i n e / C o s i n e M a p p i n g L i n e a r D A C P h a s e A c c u m u l a t o r L i n e a r D A C A n a l o g S i n e / C o s i n e M a p p i n g P h a s e A c c u m u l a t o r N o n l i n e a r D A C W i t h S i n e / C o s i n e M a p p i n g Figure 2.1: DDS architectures The rst type of DDS has several variations, depending upon the di erent mapping methods employed in the phase-to-amplitude look-up table. In the traditional DDS, a sine look-up table is built using a ROM which stores the sine/cosine mapping information. However, the ROM size expands exponentially with phase resolution. The sine ROM look- up table occupies the majority of the DDS area, and also limits its maximum operation frequency due to delay through the phase decoders. The simplest way to reduce the ROM size is to employ the quarter-wave symmetry of a sine function, reducing the ROM size by a factor of 4. Numerous ROM compression techniques have been proposed, including trigonometric approximation [19][16], parabolic approximation [18], and interpolation [14]. Even though these compression methods par- tially alleviate the problem, the internal delay caused by retrieving ROM data still restricts the speed of the DDS. Another approach employs series expansions, such as a Taylor ex- pansion or polynomial expansion, to approximate the ideal curve. The coordinate rotating digital computer (CORDIC) method calculates the amplitude directly, based on the pro- jection of a rotating vector in a polar axial system [25]. Both the series expansion and 7 CORDIC approaches require a considerable amount of hardware, and the complexity limits the nal speed, so these structures normally appear in DDS implementations below the multiple GHz range. Using improved di erential CORDIC [10], the theoretical output of the ROM-based DDS can reach the GHz range, but its performance still needs proof-in- silicon. Implementing a GHz ROM simply consumes too large a power and area. Ref. [10] implemented a linear-DAC DDS in 0.25 m CMOS technology with 1.2GHz clock speed. However, it?s very di cult, if not presently impossible, to implement a ROM-based DDS with clock frequency beyond 10GHz and amplitude resolution larger than 8 bits. The second type converts a linear analog triangle waveform to an analog sine waveform. This technique utilizes bipolar di erential pairs to perform the conversion task by choosing degenerating resistor values and biasing currents to t the rst two terms of a Taylor expan- sion. Theoretically, 0.1% total harmonic distortion can be achieved, which corresponds to a 30dB signal-to-noise ratio, or 5 e ective bits. Stringent current requirements in the di er- ential pairs limit the usage of this method, particularly when the capacitive load that must be driven varies as a result of di erent applications. The third type of DDS is a ROM-less DDS with a nonlinear DAC. The rst two structures require a linear DAC, while in a ROM- less DDS, the ROM is removed, and a nonlinear DAC serves as the phase-to-amplitude and digital-to-analog converter. The sine weighted DAC eliminates the sine look-up table, which is the speed and area bottleneck for high-speed DDS implementations. Our design employs the ROM-less structure with a nonlinear current steering DAC. This structure combines the sine/cosine mapping block and the digital amplitude to ana- log amplitude conversion block, thus signi cantly improving the speed of the DDS. In the ROM-less DDS design, the current steering DAC structure is an ideal candidate capable of 8 D i g i t a l S i n e / C o s i n e M a p p i n g L i n e a r DAC F M C o n t r o l P M C o n t r o l A M C o n t r o l N o n l i n e a r D A C W i t h Si n e / C o s i n e M a p p i n g F M C o n t r o l P M C o n t r o l F C W F C W L i n e a r D A C A n a l o g S i n e / C o s i n e M a p p i n g F M C o n t r o l P M C o n t r o l F C W Figure 2.2: Digital modulation capability in di erent DDSs generating a Nyquist output signal with excellent accuracy and high update rate. Digital domain modulation can be easily implemented in a DDS, as illustrated in Fig. 2.2. Fre- quency modulation (FM), chirp, and phase modulation (PM) can be easily implemented in all three types of DDS. However the rst type can implement amplitude modulation (AM) in digital domain prior to the DAC, while the other two can implement AM only in the analog domain. Delta-Sigma modulation can also be added in the DDS to improve the output spectral purity and to reduce the e ective number of phase bits [5]. Quadrature rotation can also be implemented in a DDS with quadrature outputs. A quadrature DDS consists of a shared phase accumulator and two DACs with sine and cosine outputs. If linear DACs are used, quadrature rotation can be implemented in digital domain, since digital quadrature waveforms are available at the inputs of the DACs. For a ROM-less DDS, quadrature rotation can only be implemented in analog domain using mixers. A mm-wave quadrature DDS has been implemented in SiGe technology with clock frequency beyond 6GHz [34]. 9 P h a s e A c c u m u la t o r S i n e O u t p u t M S B 2 n d M S B F C W N o n li n e a r D A CC o m p le m e n t o r N N P P - 2P - 2P h a s e T r u n c a t o r Figure 2.3: Conceptual diagram of the ROM-less DDS 2.3 DDS spectra purity The conceptual block diagram of the ROM-less DDS, employing a nonlinear DAC, is shown in Fig. 2.3. In order to save die area and power, the phase accumulator output is normally truncated. For instance, the output of the phase accumulator is truncated into P bits, according to the signal-to-noise ratio (SNR) requirement of the DDS output. The two MSBs are used to determine the quadrant of the phase accumulator output, according to the quadrant symmetry of the sine wave. The lowest P 2 bits are fed through the complementor and converted to a sine waveform by the nonlinear DAC. The sinusoidal waveform data are programmed into the current source matrix of the DAC, and the output currents are summed from the DAC output. In the process of discrete phase accumulation and phase word truncation, spurs and quantization noise will be introduced at the DDS output spectrum as discussed below. The N-bit FCW feeds a phase accumulator that controls the output frequency of the synthesized sine waveform as fout = FCW2L fclock (2.1) 10 where fclk is the DDS clock frequency. Thus, the desired output period is given by T0 = 2NFCWTclk. For an N-bit discrete phase accumulator, there is another periodicity, i.e., Tspur = 2NGCD(FCW;2N)Tclk , where GCD(a;b) denotes the greatest-common-divisor of a and b. The accumulator repeats its value at the intervals of Tspur, which generates equally spaced spurious tones located at multiples of the frequency fspur = GCD(FCW;2 N) 2N fclock (2.2) When the input frequency word is a power of two, i.e., FCW = 2i, there will be no spurs due to discrete phase accumulation. In this case, GCD(FCW;2N) = FCW, namely, the accumulator output repeats at the same value after every over ow. The phase truncation process also introduces spurs and quantization noise, which can be modeled as a linear additive noise to the phase of the sinusoidal wave. Phase truncation error is periodic [16]. If the P most signi cant bits (MSB) of an N-bit phase word are used to address the DAC or lookup table, the truncation resultant spurs are mixed with the DDS output frequency generating spurs at multiples of the frequency fspur = GCD(FCW;2 N P) 2N P fclock (2.3) Note the phase truncation causes errors only when the greatest-common-divisor GCD(FCW;2N) 2N P. Otherwise, the N P least-signi cant bits (LSB) of the phase word vanish and the phase truncation does not cause any error. In addition to the spurious components, the DDS output waveform will su er AM distortion due to the nite number of levels that cannot accurately represent the output 11 waveform. The envelope of the DDS output waveform is modulated by a sine wave with the frequency of fenvelope = 2 N 1 mod FCW 2N 1 fclock (2.4) where AmodB represents the integer residue of A modulo B. If 2NmodFCW = 0, no amplitude modulation will be observed. For a Nyquist output, the frequency of the amplitude modulation becomes fenvelope = 2 N 1 mod (2N 1 1) 2N 1 fclock = 1 2N 1fclock (2.5) Therefore, the envelope of the DDS output waveform is modulated by a low frequency signal except when the FCW is an integer power of 2 such that 2N mod FCW = 0 . 2.4 DDS circuit design The implemented ultra-high speed DDS MMIC is comprised of a 9-bit pipeline accu- mulator, and an 8-bit sine-weighted current steering DAC, as shown in Fig. 2.4. Since the output frequency cannot exceed the Nyquist rate, an 8-bit FCW is fed into a 9-bit pipeline accumulator with the MSB of the accumulator input tied to zero. The output of the pipeline accumulator is a 9-bit phase word, whose LSB will be truncated before driving the 8-bit DAC. One bit truncation reduces the size and power consumption of the DAC with minimum spurious penalty. The MSB output of the phase accumulator is used to provide the proper mirroring of the sine waveform about the phase point. The second MSB is used to invert the remaining 6-bits for the second and fourth quadrants of the sine wave prior to the decoding logic. Each column-row decoder has a linear 3:8 operation. The 12 P i p e l i n e A c c u m u l a t o r X O R D e c o d e r D e c o d e r B u f f e r B u f f e r D A C C u r r e n t S w i t c h S i n e W e i g h t e d C u r r e n t M a t r i x 6 b M S B 2 nd M S B 3 b 3 b 7 b 7 b 63 O u t p u t V C O B u f f e r 1 b M S B C l o c k O u t p u t 8 b F C W Figure 2.4: Block diagram of the implemented DDS MMIC outputs of the column-row decoders go to the switch matrix to control the switches in each cell [15]. The latch and switch matrices contain 64 cells, and each of the cells is comprised of a local decoder, latches, and switch pairs. The current switch outputs are summed at open-collector output nodes. Next, the circuit design of the DDS building blocks will be discussed. 2.4.1 Pipelined accumulator The speed of the DDS is often limited by the speed of the phase accumulator. The speed of the accumulator depends upon the N-bit adder design. The simplest way to construct an N-bit adder is to place N 1-bit adders in a chain starting with a 1-bit half adder followed by (N-1) 1-bit full adders with the carry-in of the full adder connected to the carry-out of the previous bit. This ripple adder topology uses the least hardware, but operates at the slowest speed. The delay of a ripple adder is due to the propagation of the carry bit from 13 the LSB to the MSB. The sum and carry-out of a full adder can be expressed as: SUM = A B Cin Cout = A B +B Cin +Cin A (2.6) where A and B are the input bits and Cin is the carry in of the adder. The delay of an N-bit ripple adder is given by Delayripple = (N 1)Tcarry +Tsum (2.7) where Tcarry is the time for carry generation and is equal to twice the delay of an AND gate. Similarly, Tsum is the time for sum generation in a 1-bit adder and is about twice the delay of an XOR gate. If the accumulator input is time-invariant, each bit of the input word and the adder output bits can be properly delayed so that a N-bit accumulator can operate at the speed of a 1-bit adder. This type of accumulator, called a pipelined accumulator [7, 11], uses the most hardware, but achieves the fastest speed. Ref. [11] employed the pipeline adder architecture to implement the phase accumulator for a numerically controlled oscillator (NCO). Fig. 2.5 illustrates a generic architecture for an NxM pipelined accumulator with a total of M pipelined rows. Each row has a total of M delay stages placed at the input and output of an N-bit adder. Obviously, an NxM pipelined accumulator has a latency period equal to the propagation delay of M-1 clock cycles. Note that an accumulator needs at least one delay stage even without any pipelined stages. The pipeline accumulator shown allows 14 M - 1 F l i p - f l o p s A B S U M C o u t N N - b i t A d d e r N N N N N - b i t FF N - b i t FF N - b i t FF N - b i t FF 1 - b i t FF C in A B S U M C o u t N N - b i t A d d e r N N N N N - b i t FF C in M F l i p - f l o p s N - b i t FF N - b i t FF ? ?? ?1:1 ???? NMNMF C W ? ?1:0 ?NF C W ? ?? ?1:1 ???? NMNMP h a s e ? ?1:0 ?NP h as e M P i p e li n e d R o w s Figure 2.5: NxM generic architecture of a pipelined accumulator the NxM bit accumulator to operate at the speed of an N-bit accumulator, i.e., a speed-up of M times. When the number of adder bits is set to one (N = 1), the 1xM bit accumulator can operate at the same speed as a 1-bit adder. To realize a 9-bit accumulator, we can set N = 1 and M = 9. Then, a 9-bit accumulator will run at the speed of a 1-bit accumulator consisting of a full-adder and a ip- op. The pipelined accumulator is used for constant input words and can achieve the max- imum operating frequency, whereas an accumulator with a carry look-ahead (CLA) adder can be employed for variable inputs with medium operation frequencies. To achieve the maximum operating speed with a xed FCW, a pipelined accumulator is used in this de- sign. The total delay of the accumulator is one full adder propagation delay plus one D- ip- op propagation delay. The MSB of the accumulator input is tied to zero, since the FCW will not exceed half of the clock frequency. The LSB of the pipeline accumulator output is discarded and only its 8 MSB bits are fed to the nonlinear DAC. The ip- ops in 15 the accumulator were designed with a reset signal that can be used to reset the accumulator to zero. In general, the ripple carry adder has complexity in the order of O(N) and delay proportional to N, where N is the FCW length of the accumulator. The hardware cost of the pipeline accumulator is of the order O(N2). In order to properly trade the area for power, k-bit adders can be used for each pipeline stage as illustrated in Fig. 2.5. We implemented a 1x8 pipelined accumulator in order to achieve the maximum speed. If 2-bit adders are used in each pipeline stage, the critical path delay will not double based on Eq. (7). Thus, the accumulator speed will be greater than half of that of the accumulator using 1-bit pipelined adders. 2.4.2 SiGe CML logic Previous ultra high speed DDS designs used InP technology in order to take advantage of the high speed InP transistors. However, these InP DDS designs su er from high power consumption and low yield. This DDS design utilizes a commercial 0.18 m SiGe BiCMOS technology with the HBT peak ft/fmax of 120/100GHz. The digital logic is implemented using current mode logic (CML) cells with di erential output swings of 400mV. For a 3- level CML circuit, a 3.3V power supply is su cient to keep all the BJT transistors from saturation. If an NPN transistor operates in saturation mode, its speed is greatly degraded and its parasitic PNP transistor is turned on, causing increased noise coupling through the substrate. In order to achieve a good balance between speed and power consumption, the bias current is set to 70% of the peak fT current. Further increasing the biasing current doesn?t 16 speed up the CML circuit signi cantly. It?s not proper to bias the CML circuits at peak fT current, since any variation of the biasing current may drive the circuit beyond fT current, slowing down the transistors signi cantly with unnecessarily large power consumption. Al- though the peak fT current is not a parameter that guarantees the operational speed for di erent CML circuits with di erent loads, it?s a good indicator for the average speed of the CML logic circuits. The current in a typical three-input CML gate is 0.55mA, which is less than 1/5 of that used in InP DDS designs [21]. This bias current is su cient to keep the delay of the three level gates below 25ps. To provide more headroom for bipolar transistor operation, the current source of the CML logic uses an NMOS transistor with degeneration resistor. In this case, the overdrive voltage of the current source MOSFET is around 0.4 0.5V, which is smaller than the head- room required by a bipolar transistor. In order to ensure that all the critical paths have the same delays, the signal paths are designed using symmetrical patterns. Pipelined adder stages are used to achieve the speed that is equivalent to a 1-bit adder. To reduce the logic requirement of the adder, the structures described in [3] are adopted, as shown in Fig. 2.6. The sum and carry-out are implemented using one current tail for low power application. This adder circuit reduces the total number of bipolar transistors in the sum circuit from 14 to 10, and provides a speed improvement of around 15%. The delay of the sum block is estimated to be 30ps, and the carry block is 25ps, with optimized biasing. The breakdown voltage BVCEO of the NPN transistors in the 0.18 m SiGe BiCMOS technology is approximately 1.8V. With 4-stacked NPNs under a 3.3V supply, each transis- tor will experience less than 1V voltage cross the C-E junction. In addition, all the circuits 17 Figure 2.6: CML full adder circuit are self-biased with no base open, which guarantees safe operation of the transistors without breakdown. 2.4.3 Clock and MSB trees The most challenging parts of the design are the clock tree and MSB bu er tree designs. To eliminate glitches due to code errors induced by clock skews, clock trees are carefully bal- anced to ensure synchronization and drive capability. Because the di erential clock signals drive every ip- op cell, and the total number of ip- ops is above 200, synchronization of the clock signals is not a trivial task. With a clock input frequency around 10GHz, the current gain of the transistor degrades to about 10. Thus, the fan-out ratio of a clock bu er is only 3 4, and the depth of the clock bu er chain is at least 6 levels for this design. To fully turn on or turn o the di erential pairs, the input di erential peak-to-peak voltage swing should be more than 6VT + IERE, in which VT is the thermal voltage and IE and 18 RE are the emitter current and emitter resistance of the bipolar transistor, respectively. The voltage swing also depends on junction temperature, which can reach above 100C for normal operating conditions. The clock signals at the ip- op cells should swing no less than 150mV, which is equal to 6VT at room temperature. Since every switching cell in the DAC has an MSB signal, the total number of gates that the MSB must drive exceeds 120. This MSB signal must also be synchronized with other decoded digital bits. The depth of the bu er chain for the MSB signal is 5 levels. To accomplish all of this, the clock and MSB bu ers require careful design, with layout symmetry and balance, in order to ensure synchronization along the clock and MSB distributions. 2.4.4 DAC current source and switch The essential building block of the nonlinear DAC is the sine weighted current source matrix. The unit current of each current source is 0.1mA, which should provide the current switches with enough switching speed when toggling. The largest source current is 0.7mA, which is composed of 7 identical current sources. The table in Fig. 2.7 indicates the number of unit current sources in each sine-weighted current source. The sum of each row is the same, which assures the regularity of the current source array, as well as its compactness. The current source matrix provides 64 pairs of sine-weighted currents that are summed at the di erential current outputs, OUTP and OUTM. The current outputs are converted to di erential voltages by a pair of o -chip 25 pull-up resistors. Fig. 2.7 shows that the currents from the cascode current sources are fed to outputs, OUTP and OUTM, by pairs of switches (Mswitch). The MSB controls the selection between di erent half periods. The current switch contains two di erential pairs, with minimum size transistors, and a cascode 19 M S B m M S B p O U T p O U T m M s w i t c h Q m Q p D m D p S m S p C L K p C L K m P u l l u p r e s i s t o r V cas V r e f D m D p C p C m M S B m M S B pM s w i t c h Q m Q p D m D p S m S p C L K p C L K m V cas V r e f D m D p C pC m D A C c u r r e n t c e l l D A C c u r r e n t m a t r i x 4 Figure 2.7: Current switch circuit of the nonlinear DAC transistor to isolate the current sources from the switches, and improve the bandwidth of the entire group of switching circuits. The size of the switching transistor pairs is chosen to be minimal in order to achieve the fastest switching speed with minimum power consumption, and to reduce the e ect of clock feed-through. For the current steering DAC, the impedance Zimp seen at the collectors of the switch transistors of each current cell must be large enough so its impact on the integral non- linearity (INL) speci cation of the DAC can be tolerated [23]. However, Zimp is frequency 20 D p S m D m V r e f C p C m C p V C C Q p Q m S p V r e f Figure 2.8: Synchronous switch control circuit of the nonlinear DAC dependent. The impedance that is required to obtain a speci ed resolution is approximately Zimp = NRL4Q (2.8) where RL is the load resistance, N represents the total number of unit current sources, and Q is the ratio of the signal to the second harmonic. To obtain 8-bit output resolution, Zimp should be approximately 500k . When the frequency increases above 100MHZ, a cascode current source is needed to meet the requirement for Zimp. Uncertainty of the switching time of current switches is one of the major causes of glitches at the DAC outputs. To synchronize the switches of the DAC, a D- ip- op with NAND function is inserted between the MSB control bit and the switch pairs in the DAC. The Sp and Sm are controlled by MSB signal and select either 0 or 180 degree phase, as shown in Fig. 2.8. 21 Device matching is one of the important factors that a ect the static and dynamic performance of the DAC. The matching properties of SiGe HBT bipolar transistors are normally one order of magnitude better than those of MOSFETs with similar feature sizes. To reduce of IR drops and matching errors, one must carefully choose the current source transistor sizes and layout placements, and use wide interconnections. For long intercon- nections carrying global signals, such as the clock and the MSB phase word, transmission line e ects are taken into consideration during the layout. In order to minimize parasitic capacitances and inductances, thick analog metal layers are used for global signal routing. 2.5 Layout When running at a 10GHz clock rate, layout plays an important role in assuring that the nal design meets the expected speed requirement. The current source matrix and the switching matrix are separately laid out and isolated from each other using a deep oxide trench to reduce noise coupling from the digital circuitry to the current sources through the substrate. The output of the DAC is placed close to the output pins to reduce interference from the rest of the circuits. Di erential pairs are placed in a symmetrical manner so that the di erential signals travel the same distance. In order to make the layout compact and easy to cascade, the CML building blocks were designed to have the same height. Power supply distribution stacks several metal layers to reduce resistance. Cadence Skill language was utilized to generate the connections that form the unit current sources into the sine- weighted current sources, in accordance with the given switching sequence. Hence, the INL of the nonlinear DAC, due to symmetrical and gradient errors, is minimized. Two dummy rows and columns have been added around the current source array to avoid edge e ects. 22 To minimize the systematic error, introduced by the voltage drop in the ground lines of the current-source transistors, su ciently wide wires have been used. The clock inputs are di erential CML compatible signals, and multiple clock inputs are provided to reduce the parametric inductance resulting from the pins. The maximum delay of the metal wire is about 40ps, and the clock tree is carefully built to ensure an acceptable clock skew. 2.6 Experiment results The die photo of the DDS MMIC is shown in Fig. 2.9. This DDS design is quite compact with an active area of 2.3 x 0.7mm2 and a total chip die area of 3 x 3 mm2 including the ESD pads, the layer density lling elements and an 8.2 GHz on-chip VCO that could be used to clock the DDS. The DDS prototypes were packaged using 48 pin ceramic leadless packages. For a frequency range over 10GHz, the PCB test board was developed using a Rogers RO4003 laminate board, which has a loss tangent of less than 0.003 and good temperature stability. To convert the single-ended signal to di erential clock inputs, a 180 degree 3dB hybrid coupler is employed at the clock input. For the di erential outputs, a second hybrid coupler is inserted into the output path. The test setup diagram is illustrated in Fig. 2.10. Power consumption of the DDS with the DAC is approximately 1.9W, and the max- imum clock frequency as measured is 12.3GHz. With Nyquist output, the DDS achieves a maximum clock frequency of 11.9GHz. The digital and analog parts of a sine-weighted DAC consume 300mA from a 3.3V supply and 35mA from a 4V supply, respectively. The accumulator consumes 250mA of current with a 3.3V supply. 23 V C O P i p e l i n e A c c u m u l a t o r S i n e W e i g h t e d DAC Figure 2.9: Die photo of DDS chip S i g n a l G e n e r a t o r 3 d B H y b r i d c o u p l e r D D S T e s t B o a r d 3 d B H y b r i d c o u p l e r S p e c t r u m A n a l y z e r 50 O T e m i n a t o r 50 O T e m i n a t o r D C B l o c k D C B l o c k O s c i l l o s c o p e Figure 2.10: DDS MMIC test setup 24 Although the power consumption of the SiGe DDS is small compared to other InP DDS, its power density is high due to its small die size. For 1.9W power concentrated on a small die area of 9mm2, the power density of the DDS MMIC would exceed 21W/cm2, which is a number that normally appears only for high performance processors. The relatively high power density of the DDS MMIC makes it di cult to dissipate the heat when it?s packaged. The junction-to-ambient thermal resistance JA of the 48-pin ceramic package is about 40 C/W with zero air ow. Therefore, the device junction temperature of the DDS MMIC could reach above 100 C at the room ambient temperature of 25 C with 1.9W power consumption. For this reason, an external fan is used to cool the device during measurements. To further reduce the thermal resistance and maximize heat dissipation, packages with a heat sink can be used. To our knowledge, other InP MMICs [21, 22, 8] were tested on wafer, while this SiGe DDS MMIC was tested as a packaged part. To test the maximum speed, the packaged DDS chips were cooled down to -50 C -80 C such that the junction temperature is around room temperature. This test condition provides a fair comparison between the packaged SiGe DDS MMIC and the wafer-probed InP DDSs. Lowering the junction temperature improves the transistor speed due to increased carrier mobility at lower temperature. Without cooling, the maximum clock speed of the packaged DDS MMIC is measured as 9.6GHz with Nyquist output and 11GHz with FCW = 1 at room ambient temperature. For the 20 tested prototypes, the chip performances are quite consistent. SiGe technology gains advantages of high yield and high performance over the InP technology. 25 Fig. 2.11,2.12,2.13,2.14 illustrates the measured DDS output spectra and waveforms for di erent outputs and clock frequencies. The measured spectra were obtained by cool- ing down the packaged chips so that the device junction temperature approaches the room temperature. Fig. 2.11 presents the 23.5MHz DDS output waveform and spectrum with a 12.021GHz clock input. The time-domain waveform measurements were limited by the digital sampling scope?s 500 MHz bandwidth. The measured DDS output power is ap- proximately -6.67 dBm. All measurements were done without calibrating the losses of the cables, the coupler and the PCB tracks. Fig. 2.12 gives the measured DDS output spec- tra at Nyquist rate, namely, (a) output frequency at 5.930GHz with clock at 11.913GHz; and (b) output frequency at 5.04GHz with clock at 10.110GHz. Fig. 2.12(a) demonstrates the maximum DDS operation frequency of 11.9 GHz at Nyquist output with the SFDR of 22dBc. The measured SFDR of the device, at 5.08 GHz output frequency with a 10.11 GHz clock, is approximately 30dBc in narrow-band as shown in Fig. 2.12(b). For Fig. 2.12(a), the FCW is chosen as 28-1, which is the maximum allowed by an 8-bit FCW input. Thus, the output frequency is set at . The rst order image tone mixed by the clock frequency and the DDS output frequency occurs at 11.913GHz-5.93GHz = 5.98GHz, which is 50MHz apart from the output frequency, as shown in Fig. 2.12(a). Operating the DDS at close to Nyquist rate makes it very hard to lter out the image tones. Practically, the DDS output frequency is restricted to be less than 3/8 of the clock frequency. An image tone at 5.08GHz is also observed in Fig. 2.12(b) with a clock at 10.110GHz. Fig. 2.13 presents the measured DDS output spectrum with a 1.7898GHz output and a 9.59GHz clock. The measured output power of the DDS is -9 dBm, which corresponds to greater than -5dBm power when cable and coupler loss are considered. The input FCW 26 Figure 2.11: Measured DDS output waveform (a) and spectrum (b) with a 23.5MHz output (FCW=1) and a clock at 12.021GHz Figure 2.12: Measured DDS output spectra at Nyquist rate (FCW=511). (a) The output frequency at 5.930GHz and the image tone at 5.98GHz with a clock at 11.913GHz; (b) The output frequency at 5.04GHz and the image tone at 5.08GHz with a clock at 10.110GHz 27 Figure 2.13: Measured DDS output spectrum with a 1.7898GHz output and a 9.59GHz clock equals to 96, so that the GCD(96,29) = 32, which leads to spurs equally spaced with 600 MHz spacing around the fundamental tone when the clock is 9.59GHz. Fig. 2.14 gives the DDS output waveforms at 1.125GHz with a 9GHz clock. At high temperature, the transistors are slowed down and the DAC current switches are no longer perfectly synchronized due to increased internal delays. Fig. 2.14 demonstrates a clean sinusoidal output waveform with the package measurements at the 9.6GHz clock frequency. Fig. 2.15 illustrates the measured DDS output spurious-free-dynamic-range (SFDR) versus frequency control word with a 4.6GHz clock at ambient temperature of -20 0C. The measured SFDR ranges from 20 to 30 dBc. Compared to the theoretical 28 Figure 2.14: Measured DDS output waveform with 1.125GHz output and 9GHz clock analysis, the degradation of the measured SFDR are due to a combination of e ects including the wideband matching of the clock and output signals, nonlinearity associated with the nonlinear DAC, and noise coupling from the reference line, the substrate and the power supply. When compared with the InP DDS in [9], which operates at a 9.2GHz clock frequency, this design achieves similar SFDR performance, yet with much lower power consumption. Most of the InP DDS MMICs were measured using probe stations [21, 22], while this DDS RFIC was tested with packaged parts. Table 1 compares the recently published mm-wave DDS MMIC performances. The designs reported in [21] and [22] used InP technologies with an ft/fmax above 300/300 GHz, which is almost triple those reported here. The InP DDS[9] employs an 8-bit accumulator and an 8-bit DAC and operates at a maximum 29 0 50 1 0 0 1 5 0 2 0 0 2 5 0 0 5 10 15 20 25 30 35 F r e q u e n c y C o n t r o l W o r d S F D R ( d B c ) Figure 2.15: Measured DDS output SFDR versus frequency control word at -20C ambient temperature clock frequency of 9.2 GHz with a power consumption of 15 W. On the other hand, this SiGe 9-bit DDS consumes 1.9W with 3.3V power supplies for digital and analog circuits, respectively. The 4V power supply was tied to a pair of pull-up resistors, providing more voltage headroom and output swing for the DAC output stage. The VCO and the DDS are separately powered, and the 1.9W power consumption does not include the power of the VCO. As shown in Table 2.1, the minimum transistor size in the InP technology is much larger than SiGe technology. Although the current densities required to achieve peak fT frequency in InP and SiGe technologies are similar, the current required to operate the minimum transistor close to a peak fT frequency di ers quite a bit, which contributes to the superior power e ciency performance of this SiGe DDS. When compared with the 30 Technology InP InP InP InP TFASTInP SiGe fT/fmax[GHz] 137/267 300/300 300/300 180/266 406/423 120/100 [9] [21] [22] [8] [8] [Thiswork] Emitterarea of minimal size transistor[mm2] 1.5x4 0.4x2 0.4x2 0.5x2 0.25x1 0.2x0.64 Emittercurrent density at peak fT[mA/mm2] 1 1.2 5 5 - - 6 PeakfTcurrent ofminsizetran- sistor[mA] 7.2 4 4 - - 0.77 BreakdownvoltageBVceo [V] 8 4 4 - 5 1.8 Accumulatorsize[bit] 8 8 8 9 9 9 DAC resolution[bit] 7 7 5 - - 8 Maxclock frequency[GHz] 9.2 13 32 8 12 12.3cooled9.6room SFDRwithNyquistoutput[dBc] 30 26.67 21.56 38 30 22@12GHz27@10GHz Power consumption[W] 15 5.42 9.45 7 8 1.9 Transistorsnumber 3000 1646 1891 8695 8800 9600 Diesize[mm2] 8x5 2.7x1.45 2.7x1.45 4x2 - 3x3chip2.3x0.7active FOM[GHz/W] 0.5 2.4 3.386 1.1 1.5 6.3cooled5.05room Table 2.1: Ultra-high speed DDS performance comparison published DDS MMICs, this SiGe DDS achieves the best reported power e ciency FOM of 6.3GHz/W with a much smaller die size of 2.5x0.7mm2. 2.7 Conclusion In this chapter, a 12 GHz direct digital synthesizer (DDS) MMIC with 9-bit phase and 8-bit amplitude resolution has been implemented in a 0.18 m SiGe BiCMOS technology. Composed of a 9-bit pipeline accumulator and an 8-bit sine-weighted current steering DAC, the DDS is capable of synthesizing sinusoidal waveforms up to 5.93 GHz. The maximum clock frequency of the DDS MMIC is measured as 11.9 GHz at the Nyquist output and 12.3 GHz at 2.31 GHz output. The spurious free dynamic range (SFDR) of the DDS, measured at Nyquist output with an 11.9 GHz clock, is 22 dBc. The power consumption of the DDS 31 MMIC measured at a 12 GHz clock input is 1.9 W with dual power supplies of 3.3V/4V. The DDS thus achieves a record-high power e ciency gure of merit (FOM) of 6.3 GHz/W. With more than 9600 transistors, the active area of the MMIC is only 2.5 x 0.7mm2. The chip was measured in packaged prototypes using 48-pin ceramic LCC packages. 32 Chapter 3 QUADRATURE PHASES SIGE DDS 3.1 Introduction In wireless transceivers, quadrature clock signals are always required for the modulator and the demodulator. There are several ways to generate the quadrature waveforms that widely adopted in the circuit design. The rst is using a divider to divide down the output signal from a local oscillator or external source and the sine and cosine signals natively come out. This requires the output frequency of a local oscillator doubles the carrier frequency. The advantage is that the pulling e ect or DC o set coming from the local oscillator can be minimized. Another way is to implement a quadrature VCO which takes more area and consume more power that the single phase VCO. The third approach is by using a polyphase lter to convert single phase signal to quadrature phase outputs. To reduce the phase and amplitude imbalances, multiple stages polyphase lter may required. This will introduce insertion loss and thermal noise. For DDS design, to generate well balanced quadrature waveforms in a large frequency range natively will avoid the problems of the divider method and polyphase method. 3.2 Direct Modulations in DDS The conceptual diagram in Fig. 3.1 shows the method to implement di erent types of modulation con gurations in a ROM-less DDS employing a nonlinear DAC. The principle of a DDS can be brie y descried as, rst integrating the frequency control word into a phase control word, then mapping the phase control word to an amplitude word, nally converting 33 the amplitude word into an analog signal output. All the frequency, phase and amplitude information are readily available in the DDS data path and can be directly addressed and manipulated, thus the digital modulation can be done without too much extra hardware cost. By directly using digital control words to change the values of registers in the data path of a DDS, the frequency, phase and amplitude of the output waveforms can be precisely controlled. Since all the modulations are done in the digital domain, many disadvantages associated with normal analog modulations can be precluded. The values of the registers in a DDS are updated with a data rate that equals to the input clock frequency, which means that high speed modulated waveforms can be generated. Waveform generation for various modulation schemes is desired for novel radio transmitter architectures. As an example, modern radar systems place ever-increasing demand for a ordable low noise signals and high speed waveform generation. With the availability of single chip DDSs working at microwave frequency, digitally generating highly complex wide bandwidth waveforms at the highest possible frequency instead of down near baseband would considerably reduce the transmitter architecture in terms of size, weight and power requirements as well as cost. These waveforms are used for high range resolution radars in sorting targets from clutter with low probability of intercepting communication applications. The modulated waveform generation is a unique feature of the DDS approach. The DDS synthesizer can implement modulations and waveforms such as chirp, step frequency, frequency modulation (FM), frequency shift keying (FSK), minimum shift keying (MSK), phase modulation (PM), amplitude modulation (AM), quadrature amplitude modulation QAM and other hybrid modulations, as illustrated in Fig. 3.1. 34 C h ir p D a ta D D S F c lk ++ + P h a s e R e g is t e r C a r r ie r F r e q u e n c y W o r d F c Fc + Fb / 4 F r e q u e n c y R e g i s t e r + Fc - Fb / 4 M S K D a t a M U X F M D a t a P M D a ta N o n l in e a r D A C ~ ~ ~ D D S M o d u la t o r O u t p u t A M D a t a D e la y D i g i t a l F r e q u e n c y D o m a i n D i g i t a l P h a s e D o m a i n A n a l o g A m p l i t u d e D o m a i n Figure 3.1: Direct modulation through a DDS The typical choice converting the baseband signal from to polar magnitude and phase data to Cartesian I and Q data during the modulation is based on the normal practical consideration. Direct manipulation of magnitude and phase in polar system is expensive and di cult to design and build. The approach of taking DDS into transceiver system to perform the polar modulation task in addition to the normal frequency synthesis is one way to solve this problem that worth further exploring. Since the major parts of a DDS are digital circuits, it?s easier to integrate the DDS with baseband circuit and provides a compact solution to the transmitter design. Sometimes it is expected that the DDS output can cover more frequency range while the typical DDS output frequency ranges from DC to one third of the input clock frequency. When the output frequency closes to the Nyquist output, the frequency of the alias image will come closer to the output frequency, which made it almost impossible to be removed with analog low pass lter. To build low pass lter with steep roll o characteristic at several 35 sin?t cos?t I(t) Q(t) RFout RFout - Quadrature DDS ? ? Figure 3.2: Extend the output frequency range using a quadrature DDS and SSB mixers GHz will require tremendous e ort. A practical solution to extend the output frequency of a DDS to a wider range without incurring the problems of alias images are using single side band (SSB) mixer, as shown in Fig. 3.2. The local oscillator generates quadrature outputs with relatively xed output frequency 0, which are mixed with the outputs of a quadrature DDS. Then the mixer outputs are summed and subtracted with each other, so the up-converted cosine waveforms with a fre- quency of 0+ or 0- are derived. Theoretically the nal output should be clean of alias images. However, in practice the DDS output contains harmonics and spurs that signi - cantly deteriorate the purity of desired output waveforms. The imperfections of the mixers due to leakage and second order e ects will introduce some other spurs that have negative impacts on the output signals. Even though, the power of the alias image tune is small compare to the fundamental tune, which greatly easies the lter design. Assuming the local oscillator frequency is higher than the output frequency of the quadrature DDS, the above mixing scheme can be used to up convert the DDS output frequency to a higher frequency band. 36 Figure 3.3: Conceptual drawing of the quadrature DDS RFIC 3.3 DDS circuit design 3.3.1 Quadrature DDS architecture The simpli ed block diagram of the ROM-less quadrature DDS, employing one 9- bit pipeline accumulator and two nonlinear DACs, is shown in Fig. 3.3. Intuitively, by paralleling two single phase DDSs, one with sine output and another with cosine output, and then merging them together, a quadrature DDS can be realized. When performing the merge of two single phase DDSs, the goal is to share the commonly used circuits in both DDSs as much as possible. In this design, the phase accumulator is shared for the two DDSs due to the limitation of the fan-out factors of the digital logic gates at multi GHZ frequency, which leaves very marginal gain when sharing the decoders and other digital blocks inside the DACs. The main components in a single phase ROM-less DDS are phase accumulator and sine-weighted nonlinear DAC. For a DDS with L-bit frequency control word (FCW) and M-bit phase resolution DAC, the output frequency of the synthesized sine waveform is 37 modulated by the truncation error of the accumulator. The output of the phase accumulator is truncated into M bits to t the inputs of the nonlinear DAC. Usually the phase resolution of the DAC is much less than the resolution of the phase accumulator, then L-M bits are discarded, which introduces FCW depended spurs. The contribution of phase truncation related spurs to the total spurs and noise of the DDS output is considered to be a dominate factor if the following assumption is valid, that DAC is ideal or close to ideal. However, even for linear DAC, when sampling rate is over multiple GHz and transition of the magnitude is larger comparing to the full scale output, the validity of above assumption is no longer hold. For a nonlinear DAC, the situation is more complicated. It is not an easy task to reach high amplitude resolution with a nonlinear DAC at multiple GHz clock speed. In fact, the ultra high speed nonlinear DACs in the published works at most have 8bit amplitude resolution. The nonlinear DAC approach is still attractive for the microwave DDS design because it provides drastically speed improvement to the ROM based or algorithm based DDS design. To reduce the e ect of the amplitude error introduced spurs in an ultra high speed DDS needs to be taking into account during the design. The phase truncation error introduced spurs are already minimized because only one bit of the phase accumulator output has been truncated. As illustrated in Fig. 3.3, the quadrature DDS RFIC utilizes one 9-bit pipeline ac- cumulator and two nonlinear 8-bit sine-weighted current-steering DACs to simultaneously generate the sine and cosine waveforms. The DDS comprises a 9-bit pipeline accumulator. Since the out frequency cannot exceed the Nyquist rate, an 8-bit FCW is fed into a 9-bit pipeline accumulator with the most signi cant bit (MSB) of the accumulator input tied to zero internally. Thus, the DDS requires only 8-bit FCW inputs. The output of pipeline 38 accumulator gives a 9-bit phase word. The least-signi cant-bit (LSB) of the 9-bit phase word is truncation before driving the 8-bit DAC input. To produce the 90 degree phase word, binary number of ?01? need to be added to the two most signi cant bits of the DAC input. Translating the add function into gate level, the output of the MSB is the results of an Exclusive-OR (XOR) of the rst two MSB inputs and the output of the 2nd MSB is the inversion of the 2nd MSB input. Because all the digital logics have di erential outputs, only one XOR gate is needed to be inserted at the inputs of the sine-weighted DAC to converter it to DAC with 90 degree output phase di erence. Current steering DAC structure is chosen for its advantages of high speed and good matching between unit cells. The di erential current outputs of the nonlinear current steering DACs are converted to di erential voltage outputs with two pairs of external 15ohm pull-up resistors. The detailed block diagram of the quadrature DDS is shown in Fig. 3.4. On the right side there are two back-to-back sine-weighted current steering DACs and on the left side the phase accumulator is shared by both DACs. The DDS also includes a standard LC-tuned VCO, which can be connected to the input of the clock bu er on the upper side to drive the whole DDS. Since the two nonlinear DACs are identical, naturally it appears to be bene cial if all the decoders and bu ers in the DACs can be shared and only leave current switches and current sources separated. This is a plausible suggestion and worth to be investigated further. Before evaluating this alternative solution, the mechanism of the sine waveform generation in the nonlinear sine-weighted DAC will be explained. The pipeline accumulator integrates the input FCW to phase information. Due to the symmetry of a sine waveform, only one quarter of sine waveform data is stored in the sine-weighted DAC. The two MSBs are used to determine in which sine wave quadrant the 39 8 b S i n O u t 9 B i t P i p e l i n e A c c u m u l a t o r X O R D e c o d e r D e c o d e r B u f f e r B u f f e r C u r r e n t S w i t c h S i n e W e i g h t e d C u r r e n t S o u r c e F C W 6 b M S B 2 nd M S B 3 b 3 b 7 b 7 b 63 V C O B u f f e r 1 b M S B C L O C K _ O U T X O R D e c o d e r D e c o d e r B u f f e r B u f f e r C u r r e n t S w i t c h S i n e W e i g h t e d C u r r e n t S o u r c e 6 b 3 b 3 b 7 b 7 b 63 M S B X O R 6 b B u f f e r C L O C K C L O C K _ IN C o s O u t Figure 3.4: Detailed block diagram of the qaudarture DDS 40 phase accumulator output resides, according to the quadrant symmetry of the sine wave. The MSB output of the phase accumulator is used to provide the proper mirroring of the sine waveform at the phase point. The 2nd MSB is used to invert the remaining 6-bit for the second and fourth quadrants of the sine wave prior to the decoding logic. The 6-bit outputs are split to 3-bit and 3-bit and fed into two column-row decoders which drive column lines and row lines of the inputs of the current switch cells. Each column-row decoder in this circuit is a linear 3:8 operation. The outputs of the column-row decoder go to the switch matrix to control the switches in each cell. The latch and switch matrices contain 64 cells, and each is comprised of a local decoder, latches, and switch pairs. The current weights of the current sources inside the current source matrix are preset to the sinusoidal waveform data. The current switch outputs are summed at the open-collector output nodes. Sharing all the digital blocks before current switch cells in the DACs in order to reduce the circuit size or power consumption may not have too much impact on the performance of the DDS. The symmetry properties of sine and cosine waveform are di erent, particularly, sine is an odd function and cosine is an even function. The turning on sequences of the switch cells guarantee the complete sine waveform generation. According to the symmetry property of the sine waveform, the derived cosine waveform is not continuous, as shown in Fig. 3.5. So directly share all the logic before the current switch cells to simultaneous produce sine and cosine waveforms will encounter some problems. One way to overcome the di culty is to implement the XOR function which generates the rst two MSBs input for the cosine DAC at the input into the current switch cells. This method need add 64 or more logic gates to the switching cells. Considering the fan out factor of the logic cells at several GHz, the total loads on the signal paths are nearly the same. Thus the actual number of gates are 41 not reduced, the area cost and power consumption will maintain the same. It looks like that sharing the logic blocks before the current switch cells can only achieve marginal gain. In this design, only the phase accumulator is shared between the sine and cosine DACs. 3.3.2 Pipelined accumulator The adder in the phase accumulator of the DDS can be chosen from variant types such as pipeline, ripple-carry and carry-look-ahead adders. For modulation purposes, it would be bene cial to use a carry-look-ahead adder or ripple-carry adder, but their speeds are restricted by inevitably introducing more delay stages in the critical paths. Instead, to achieve maximum operating speed, a pipelined accumulator is adopted in this design. The delays of the accumulators are determined by the propagation delay of the full adder (FA), the D- ip- op (DFF) and the level shifter (LS) and can be expressed as Ttotal(Pipeline) = Tdq(DFF) +Tdq(FA) +Tdq(LS) Ttotal(Ripple carry) = Tdq(DFF) + (Tdq(FA) +Tdq(LS)) N (3.1) The total delay of carry-looked-ahead accumulator has a delay between those two. From the above equations it?s clear that the pipeline accumulator can run at least double clock speed and is straight forward to build. The ripple adder topology uses the least hardware, but operates at the slowest speed. The delay of the carry-looked-ahead accumulator is estimated with a maximum fan-in of 3 and a group size of 3. For the D- ip- op in the accumulator, a reset pin is added to reset the accumulator to the initial state. In this DDS, current mode logic cell (CML) has been chosen to implement the digital logic block. The breakdown voltage VCEO is 1.8V and the VBE is approximately 0.9V 42 p p 0 0 0 . 5 p 0 . 5 p 0 . 5 p 0 . 5 p 1 . 5 p 1 . 5 p 2 p 2 p S i n e C o s i n e Figure 3.5: Output sine and cosine waveform depending on the symmetry property of the sine waveform under typical bias condition. For a 3-level CML logic cell, to keep all the bipolar transistor work in the active region the minimum power supply voltage is VSWING + VBE + 2 VCESAT + VDSSAT, in which VSWING is the output amplitude of the CML, VCESAT is the saturation collector-emitter voltage of the SiGe bipolar transistor and the VDSSAT is the saturation overdrive voltage of the NMOS transistor in the current source. Roughly estimation indicates that the CML logic can work with a 2.7V power supply while the speed will be sacri ced. 3.3V is a more comfortable choice to ensure the base-collectors of all the BJT transistors are reverse biased. The current source of the CML logic uses a NMOS transistor with a degeneration resistor to provide more headroom for the bipolar transistor operation. The overdrive voltage of a NMOS FET is around 0.4 0.5V, which is signi cantly smaller than a normal bipolar transistor. Choosing optimized value of the bias current of the CML logic depends on several factors, the structure of this stage, total loads of the next stage and the drive strength of the previous stage, which means the bias current for every logic gate should be separately tuned. A more practical approach is to choose same bias current for the same type CML logic with the equal size devices. Here the bias current is set to 70% of the peak fT current to achieve a good trade-o between speed and power 43 consumption. Although it will be more meaningful to use peak fMAX current to specify the operational speed, the peak fMAX is related to the load of the previous stage, a variable factor determined by the circuit itself. After all the peak fT current still can be serve as a reasonable indicator of average speed of the CML logic circuits. As a result, the propagation delay of the sum logic is 30ps, and the carry block is 25ps when setting fan-out factor to 1 or 2. 3.3.3 DAC current source and switch circuits The essential building block of the nonlinear DAC is the sine weighted current source matrix. The smallest unit current of each current source is 0.1mA, which should provide the current switches with enough switching speed when toggling. The largest current in the current source is 0.7mA, which is composed of 7 identical current sources. The current switch contains two di erential pairs, with minimal sized transistors, and a cascade tran- sistor, to isolate the current sources from the switches, and improve the bandwidth of the entire group of switching circuits. The current source matrix provides 128 sine-weighted currents that are summed at the di erential current outputs, OUTP and OUTM. The cur- rent outputs are converted to di erential voltages by a pair of o -chip 25 pull-up resistors. Fig. 3.6 shows that the currents from the cascode current sources are fed to outputs, OUTP and OUTM, by pair of switches. The MSB controls the selection between Part A and Part B during di erent half periods. The size of the switching transistor pairs is chosen to be minimal in order to achieve the fastest switching speed, with minimum power consumption, and to reduce the e ect of clock feed-through. 44 M S B p M S B m D m D p R E F R E F C L K p C L K m C L K p V C C Q p Q m O U T m D p D m M S B m M S B p O U T p I S I S Q m Q p M S B m M S B p Q m Q p M S B m M S B p C L K p C L K m D - FF D - FF Figure 3.6: DAC current switch circuit 45 In the current steering DAC, the impedance Zimp seen in the drain of the switch tran- sistors of each current cell must be large enough so its impact on the integral non-linearity (INL) speci cation of the DAC can be tolerated [23]. However, Zimp is frequency depen- dent. To obtain 8-bit output resolution, Zimp should be about 500k . When the frequency increases above 100MHZ, a cascode current source is needed to meet the requirement for Zimp. Device matching is one of the important factors that a ect the static and dynamic performance of the DAC. The matching properties of SiGe HBT bipolar transistors are normally one order of magnitude better than those of MOSFETs with similar feature sizes. Carefully choosing current source transistor sizes and positions, and increasing the widths of the interconnections to reduce IR drops, helps to reduce matching errors. For those long interconnections carrying global signals like clock and the MSB phase word, transmission line e ects are taken into consideration during the layout. In order to minimize parasitic capacitances and inductances, top metal layers are used for global signal routings. Power consumption of the DDS in the Ghz range is always a severe problem due to the scale of the circuit and the high current density that the transistors require at these frequencies. To increase the operational speed, the current ow in the transistors should be increased proportionally to overcome both the transistor parasitic and interconnection loads. With a scaled-down device feature size, the latter plays a signi cant part in the total delay at relatively high speeds. The advantage of using bipolar transistors over their CMOS counterparts is that a bipolar transistor provides higher current gain while maintaining a reasonable size. The CMOS transistor area, on the other hand, is larger in order to generate the same amount current. The more meaningful spec of the bipolar transistor is the Fmax, which takes into account the base resistance and provides insight into CML circuit speed 46 under normal operation. The relatively high power density of DDS chips makes it extremely di cult to quickly remove heat without an external air ow. According to the ceramic package JA spec, it?s quite normal to get a 20 30 degree Celsius temperature increase per watt dissipation on the chip. If only the bare die area is considered, the situation is even worse, because a total power of approximately 2W would be concentrated on a small die area of 10mm2. Thus the indicated power density would exceed 100W/inch2, which is an alarming number and normally appears in only high performance processors. Without a heat sink, the chip temperature could reach 85 degrees, with 27 degrees of ambient temperature. For this reason, an external fan is used to cool down the device when making measurements. 3.3.4 Clock tree and MSB tree designs The most challenging parts of the design are the clock tree and MSB bu er tree de- signs. To eliminate glitches due to code errors induced by clock skews, clock trees are carefully balanced to ensure synchronization and drive capability. Because the di erential clock signals drive every ip- op cell, and the total number of ip- ops is above 200, syn- chronization of the clock signals is not a trivial task. With a clock input frequency around 10GHz, the fanout ratio is only around 3 4, and so the depth of the clock bu er chain is at least 6. To fully turn on or turn o the di erential bipolar pairs, the input di erential peak-to-peak voltage swing should be more than 6VT+IE*RE, in which VT is the thermal voltage, 26mV at the room temperature and IE, RE are emitter current and emitter re- sistance of the bipolar transistor respectively. The voltage swing also depends on junction temperature, which can reach above 80 degree Celsius at the normal operation condition. 47 The clock signals at the ip- op cells should swing no less than 150mV, which is equal to 6VT at room temperature. Since every switching cell in the DAC has an MSB signal, the total number of gates to be driven exceeds 120. This MSB signal must also be synchronized with other decoded digital bits. The depth of the bu er chain for the MSB signal is 5. To accomplish all of this, the clock and MSB bu er require careful consideration in order to ensure that each middle and end point load are well balanced. 3.3.5 Layout considerations When running at a 10GHz clock rate, layout plays an important role in assuring that the nal design meets the expected speed requirement. The current source matrix and the switching matrix are separately laid out and isolated from one another using a deep oxide trench to reduce the noise coupling from the digital part to the current source from the substrate. The output of the DAC is placed close to the output pins to reduce the interference from the other parts of the circuit. Di erential pair device signals are placed in a symmetrical manner such that the signal traveling lengths are almost same. In order to make the layout compact and easy to assemble, the CML building block has the same height, and power supply distribution stacks several metal layers to reduce the width of the metal. Table.3.1 indicates the number of unit current sources in each sine-weighted current source. The sum of each row is the same, which assures the regularity of the current source array, as well as its compactness. Cadence Skill language has been utilized to generate the connections which form the unit current sources into the sine-weighted current sources, in 48 2 4 6 6 6 5 3 0 0 3 5 6 6 6 4 2 2 4 6 6 6 5 2 1 1 3 5 6 6 6 3 2 3 4 6 3 6 5 3 2 1 3 5 6 7 5 4 1 2 4 5 6 6 5 3 1 0 4 4 6 7 5 5 1 Table 3.1: Number of unit current sources in sine-weighted current source accordance with the given switching sequence. Hence, the INL of the nonlinear DAC, due to symmetrical and gradient errors, is optimized. In the layout, two dummy rows and columns have been added around the current source array to avoid edge e ects. To minimize the systematic error, introduced by the voltage drop in the ground lines of the current-source transistors, su ciently wide lines have been used. The clock inputs are di erential CML compatible signals, and multiple clock inputs are provided to reduce the parametric inductance resulting from the pins. The maximum delay of the metal wire in chip is about 40ps and the clock tree is carefully built to ensure an acceptable clock skew. 3.4 Measured results The die photo of the quadrature DDS RFIC is shown in Fig. 3.7. This DDS design is quite compact with an active area of 2.3x2.5mm2 and a total die area of 3 x 3 mm2. The DDS MMIC was packaged in a 48-pin ceramic leadless package. The test board was built using Rogers RO4003 laminate board, which has a loss tangent of less than 0.003 and good temperature stability. To convert the single-ended signal to di erential clock inputs, a 180 degree 3dB hybrid coupler is employed at the clock input. For the di erential outputs, a 49 second hybrid coupler is inserted into the output path to covert them into single-end for testing. Fig. 3.8 illustrates the test setup. We rst tested the output of a single-phase DDS RFIC. In a separate design, we have implemented a single-phase DDS RFIC that was tested at a maximum clock frequency of 11GHz with a power consumption of 1.9W. At Nyquist rate, the single phase DDS can operate at a maximum clock frequency of 9.6GHz, which corresponds to the record high power e ciency FOM of 5.1GHz/W [30]. Fig. 3.9 illustrates the measured single-phase DDS output spectrum with 2.227GHz output and 9.07GHz clock. The measured output power is approximately -16dBm. All measurements were done without calibrating the losses of the cables and PCB tracks. Figures 10-13 illustrate the measured quadrature DDS output spectra and waveforms for di erent outputs and clock frequencies. Fig. 3.10 presents the 0.397GHz quadrature DDS output spectrum with a 5.44GHz clock input. The measured output power is ap- proximately -4.67dBm. Fig. 3.11 demonstrates the highest operational frequency of the quadrature DDS at 6.815GHz with close to Nyquist output of 3.394. The measured SFDR of the device, at a 3.394 GHz output frequency with a 6.815 GHz clock, is around 30dBc. For Fig. 3.11, the FCW is chosen as 28-1, which is the maximum allowed by an 8-bit FCW input. Thus, the output frequency is set at . The rst order image tone mixed by the clock frequency and the DDS output frequency occurs at 6.8GHz-3.387GHz = 3.421GHz, which is 27MHz apart from the output frequency, as shown in Fig. 3.11. Operating the DDS at close to Nyquist rate makes it very hard to lter out the image tones. Practically, the DDS output frequency is restricted to be less than 3/8 of the clock frequency. 50 Figure 3.7: Die photo of the quadrature DDS MMIC 50 O T e m i n a t o r S i g n a l G e n e r a t o r 3 d B H y b r i d c o u p l e r Q D D S T e s t B o a r d 3 d B H y b r i d c o u p l e r S p e c t r u m A n a l y z e r 50 O T e m i n a t o r D C B l o c k D C B l o c k O s c i l l o s c o p e S i n O u t C o s O u t D C B l o c k D C B l o c k C L K 3 d B H y b r i d c o u p l e r Figure 3.8: Test setup of the quadrature DDS RFIC 51 Figure 3.9: Measured single phase DDS output spectrum with clock at 9.07 GHz and output at 2.227GHz 52 Figure 3.10: Measured quadrature phase DDS output spectrum with clock at 5.44 GHz and output at 0.397GHz 53 Fig. 3.12 illustrates the measured output waveform of the quadrature DDS outputs at 389 MHz with a 6.2GHz clock. The time-domain waveform measurements were limited by the digital sampling scope?s 500 MHz bandwidth. Using a 6GS/s sampling digital scope, Fig. 3.13 provides the output waveforms of the quadrature DDS RFIC with a 6.3GHz clock input frequency and 1.58GHz output. The measured I/Q waveforms demonstrate the 90 degree phase di erence for the quadrature DDS RFIC outputs. Table 3.2 provides a performance comparison of the recently published microwave band DDS designs. The designs reported in [21] and [22], used InP technologies with an ft/fmax above 300/300 GHz, which is almost triple those reported here. The InP DDS[9] employs with an 8-bit accumulator and an 8-bit DAC and operates at a maximum clock frequency of 9.2 GHz with a power consumption of 15W. On the other hand, the single phase SiGe DDS [30] that has an 8-bit DAC and a 9-bit accumulator consumes only 1.9W with 3.3V and 4V power supplies for digital and analog circuits, respectively. For this design, the digital portion of the DAC consumes 300mA and the accumulator consumes 250mA under 3.3V. The analog portion of the DAC consumes 35mA using a 4.0V supply voltage. This is the rst mm-wave quadrature DDS design reported so far [34]. The quadrature DDS RFIC contains more than 13500 active devices with quite compact die size. The active area of the quadrature DDS RFIC is about 2.3x0.7mm2 and its total die area is 3 x 3 mm2. When compared with other single-phase mm-wave DDSs [9, 21], this design achieves similar SFDR performance. It?s more complex, yet more compact and has lower power, as shown in Table 3.2. The minimum size of the InP transistor is much larger than that of the SiGe transistor. Although the current density needed to achieve peak fT frequency in InP and SiGe technologies are similar, the current required to operate the minimum size SiGe 54 Figure 3.11: Measured qaudrature phase DDS output spectrum with clock at 6.815 GHz and output at 3.394GHz transistor is only one third of the current of InP transistor. It is for this reason that the SiGe DDS leads to a superior power e ciency performance. 3.5 Conclusion In this chapter, a 9-bit, 6.2GHz low power quadrature DDS has been implemented in a 0.18 m SiGe BiCMOS technology. With a 9-bit pipeline accumulator and two 8-bit sine- weighted current steering DACs, this DDS is capable of generating quadrature sinusoidal waveforms up to 3.15GHz with a maximum clock frequency of 6.2GHz. Packed more than 13500 transistors, the quadrature DDS occupies an active area of 2.3x2.5mm2 and a total 55 Figure 3.12: Measured DDS output waveforms without deglitch lter at 0.389GHz with clock at 6.2GHz Figure 3.13: Measured quadrature DDS output waveforms at 1.58GHz with clock at 6.3GHz 56 Technology InP InP InP SiGe SiGe fT/fmax [GHz] 137/267 300/300 300/300 120/100 120/100 [9] [21] [22] [30] [34] Emitter area of min size npn [mm2] 1.5x4 0.4x2 0.4x2 0.2x0.64 0.2x0.64 Peak fT current of min size npn [mA] 7.2 4 4 0.77 0.77 Quadrature phase outputs No No No No Yes Accumulator size [Bit] 8 8 8 9 9 DAC resolution [Bit] 7 7 5 8 8 Max clock fre- quency [GHz] 9.2 13 32 12 6.8 SFDR [dBc] 30 26.67 21.56 30 26 Power consump- tion [W] 15 5.42 9.45 1.9 2.5 Transistors num- ber 3000 1646 1891 9600 13500 Die size [mm2] 8x5 2.7x1.45 2.7x1.45 2.3x0.7 2.3x2.5 Power e ciency [Phase-GHz/W] 0.5 2.4 3.386 6.3 5.44 Area e ciency [DAC-Bit/mm2] 0.175 1.79 1.28 4.97 w. VCO 2.78 w. VCO Tested prototypes wafer wafer wafer packaged packaged Table 3.2: ULTRA-HIGH SPEED DDS PERFORMANCE COMPARISON. THIS WORK IS QUOTED FOR SINGLE-PHASE / QUADRATURE-PHASE DDS DESIGNS 57 die area of 3.0x3.0mm2. The measured SFDR is about 26dBc at a clock frequency 6.2GHz. At the maximum clock frequency, the power consumption of the DDS is 2.5W with 3.3V and 4.0V power supplies for the digital and analog parts, respectively. The DDS thus achieves a power e ciency gure of merit (FOM) of 5.04GHz/W/Phase. The DDS chips were packaged with 48-pin ceramic LCC carriers and air cooling was used during the measurement. 58 Chapter 4 QUADRATURE PHASES SIGE DDS WITH UP-CONVERSION 4.1 Introduction In the next generation radar system, there are emerging trends toward digitization in radar receiver designs by applying direct intermediate frequency-to-digital conversion (IF sampling) and direct digital synthesis (DDS). The digital radar receivers can obtain much higher precision, low noise, low power and better stability than analog counterparts. Moreover, it can retain the exibility of digital techniques such as direct digital modulation and waveform generation. A DDS generates a digitized waveform of a given frequency by accumulating phase changes at a higher clock frequency. Microwave range DDS has been developed in both InP and SiGe technologies [9, 21, 22] with output frequency up to 10GHz. It?s highly desirable to develop frequency synthesis means for X/Ku-band applications. By mixing the outputs of a quadrature DDS (QDDS) and a quadrature VCO, X/Ku-band waveform generation can be achieved. 4.2 Architecture and circuit design The conceptual diagram of the frequency synthesizer is shown in Fig. 4.1. The quadra- ture outputs from the local oscillator are mixed with the outputs of a quadrature DDS and the mixers outputs are summed and subtracted with each other, so the up-converted and down-converted sine waveforms are derived [6]. The local oscillator generates quadrature outputs with relatively xed output frequency 0, which are mixed with the outputs of a quadrature DDS. Then the mixer outputs are summed and subtracted with each other, so 59 S I N C O S Q u a d r a t u r e V C O - + + + 0 ? 90 ? D o w n C o n v e r t O u t p u t U p C o n v e r t O u tp u t Q u a d r a tu r e DDS ? ? BPF BPF C L K F C W Figure 4.1: Concept diagram of the frequency synthesizer the up-converted cosine waveforms with a frequency of 0+ or 0- are derived. Assuming the local oscillator frequency is higher than the output frequency of the quadrature DDS, the above mixing scheme can be used to up convert the DDS output frequency to a higher frequency band. Theoretically the output should be clean of alias images. However, in practice the DDS output contains harmonics and spurs that signi cantly deteriorate the purity of desired output waveforms. The imperfections of the mixers due to leakage and second order e ects will introduce spurs and harmonics to the output signals. In the multiple GHz DDS design, this will be more complicated due to the impact post by large scale circuit and huge power dissipation. The frequency synthesizer contains three major parts, the quadrature DDS, quadrature VCO and mixers. DDS can provide quadrature signals with accurate I/Q matching, as shown in Fig. 4.2. The quadrature DDS is formed by merging two sine-weighted current steering DACs and a 9-bit pipeline accumulator. The nonlinear DAC approach is still attractive for the microwave DDS design because it provides drastically speed improvement 60 V R E F I mIp V R E F Q p V C C V T U N E V T U N E Q m Q p Q m I p Im A c c u m u l a t o r M I X E R V C O V C O / 2 S i n e W e i g h t e d D A C S i n e W e i g h t e d D A C C l o c k B u f f e r 11 . 7 G H z 5 . 85 G H z F C W C L K Figure 4.2: Block diagram with the circuit of quadrature VCO to the ROM based or algorithm based DDS design. To reduce the e ect of the amplitude error introduced spurs in an ultra high speed DDS needs to be taking into account during the design. The phase truncation error introduced spurs have already minimized because only one bit of the phase accumulator output has been truncated. The input frequency control word (FCW) speci es the output frequency of the quadra- ture DDS. The output of the quadrature VCO is tuned to 11.7GHz and is also divided by 2 to generate 5.85GHz for potential use as the DDS clock. The quadrature VCO design adopts a standard cross-coupled LC-VCO topology. The center tapped inductor in the LC-tank has been replaced by 4 transmission line inductors to facilitate a symmetrical and compact layout. However, the Q factor is relatively lower than typical spiral inductors, 61 which needs to be accounted for in the design. To reduce the losses of the inductors, thick analog metals are used for the connections between the transmission line inductors. To produce the 90 degree phase word, binary number of ?01? need to be added to the two most signi cant bits of the DAC input. Translating the add function into gate level, the output of the MSB is the results of an Exclusive-OR (XOR) of the rst two MSB inputs and the output of the 2nd MSB is the inversion of the 2nd MSB input. Because all the digital logics have di erential outputs, only one XOR gate is needed to be inserted at the inputs of the sine-weighted DAC to converter it to DAC with 90 degree output phase di erence. The essential building block of the nonlinear DAC is the sine weighted current source matrix. The smallest unit current of each current source is 0.1mA, which should provide the current switches with enough switching speed when toggling. The largest current in the current source is 0.7mA, which is composed of 7 identical current sources. The cur- rent switch contains two di erential pairs, with minimal sized transistors, and a cascade transistor, to isolate the current sources from the switches, and improve the bandwidth of the entire group of switching circuits. In this ultra-high speed DDS design, the ROM- less structure with two nonlinear current steering DACs is employed, and the sine/cosine mapping function is performed by a sine-weighted DAC instead of using the traditional ROM-based sine waveform look-up-table. By eliminating the ROM, speed of the DDS is improved and the power consumption is reduced. This quadrature DDS comprises a 9-bit pipeline accumulator and two 8-bit sine-weighted current-steering DACs. To produce the 90 degree phase, an XOR gate is inserted into the inputs of one sine-weighted DAC. Since the out frequency cannot exceed the Nyquist rate, an 8-bit frequency control word (FCW) is fed into a 9-bit pipeline accumulator with the MSB of the accumulator input tied to zero. 62 V C C R E F R E F R E F R E F R E F S I N p S I N m LO _ 90 p LO _ 90 m M I X C O S p M I X C O S m R E F M I X C O S p M IX C O S mM I X S I N p M I X S I N m D O W N mD O W N p R E F M I X C O S p M IX C O S mM I X S I N p M I X S I N m UP m UP p V C C R E F R E F R E F R E FR E F C O S p LO _ 0 p LO _ 0 m M I X S I N p M I X S I N m C O S m Figure 4.3: Circuits of up-convert and down-convert mixers The LSB of the 9-bit phase word is truncated, and its MSB is used to provide the proper mirroring of the sine waveform about the phase point. Its 2nd MSB is used to invert the remaining 6-bits for the 2nd and 4th quadrants of the sine wave prior to the decoding logic. The outputs of 3:8 column-row decoders go to the switch matrix to control the switches in each DAC cell. The latch and switch matrices contain 64 cells. The implemented ultra-high speed DDS presents the rst mm-wave quadrature DDS design reported so far. When compared with other single-phase mm-wave DDSs [9, 21, 22], it?s more complex, yet more compact and has lower power, as shown in Table 1. The minimum size of the InP transistor is much larger than that of the SiGe transistor. Although the current density needed to achieve peak fT frequency in InP and SiGe technologies are similar, the current required to operate the minimum size SiGe transistor is much less. It is for this reason that the SiGe DDS leads to a superior power e ciency performance. 63 Technology InP InP SiGe fT/fmax [GHz] 137/267 300/300 120/100 [9] [21] [30, 34] Emitter area of min npn [mm2] 1.5x4 0.4x2 0.2x0.64 Current density at peak fT [mA/mm2] 1 1.2 5 6 Peak fT current of min npn [mA] 7.2 4 0.77 Break down voltage Bvceo [V] 8 4 1.8 Accumulator size [bit] 8 8 9 DAC resolution [bit] 7 7 8 Max clock frequency [GHz] 9.2 13 9.6/6.3 SFDR [dBc] 30 26.67 30/26 Power consumption [W] 15 5.42 1.9/2.5 Number of Transistors 3000 1646 9600 /13500 Die size [mm2] 8x5 2.7x1.45 2.3x0.7 /2.3x2.5 FOM[GHz/W/Phase] 0.5 2.4 5.1/5.04 Table 4.1: ULTRA-HIGH SPEED DDS PERFORMANCE COMPARISON. THIS WORK IS QUOTED FOR SINGLE-PHASE / QUADRATURE-PHASE. To shorten the connections to the mixers and make the layout as symmetrical as pos- sible, the mixers are placed in the center of the chip, and two VCOs and two sine-weighted DACs are placed at the opposite sides of the mixers. The die photo of the frequency synthesizer is shown in Fig. 4.4. The active area is approximately 2.5x2.5mm2. 4.3 Measured results The test is performed on ceramic leadless free packaged chips. The test board was built using Rogers RO4003 laminate board, which has a loss tangent of less than 0.003 and good temperature stability. To convert the single-ended signal to di erential clock inputs, a 180 degree 3dB hybrid coupler is employed at the clock input. For the di erential outputs, a second hybrid coupler is inserted into the output path to covert them into single-end for testing. To ensure the chips working in the safe range, external air cooling is used. The 64 Figure 4.4: Frequency synthesizer die photo 65 Figure 4.5: Measured 37MHz output waveforms with a 6.4GHz QDDS measured I/Q waveforms with a digital oscilloscope con rm the 90 degree phase di erence of the outputs of the quadrature DDS, as shown in Fig. 4.5. The measured amplitude imbalance is 5% and the phase imbalance is 2 degree. The spectra of the frequency synthesizer outputs shown in Fig. 4.6,4.7,4.8 are taken at the down-convert output side without calibrate the attenuation. Fig. 4.6 is the output spectrum of 11.7GHz VCO output and 4.6GHz DDS clock input when DDS has been turned o . The leaked clock power to the mixer output is -50dBm and the power of leaked local oscillator is -41dBm. The spur at 5.85GHz is purely due to the leakage of the divide-by-2 output of the local oscillator, which contains built in divider for test purpose. The divided output of the local oscillator is attenuated by 27dB. 66 LO C L K LO / 2 Figure 4.6: Measured output spectra of 4.6GHz QDDS clock input and 11.7GHz LO output 67 C L K Q D D S O U T Figure 4.7: Measured output spectra of 4.6GHz QDDS clock input and 2.3GHz QDDS output Fig. 4.7 shows the Nyquist output spectrum of the DDS with a 4.6GHz clock input when the local oscillator has been turned o . The output of the DDS is located close to 2.31GHz. Since the measurement is taken at the mixer output side, the DDS output power has been signi cantly reduced. The output power of the quadrature DDS with single output is approximately -53.75dBm. In Fig. 4.8, the local oscillator and DDS are switched on and the output spectra of mixed outputs are shown. The frequency of the down-converted signal is 9.4GHz with a power of -35.24dBm and the up-converted 14.0GHz signal can also be noticed which has a power of -46dBm. During the measurement, one of the quadrature outputs of the quadrature VCO shows a strong distortion. One of the reason cause it can be explained as 68 LO D O W N C O N V E R T O U T Q D D S O U T C L K Figure 4.8: Measured output down-converted 9.4GHz output 69 the interconnection wires connecting the transmission line inductors used in the LC tanks are possibly induce inductive peaking and drive some of the transistors into saturation. The single side band suppression has been also a ect by the imbalance of the integrated quadrature VCO, which will be improved in the next version. 4.4 Conclusion In this chapter, an X/Ku-band ne-tuning frequency synthesizer using a quadrature DDS has been implemented in a 0.18m SiGe BiCMOS technology. The frequency synthesizer comprises a 9-bit quadrature DDS, an 11.7GHz quadrature VCO and image rejection mixers. The outputs of the quadrature DDS are down-converted to 9.4 11.7GHz and up-converted to 11.7 14.0GHz, respectively. The die area of the synthesizer is 3.0x3.0mm2 and the power consumption is 2.6W under a 3.3V supply. The chip is measured with a 48-pin leadless free ceramic package and external cooling. 70 Chapter 5 RING OSCILLATOR BASED PERIODICAL WAVEFORM GENERATOR 5.1 Introduction Arbitrary periodical waveform generation is highly desirable for certain wireless and radar applications. To generate arbitrary waveforms over several GHz usually requires a power consuming direct digital synthesizer (DDS) [4, 22]. A DDS for waveform generation is normally comprised of a digital-to-analog converter (DAC) to convert the digital amplitude inputs to analog amplitude outputs and digital controlling circuits to retrieve data stored in a RAM and pass the data to the DAC inputs. The amplitude data can also be directly calculated by a signal processor. However, running the signal processor or RAM controller at the same refresh rate as the input clock becomes extremely di cult due to restrictions of power consumption and implementation complexity. To overcome the di culty of feeding digital amplitude signals at multiple GHz to the DAC, the designers are forced to adopt some types of simpli cations. In [17], an accumulator has been applied to the input of a 6b DAC to generate a ramp waveform for test purposes. With an on-chip programmable ROM, ref [1] demonstrates the capability of synthesizing the speci c waveform for ultra wide band (UWB) transmission. However, the distributive nature of millimeter wave signal has been overlooked in [17] and [1]. In [35], a structure similar to an equalizer has been utilized to combine multiple narrow pulses trigged by a delayed clock signal to form the envelope of the objective waveforms. The works mentioned above have provided certain insight to the problem of high speed waveform generation and intrigued our interests to seek alternative possible solutions. One goal is to simplify the structure so the power and area requirement 71 t y a 1 x 1 p 2 p Figure 5.1: One cycle of the waveform with constant sampling step can be minimized. The fact that in a ring oscillator multiple phase clock outputs are automatically generated can be explored further. By combining a programmable DAC and a ring oscillator topology, a more compact waveform generator is formed [32]. 5.2 Waveform generator architectures The principle of the periodic waveform generation can be explained with basic sampling theory. As shown in Fig. 5.1, one cycle of periodic waveform can be expressed as f(t) = nX k=0 akxk(u(t) u(kT));nT t (n+ 1)T (5.1) where T is the period of sampling function and NT is the period of the waveform, u(t) is unit step function, and akxk is the incremental value of each step. ak is the weight coe cient and xk is the binary value. Assume that akxk does not depends explicitly on time, so the value of the sampled waveform value can be written as yn = Pnk=0akxk . By accumulating akxk with a period of T, an arbitrary waveform can be synthesized by substituting the values of setfakxkg. The waveform generator is conventionally implemented using a global sampling clock and a phase accumulator. Considering the multiple phases synthesizing 72 D e l a y C e l l D e l a y C e l l D e l a y C e l l D e l a y C e l l D e l a y C e l l D e l a y C e l l D e l a y C e l l D e l a y C e l l Delay Cell Delay Cell Delay Cell Delay Cell D e l a y C e l l D e l a y C e l l D e l a y C e l l D e l a y C e l l I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I8I 9 I10I11 I 1 2 I 1 3 I 1 4 I 1 5 I O U T Figure 5.2: Block diagram of the ring oscillator waveform generator capability of a ring oscillator, it?s possible to localizing the global sampling clock. The outputs of the ring oscillator with di erent phase delay provide proper time stamps for sampling purpose. In this work, the structure of the waveform generator that eliminates the external clock and phase accumulator has been implemented. As shown in Fig. 5.2, the waveform generator contains two major parts: a 16-stage ring oscillator with 16 current switch cells, and 16 programmable current sources. A 64-bit shift register chain is designed to store the current weight data that sets the current for each of the current switch cells. The clock 73 distribution network in a conventional DDS is replaced by a 16-stage ring oscillator, which naturally provides multiphase clocks with the exact sample time stamps. The clock signals propagate along the stages and can be directly employed to switch the weighed currents one by one at the desired sample time points. The currents that ow through the current switches have weights of , which are stored in the shift register. Then the current outputs are summed by connecting all the output nodes of the ring oscillator stages together. In the ring oscillator, the rising and falling edges of the clock propagate along the chain of the delay bu ers. The half period of the ring oscillator is the total number of delay cells multiplied by the propagation delay of a single cell. Referring to Fig. 5.2, the 16-stage delay cell based ring oscillator is located in the center of the chip, and the outputs of the ring oscillator drives 16 identical current switch cells. The outside is a 64-stage shift register chain with an external clock and a serial data input. The data carrying the amplitude weight information is shifted in and stored in the register. 5.3 Circuits of the waveform generator One issue associated with the typical DDS is distributing the clock signal uniformly over the whole chip at several GHz frequency ranges [22], which puts a stringent requirement for the driving capabilities of the clock bu ers and the time delay between the di erent branches of the clock tree. The power consumed by the clock distribution system is counted more than 20% in a circuit with lots of sequential logic gates. Eliminating the high speed clock distribution circuit in a periodic waveform generator can greatly reduce the power and save the area. 74 V R E F O U T mO U T p D 3 D 2 D 1 x 4 x 2 x 1 V C A S D 4 IN Figure 5.3: Simpli ed circuit of current switch with 3-bit+sign programmable current source The ring oscillator is composed of cascaded multiple stages of the CML delay bu er, which delay can be controlled by the bias current. The outputs of each stage of the delay bu er are fed into the current switches, which can be turned on or turned o according to the propagation of signals inside the ring oscillator. The programmable weighted currents are switched to the positive or negative current output depending on the status of the current switches. The current outputs are summed up and converted to the voltage outputs with a pair of pull up resistors. Since only one current source is switched at one time, the output waveform slope is limited to the ratio of current of the single current source to the output load capacitance. Fig. 5.3 gives the simpli ed circuit of the current switch cell, which is composed of the switch driver and weighted current source. The lower 3 digital bits, D3 to D1 control the current magnitude of the current source and the highest digital bit, D4 selects the polarity of the current source by a XOR gate. Thus, positive and negative values can be chosen 75 by switching the polarity of the current source. The current source cell is preset by the input digital bits. The 3 programmable digital bits turn on the gate driving voltages of the NMOS current mirrors and for a 3-bit input, 3-bit resolution of the current source can be attained. The ring oscillator is composed of cascaded stages of CML delay bu ers, whose delay can be controlled by the bias current. The outputs of each stage of the delay bu er are fed into the current switches, which turn on or turn o according to the propagation of signals inside the ring oscillator. The programmable weighted currents are switched to the positive or negative current output depending on the status of the current switches. The current outputs are summed and converted to voltage outputs with a pair of pull up resistors. Since only one current source is switched at a time, the output waveform slope is limited to the ratio of the current of a single current source to the output load capacitance. One issue associated with the typical DDS is distributing the clock signal uniformly over the whole chip at multi-GHz frequencies [22], which puts a stringent requirement or the driving capabilities of the clock bu ers and the time delay between the di erent branches of the clock tree. The power consumed by the clock distribution system represents more than 20% in a circuit with many sequential logic gates. Eliminating the high speed clock distribution circuit in a periodic waveform generator can greatly reduce the power and save the area. The layout of the waveform generator adopts a oor plan similar to the conceptual dia- gram shown in Fig. 5.2. This can make the layout much more symmetrical and the delay in the critical path can be well controlled. To reduce bonding wire inductance e ects, multiple pads are used for the output pins. Without the wire bounding pads, the active core area 76 Figure 5.4: Die photo of the waveform generator of the waveform generator chip is approximately 0.6x0.6mm2 and in which approximately 40% of the area is occupied by decoupling capacitors. Including the pads, the total chip area is 1.0x1.0 mm2, as shown in Fig. 5.4. 5.4 Experiment results To verify functionality of the waveform generator, simulated results are provided as well as measured results. The simulated output waveforms are shown in Fig. 5.5 to Fig. 5.7. In Fig. 5.5, during the data is serially loaded into the shift register chain, and the outputs stabilize after all 64 bits have been loaded. Fig.6 gives the simulated results for synthesizes sine waveform. Fig. 5.7 shows the output waveform as the output changes from one type to another by changing the stored data of the shift registers. The outputs of the periodic waveform generator have been directly captured with a 6GHz bandwidth digital oscilloscope. Fig. 5.8 to Fig.5.10 present the measured waveforms 77 Figure 5.5: Simulated output waveform during data loading Figure 5.6: Simulated output sine waveform 78 Figure 5.7: Simulated output waveform during transition Figure 5.8: Measured output waveform during data loading 79 Figure 5.9: Measured synthesized arbitrary waveform Figure 5.10: Measured synthesized arbitrary waveform 80 created by the circuit. Fig. 5.8 shows the waveform during data loading to the shift register chain. The output of the waveform generator experiences transients due to the changing of the current weight data. After 64ns, data loading is nished, and the waveform generator?s output becomes stable. The output swing is 300mV peak to peak on a 15 load and the output frequency is 3.0GHz. Fig. 5.9 shows the measured synthesized waveform with a frequency of 2.887GHz. Fig. 5.10 shows a generated arbitrary waveform with a frequency of 2.867GHz. The slight di erent output frequencies of the synthesized waveform re ect the load change with di erent weight current sources. Due to the input bandwidth limitation of the sampling oscilloscope, the spectral components outside of the 6GHz input bandwidth are ltered. 5.5 Conclusion In this chapter, a periodic arbitrary waveform generator based on a ring oscillator structure has been implemented in a 0.13 m SiGe BiCMOS technology. Using 16 delay stages with control programmable weighted currents, the proposed waveform generator can output 3GHz periodic waveforms. The total power consumption is less than 200mW with a 2.2V power supply. The total area of the SiGe chip is 1.0mm2. 81 Chapter 6 SUMMARY AND FUTURE WORKS 6.1 Summary of the works This dissertation presents detailed design procedure of high speed direct digital fre- quency synthesizer. The main target is to achieve microwave range speed performance as well as keep moderate power consumption. With the advanced SiGe technology, the output frequency of DDS can be over multiple GHz. Choosing the right process and architecture for a DDS among di erent candidates need to specify the interested application. The ROM based DDS can be optimized in many ways, by implement all kinds of ROM compression techniques, the size of ROM can be reduced signi cantly. The ROM-Less approach by eliminating the ROM, a speed bottleneck in DDS design, is suitable for application the require extreme performance. Since there are many ways to implement the mapping from phase to sine amplitude, selecting the appropriate mapping block worths consideration. Using non-linear sine weighted DAC is a straight forward solution for the mapping function. However, its impacts on the whole DDS performance need to be carefully evaluated. The non-linear DAC brings in the system additional spurs to the nal DDS output spectra. The major problem is the non-ideal nature of a non-linear DAC is more noticeable than that of a linear DAC. The detailed analysis shows that the DAC associated spurs coming from two major sources. One is the static performance of a DAC, such as DNL or INL. The other is the dynamic performance of a DAC, which is input code dependent and clock frequency dependent. To make the who situation more complicate, the noise coming 82 from digital block will also inject into the substrate and power supply. To suppress the crosstalk and power and ground bouncing, more unknown variables need to be taken into account. Some of the above mention issues will be investigated in this dissertation and the design ow will also address some practical problems encounter The single phase high speed DDS rst has been designed and the quadrature version also has been developed. Later on, the DDS with I/Q outputs and internal mixers has been shown. Generally speaking, the output signals can be up-converted to even higher frequency band. Another approach is to make the waveform generator more compact to suit for on chip test. Thus a waveform generator based on ring oscillator structure has been made. Without external clock, this waveform generator can be useful for certain application. 6.2 Future works The main di culty comes from good model to accurately re ect the real natures of a working DDS. Though there are some theoretical works appeared in this eld, there is still something to be desired. Especially, to predicate the dynamic performance of the DAC and the DDS with good accuracy, up to now hardly anyone can give satis ed results. The following works are intend to make further study of modeling of the DDS, provide some new thoughts into this problem and hopefully nd alternative solution to answer the questions when designing a DDS. 83 Bibliography [1] D. Baranauskas and D. Zelenin, \A 0.36w 6b up to 20gs/s dac for uwb wave forma- tion," in Proc. Digest of Technical Papers. IEEE International Solid-State Circuits Conference ISSCC 2006, 6{9 Feb. 2006, pp. 2380{2389. [2] B. Bjerede, \Suppression of spurious frequency components in direct digital frequency synthesizer," Patent, Dec. 17, 1991, uS Patent 5,073,869. [3] K. Chu and D. Pulfrey, \Design procedures for di erential cascode voltage switch circuits," Solid-State Circuits, IEEE Journal of, vol. 21, no. 6, pp. 1082{1087, 1986. [4] L. Cordesses, \Direct digital synthesis: a tool for periodic wave generation (part 1)," Signal Processing Magazine, IEEE, vol. 21, no. 4, pp. 50{54, July 2004. [5] A. Corry and R. Sutherland, \Direct digital frequency synthesizer using sigma-delta techniques," Patent, Oct. 8, 1996, uS Patent 5,563,535. [6] R. Cushing, \Single-sideband upconversion of quadrature dds signals to the 800-to- 2500-mhz band," Analog Dialogue, pp. 34{3, 2000. [7] L. Dadda and V. Piuri, \Pipelined adders," Computers, IEEE Transactions on, vol. 45, no. 3, pp. 348{356, 1996. [8] K. R. Elliott, \Direct digital synthesis for enabling next generation rf systems," in Proc. IEEE Compound Semiconductor Integrated Circuit Symposium CSIC ?05, 30 Oct.{2 Nov. 2005, p. 4pp. [9] A. Gutierrez-Aitken, J. Matsui, E. N. Kaneshiro, B. K. Oyama, D. Sawdai, A. K. Oki, and D. C. Streit, \Ultrahigh-speed direct digital synthesizer using inp dhbt technology," Solid-State Circuits, IEEE Journal of, vol. 37, no. 9, pp. 1115{1119, Sep 2002. [10] C. Kang and E. Swartzlander Jr, \Digit-pipelined direct digital frequency synthesis based on di erential cordic," Circuits and Systems I: Regular Papers, IEEE Transac- tions on [see also Circuits and Systems I: Fundamental Theory and Applications, IEEE Transactions on], vol. 53, no. 5, pp. 1035{1044, 2006. [11] F. Lu, H. Samueli, J. Yuan, and C. Svensson, \A 700mhz 24-b pipelined accumulator in 1.2- m cmos for application as a numerically controlled oscillator," IEEE Journal of Solid-State Circuits, vol. 28, no. 8, pp. 878{886, 1993. [12] R. Meyer, W. Sansen, and S. Peeters, \The di erential pair as a triangle-sine wave converter," Solid-State Circuits, IEEE Journal of, vol. 11, no. 3, pp. 418{420, 1976. 84 [13] S. Mortezapour and E. K. F. Lee, \Design of low-power rom-less direct digital frequency synthesizer using nonlinear digital-to-analog converter," Solid-State Circuits, IEEE Journal of, vol. 34, no. 10, pp. 1350{1359, Oct. 1999. [14] T. Nakagawa and H. Nosaka, \A direct digital synthesizer with interpolation circuits," Solid-State Circuits, IEEE Journal of, vol. 32, no. 5, pp. 766{770, May 1997. [15] Y. Nakamura, T. Miki, A. Maeda, H. Kondoh, and N. Yazawa, \A 10-b 70-ms/s cmos d/a converter," Solid-State Circuits, IEEE Journal of, vol. 26, no. 4, pp. 637{642, April 1991. [16] I. Nicholas, H. T. and H. Samueli, \A 150-mhz direct digital frequency synthesizer in 1.25-& cmos with -90-dbc spurious performance," Solid-State Circuits, IEEE Journal of, vol. 26, no. 12, pp. 1959{1969, Dec. 1991. [17] P. Schvan, D. Pollex, and T. Bellingrath, \A 22gs/s 6b dac with integrated digital ramp generator," in Proc. Digest of Technical Papers Solid-State Circuits Conference ISSCC. 2005 IEEE International, 10{10 Feb. 2005, pp. 122{588. [18] A. M. Sodagar and G. Roientan Lahiji, \Mapping from phase to sine-amplitude in di- rect digital frequency synthesizers using parabolic approximation," IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 47, no. 12, pp. 1452{1457, Dec. 2000. [19] D. Sunderland, R. Strauch, S. Whar eld, H. Peterson, and C. Cole, \Cmos/sos fre- quency synthesizer lsi circuit for spread spectrum communications," Solid-State Cir- cuits, IEEE Journal of, vol. 19, no. 4, pp. 497{506, 1984. [20] J. Tierney, C. Rader, and B. Gold, \A digital frequency synthesizer," Audio and Elec- troacoustics, IEEE Transactions on, vol. 19, no. 1, pp. 48{57, 1971. [21] S. E. Turner and D. E. Kotecki, \Direct digital synthesizer with rom-less architecture at 13-ghz clock frequency in inp dhbt technology," Microwave and Wireless Components Letters, IEEE, vol. 16, no. 5, pp. 296{298, May 2006. [22] S. Turner and D. Kotecki, \Direct digital synthesizer with sine-weighted dac at 32-ghz clock frequency in inp dhbt technology," Solid-State Circuits, IEEE Journal of, vol. 41, no. 10, p. 2284, 2006. [23] A. Van den Bosch, M. Steyaert, and W. Sansen, \Sfdr-bandwidth limitations for high speed high resolution current steering cmos d/a converters," Electronics, Circuits and Systems, 1999. Proceedings of ICECS ?99. The 6th IEEE International Conference on, vol. 3, pp. 1193{1196 vol.3, 1999. [24] J. Vandenbussche, G. Van der Plas, A. Van den Bosch, W. Daems, G. Gielen, M. Steyaert, and W. Sansen, \A 14 b 150 msample/s update rate q2 random walk 85 cmos dac," Solid-State Circuits Conference, 1999. Digest of Technical Papers. ISSCC. 1999 IEEE International, pp. 146{147, 1999. [25] J. Volder, \The cordic trigonometric computing technique," IRE Trans. Electron. Com- put, vol. 8, no. 3, pp. 330{334, 1959. [26] P. Vorenkamp, J. Verdaasdonk, R. van de Plassche, and D. Sche er, \A 1 gs/s, 10b digital-to-analog converter," Feb 1994, pp. 52{53. [27] M. YAMASHINA and H. YAMADA, \An mos current mode logic (mcml) circuit for low-power sub-ghz processors," IEICE TRANSACTIONS on Electronics, vol. 75, no. 10, pp. 1181{1187, 1992. [28] X. Yu, F. F. Dai, J. D. Irwin, and R. C. Jaeger, \A 12 ghz 1.9 w direct digital syn- thesizer mmic implemented in 0.18 m sige bicmos technology," Solid-State Circuits, IEEE Journal of, vol. 43, no. 6, pp. 1384{1393, June 2008. [29] X. Yu, F. F. Dai, Y. Shi, and R. Zhu, \2 ghz 8-bit cmos rom-less direct digital frequency synthesizer," in Proc. IEEE International Symposium on Circuits and Systems ISCAS 2005, 23{26 May 2005, pp. 4397{4400. [30] X. Yu, F. F. Dai, D. Yang, V. Kakani, J. D. Irwin, and R. C. Jaeger, \A 9-bit 9.6ghz 1.9w direct digital synthesizer r c implemented in 0.18 m sige bicmos technology," in Proc. IEEE Radio Frequency Integrated Circuits (RFIC) Symposium, 3{5 June 2007, pp. 241{244. [31] X. Yu, F. F. Dai, J. David Irwin, and R. C. Jaeger, \A 9-bit quadrature direct digital synthesizer implemented in 0.18 m sige bicmos technology," Microwave Theory and Techniques, IEEE Transactions on, vol. 56, no. 5, pp. 1257{1266, May 2008. [32] X. Yu, F. F. Dai, D. J. Irwin, and R. C. Jaeger, \A 2.2v 200mw 3ghz ring oscillator based waveform," Silicon Monolithic Integrated Circuits in RF Systems, 2009. SiRF ?09. IEEE Topical Meeting on, pp. 1{4, Jan. 2009. [33] X. Yu, F. F. Dai, D. Yang, J. D. Irwin, and R. C. Jaeger, \An x/ku-band frequency synthesizer using a 9-bit quadrature dds," in Proc. IEEE Custom Integrated Circuits Conference CICC 2008, 21{24 Sept. 2008, pp. 491{494. [34] X. Yu, F. F. Dai, D. Yang, V. Kakani, J. D. Irwin, and R. C. Jaeger, \A 9-bit 6.3ghz 2.5w quadrature direct digital synthesizer mmic," in Proc. IEEE Symposium on VLSI Circuits, 14{16 June 2007, pp. 52{53. [35] Y. Zhu, J. Zuegel, J. Marciante, and H. Wu, \A 10 gs/s distributed waveform generator for sub-nanosecond pulse generation and modulation in 0.18> m standard digital cmos," in Radio Frequency Integrated Circuits (RFIC) Symposium, 2007 IEEE, 2007, pp. 35{ 38. 86