Integrated Circuit Design for Ultrahigh Speed Frequency Synthesis: Direct Digital Synthesizer and Variable Frequency Oscillator by Xueyang Geng A dissertation submitted to the Graduate Faculty of Auburn University in partial ful llment of the requirements for the Degree of Doctor of Philosophy Auburn, Alabama May 14, 2010 Keywords: direct digital synthesizer (DDS), ROM-less DDS, pipeline accumulator, digital-to-analog converter (DAC), sine-weighted DAC, carry-look-ahead (CLA), ripple carry adder, frequency modulation (FM), phase modulation (PM), voltage-contolled oscillator(VCO), qudrature current-controlled oscillator(QCCO) Copyright 2010 by Xueyang Geng Approved by: Fa Foster Dai, Chair, Professor of Electrical and Computer Engineering Guofu Niu, Alumni Professor of Electrical and Computer Engineering Richard C. Jaeger, Professor of Electrical and Computer Engineering Bogdan M. Wilamowski, Professor of Electrical and Computer Engineering Abstract This dissertation presents design and implementation of the high speed direct digi- tal frequency synthesizer (DDS) and variable-frequency oscillator (VFO). DDS is a digital technique for frequency synthesis, waveform generation, sensor excitation, and digital mod- ulation/demodulation in modern communication systems. The VFO can be used as the reference clock of the DDS system, either standalone or combined with other phase-locked- loop (PLL) components. DDS provides many advantages including ne frequency-tuning resolution, continuous- phase switching and accurate matched quadrature signals. DDS can directly generate and modulate signal at microwave frequencies. A high-speed DDS can be signi cantly simpli ed the transceiver architecture. Thus the cost of radio and radar systems can be reduced considerably. Ultrahigh speed DDS over GHz is demanding for modern radar and communication systems. This research proposes work on designing ultrahigh speed DDS chips with sine- weighted digital-to-analog converter (DAC) in Silicon Germanium (SiGe) BiCMOS technol- ogy and using a VFO as the reference clock. Sine-weighted DAC is necessary for ultrahigh speed DDS design to overcome the speed limitation of the ROM lookup table (LUT) in conventional DDS designs. The sine-weighted DAC replaces ROM LUT and linear DAC to perform the phase-to-amplitude conversion (PAC) as well as digital-to-analog conversion. A segmented sine-weighted DAC is designed and implemented to achieve 10-bit amplitude resolution. Due to the code dependent and frequency dependent non-ideal e ects from the sine- weighted DAC, the unwanted harmonics and spurs of the DDS outputs have more signi cant impacts on the whole systems. In this dissertation, the spurs and harmonics from di erent ii sources such as truncation errors, limited DAC amplitude resolutions and non-ideal e ects of DAC will be discussed. Four fabricated silicons are implemented in SiGe BiCMOS technology and discussed in the dissertation, including three DDSs and one VFO. The rst DDS is a 11-bit 8.6 GHz ROM-less DDS with 10-bit segmented sine-weighted DAC. The second one is a 9-bit 2.9 GHz ROM-less DDS with direct digital modulation capabilities. The last DDS is a 24-bit 5.0 GHz ROM-less DDS with direct digital modulation capabilities. Besides the DDS designs, an 8.7-13.8 GHz VFO, implemented by a transformer coupled current-controlled varactor-less oscillator with quadrature outputs, will be presented in this dissertation, too. Circuit and layout designs of DDS building blocks such as current mode logic (CML), pipeline accumu- lator, carry look-ahead adder/accumulator, ripple-carry adder/accumulator and segmented and non-segmented sine-weighted DAC are presented. The quadrature current-controlled oscillator (QCCO) is discussed as well as the design and implementation of the on-chip transformer. iii Acknowledgments It has been a great pleasure working with the faculty, sta , and students at the Electrical and Computer Engineering Department, Auburn University, during my tenure as a doctoral student. Completing this work is de nitely a high point in my academic career. I could not have come this far without the assistance of many individuals and I want to express my deepest appreciation to them. My rst and most earnest thanks go to my advisor, Dr. Fa Foster Dai, who guided and encouraged me throughout my studies. His advice and research attitude have provided me with a model for my entire future career. I wish to thank my advisory committee members, Dr. Guofu Niu, Dr. Richard C. Jaeger and Dr. Bogdan M. Wilamowski, for their guidance and advices on this work. Many thanks to Dr. Richard O. Chapman who served as my outside reader for providing valuable comments that improved the contents of this dissertation. I also wish to thank Dr. J. David Irwin for his valuable comments on my paper publishing and endless support on my Ph.D. study. I would like to express my appreciation and sincere thanks to Dr. Yin Shi, the advi- sor of my M.S. degree at Chinese Academy of Sciences. Without his encourage, help and recommendation, I would not pursue and complete my Ph.D. study. Appreciation is also expressed to those who have made contributions to my research. I am especially indebted to Desheng Ma, Yuan Yao, Wenting Deng, Dayu Yang, Vasanth Kakani, Xuefeng Yu, Jianjun Yu, Yuehai Jin, William Souder, Mark Ray, Joseph Cali, Michael Pukish and Jie Qin for their cooperation and continued assistance throughout the course of this research. iv My nal, and most heartfelt, acknowledgment must go to my family members, especially to my parents Xiuliang Geng and Jinglan Zhu, my wife Xueqin Lu, and my daughter Michelle Q. Geng, for their continual encouragement and support throughout this work. v Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 DDS Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Conventional DDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 ROM-less DDS with Sin-weighted DAC . . . . . . . . . . . . . . . . . 3 1.1.3 ROM-less DDS with Direct Digital Modulations . . . . . . . . . . . . 4 1.2 Direct Digital Synthesizer Used in Modern Radar Systems . . . . . . . . . . 6 1.3 DDS Spectral Purity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 Outline and Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2 Design and Analysis of Sine-weighted DAC . . . . . . . . . . . . . . . . . . . . . 14 2.1 Sine-weighted DAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Segmented Sine-Weighted DAC . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.1 Quantization and Segmentation of the Sine Wave . . . . . . . . . . . 15 2.2.2 Approximation Error Analysis . . . . . . . . . . . . . . . . . . . . . . 19 2.2.3 Optimizing the Segmentation . . . . . . . . . . . . . . . . . . . . . . 20 3 An 11-bit 8.6 GHz DDS RFIC with 10-bit Segmented Sine-weighted DAC . . . . 22 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 Circuit Implementation of the 11-bit Rom-less DDS . . . . . . . . . . . . . . 23 3.2.1 11-Bit Pipeline Accumulator . . . . . . . . . . . . . . . . . . . . . . . 24 vi 3.2.2 10-Bit Segmented Sine-Weighted DAC . . . . . . . . . . . . . . . . . 24 3.2.3 Clock Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4 A 9-bit 2.9 GHz DDS RFIC with Direct Digital Modulations . . . . . . . . . . . 45 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2 Circuit Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2.1 9-bit Carry Look Ahead Adder/Accumulator . . . . . . . . . . . . . . 46 4.2.2 7-bit Sine-weighted DAC . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5 A 24-bit 5.0 GHz DDS RFIC with Direct Digital Modulations . . . . . . . . . . 56 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.2 Ultrahigh Speed Adder Design . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.2.1 Wire Delay in the 0.13 m SiGe BiCMOS Technology . . . . . . . . . 57 5.2.2 Propogation Delay Comparison Between the CLA and RCA Accumu- lator/Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.2.3 Circuit Implementation of the 24-bit 5.0 GHz RCA . . . . . . . . . . 64 5.3 10-Bit Segmented Sine-weighted DAC . . . . . . . . . . . . . . . . . . . . . . 64 5.3.1 Architecture of the 10-bit Sine-weighted DAC . . . . . . . . . . . . . 64 5.3.2 Bandwidth Limitation of the DAC Switch Output Impedance . . . . 67 5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6 An 8.7-13.8 GHz Transformer-coupled Varactor-less QCCO RFIC . . . . . . . . 76 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.2 Analysis and Design of Transformer Coupled Quadrature Oscillator . . . . . 78 6.2.1 Oscillation Analysis and Design . . . . . . . . . . . . . . . . . . . . . 78 vii 6.2.2 Quadrature Coupling Phase Accuracy and Phase Noise . . . . . . . . 81 6.3 Transformer Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.3.1 Geometry Design of Transformers . . . . . . . . . . . . . . . . . . . . 84 6.3.2 Transformer Equivalent Circuit and Parameters . . . . . . . . . . . . 87 6.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 7 Summary and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 7.1 Summary of Original Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 7.2 Possible Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 viii List of Figures 1.1 Block diagram of the conventional ROM-based DDS. . . . . . . . . . . . . . 3 1.2 Block diagram of the ROM-less DDS. . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Block diagram of the ROM-less DDS with segmented sine-weighted DAC. . . 5 1.4 DDS block diagram with direct digital modulations. . . . . . . . . . . . . . . 6 1.5 DDS direct digital modulations (A) BFSK (FCW = 16, FCW = 32) (B) LFM (CCW = 2 or FCW sweeps from 2 to 32) (C) BPSK (FCW = 255, PCW = 215). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.6 Simpli ed pulse compression radar with stretch processing. . . . . . . . . . . 8 1.7 Typical switching structure of current-steering DAC . . . . . . . . . . . . . . 11 2.1 Block diagram of (P-1)-bit sine-weighted DAC. . . . . . . . . . . . . . . . . . 14 3.1 Block diagram of the ROM-less DDS with 10-bit segmented sine-weighted DAC 23 3.2 The 11-bit pipeline phase accumulator. . . . . . . . . . . . . . . . . . . . . . 25 3.3 10-bit segmented sine-weighted DAC. . . . . . . . . . . . . . . . . . . . . . . 26 3.4 Coarse DAC thermometer decoder. . . . . . . . . . . . . . . . . . . . . . . . 29 3.5 Fine DACs thermometer decoders. . . . . . . . . . . . . . . . . . . . . . . . 30 3.6 Illustration of interpolating the two adjacent outputs of a coarse DAC using the ne DAC current matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.7 Current switch circuit of the sine-weighted DAC. . . . . . . . . . . . . . . . 32 3.8 Diagram of the current source matrix. . . . . . . . . . . . . . . . . . . . . . . 33 3.9 Simpli ed clock tree distribution. . . . . . . . . . . . . . . . . . . . . . . . . 35 3.10 Die photo of the 11-bit ROM-less DDS RFIC. . . . . . . . . . . . . . . . . . 36 3.11 Evaluation board for the 11-bit ROM-less DDS RFIC. . . . . . . . . . . . . . 36 ix 3.12 Measured DDS output spectrum with a 4.2 MHz output and a maximum 8.6 GHz clock (FCW = 1), illustrating about 50 dBc SFDR. The tone at 91.7 MHz is from the nearby campus FM radio station. . . . . . . . . . . . . . . . 37 3.13 Measured DDS output waveform with a 4.2 MHz output and an 8.6 GHz clock. 38 3.14 Measured DDS Nyquist output spectrum with a 4.2958 GHz output and a maximum 8.6 GHz clock (FCW = 1023), illustrating about 45 dBc SFDR. The image tone is located at 4.3042 GHz. . . . . . . . . . . . . . . . . . . . . 39 3.15 Measured DDS output waveform with a 4.2958 GHz Nyquist output and an 8.6 GHz clock. The 8.4 MHz envelope frequency results from mixing the output and its image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.16 The measured DDS SFDR versus FCW at clock frequency of 7.2 GHz. Illus- trating a worst-case SFDR of 33 dBc for the Nyquist band (3.6 GHz) and 42 dBc for the narrow band (100 MHz), respectively. . . . . . . . . . . . . . . . 40 3.17 The measured DDS phase noise at an output frequency of 1.57 GHz with a 7.2 GHz clock input frequency. The input clock is generated from an Agilent E8257D analog signal generator. The graph illustrates a 118:55 dBc/Hz phase noise at a 10 kHz frequency o set. . . . . . . . . . . . . . . . . . . . . 41 4.1 Block diagram of 9-bit ROM-less DDS. . . . . . . . . . . . . . . . . . . . . . 46 4.2 Block diagram of 9-bit CLA accumulator (full adder). . . . . . . . . . . . . . 47 4.3 Block diagram of 7-bit sine-weighted DAC. . . . . . . . . . . . . . . . . . . . 49 4.4 Diagram of DAC switch and current source matrix cell. . . . . . . . . . . . . 51 4.5 Measured DDS output spectrum with 509 MHz output under 2.5 GHz clock (FCW=104), showing about 48dBc narrow band SFDR. . . . . . . . . . . . 51 4.6 Measured DDS output spectrum with 1.444 GHz output and 2.9 GHz clock (FCW = 255), showing about 35dBc narrow band SFDR. The image tone is located at 1.455 GHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.7 Measured DDS output waveform with 1.444 GHz output and 2.9 GHz clock (FCW=255). The envelope frequency is 12 MHz . . . . . . . . . . . . . . . 52 4.8 Measured DDS output with FCW = 2 frequency modulated by a frequency step of FCW = 1. The frequency before the step is 9.375 MHz with FCW = 2, after the step is 14.062 MHz with FCW=3. . . . . . . . . . . . . . . . . 53 4.9 Measured DDS output with FCW = 2 phase modulated by a phase step of PCW = 256 with respect to 180 phase shift. The output frequency is 10 MHz with a 2.5 GHz clock. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 x 4.10 Die photo of the 9-bit DDS with direct digital modulations. . . . . . . . . . 55 5.1 Block diagram of the 24-bit 5.0 GHz DDS RFIC. . . . . . . . . . . . . . . . 56 5.2 Lumped RC model for a wire with length of L. . . . . . . . . . . . . . . . . . 58 5.3 Test bench to simulate the wire propagation delay. . . . . . . . . . . . . . . . 60 5.4 Simulated wire propagation delay versus length. . . . . . . . . . . . . . . . . 61 5.5 Diagram of N-bit RCA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.6 Estimated adder propagation delays with number of bits. . . . . . . . . . . . 63 5.7 Block diagram of the 10-bit segmented sine-weighted DAC. . . . . . . . . . . 65 5.8 Diagram of the DAC switch and current source matrix cell. . . . . . . . . . . 66 5.9 DAC switch core circuit and its small signal equivalent circuit. . . . . . . . . 68 5.10 Measured DDS output with a 469.360351 MHz output and the maximum 5.0 GHz clock (FCW = 0x180800), showing a 38 dBc Nyquist band SFDR. . . . 70 5.11 Measured DDS output with a 1.246258914 GHz output and the maximum 5.0 GHz clock (FCW = 0x3FCFE7), showing an 82 dBc narrow band SFDR. . . 71 5.12 Measured DDS LFM output with a FCW sweeps from 1 to 0x005AD9C and using a 300 MHz clock. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.13 Measured DDS output with FCW = 7 phase modulated by a phase step of PCW = 0x800 causing an 180 phase shift. The output frequency is 1.251 kHz with a 3.0 GHz clock. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.14 Measured DDS narrow band SFDR versus output frequency within a 50 MHz bandwidth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.15 Die photo of the 24-bit DDS RFIC. . . . . . . . . . . . . . . . . . . . . . . . 73 6.1 Quadrature VCO circuits with parallel coupling. . . . . . . . . . . . . . . . . 77 6.2 Schematic of transformer-coupled varactor-less QCCO. . . . . . . . . . . . . 78 6.3 AC equivalent circuit of the transformer tank. . . . . . . . . . . . . . . . . . 80 6.4 Stacked octagonal transformer. . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.5 AC equivalent circuit of the varactor-less QCCO. . . . . . . . . . . . . . . . 83 xi 6.6 Equivalent circuits of the varactor-less QCCO. . . . . . . . . . . . . . . . . . 83 6.7 Octagonal symmetrical transformer: (a) concentric, (b) inter-wound, and (c) stacked. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.8 Diagram of the (a) three-dimension PGS substrate and (b) two-dimension deep trench lattice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.9 Transformer time domain equivalent circuit model. . . . . . . . . . . . . . . 88 6.10 Simulated parameters of the transformer windings: (a) self-inductance L, (b) coupling factor k, and (c) quality factor Q. . . . . . . . . . . . . . . . . . . 90 6.11 Simulated capacitance parallel with the transformer primary winding. . . . . 91 6.12 Fabricated QCCO RFIC die photo. . . . . . . . . . . . . . . . . . . . . . . . 92 6.13 Measured QCCO tuning range. . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.14 Measured QCCO outputs at 10.5 GHz with tuning current of 1.5 mA and core current of 2 mA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.15 Measured QCCO phase noise with output frequency of 11.02 GHz. . . . . . . 94 xii List of Tables 2.1 Simulated Segmentation FOM for Di erent Segmenations with 11-bit Phase and 10-bit Amplitude Resolutions . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1 Performance Comparison of Ultrahigh Speed DDS RFICs with over 8 GHz Maximum Clock Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.1 Current Source Matrix in Sine-weighted DAC . . . . . . . . . . . . . . . . . 49 4.2 Selected Ultrahigh Speed DDS RFIC Performance Comparison . . . . . . . . 55 5.1 Ultrahigh Speed DDS RFIC Performance Comparison . . . . . . . . . . . . . 75 6.1 QCCO Performance Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.2 Performance Comparison of Variable-frequency Oscillators . . . . . . . . . . 95 xiii List of Abbreviations 11B DDS the 11-Bit 8.6 GHz DDS 24B DDS the 24-Bit 5.0 GHz DDS 9B DDS the 9-Bit 2.9 GHz DDS ADC Analog-to-Digital Converter BFSK Binary Frequency Shift-Keying BPSK Binary Phase Shift-Keying CCO Current-Controlled Oscillator CCW Chirp Control Word CLA Carry-Look-Ahead CLCC Ceremic Leadless Chip Carrier CML Current Mode Logic CMOS Complementary Metal-Oxide-Semiconductor DAC Digital-to-Analog Converter DDFS Direct Digiral Frequency Synthesizer DDS Direct Digital Synthesizer DFF D-Flip-Flop DT Deep Trench xiv ENOB E ect Number of Bit FA Full Adder FCW Frequency Control Word FM Frequency Modulation FOM Figure-of-Merit GCD the Greatest-Common-Divisor HBT Heterojunction Bipolar Transistor IC Intergrated Circuit InP Indium Phosphide LO Local Oscillator LPF Linear Frequency Modulation LPF Low-Pass Filter LSB Least-Sigini cant Bit LUT Look-Up Table MOSFET Metal-Oxide-Semiconductor Field-E ect Transistor MSB Most-Sigini cant-Bit P-QVCO Parallel Voltage-Controlled Oscillator PAC Phase-to-Amplitude Conversion/Converter PCB Printed Circuit Board PCW Phase Control Word xv PGS Patterned Ground Shield PLL Phase-Locked-Loop PM Phase Modulation PSAC-DAC Phase-to-Sine Amplitude Conversion Digital-to-Analog Converter QCCO Qudrature Current-Controlled Oscillator RCA Ripple Carry Adder RFIC Radio Frequency Intergrated Circuit RMS Root-Mean-Square ROM Read Only Memery S-QVCO Series Voltage-Controlled Oscillator SFDR Spurious-Free-Dynamic-Range SiGe Silicon Germanium SINAD Signal-to-Noise and Total Harmonic Distortion SMA SubMiniature version A SNR Siginal-to-Noise Ratio THD Total Harmonic Distortion VCO Voltage-Controlled Oscillator VNA Vector Network Analyzer xvi Chapter 1 Introduction Ultrahigh speed 1 direct digital synthesizers (DDS)2 RFIC 3 will play key roles in next generation radar and communication systems. Recent developments in radar systems require frequency synthesis with low power consumption, high output frequency, ne frequency res- olution, fast channel switching and versatile modulation capabilities. Linear frequency mod- ulation (LFM) or chirp modulation is widely used in radars to achieve high range resolution, while pulsed phase modulation (PM) can provide anti-jamming capability. With ne fre- quency resolution, fast channel switching and versatile modulation capabilities, the DDS provides frequency synthesis and direct modulation capabilities that cannot be easily imple- mented by other synthesizer tools such as analog-based phase-locked loop (PLL) synthesizers. It is di cult for conventional PLL-based frequency synthesizers to meet these requirements due to internal loop delay, low resolution, modulation problems and limited tuning range of the voltage-controlled oscillator (VCO). Ultrahigh-speed heterojunction bipolar transistors (HBT) allow a DDS to operate up to mm-wave frequency, which is a preferable solution to the synthesis of sine waveforms using in modern ultrahigh speed radar and other communication systems. [1,2]. 1Ultra high frequency (UHF) in ITU radio band means 300MHz to 3 GHz. In this dissertation, ultrahigh speed represents that DDS output frequency is over 1 GHz. 2Sometimes, use direct digital frequency synthesizer (DDFS). They represent same circuits and systems. 3Usually, the term of RFIC refers to the radio frequency or wireless integrated circuit fabricated in Si/SiGe CMOS/BiCMOS technologies. While the term of MMIC refers to the microwave monolithic integrated circuit fabricated in GaAs/InP high fT technology. With the development of modern technology, the two terms appear to be merged together. So in this dissertation, both RFIC and MMIC represent RF/Microwave monolithic integrated circuit regardless what technology is fabricated in. 1 1.1 DDS Architectures 1.1.1 Conventional DDS Conventional DDS design normally consists of a phase accumulator, a ROM lookup table (LUT) and a linear digital-to-analog converter (DAC). The phase accumulator computes the correct phase angle for the output sine wave by accumulating the input frequency control word (FCW) on each clock cycle. If the size of the accumulator is N bits, as shown in Fig. 1.1, the maximum phase value will be 2 (2N 1)=2N. To save power and reduce the complexity of the sinusoidal LUT, the N-bit output of the accumulator may be truncated to P bits before addressing the ROM. The ROM LUT performs a phase-to-amplitude conversion (PAC) of the output sinusoidal wave. Once the amplitude information is obtained, it may be further truncated to D bits that correspond to the number of input bits of the DAC. The digital amplitude codes are then fed into a linear DAC that generates an analog replica of the synthesized waveform. A low pass lter (LPF) usually follows the DAC to remove the unwanted frequency components. The input clock frequency and FCW determine the frequency step size of the DDS as f = fclk2N ; (1.1) and the output frequency of the DDS is given by fout = fclk FCW2N ; (1.2) where fclk is the DDS clock frequency, FCW is the input frequency control word, and N is the size of the phase accumulator. Based upon the Nyquist theorem, at least two samples per clock cycle are required to reconstruct a sinusoidal wave without aliasing. Thus, the largest value of the FCW is 2N 1. Therefore, the maximum output frequency of the DDS is limited to less than fclk=2. However, the output frequency of the DDS is usually constrained to be less than fclk=3 in a practical implementation of the deglitch LPF. 2 R e g i s t e r s F C W S i n / C o s L U T N L i n e a r D A C Ad d e r f out f cl k P D f out = f c l k F CW 2 N Figure 1.1: Block diagram of the conventional ROM-based DDS. 1.1.2 ROM-less DDS with Sin-weighted DAC The ROM size of the conventional DDS increases exponentially with an increase of the number of phase bits used to address the LUT. In general, increasing the ROM size results in higher power consumption and larger area in ROM-based DDS designs. Numerous attempts have been made to compress or eliminate the ROM LUT in the PAC. Langlois has published a comprehensive review of the PAC techniques [3], including angular decomposition [4,5,6], angular rotation, sine amplitude LUT compression [7], polynomial approximation and phase-to-sine amplitude conversion (PSAC)-DAC combinations. All the phase-to-amplitude conversion methods with the exception of PSAC-DAC involve either a large ROM or a complex architecture, yet operate at relatively low speed. To overcome the speed and power performance limits of the ROM-based DDS with high resolution, a ROM-less DDS with sine-weighted DAC (identi ed as PSAC-DAC by Langlois) has been developed in both low speed and ultrahigh speed DDSs. The conceptual block diagram of the ROM-less DDS employing a sine-weighted DAC is shown in Fig. 1.2. The ROM-less DDS replaces the ROM and linear DAC with a sine- weighted DAC that serves as a PAC block as well as a DAC. It eliminates the sine LUT, which is the speed and area bottleneck for high-speed DDS implementations. But, it is a 3 S i n e - w e i g h t e d D A C f out = f c l k F CW 2 N R e g i s t e r s F C W N A d d e r f cl k f out P Figure 1.2: Block diagram of the ROM-less DDS. design challenge to achieve high resolution in the sine-weighted DAC due to the required nonlinear segmentation process. Fig. 1.3 shows a ROM-less DDS with segmented sine-weighted DAC. [8,9]. The major part of the ROM-less DDS is an N-bit phase accumulator and a current-steering sine-weighted DAC. Since the output frequency cannot exceed the Nyquist rate, the most-signi cant-bit (MSB) of the accumulator input is tied to zero. The N-bit FCW (including MSB = 0) feeds the accumulator which controls the output frequency of the synthesized sine wave. The two MSBs of the accumulator output are used to determine the quadrant of the sine waveform. The remaining (P-2)-bits are use to control the segmented sine-weighted DAC in generating the amplitude for a quarter phase (0 /2) sine wave. With the segmentation method described in the following sections, (a + b) MSBs are used to control the coarse DAC, while the a-bit MSBs and c-bit least-signi cant-bits (LSB) are used to control the ne DACs. 1.1.3 ROM-less DDS with Direct Digital Modulations With proper designs, DDS can be used to implement modulations and generate wave- forms such as phase modulation (PM), linear frequency modulation (LFM), step frequency modulation (frequency hopping), binary frequency shift-keying (BFSK), binary phase shift- keying (BPSK) and other hybrid modulations. Fig. 1.4 shows a general architecture of a 4 FCW N DFFs A D D E R Coarse DAC P 1 ' s C o m p l e m e n t o r P-2 Fine DACs a c b 2 nd MSB MSB N-bit pipline accumulator (P-1)-bit sine-weighted DAC fclk fout N N Figure 1.3: Block diagram of the ROM-less DDS with segmented sine-weighted DAC. ROM-less DDS with direct digital modulation capabilities designed for radar system. The architecture has four parts, a D-bit sine-weighted DAC, a P-bit adder used as a phase mod- ulator, an N-bit phase accumulator and another N-bit accumulator used as an N-bit chirp ramp signal generator. Chirp control words (CCW), frequency control words (FCW) and phase control words (PCW) provide the control signal for the chirp accumulator, phase accu- mulator and phase modulator, respectively. Through the direct use of digital control words to change the values of registers in the data path of the DDS, the frequency, phase, and amplitude of the output waveforms can be precisely controlled. Since all the modulations are done in the digital domain, many disadvantages associated with normal analog modula- tions can be avoided. In this ROM-less DDS architecture, the sine-weighted DAC assumes the responsibility for phase-to-amplitude conversion as well as digital-to-analog conversion. Without a ROM, which is usually the speed bottleneck, this DDS architecture can be devel- oped to produce over-GHz frequency waveforms. To perform the direct digital modulations, the accumulators and modulator (full adder) must be updated in every clock cycle. As a result of this requirement, a pipeline accumulator is not suitable for the modulation, and the 5 N Reg A d d e r f out P D Reg A d d e r Sine- weighted DAC PCW CCW N N Reg A d d e r FCW f CLK M U X N N Figure 1.4: DDS block diagram with direct digital modulations. carry-look-ahead (CLA) or ripple carry adder (RCA) architecture is used with an attendant sacri ce in speed. Fig. 1.5 shows some direct digital modulation waveforms generated from a 16-bit phase resolution DDS. This DDS has both 16-bit FM resolution as well as 14-bit PM capabilities. Fig. 1.5(A) displays a BFSK modulation waveform. The input CCW switches between 16 and 32 for frequency f1 and f2 labeled in the waveform. Fig. 1.5(B) shows an LFM waveform with CCW = 2, which performs as though FCW is swept from 2 to 32, repeatedly. Fig. 1.5(C) shows a BPSK modulation waveform with FCW = 255 and PCW = 215 for a phase shift of 180 . 1.2 Direct Digital Synthesizer Used in Modern Radar Systems Range resolution is the ability of a radar system to distinguish between two or more targets on the same bearing but at di erent distances. Weapon-control radar, which requires great precision, should be able to distinguish between targets that are only yards apart. Search radar is usually less precise and only distinguishes between targets that are hundreds of yards or even miles apart. The degree of range resolution depends on the width of the transmitted pulse, the types and sizes of the targets, and the e ciency of the receiver and indicator. The range resolution of simple single pulse radar is cT=2, where c is the pulse transmitting velocity and T is the pulse width transmitted by the pulse radar. In pulse compression radar shown in Fig. 1.6, with the help of a versatile modulated signal generated 6 (C) (B) 0 500 1000 1500 2000 2500 Time (?s) O u t p u t ( V ) -1.5 -1 -0.5 0 0.5 1 1.5 0 500 1000 1500 2000 2500 3000 3500 +180? +180? +180? Time (?s) O u t p u t ( V ) -1.5 -1 -0.5 0 0.5 1 1.5 0 10 20 30 40 50 60 70 80 -1.5 -1 -0.5 0 0.5 1 1.5 Time (ms) O u t p u t ( V ) f 1 f 1 f 2 (A) Figure 1.5: DDS direct digital modulations (A) BFSK (FCW = 16, FCW = 32) (B) LFM (CCW = 2 or FCW sweeps from 2 to 32) (C) BPSK (FCW = 255, PCW = 215). by a DDS, such as LFM, nonlinear FM or phase-coded waveforms, the range resolution can be improved to c=(2B) without losing received pulse strength [10], where c is the signal 7 Duplexer DDS ADC DSP Antenna Control and Display Correlation Mixer LNA PA s(t) s R (t) r(t) x IF (t) LO Figure 1.6: Simpli ed pulse compression radar with stretch processing. transmitting velocity and B is the bandwidth of the transmitted signal. In comparison to the simple single pulse radar, the range resolution is increased by T=B times while the transmitted signal maintains the same instantaneous power. The quantity T=B is the pulse compression ratio, and it is usually much greater than 1. The traditional radar receiver uses a wide bandwidth convolution processor with a matched lter to process the received pulse compression signal. It requires high bandwidth for the analog-to-digital converter (ADC) as well as the back-end processing. In modern radar system, stretch processing is used to reduce the bandwidth requirement of the ADC and back-end processing. Stretch processing is a technique for processing LFM, or other modulated wideband waveforms, using a signal processor with a bandwidth that is much smaller than the transmitted signal bandwidth, without losses in the signal-to-noise ratio (SNR) or range resolution [11, 12]. As shown in Fig. 1.6, stretch processing can be imple- mented in modern radar systems with the help of a simple mixer and the modulated reference signal generated from the same DDS as in the transmit path. 1.3 DDS Spectral Purity In order to achieve ne step size, a large phase accumulator is desired. However, the phase accumulator output is normally truncated to save die area and power. For instance, the output of the phase accumulator is truncated into P bits (P>> < >>> : b 2M 1 sin 2 (0:5) 2P 2 c;for k = 0 b 2M 1 sin 2 (k+0:5) 2P 2 k 1X n=0 In ! c; 0 k 2P 2 1 (2.1) In Eq. (2.1), P is the phase resolution of the sine-weighted DAC, which is the total input number of bit of the sine-weighted DAC. M is the amplitude resolution including the mir- roring e ect of the MSB. Usually, M=P-1, generated by the (P-2)-bit quater sine-wave and the mirroring of the MSB. 2.2 Segmented Sine-Weighted DAC It is quite di cult to build a non-segmented DAC with more than 10 bit resolution due to the exponential increase in area and power consumption that results from increasing the DAC resolution. The problem becomes even more pronounced for sine-weighted DAC designs than the linear DAC. In linear DAC design, high accuracy can be achieved using segmentation. For instance, a 10-bit DAC can be segmented into a 5-bit coarse DAC and a 5-bit ne DAC, i.e., a 5+5 segmentation, while a 12-bit DAC can be segmented into an 8-bit coarse DAC and a 4-bit ne DAC, i.e., 8+4 segmentation [16,13]. Similarly, a sine-weighted DAC can also be segmented into coarse DAC and ne DACs [17]. 2.2.1 Quantization and Segmentation of the Sine Wave For the P-bit phase word, since the quadrant of the sine waveform was determined by the two MSBs, only one quarter of the sine wave needs to be generated by the left P-2 bits. If we further segment the remaining P-2 phase bits in three parts with a, b and c bits (a + b + c = P-2), there are 2a+b+c phase words for one quarter of the sine wave. The phase word can thus be represented as = x 2b+c +y 2c +z (2.2) 15 with 0 x 2a 1;0 y 2b 1 and 0 z 2c 1, where x, y and z are the phase sequence numbers related to the segmented parts a, b and c. Thus, if the amplitude of the sine wave is given by A = 2M 1, where M is number of amplitude bits, and for a speci c phase word , the quarter sine wave can be represented as Asin 2 2a+b+c = (2M 1) sin 2 x 2b+c +y 2c +z 2a+b+c = (2M 1) sin 2 x 2b+c +y 2c 2a+b+c cos 2 z 2a+b+c + (2M 1) cos 2 x 2b+c +y 2c 2a+b+c sin 2 z 2a+b+c : (2.3) Since z x 2b+c +y 2c 2a+b+c; (2.4) we have cos( 2 z2a+b+c) 1: (2.5) Thus, the sine wave can be approximated as Asin 2 2a+b+c (2M 1) sin 2 x 2b+c +y 2c 2a+b+c + (2M 1) cos 2 x 2b+c +y 2c 2a+b+c sin 2 z 2a+b+c = C(x;y) +F(x;y;z); (2.6) with C(x;y) = (2M 1) sin 2 x 2b+c +y 2c 2a+b+c ; (2.7) 16 F(x;y;z) =(2M 1)cos 2 x 2b+c +y 2c 2a+b+c sin 2 z 2a+b+c ; (2.8) where C(x;y) is the sinusoidal value to be stored in a coarse DAC, and F(x;y;z) denotes the sinusoidal value to be stored in ne DACs, respectively. From the above decomposition, two sub-DACs can be designed to convert a complete sine wave to its analog waveform. The ne DAC data F(x;y;z) can be used to interpolate the coarse DAC data C(x;y). In order to quantize C(x;y), the amplitude di erences between the two adjacent coarse phase words are derived as shown in Eq. (2.9). C(x;y) = 8> >>> >>>< >>> >>>> : b(2M 1) sin 2 (0:5) 2P c;for x = y = 0 b(2M 1) sin 2 (x 2b+c+y 2c) 2P x 1X m=0 2b 1X n=0 C(m;n) yX n=0 C(x;n)c; for 0 x 2a 1;1 y 2b 1 (2.9) To simplify the quantization of F(x;y;z), the average of y is used to represent every y value and F(x;y;z) is thus simpli ed to F(x;z). Hence, the amplitude di erence between the two adjacent ne phase words for the ne DACs can be obtained as shown in Eq. (2.10). F(x;z) = 8 >>>< >>> : b(2M 1) cos 2 (x 2b+c+y 2c) 2P sin 2 (0:5) 2P c;for z = 0 b(2M 1) cos 2 (x 2b+c+y 2c) 2P sin 2 (z+0:5) 2P z 1X n=0 F(x;n)c; for 1 z 2c 1 (2.10) In Eqs. (2.9) and (2.10), it should be pointed out that a. P is the truncated phase resolution, P = a+b+c+ 2; b. bAc denotes the rounding of number A down to the nearest integer toward zero; c. y = 0+1+ +(2b 1)2b = 2b 12 is the average value of y; and 17 d. F(x;z) = F(x;y;z) F(x;y;z), where y is replaced with its averaged value. With Eqs. (2.9) and (2.10), the sine function can be rewritten as (2M 1) sin 2 2a+b+c C(x;y) +F(x;z) = xX i=0 yX j=0 C(x;y) + xX i=0 yX k=0 F(x;z); (2.11) where the rst term denotes the data stored in the coarse DAC current sources and the second term denotes the data stored in the ne DAC current sources. This trigonometric decomposition is similar to the ROM compression in the ROM-based DDS. In the approaches by Sunderland [4] and Nicholas [5], sin(A+B +C) = sin(A+B) cos(C) + cos(A+B) sin(C) sin(A+B) + cos(A) sin(C): (2.12) The following two approximations 8 >>< >>: cos(C) 1 cos(A+B) cos(A) (2.13) have been made, while in the approach adopted here, the approximation is improved by using 8 >>< >>: cos(C) 1 cos(A+B) cos(A+B); (2.14) where B is the mean value of B. The approximation error will be analyzed in the next subsection. 18 2.2.2 Approximation Error Analysis In the previous subsection, two approximations are used for the coarse DAC and ne DACs respectively. The rst is represented in Eq. (2.5). The second is the use of the mean value of y for the computation of F(x;y;z). Both the approximations lead to errors in the computation of the sine wave?s amplitude. For the coarse DAC the approximation error is EC = cos 2 x 2b+c +y 2c 2a+b+c 1: (2.15) The maximum value of EC is maxfECg= sin 2 1 2a+b ; (2.16) when x = 2a 1 and y = 2b 1. For the ne DACs, EF = cos 2 x 2b+c +y 2c 2a+b+c cos 2 x 2b+c +y 2c 2a+b+c ; (2.17) and the maximum value of EF is maxfEFg 2 sin 2 1 2a+2 ; (2.18) when x = 2a 1, y = 2b 1 and y = (2b 1)=2. If the whole DAC requires a 9-bit amplitude resolution, excluding the MSB mirroring, then the coarse DAC should have at least a 9-bit resolution and the ne DACs should have c-bit resolution. From Eqs. (2.16) and (2.18), 8 >>< >>: sin 2 12a+b 129 2 sin 2 12a+2 12c: (2.19) 19 From Eq. (2.19), 8 >>> >>>< >>> >>>: a+b 4 c 6;when a = 0 c 10;when a = 4: (2.20) As long as a, b and c are in the range of Eq. (2.20), the approximation errors are less than the quantization noise and can be ignored. 2.2.3 Optimizing the Segmentation From the above discussion, the quantization noise is signi cantly a ected by the seg- mentation. To optimize the segmentation for better performance, one or more optimization parameters need to be considered. SFDR, power consumption and die area are the most critical parameters in the ultrahigh speed DDS design. An optimized segmentation gure- of-merit, normalized by the non-segmented values, is de ned as FOMsg = (SFDRsg SFDRns) PsgP ns AsgA ns : (2.21) where FOMsg, SFDR, P and A represent the segmentation gure-of-merit, spurious-free- dynamic-range, power consumption and occupied area, respectively. The subscript \sg" means segmented DAC and \ns" denotes non-segmented DAC. Unlike CMOS logic design, where the power consumption results mainly from dynamic power, the primary power con- sumed by the current-mode-logic (CML) circuits that are used in the ultrahigh speed DDS designs is the static bias current in the CML current sources. Moreover, we assume that both the DAC power consumption and area are proportional to the number of DAC switch cells. If we segment the switch cells to a, b and c, the normalized number of switch cells is given by 2a+b + 2a+c 2a+b+c : (2.22) 20 Table 2.1: Simulated Segmentation FOM for Di erent Segmenations with 11-bit Phase and 10-bit Amplitude Resolutions Segmentation SFDR Normalized Power FOMsg a-b-c Consumption or Area 2-2-5 51.08 0.2813 2.1895 2-3-4 57.73 0.1875 0.7390 2-4-3 65.44 0.1875 0.4679 3-2-4 64.03 0.3125 1.4375 3-3-3 71.07 0.2500 0.4675 3-4-2 72.19 0.3125 0.6406 4-2-3 72.05 0.3750 0.9422 4-3-2 70.87 0.3750 1.1081 4-4-1 71.27 0.5625 2.3667 4-5-0 78.75 1 0 which can be used to represent either the normalized power consumption Psg=Pns or the normalized area Asg=Ans. For a sine-weighted DAC with total 9 input bits, Table 2.1 shows the simulated SFDR, normalized power consumption or area and the FOMsg. The results in Table 2.1 demonstrate that with a larger a or b, a better SFDR can be achieved, but power consumption and area will increase as well. Segmentation with a + b = 9 yields the best SFDR, yet it also leads to the highest power consumption and largest area. This result is understandable since a + b = 9 means a non-segmented DAC. The segmentation with a = b = c = 3 results in a good power and area e ciency, and a relatively high SFDR. Moreover, it achieves the best FOMsg of 0.47. Note that the simulated SFDR in Table 2.1 includes only the e ect of static quantization errors of the sine-weighted DAC, whereas the practical integrated circuit also su ers from other nonlinearities and distortions. As a result, the measured SFDR will be worse than what is given in Table 2.1. 21 Chapter 3 An 11-bit 8.6 GHz DDS RFIC with 10-bit Segmented Sine-weighted DAC 3.1 Introduction Ultrahigh-speed HBTs allow a DDS to operate up to mm-wave frequency, which is a preferable solution to the synthesis of sine waveforms with ne frequency resolution, fast channel switching and versatile modulation capability [1,2]. There are several ultrahigh speed DDS designs reported with clock frequencies from 9 GHz to 32 GHz and DAC resolution from 5 bits to a maximum of 8 bits [18, 19, 20]. These DDSs have been implemented in indium phosphide (InP) (HBT) technology and only tested on-wafer [18, 19, 20]. The maximum achieved SFDR in these DDS designs is less than 30 dBc, which is not su cient for typical radar and wireless applications. The low yield and high power consumption of InP HBTs limits the InP HBT-based DDS from achieving higher resolution. Several DDSs have been developed in SiGe BiCMOS technology with more robust and higher yield devices than the InP counterpart [21,22]. However, theses earlier versions of SiGe DDSs still su er from less than 30 dBc SFDR. A higher spectrum purity and higher amplitude resolution are required in modern radar and communication systems. With a segmented sine-weighted DAC, the DDS presented in this chapter achieves 11-bit phase and 10-bit amplitude resolutions with a maximum clock frequency of 8.6 GHz [8,9]. The DDS consumes 4.8 W with a leading power e ciency FOM of 81.1 GHz 2SFDR=6/W and the best reported Nyquist band worst-case SFDR of 33 dBc in ultrahigh speed DDS designs. The proposed DDS adopts a ROM-less architecture, which combines both the sine/cosine mapping and digital-to-analog conversion together in a sine-weighted DAC [8,9]. The block diagram of the ROM-less DDS, with 11-bit phase and 10-bit amplitude resolution is shown in Fig. 3.1. The major part of the ROM-less DDS is an 11-bit pipeline phase accumulator 22 and a 10-bit current-steering segmented sine-weighted DAC. Since the output frequency can- not exceed the Nyquist rate, the MSB of the accumulator input is tied to zero. The 11-bit FCW (including MSB = 0) feeds the accumulator which controls the output frequency of the synthesized sine wave. The two MSBs of the accumulator output are used to determine the quadrant of the sine waveform. The remaining 9-bits are use to control the segmented sine-weighted DAC in generating the amplitude for a quarter phase (0 /2) sine wave. With the segmentation method described in Chapter 2, 3+3 MSBs are used to control the coarse DAC, while the 3-bit MSBs and 3-bit LSBs are used to control the ne DACs. FCW 11 DFFs A D D E R Coarse DAC 10 1 ' s C o m p l e m e n t o r 9 Fine DACs 3 3 3 2 nd MSB MSB 11-bit pipline accumulator 10-bit sine-weighted DAC fclk fout 11 11 Figure 3.1: Block diagram of the ROM-less DDS with 10-bit segmented sine-weighted DAC . 3.2 Circuit Implementation of the 11-bit Rom-less DDS With a 3.3 V power supply and a SiGe HBT base-collector voltage of 0.85 V 0.9 V, all of the digital logic is implemented using 3-level CML with di erential output swings of 400 mV. A trade-o has been made between the DDS operational speed and its power consumption. For an 11-bit packaged DDS RFIC, power consumption is the primary concern. To save 23 power, each tail current in a CML current source is set to 0.3 mA, which is close to 40% of the peak fT current. In the contrast, traditional CML circuits are biased at 70 80% of the peak fT current. A traditional implementation of the CML circuits would end up with a DDS with power consumption larger than 9.0 W. 3.2.1 11-Bit Pipeline Accumulator To achieve the maximum operating speed with a xed FCW, a pipeline accumulator is used in this design. It uses the most hardware, but achieves the fastest speed. The total delay of the accumulator is one full adder (FA) propagation delay plus one D ip- op (DFF) propagation delay. Fig. 3.2 illustrates the architecture of the 11-bit pipeline phase accumulator, which has a total of 11 pipelined rows. Each row has a total of 12 DFF delay stages placed at the input and output of a 1-bit FA. Eleven DFF stages are needed for an 11-bit pipeline accumulator. One more DFF is used for each row to retime the signal for data synchronization. This scheme retimes the signal to remove the timing mismatch due to the metal wire delays from the accumulator output to its input. Obviously, the pipeline accumulator has a propagation delay of 12 clock cycles, including a latency period of 11 clock cycles plus one retiming clock cycle. Note that an accumulator requires at least one delay stage even without any pipelined stages. So the pipeline architecture shown in Fig. 3.2 allows the 11-bit accumulator to operate at the speed of a 1-bit accumulator consisting of an FA and a DFF. 3.2.2 10-Bit Segmented Sine-Weighted DAC The block diagram of the 10-bit sine-weighted DAC is shown in Fig. 3.3. It has a 9-bit complementor and a current-steering sine-weighted DAC, which includes a 6-bit coarse DAC and eight 3-bit ne DACs. The MSB of the DAC input is used to provide the proper mirroring of the sine waveform about the phase point. The 2nd MSB is used by the complementor to invert the remaining 9 bits for the 2nd and 4th quadrants of the sine waveform. The outputs 24 A B C in C out SUM DFF DFF DFF 1-bit FA DFF A B C in C out SUM DFF DFF DFF 1-bit FA DFF A B C in C out SUM DFF DFF DFF 1-bit FA DFF DFF 11 10 11 Q1 Q2 Q11 D1 D2 D11 (tied to zero) Carry out Carry in tied to zero Figure 3.2: The 11-bit pipeline phase accumulator. of the complementor are applied to the segmented sine-weighted DAC to form a quarter of the sine waveform. Because of the quadrant mirror, the total amplitude resolution of the sine-weighted DAC is 10-bits, while a 9-bit segmented sine-weighted DAC is used to generate the amplitude for a quarter phase (0 /2) sine wave. Based on the discussion in Chapter 2, setting a = b = c = 3 results in a segmentation with the best segmentation FOM. Therefore, the 9-bit sine-weighted DAC is divided into a 6-bit coarse sine-weighted DAC and eight 3-bit ne sine-weighted DACs. The rst 6 bits of the complementor output control the coarse sine-weighted DAC, and the highest 3 bits also address the selection of the ne DACs. The lowest 3 bits of the complementor output determine the output value of each of the ne DACs. 25 3-7 R o w D eco d er 3-7 T h er mo mete r D eco d er 1' s C o mp leme n to r M SB 2 nd M SB 3-7 C o l u mn D eco d er < 4:6> < 1:9> < 7:9> < 1:3> VC C OU T3-8 B in ar y D eco d er (B ) (C ) (D ) (A ) < 1:9> (A ) (B ) (C ) 9 b (D ) 10 b ? 2?0 0 2 1 2 1 2 1 2 0 2 1 2 1 2 1 2 0 1 2 1 2 1 1 2 0 1 1 2 1 1 1 2 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 1 12 13 12 13 12 13 12 12 13 12 12 12 12 12 12 11 12 11 11 12 11 10 11 11 10 10 10 10 9 10 9 9 9 8 8 9 7 8 7 7 7 7 6 6 6 5 5 5 5 4 4 4 4 3 3 2 3 2 1 2 1 0 1 < 1:11 > Figure 3.3: 10-bit segmented sine-weighted DAC. With 11-bit phase and 10-bit amplitude resolutions, the weighted current sources of the coarse DAC and ne DACs can be calculated from Eqs. (2.9) and (2.10). The numbers within the coarse DAC and ne DACs in Fig. 3.3 represent the weights of the various current sources. To describe the DAC core architecture and its operation, an operator is de ned between two 8 8 square matrices. A~B = aij ~bij = 7X i=0 7X j=0 aij bij: (3.1) To match the sine-weighted DAC description, the matrix indices start from 0 instead of 1. As an example, for a speci c phase word = x 2b+c +y 2c +z = 64x+ 8y +z; (3.2) 26 the quarter sine wave is rebuilt using Eq. (2.11) and represented in Eq. (3.3), (2M 1) sin 2 2a+b+c = 1023 sin 2 64x+ 8y +z 29 = 2 66 66 66 66 66 66 66 66 66 66 4 1 12 13 12 13 12 13 12 12 13 12 12 12 12 12 12 11 12 11 11 12 11 10 11 11 10 10 10 10 9 10 9 9 9 8 8 9 7 8 7 7 7 7 6 6 6 5 5 5 5 4 4 4 4 3 3 2 3 2 1 2 1 0 1 3 77 77 77 77 77 77 77 77 77 77 5 | {z } Coarse current matrix ~ 2 66 66 66 66 66 66 66 66 66 66 4 c00 c01 c02 c03 c04 c05 c06 c07 c10 c11 c12 c13 c14 c15 c16 c17 c20 c21 c22 c23 c24 c25 c26 c27 c30 c31 c32 c33 c34 c35 c36 c37 c40 c41 c42 c43 c44 c45 c46 c47 c50 c51 c52 c53 c54 c55 c56 c57 c60 c61 c62 c63 c64 c65 c66 c67 c70 c71 c72 c73 c74 c75 c76 c77 3 77 77 77 77 77 77 77 77 77 77 5 | {z } Coarse switch matrix + 2 66 66 66 66 66 66 66 66 66 66 4 0 2 1 2 1 2 1 2 0 2 1 2 1 2 1 2 0 1 2 1 2 1 1 2 0 1 1 2 1 1 1 2 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 3 77 77 77 77 77 77 77 77 77 77 5 | {z } Fine current matrix ~ 2 66 66 66 66 66 66 66 66 66 66 4 f00 f01 f02 f03 f04 f05 f06 f07 f10 f11 f12 f13 f14 f15 f16 f17 f20 f21 f22 f23 f24 f25 f26 f27 f30 f31 f32 f33 f34 f35 f36 f37 f40 f41 f42 f43 f44 f45 f46 f47 f50 f51 f52 f53 f54 f55 f56 f57 f60 f61 f62 f63 f64 f65 f66 f67 f70 f71 f72 f73 f74 f75 f76 f77 3 77 77 77 77 77 77 77 77 77 77 5 | {z } Fine switch matrix (3.3) where cij and fij represent the operation state (0 means open and 1 means closed) of the respective coarse DAC and ne DAC switches. Comparing to Eqs. (2.11) and (3.3), we have 27 cij = 8 >>> >>>< >>> >>>: 1; when i>< >>: 1; when i = x;j z 0; others; (3.5) and 0 x 7;0 y 7;and 0 z 7: (3.6) From Eq. (3.4), the control bits of the coarse DAC switch matrix can be generated through thermometer decoders. Fig. 3.4 shows the coarse DAC decoders. d10 d4 represent the input bits to the coarse DAC and e9 e4 represent the complemented bits at the complementor output. The full 6-bit thermometer decoder includes 3 parts: a column decoder, a row decoder and second level decoders. e9 e7 and e6 e4 are inputs to the column decoder and row decoder, respectively. Following the second level thermometer decoder, 6-bit binary codes are converted to 64-bit thermometer codes represented by cij. As shown in Fig. 3.5, the control bits of the ne DAC?s switch matrix can be generated through a thermometer decoder, a binary decoder and a second level address-select decoder. d10 d7 and d3 d1 represent the input bits to the ne DACs. e9 e7 and e3 e1 represent the complemented bits at the complementor output. e3 e1 is input through the thermometer decoder and converts the input bits for each ne DAC, and e9 e7 is input through the binary decoder to generate the address-select code. The binary decoder and the address-select decoder work together to select which ne DAC is used to interpolate the respective coarse DAC steps. Through a combination of all the decoders, the 64-bit ne DAC control matrix is generated and represented by fij as described in Eq. (3.5). 28 DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF Complementor Complementor Thermometer Decoder Thermometer Decoder d 10 d 9 d 8 d 7 d 10 d 6 d 5 d 4 e 8 e 7 e 9 e 6 e 5 e 4 e 9 +e 8 +e 7 e 9 +e 8 e 9 +e 8 e 7 e 9 e 9 (e 8 +e 7 ) e 9 e 8 e 9 e 8 e 7 e 6 +e 5 +e 4 e 6 +e 5 e 6 +e 5 e 4 e 6 e 6 (e 5 +e 4 ) e 6 e 5 e 6 e 5 e 4 C 7 C 6 C 5 C 4 C 3 C 2 C 1 C 0 =0 R 7 R 6 R 5 R 4 R 3 R 2 R 1 R 0 =0 R 8 =1 0 ? i, j ? 7 c ij C i R j R j+1 Figure 3.4: Coarse DAC thermometer decoder. As shown in Fig. 3.6, the coarse DAC current source matrix provides 512 unit current sources. Each ne DAC uses about 8 unit current sources to interpolate the two adjacent outputs of the coarse DAC. For example, for a phase value represented by x = 2, y = 3 and 29 DFF e9e8e7e9e8e7e9e8e7e9e8e7e9e8e7e9e8e7e9e8e7 DFF DFF DFF DFF DFF DFF DFF DFF DFFe9e8e7 DFF Complementor Binary Decoder DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF Complementor Thermometer Decoder C 7 C 6 C 5 C 4 C 3 C 2 C 1 C 0 R 7 R 6 R 5 R 4 R 3 R 2 R 1 R 0 =0 0 ? i, j ? 7 f ij C i R j d 10 d 9 d 8 d 7 d 10 d 3 d 2 d 1 e 3 e 2 e 1 e 9 e 8 e 7 e 3 +e 2 +e 1 e 3 +e 2 e 3 +e 2 e 1 e 3 e 3 (e 2 +e 1 ) e 3 e 2 e 3 e 2 e 1 Figure 3.5: Fine DACs thermometer decoders. z = 5, the coarse DAC current output is the sum of all the numbers lled in the gray-shaded boxes in the coarse DAC current matrix in Fig. 3.6. The ne DAC current output, which is 30 the sum of all the numbers lled in the gray-shaded boxes in the ne DAC current matrix, is added to the coarse DAC output. As a result, the total current output of the DAC is the sum of all the gray-shaded boxes and equal to 237 unit current sources. The unit current of each current source is set at 26 A. The largest current in the current source matrix of this 10-bit sine-weighted DAC is 338 A, which is composed of 13 unit current sources. 1 12 13 12 13 12 13 12 12 13 12 12 12 12 12 12 11 12 11 11 12 11 10 11 11 10 10 10 10 9 10 9 9 9 8 8 9 7 8 7 7 7 7 6 6 6 5 5 5 5 4 4 4 4 3 3 2 3 2 1 2 1 0 1 0 2 1 2 1 2 1 2 0 2 1 2 1 2 1 2 0 1 2 1 2 1 1 2 0 1 1 2 1 1 1 2 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 x z y C o a rse D ACF i n e D AC s 1 2 3 4 5 6 70 1 2 3 4 5 6 70 0 1 2 3 4 5 6 7 Figure 3.6: Illustration of interpolating the two adjacent outputs of a coarse DAC using the ne DAC current matrix. The current switch contains two di erential pairs with cascode current sources for im- proved output impedance and current mirror accuracy. The current outputs are converted to di erential voltages by a pair of o -chip 15 pull-up resistors. Fig. 3.7 shows that the currents from the cascode current sources are fed to outputs OUTp and OUTm by pairs of switches (Msw). The MSB controls the selection between di erent half periods. The current switch contains two di erential pairs with minimum size transistors and a cascode transistor to isolate the current sources from the switches, which also improves the bandwidth of the switching circuits. 31 MSB p MSB m OUT p OUT m M sw Q p Q m D p D m S p S m C L K p C L K m Pull up resistor V cas V cs D p D m C p C m MSB p MSB m M sw Q p Q m D p D m S p S m C L K m C L K p D p D m C m C p DAC current cell Figure 3.7: Current switch circuit of the sine-weighted DAC. In order to achieve current source matching in the layout, each current source is split into four identical small current sources which carry a quarter of the required current. To further improve this matching, all the current source transistors, including those in the coarse DAC and ne DACs, are distributed in the current source matrix with a pseudo-double-centroid switching scheme [23]. The coarse DAC and ne DACs use a total of 568 current sources. Therefore, a 24 row by 24 column current source cells are used to build the current matrix in Fig. 3.8. All the current sources are distributed through a rotation from the matrix center to the edge. The total number of current source cells used for the coarse DAC are 511 and 57 are used for ne DACs. The remaining 8 current sources are used for bias. Four 24 by 24 current source matrices are placed around a common cenrtoid. Two dummy rows and columns are added around the current source matrix to avoid edge e ects. So the complete current matrix has 52 rows and 52 columns. 32 10 11 12 13 9 2 3 14 8 1 4 ??? 7 6 5 568 10111213 92314 814??? 765 568 10 11 12 13 9 2 3 14 8 1 4 ??? 7 6 5 568 10111213 92314 814??? 765 568 24 columns 2 4 r o w s 52 columns 5 2 r o w s Figure 3.8: Diagram of the current source matrix. 3.2.3 Clock Distribution To synchronize the signal in high speed circuit design, numerous DFFs are used between the logic elements. In the accumulator design, the number of the DFFs in the pipeline accumulator increases rapidly with the increasing number of the pipeline stages. Hence there are more than 100 DFFs used in the 11-bit pipeline accumulator. Counting the number of the DFFs used in the sine-weighted DAC to synchronize the current switches, it yields approximately 300 DFFs. All of these DFFs must be synchronized with a simultaneous clock edge. In order to minimize the phase di erence and maintain the same drive strength between the clock and DFFs, an H-tree clock scheme is used to ensure that the clock signal reaches each block simultaneously. Fig. 3.9 shows a simpli ed architecture of the \H"-shaped clock 33 tree. The actual clock tree is 3 times bigger than the simpli ed version shown in Fig. 3.9. The external clock is bu ered by a di erential pair and then drives two emitter follower pairs which are used as a level-shifter as well as a bu er. Each emitter follower pair drives two or four di erential pairs and each di erential pair drives other emitter follower pairs, until the clock reaches the leaves that nally drive the DFFs. The number of di erential pairs or emitter follower pairs driven by the previous stage depends on the driving strength of the previous stage and is proportional to the CML tail current. To keep enough swing fully switching the next stage, a 1-driving-2 current ratio is maintained throughout the whole clock bu er tree. 3.3 Experimental Results The die photo of the SiGe DDS RFIC is shown in Fig. 3.10. This DDS design is quite compact with an active area of 3 2.5 mm2 and a total die area of 4 3.5 mm2. The DDS was tested in a CLCC-52 package. Fig. 3.11 shows the packaged chip soldered on a PCB fabricated with RO4004 material. The clock signal is generated from an Agilent E8257D analog signal generator and is converted to di erential signals by a hybrid coupler. Two SMA connectors and symmetrical tracks are used to send the clock signal to the DDS chip. The DDS current outputs are converted to voltage outputs through a pair of 15 on-board resistors and connected to the spectrum analyzer or oscilloscope through SMA connectors and RF cables. The package has a thermal resistance of approximately 30 C/W. With a 4.8 W power consumption at room ambient temperature, the junction temperature of the SiGe devices can theoretically reach as high as 180 C. At such high temperature, the device performance is greatly degraded and the DAC current switches are no longer synchronized due to increased internal delays, which introduce noticeable distortion in the output waveform. When the device is e ectively cooled, the DDS operates at a maximum clock frequency of 8.6 GHz. 34 E0 E0 E0 E0 D0 E0 E0 E0 E0 D0 E0 E0 E0 E0 D0 E0 E0 E0 E0 D0 E1 E0 E0 E0 E0 D0 E0 E0 E0 E0 D0 E0 E0 E0 E0 D0 E0 E0 E0 E0 D0 E1 D1 E2 E0 E1 D0 D1 E2 Emitter follower pairs Differential pairs Clock In Figure 3.9: Simpli ed clock tree distribution. At room temperature, the packaged DDS operates at the maximum clock frequency of 7.2 GHz. 35 Pipeline Accumulator 1.4 mm x 0.8 mm Fine DAC Switches 1.4 mm x 1.0 mm Coarse DAC Switches 1.4 mm x1.0 mm Current Source Matrix 2.8 mm x 0.5 mm Figure 3.10: Die photo of the 11-bit ROM-less DDS RFIC. FCW INPUT D0~D9 CLOCK INPUT DDS OUTPUT DDS in CLCC-52 Figure 3.11: Evaluation board for the 11-bit ROM-less DDS RFIC. 36 Figs. 3.12-3.15 illustrate the measured DDS output spectra and waveforms for di erent outputs and clock frequencies. All measurements were done with packaged parts and without calibrating the losses of the cables and PCB tracks. Fig. 3.12 presents a 4.2 MHz DDS output spectrum with an 8.6 GHz clock input, and a minimum FCW of 1. The measured output power is approximately -8.3 dBm, and the measured SFDR is about 50 dBc. The tone at 91.7 MHz was generated by the nearby campus FM radio station. To show the signal tone clearly, a 100 MHz band spectrum plot is used instead of the full Nyquist band. However, the worst-case spur is located within this band, so within both the Nyquist band and the narrow band the SFDR is 50 dBc. Fig. 3.13 shows the waveform for the spectrum in Fig. 3.12. Figure 3.12: Measured DDS output spectrum with a 4.2 MHz output and a maximum 8.6 GHz clock (FCW = 1), illustrating about 50 dBc SFDR. The tone at 91.7 MHz is from the nearby campus FM radio station. Fig. 3.14 demonstrates the operation of the DDS at a maximum clock frequency of 8.6 GHz with Nyquist output (i.e., FCW = 1023). Thus, the output frequency is set as 210 1 211 fclk = 4:2958 GHz: (3.7) 37 Figure 3.13: Measured DDS output waveform with a 4.2 MHz output and an 8.6 GHz clock. The rst order image tone due to mixing the clock frequency and the DDS output frequency occurs at 8:6 GHz 4:2958 GHz = 4:3042 GHz: (3.8) The measured SFDR of the device is approximately 45 dBc. The tone at 91.7 MHz once again appears in the spectrum. Fig. 3.15 illustrates the measured DDS output waveform with a 4.2958 GHz Nyquist output and an 8.6 GHz clock. The signal envelope frequency results from mixing the output and its image, which is 210 + 1 211 fclk 210 1 211 fclk 8:4 MHz: (3.9) Fig. 3.16 shows the measured DDS SFDR plot at both the Nyquist band (3.6 GHz) and the narrow band (100 MHz) versus the FCW with a clock frequency of 7.2 GHz. The worst-case SFDR is 33 dBc and 42 dBc for the Nyquist band and narrow band, respectively. 38 Fig. 3.17 shows the measured DDS phase noise at an output frequency of 1.57 GHz with a 7.2 GHz clock input frequency. There is a -118.55 dBc/Hz phase noise at a 10 kHz frequency o set. The input clock is generated from an Agilent E8257D analog signal generator. The spurs showing in the measurement are not harmonically related to the synthesized output frequency. It is test environment related. Figure 3.14: Measured DDS Nyquist output spectrum with a 4.2958 GHz output and a maximum 8.6 GHz clock (FCW = 1023), illustrating about 45 dBc SFDR. The image tone is located at 4.3042 GHz. To evaluate the performance of ultrahigh speed DDSs, an easily measured and calculated FOM must be de ned from a combination of performance parameters. In the previous literature [24], a power e ciency FOM has been de ned as FOM = Max. Clock(GHz)Power(W) : (3.10) This previously de ned FOM includes the maximum update frequency as well as the power consumption, but does not consider the amplitude resolution information, which is 39 Figure 3.15: Measured DDS output waveform with a 4.2958 GHz Nyquist output and an 8.6 GHz clock. The 8.4 MHz envelope frequency results from mixing the output and its image. 0 100 200 300 400 500 600 700 800 900 1000 0 10 20 30 40 50 60 70 80 FREQUENCY CONTROL WORD M E A S U R E D S F D R ( d B c ) MEASURED SFDR VS. FCW Narrow Band Nyquist Band Figure 3.16: The measured DDS SFDR versus FCW at clock frequency of 7.2 GHz. Illus- trating a worst-case SFDR of 33 dBc for the Nyquist band (3.6 GHz) and 42 dBc for the narrow band (100 MHz), respectively. limited by the DAC. For an ultrahigh speed DDS, this lack of information is unfortunate since the DAC is the most challenging part of these DDS designs. Thus, we de ne a new FOM including the e ective number of bits (ENOB) that measures the DAC spurious performance. 40 Figure 3.17: The measured DDS phase noise at an output frequency of 1.57 GHz with a 7.2 GHz clock input frequency. The input clock is generated from an Agilent E8257D analog signal generator. The graph illustrates a 118:55 dBc/Hz phase noise at a 10 kHz frequency o set. From [25], the signal to noise and total harmonic distortion (SINAD) are used to calculate the ENOB as follows: ENOB = SINADdB 1:766:02 : (3.11) SINAD is the ratio of the root-mean-square (RMS) value of the sine wave (reconstructed output of a DAC) to the RMS value of the noise plus the total harmonic distortion (THD) up to the Nyquist frequency, excluding the fundamental and the DC o set. SINAD is typically expressed in dB as SINAD = SN + THD; (3.12) 41 where S and N are the RMS energy values of the signal and noise; THD is the total harmonic distortion de ned as THD = PHD1 +PHD2 + P signal = the biggest spur powerP signal + the sum of all other spurs? power except the biggistP signal = 1SFDR + the sum of all other spurs? power except the biggistP signal : (3.13) PHD1, PHD2, are the rst and second harmonic distortion energy. Psignal is the funda- mental tone or signal tone energy. Table 3.1: Performance Comparison of Ultrahigh Speed DDS RFICs with over 8 GHz Max- imum Clock Frequency [18] [19] [20] [21] [this work] Technology InP InP InP SiGe SiGe fT=fMAX [GHz] 137/267 300/300 300/300 100/120 200/250 Phase resolution [bit] 8 8 8 9 11 Amplitude resolution [bit] 7 7 5 8 10 Maximum clock [GHz] 9.2 13 32 12.3 8.6 Nyquist band SFDR [dBc] <30 26.67 21.56 20 33 Power consumption [W] 15 5.42 9.45 1.9 4.8 Die area [mm2] 8 5 2.7 1.45 2.7 1.45 3 3 4 3.5 FOM [GHz 2SFDR=6/W] <16.0 42.6 34.8 65.3 81.1 Although the second items in Eq. (3.13) may be larger than the rst item, the SFDR is easily obtained since it can be read directly from the spectrum analyzer. Herein, we use 1/SFDR to represent the THD. In general, the RMS value of the noise is far below the THD. As a result, the SFDR is used to represent SINAD to calculate the FOM, which can 42 be de ned as FOM = Max. Clock(GHz) 2 (SFDRdB 1:76)=6:02 Power(W) Max. Clock(GHz) 2 SFDRdB=6 Power(W) : (3.14) SFDRdB=6 represents the ENOB obtained from the SFDR measurement [26]. Although the SFDR is de ned in the Nyquist band, the narrow band SFDR is often more important since wideband spurs can be removed relatively easily. It is only a speci c narrow band near the output, which is usually less than 1% of the update frequency, which is of the interest of many applications. Table 3.1 is a performance comparison of ultrahigh speed DDS RFICs with more than 8 GHz maximum clock frequency. Compared to the InP DDS RFICs, this SiGe DDS signif- icantly improves the resolution, and it is the most complicated ultrahigh speed DDS design containing approximately twenty thousand transistors. Most of the InP DDS RFICs were measured using probe stations [18, 19, 20], while this DDS RFIC was packaged. As men- tioned earlier, the package has a thermal resistance of approximately 30 C/W, and at room ambient temperature, the junction temperature of the SiGe devices can theoretically reach as high as 180 C. At such high temperature, the device performance is greatly degraded and the DAC current switches are no longer synchronized due to increased internal delays. When the device is e ectively cooled, the DDS operates at a maximum clock frequency of 8.6 GHz. At room temperature, the packaged DDS operates at the maximum clock frequency of 7.2 GHz. When compared with the 9-bit 12.3 GHz DDS [21], this design achieves two more bits for both phase and amplitude. As a result, this DDS achieves a 10 dB larger SFDR. 3.4 Conclusion This chapter presented an 11-bit 8.6 GHz SiGe DDS RFIC design with a 10-bit seg- mented sine-weighted DAC, implemented in 0.13 m SiGe BiCMOS technology withfT=fMAX 43 of 200/250 GHz. With Nyquist output, the DDS achieves a maximum clock frequency of 8.6 GHz. The Power consumption of the DDS is approximately 4.8 W and the power e ciency FOM is 81.1 GHz 2SFDR=6/W. This DDS RFIC is the rst ultrahigh speed DDS with 11-bit phase and 10-bit DAC amplitude resolutions that achieves a record high SFDR of 33 dBc with leading power e ciency. 44 Chapter 4 A 9-bit 2.9 GHz DDS RFIC with Direct Digital Modulations 4.1 Introduction So far, no DDS with over-GHz output that have been developed provide desired mod- ulation capabilities to be used in next generation radar and communication systems. [18, 19, 20, 27, 22, 21, 8]. To achieve an over-GHz output frequency, all existing DDS RFICs use pipeline accumulators that work only with a constant input FCW, and thus no FM can be performed [18, 19, 20, 27, 22, 21, 8]. To implement direct FM or PM, CLA or RCA must be used with the attendant penalty of reduced speed. Ref. [28] reported a 9-bit DDS with RCA accumulator. It has the capability of FM, but only at low frequency because the FCW cannot change too fast with the bipolar plus NMOS adder architecture, and no PM can be performed. The 9B DDS using CLA accumulator and adder to implement the direct digital modulation capabilities is presented in this chapter. And in next chapter, the 24B DDS using RCA accumulator and adder to implement the direct digital modulation capabilities is presented. This two DDS RFICs represent the rst reported GHz range output DDSs with direct digital frequency and phase modulation capabilities. 4.2 Circuit Implementation The 9B DDS adopts a ROM-less architecture which combines both the sine/cosine map- ping and digital-to-analog conversion together in a sine-weighted digital-to-analog converter (DAC). The block diagram of the 9-bit 2.9 GHz ROM-less DDS is shown in Fig. 4.1. The major parts of the ROM-less DDS are a 9-bit CLA phase accumulator, a 9-bit CLA full adder and a 7-bit sine-weighted DAC. The 9-bit phase accumulator output modulates with 45 the 9-bit PCW and truncated to 8 bits. After phase modulation and truncation, the highest 8 bit output is fed into the sine-weighted DAC. The two MSBs of the residue are used to determine the quadrant of the sine wave. The MSB output of the phase accumulator is used to provide the proper mirroring of the sine waveform about the phase point. The 2nd MSB is used to invert the remaining 6 bits for the 2nd and 4th quadrants of the sine wave by a 1?s complementor, and the outputs of the complementor are applied to the sine-weighted DAC to form a quarter of the sine waveform. Because of the phase point mirroring, the total amplitude resolution of the sine-weighted DAC is 7 bit. FCW 9 f clk f out 7 PCW 9 9 DFFs DFFs C L A C L A Sine-weighted DAC Figure 4.1: Block diagram of 9-bit ROM-less DDS. 4.2.1 9-bit Carry Look Ahead Adder/Accumulator To perform a direct digital modulation, the adder must have no latency. Pipelined accumulator is not suitable because of its big latencies and can only handle a xed FCW. In this design, CLA adder is used to implement the direct digital modulations due to its small delays beyond other zero latency architectures. A 9-bit CLA adder is used to implement the 9-bit accumulator. Fig. 4.2 shows the architecture of the 9-bit CLA adder. The output and carry out for each bit are calculated as 46 Level II CLA A6 B6 c5A7 B7 c6A8 B8 c7 A3 B3 c2A4 B4 c3A5 B5 c4 A0 B0 Cin=0A1 B1 c0A2 B2 c1 P0 G0 C1 C0 Cin FA FA FA Level I CLA P1 G1 FA FA FA Level I CLA FA FA FA Level I CLA p8g8 s8 p7g7 p6g6 p5g5 p4g4 p3g3 p2g2 p1g1 p0g0 s7 s6 s4 s3 s1 s0s2s5 Figure 4.2: Block diagram of 9-bit CLA accumulator (full adder). 8 >>< >>: Carry out: ci = gi +pi ci 1 Sum: si = piLci 1 (4.1) where ci is the carry out and ci 1 is the carry in or the carry out from the previous bit. gi and pi are the carry generate and carry propagate in level I CLA. The rst level carry out can be obtained by 8 >>< >>: c0 = g0 +p0 Cin c1 = g1 +p1 g0 +p1 p0 Cin (4.2) where 8 >>< >>: Carry generate: gi = Ai Bi Carry propagate: pi = AiLBi (4.3) and the second level carry out can be obtained by 8 >>< >>: C0 = G0 +P0 Cin C1 = G1 +P1 G0 +P1 P0 Cin (4.4) 47 where second level propagates are obtained by 8 >>< >>: P0 = p2 p1 p0 P1 = p5 p4 p3 (4.5) and second generates are obtained by 8 >>< >>: G0 = g2 +p2 g1 +p2 p1 g0 G1 = g5 +p5 g4 +p5 p4 g3 (4.6) In the above equations, all the logics must be implemented within less than three inputs. This is selected to compromise with the power supply voltage and CML logics. Under a 3.3 V power supply and a SiGe HBT base-collector voltage of 0.85 V 0.9 V, all the digital logic is implemented using 3 level CML with di erential output swings of 400 mV. Level shifters may be needed to shift between di erent voltage level inputs. The level shifter usually runs much faster than other CML gates. It can be ignored when counting the gate delays. Suppose XOR gate?s delay is two times of the AND gate. The total delays can be calculated from the equations and diagram. (A) two gates delay to calculate level I carry generate gi and propagate pi in Eq. (4.3); (B) two gate delays to calculate level II carry generate Gi and propagate Pi in Eqs. (4.5) and (4.6); (C) two gate delays to calculate level II carry in Eq. (4.4); (D) two gate delays to calculate level I carry in Eq. (4.2); and (E) two gates delay to calculate sum and carry out from Eq. (4.1). Therefore the 9-bit CLA adder needs only 10 AND gates delay, which has a much less delay than the ripple carry adder?s (2N-1) = 17 gate delays especially for high resolution adders (It is true without considering the wire delay. The e ect of the wire delay will be discussed in Chapter 5), while it is much slower than the pipelined counterpart. 48 Table 4.1: Current Source Matrix in Sine-weighted DAC 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 2 3 2 3 2 3 2 2 3 2 2 2 2 2 2 2 2 1 2 2 1 2 1 1 2 1 1 1 1 1 1 1 0 1 0 1 0 1 0 0 0 4.2.2 7-bit Sine-weighted DAC The structure of the sine-weighted DAC is shown in Fig. 4.3. Since the quadrant of the sine waveform was determined by the two MSBs, the left 6 bits are used to control the switch matrix and generate the amplitude for a quarter phase (0 /2) sine wave. The current source matrix is calculated by the below equations and shown in Table 4.1. 3 - 7 R o w D e c o d e r 3-7 Column Decoder 1 ' s C o m p l e m e n t o r 3 3 8 MSB 2nd MSB 6 6 IN OUT VCC K=0 K=8 K=56 K=63 K=15 K=7 Figure 4.3: Block diagram of 7-bit sine-weighted DAC. 49 Ik = 8 >>>< >>> : b(29 1) sin 2 (0:5) 26 c;for k = 0 b(29 1) sin 2 (k+0:5) 26 k 1X n=0 In ! c; 0 k 26 1 (4.7) The sine-weighted DAC current source matrix provides totally 128 unit current sources. The unit current of each current source is set as 105 A. The largest current in the current source is 315 A, which is composed of 3 unit current sources. The current switch contains two di erential pairs with cascode current sources for better isolation and current mirror accuracy. The current outputs are converted to di erential voltages by a pair of o -chip 15 pull-up resistors. Fig. 4.4 shows that the currents from the cascode current sources are fed to outputs OUTp and OUTm by pairs of switches (Msw). The MSB controls the selection between di erent half periods. The current switch contains two di erential pairs with minimum size transistors and a cascode transistor to isolate the current sources from the switches, which improves the bandwidth of the switching circuits. For the layout, vertical devices SiGe HBTs are used with a degeneration resistor to improve the current source matching. All the current source transistors are randomly distributed in the current source matrix. Two dummy rows and columns have been added around the current source array to avoid edge e ects. In order to minimize the phase di erence of the clock to the ip- ops, an H-tree clock scheme is used to make the clock signal reach each block simultaneously in both the adder/accumulator and DAC. 4.3 Experimental Results Figs. 4.5-4.7 illustrate the measured DDS output spectra and waveforms for di erent output and clock frequencies without modulations. Fig. 4.5 presents a 509 MHz DDS output with a 2.5 GHz clock input, with the FCW equals to 104. The measured output power is approximately 0 dBm and the measured narrow band SFDR is approximately 48 dBc. Fig. 4.6 gives the measured DDS output spectrum with 1.444 GHz Nyquist output under 50 Vca s Qp Qm Vcs Sp Sm Dp Dm Cp Cm MSBp MSBm Dp Dm C L Kp C L Km Qp Qm Sp Sm Dp Dm Cp Cm MSBp MSBm Dp Dm C L Kp C L Km D AC C u rre n t C e l l Msw Msw O U T p O U T m Pu l l U p R e s i s t o rs Figure 4.4: Diagram of DAC switch and current source matrix cell. Figure 4.5: Measured DDS output spectrum with 509 MHz output under 2.5 GHz clock (FCW=104), showing about 48dBc narrow band SFDR. 51 Figure 4.6: Measured DDS output spectrum with 1.444 GHz output and 2.9 GHz clock (FCW = 255), showing about 35dBc narrow band SFDR. The image tone is located at 1.455 GHz. - 1 0 0 - 8 0 - 6 0 - 4 0 - 2 0 0 20 40 60 80 100 - 0 . 1 - 0 . 0 5 0 0 . 0 5 0 . 1 0 . 1 5 T I M E ( n s ) D D S O U T P U T A M P L IT U D E ( V ) Figure 4.7: Measured DDS output waveform with 1.444 GHz output and 2.9 GHz clock (FCW=255). The envelope frequency is 12 MHz 2.9 GHz clock. Since FCW = 28-1 = 255, the output frequency is FCW 2N fclk = 255 29 2:9 GHz = 1:444 GHz 52 The rst order image tone mixed by the clock frequency and the DDS output frequency occurs at 2:9 GHz 1:444 GHz = 1:456 GHz Fig. 4.7 shows the time domain waveform of Fig. 4.6. The envelope frequency of the waveform is 29+1 29 fclk 29 1 29 fclk 12 MHz Fig. 4.8 shows the measured DDS output with FCW = 2 frequency modulated by a step of FCW = 1. The frequency before the step is 9.375 MHz with FCW = 2 and after the step is 14.0625 MHz with FCW = 3. Fig. 4.9 shows the measured DDS output with FCW = 2 phase modulated by a step of PCW=256 with respect to 180 phase shift. The output frequency is 10 MHz with a 2.5GHz clock. Figure 4.8: Measured DDS output with FCW = 2 frequency modulated by a frequency step of FCW = 1. The frequency before the step is 9.375 MHz with FCW = 2, after the step is 14.062 MHz with FCW=3. All measurements were done in CLCC-44 packaged parts without deglitch lter or cali- brating the losses of the cables and PCB tracks. 53 Figure 4.9: Measured DDS output with FCW = 2 phase modulated by a phase step of PCW = 256 with respect to 180 phase shift. The output frequency is 10 MHz with a 2.5 GHz clock. Table 4.2 compares mm-wave DDS RFIC performances. Although this DDS have a relatively low frequency than others, it is the rst DDS with direct digital frequency and phase modulation capabilities and has more than GHz output frequency. Some commercial parts have the FM and PM capabilities, but all the parts work no more than 1 GHz and can only output less than 500 MHz frequency. The die photo of the SiGe DDS RFIC is shown in Fig. 4.10. This DDS design is quite compact with an active area of 1.7 2.0 mm2 and a total die area of 2.5 3.0 mm2. 4.4 Conclusion Implemented in a 0.13 m SiGe BiCMOS technology with fT=fmax of 200/250 GHz, this chapter presented a 9-bit 2.9 GHz SiGe DDS RFIC design with direct digital 9-bit frequency and 9-bit phase modulations. With Nyquist output, the DDS achieves a maximum clock frequency of 2.9 GHz, and a narrow band SFDR of 35 dBc. It has low power consumption as well. The power consumption is approximately 2.0 W under a single 3.3 V power supply 54 Table 4.2: Selected Ultrahigh Speed DDS RFIC Performance Comparison [18] [19] [20] [22] [8] [9B DDS] Technology InP InP InP SiGe SiGe SiGe fT=fmax [GHz] 137/267 300/300 300/300 100/120 200/250 200/250 Phase [bit] 8 8 8 9 11 9 Amplitude [bit] 7 7 5 8 10 7 FM [bit] None None None None None 9 PM [bit] None None None None None 9 Max clock [GHz] 9.2 13 32 9.6 8.6 2.9 SFDR [dBc] 30 26.67 21.56 30 40 35 Power [W] 15 5.42 9.45 1.9 4.8 2.0 Area [mm2] 8.0 5.0 2.7 1.45 2.7 1.45 3.0 3.0 4.0 3.5 2.5 3.0 Figure 4.10: Die photo of the 9-bit DDS with direct digital modulations. even with added modulation blocks. This DDS RFIC is the rst reported GHz range output DDS with direct digital frequency and phase modulation capabilities. 55 Chapter 5 A 24-bit 5.0 GHz DDS RFIC with Direct Digital Modulations 5.1 Introduction This chapter presents a 24-bit 5.0 GHz DDS with over-GHz output frequency and direct digital modulation capabilities. This work represents one of the rst DDS RFIC with over- GHz range output as well as direct digital FM and PM capabilities. The 24B DDS is implemented with direct digital FM and PM capabilities using RCA adders. The block diagram of the 24-bit 5.0 GHz ROM-less DDS with RCA accumulator and modulator is shown in Fig. 5.1 [2, 29]. The major parts of the ROM-less DDS are a 24-bit RCA phase accumulator, a 12-bit RCA modulator, and a 10-bit sine-weighted DAC. The 24-bit RCA phase accumulator output is truncated to 12 bits and modulated with a 12-bit PCW. After PM, the output is truncated again, and the highest 11 bits are fed into the sine-weighted DAC. The sine-weighted DAC maps the 11-bit linear phase word to the digital amplitude and generates the analog waveform. The ultrahigh speed RCA accumulator/adder and sine- weighted DAC will be described in the following two sections, respectively. 24 Reg R C A f out 12 11 Reg R C A 10-bit Sine-weighted DAC PCW FCW f CLK 24 Figure 5.1: Block diagram of the 24-bit 5.0 GHz DDS RFIC. 56 5.2 Ultrahigh Speed Adder Design 5.2.1 Wire Delay in the 0.13 m SiGe BiCMOS Technology With the introduction of deep submicron semiconductor technology, the parasitic ef- fects introduced by the wire delay begin to dominate the performance of high speed digital integrated circuits. The typical bu er delay in the 0.13 m SiGe BiCMOS technology is less than 4 ps while the wire delay of a 2 m wide and 100 m long wire can be as high as 10 ps. From [30], the transmission line e ects should be considered when the rise or fall time of the input signal is smaller than the time of ight of the transmission line. The following equation is used to determine when transmission line e ects should be considered. trf 2:5tflight = 2:5Lv (5.1) In Eq. (5.1), trf is the rise and the fall time of the signal transmitted through the wire. tflight is the ight time, which is the time it takes for the wave to propagate from one end of the wire to the other, and is 15 cm/ns in silicon oxide (SiO2). So the minimum length that must be considered as a transmission line for a signal is Lmin = 0:4 trf v: (5.2) For a 5.0 GHz signal the rise and fall time should not be longer than 67 ps. If the wire length is less than 4 mm, a lumped RC model can be used to evaluate the propagation delay through the wire. Fig. 5.2 shows the equivalent circuit of a wire with length L. From the Elmore delay rule, the dominant time constant is D = RscL+ 0:5rcL2; (5.3) 57 where Rs is the internal resistance of the driver, and r and c are the unit length parasitic resistance and capacitance of the wire. The delay introduced by the wire resistance becomes dominant when the second term is bigger than rst, i.e. when L 2Rs=r. In the 0.13 m SiGe BiCMOS technology, the rst term in Eq. (5.3) will dominate the propagation delay, as long as L <2 mm, and as a result the propagation delay of the wire is approximately proportional to the length. r? L c ? L V o u t r? L r? L c ? Lc ? L V in R s Figure 5.2: Lumped RC model for a wire with length of L. To evaluate wire delay e ects in high speed digital logic design, several simulations have been performed in a 0.13 m SiGe process for a current-mode-logic (CML) cell implemented using a di erential pair without an emitter follower as the output bu er and its connection wires. Fig. 5.3 shows the test bench used to simulate the wire delay e ects. Fig. 5.3(A) is the schematic view. It is used to nd the intrinsic propagation delay of the CML bu er that is 2 m wide and 100 m long. Di erential wires with the third metal layer are inserted between the two bu ers in Fig. 5.3(B). The space between the two di erential wires is typically maintained at 2 m in the layout. Fig. 5.3(C) uses the same test bench as that employed in Fig. 5.3(B), with the exception of an additional piece of metal under the di erential wires. Fig. 5.3(D) uses the same test bench as that in Fig. 5.3(B), except in this case two pieces of metal are used to sandwiched the di erential wires. Clearly, cases Fig. 5.3(C) and (D) result in a larger parasitic capacitance than Fig. 5.3(B). However, it is not always possible to place the wire without any overlap with the metals that are under and above the wires, especially for modern processes with more than 5 layer metal connections. The simulated results are plotted in Fig. 5.4. In Fig. 5.4, plot (A) represents the propagation delay of 58 Fig. 5.3(A), and illustrates the propagation delay of only the input and output bu ers. It does not include the wire delay so it is constant along the wire length. Plots (B), (C) and (D) show the propagation delay of test bench (B), (C) and (D) but does not include the bu er delays in the test benches. These three plots re ect the third metal wire propagation delay in a 0.13 m SiGe process. It is proportional to the wire length as long as the length is less than 2 mm. Comparing (B), (C) and (D), the wire delay with shielded metal is two (for (C)) or three (for (D)) times larger than an unshielded metal wire. This conclusion, as well as the linear relationship between the wire delay and the wire length, indicates that the wire delay is dominated by the time constant of the product of parasitic capacitance and the input bu er output impedance, as described by the rst term of Eq. (5.3). Note that test benches (C) and (D) are more practical cases than (B), because in a real layout environment all the wires overlap each other and produce several times more parasitic capacitance than the coupling capacitance with the substrate. From the wire delay plot of Fig. 5.4, the wire delay coe cient (delay for 1 m wire) for case (C) is about 0.10 ps/ m. This number will be used to estimate the adder delay in the following sections. 5.2.2 Propogation Delay Comparison Between the CLA and RCA Accumula- tor/Adder A pipeline accumulator can only handle a xed input FCW. Direct modulations such as LFM require a varying input FCW. Thus, either RCA or CLA adders must be used to implement direct modulations. Chapter 4 has calculated a 9-bit CLA adder delay for the critical path with 3-input CML implementations. However, the calculation did not count the level shifters that are used to shift between di erent input voltage levels, as well as the wire delays. In general, level shifters usually run much faster than CML gates, and thus can be ignored when counting the gate delays. Furthermore, the XOR gate delay is treated the same as the AND gate delay since CML gate delays are essentially identical for di erent gates. For example, both the carry and sum logic in a full adder can be implemented by 59 2 0 0 m V + - + - 2 0 0 m V + - + - 2 0 0 m V + - + - 2 0 0 m V + - + - v 4 (A ) (B ) (C ) (D ) v 1 v 2 v 3 M e ta l 3 M e ta l 2 M e ta l 4 Figure 5.3: Test bench to simulate the wire propagation delay. only one current tail CML gate. Given this information, the 9-bit CLA adder has a total of 8 CML gate delays. One CML gate delay is about 9 ps in the 0.13 m SiGe BiCMOS technology. In the 9-bit full adder, the total logic delay is approximately 72 ps without the wire delay. From Fig. 4.2 in Chapter 4 and the actual layout, the wire delay in the critical 60 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 50 100 150 200 250 300 W IRE LENGTH ( ? m ) P R O P O G A T I O N D E L A Y ( p s ) 0 100 200 300 400 500 600 0 10 20 30 40 50 W IRE LENGTH ( ? m ) P R O P O G A T I O N D E L A Y ( p s ) (A) (B) (C) (D) (A) (B) (C) (D) Figure 5.4: Simulated wire propagation delay versus length. path of the 9-bit CLA adder is calculated as follows: (A) A 200 m wire is added to calculate the delay of level I generate gi and propagate pi; (B) A 200 m wire is added to calculate the delay of level II generate Gi and propagate Pi; (C) A 300 m wire is added to calculate the delay of the level II carry; (D) A 200 m wire is added to calculate the delay of level I carry; (E) A 200 m wire delay is added to calculate the delay of the sum and carry-out. Therefore, the total wire length to calculate the delay of the critical path is 1100 m. If the bit number of the adder is higher than 9-bit and less than 27-bit, level III CLA block is needed to calculate the carry-out. To generate the third level CLA logic and the carry-in signal for the second level CLA block, the total gate delay increases to 12 CML gate and the total wire length increases to 1800 m. The worst-case delay from a 10-bit CLA adder to 27-bit CLA adder remains the same since these adders have an identical critical path. 61 The calculation of propagation delay for the RCA is much easier than the CLA adder. Fig. 5.5 shows the architecture of an N-bit RCA. It represents the layout oor plan and the component placement as well as the schematic wire connection. The output and carry-out for each bit are calculated as 8 >>< >>: Carry out: ci = AiBi +Bici 1 +ci 1Ai Sum: si = AiLBiLci 1; (5.4) where Ai and Bi are the input of the N-bit full adder, i = 0, 1, , (N-1). ci is the carry- out of the ith-bit full adder. c 1 = Cin = 0 is the initial carry-in of the N-bit full adder. Cout = cN 1 is the last bit carry-out of the N-bit full adder. Therefore, the worst-case propagation delay of the N-bit full adder is the delay of N-1 carry logic gates and one sum logic gate. There is almost no wire delay since the carry logic can be placed as close as possible to minimize the amount of wire in the connection. The level shifter delay can be eliminated as well because the input voltage level can be intentionally removed from the critical path. A N B N c N-1 FA s N A 1 B 1 c 0 FA s 1 A 0 B 0 C in =0 FA c 0 s 0 C out c 1 Figure 5.5: Diagram of N-bit RCA. Fig. 5.6 shows the comparison of the estimated propagation delay of the CLA adder and RCA. Not counting the wire delay, the speed of the CLA adder is close to that of the RCA for small numbers of bits. At high numbers of bits, the CLA adder runs much faster than 62 the RCA. With the added wire delay, the RCA delay does not change too much because the RCA is very compact and can be layed out very closely, thus having almost no wire delays. However the layout of the CLA is very complex and introduces signi cant wire delay. So the CLA adder runs much slower than the RCA especially for 10-bit to 25-bit adders. In addition to the internal wire delays of the CLA adder, the CLA adder layout area is several times larger than the RCA adder, which results in more wire delays for global wiring. 5 10 15 20 25 0 50 100 150 200 250 300 ADDER NUMBER OF BIT E S T I M A T E D P R O P A G A T I O N D E L A Y ( p s ) ESTIMATED PROPAGATION DELAY OF CLA AND RCA ADDERS CLA W/ WIRE DELAY CLA W/O WIRE DELAY RCA W/ WIRE DELAY RCA W/O WIRE DELAY Figure 5.6: Estimated adder propagation delays with number of bits. In conclusion, at the low or medium speed with an older and slower fabrication technol- ogy, the CLA speeds up the adder operation by using additional logic for carry calculations. However, for high speed implementation with fast technology (e.g., <0.13 m), adder delay is mainly dominated by the wire delays. When compared to a CLA adder, the RCA adder 63 has a simple ripple architecture, which can be layed out in cascaded format one bit after another, leading to very compact layout with short wire interconnections between stages. 5.2.3 Circuit Implementation of the 24-bit 5.0 GHz RCA In this DDS design, a 24-bit RCA is used to implement the 24-bit accumulator. The 24-bit RCA is composed of 24 1-bit full adders carefully designed in a compact manner. The output of the carry-out logic remains at the top CML level, and no level shifter is needed to convert the signal level for the critical path. Therefore the longest delay from input to output of the 24-bit RCA is 23 carry-out CML delays and 1 sum CML gate delay. The wire delay in the RCA can be minimized since the carry-in can be directly connected to the carry-out of the previous bit, leading to a compact layout in a cascaded format. Another 12-bit RCA was implemented for the 12-bit phase modulator. In addition, the 24-bit CLA adder runs slower than the RCA adder as shown in Fig. 5.6. When the 0.13 m SiGe BiCMOS technology is used, long wires contribute much more delay than the logic gates. So a 24-bit CLA adder cannot run as fast as the RCA adder, not only because of the amount of cascade CLA logic with the attendant limited CML fan-in numbers but also because of the much longer wire delays needed by CLA logic. The wire delay in the RCA adder can be minimized since the carry-in can be directly connected to the carry-out of the previous bit, leading to a compact layout in a cascaded format. 5.3 10-Bit Segmented Sine-weighted DAC 5.3.1 Architecture of the 10-bit Sine-weighted DAC The structure of the 10-bit sine-weighted DAC is shown in Fig. 5.7. The total phase word input for the sine-weighted DAC is 11 bits. The two MSB are used to determine the quadrants of the sine wave. The MSB output of the phase word is used to provide the proper mirroring of the sine waveform about the phase point. The 2nd MSB is used to invert the remaining 9 bits for the 2nd and 4th quadrants of the sine wave by a 1?s complementor, and 64 the outputs of the complementor are applied to a 9-bit sine-weighted DAC core to form a quarter of the sine waveform. Because of this phase point mirroring, the total amplitude resolution of the sine-weighted DAC is 10 bits. 1 ' s C o m p l e m e n t o r F i n e D A C s S e l e c t D e c o d e r Latches and Current Switch Matrix OUTp 2nd MSB MSB <1:11> IN <1:9> <7:9> <1:3> 8 7 64 64 VCC OUTm 3 - 8 B i n a r y D e c o d e r 3 - 7 T h e r m o m e t e r D e c o d e r 6 4 - b i t T h e r m o m e t e r D e c o d e r Latches and Current Switch Matrix 7 7 64 64 <4:6> <7:9> 1 12 11 11 9 7 5 2 12 13 12 10 9 7 5 3 13 12 11 10 8 7 4 2 12 12 11 10 8 6 4 1 13 12 12 10 9 6 4 2 12 12 11 9 7 6 4 1 13 12 10 10 8 5 3 0 12 12 11 9 7 5 3 1 3 - 7 R o w D e c o d e r 3 - 7 C o l u m D e c o d e r 2 2 2 2 1 0 0 0 1 1 1 1 1 1 1 0 2 2 1 1 1 1 0 0 1 1 2 1 1 1 1 0 2 2 1 2 1 0 0 1 1 1 2 1 1 1 1 0 2 2 1 1 1 1 0 0 0 0 0 0 0 0 0 0 Fine (above) & Coarse (below) DACs Current Source Matrix F i n e D A C s C o a r s e D A C Figure 5.7: Block diagram of the 10-bit segmented sine-weighted DAC. To reduce the complexity of the sine-weighted DAC, segmentation has been employed [8,9]. The 9-bit sine-weighted DAC core is divided into a 6-bit thermometer-decoded coarse sine-weighted DAC and eight 3-bit thermometer-decoded ne sine-weighted DACs. The rst 6 bits of the complementor?s output control the coarse DAC and the highest 3 bits also address the selection of the ne DACs. The lowest 3 bits of the complementor?s output determines the output value of each ne DAC. The bit division between the 6-bit coarse DAC and the 3-bit ne DACs is based on the trade-o of static and dynamic accuracies, chip area and power consumption. As shown in Fig. 5.7, the bottom current source array implements the coarse DAC. The coarse DAC current source array provides 512 unit current sources. The top current source array implements the ne DACs. Each column of the ne DAC current source array forms the current sources of one ne DAC. Every ne DAC has 65 about 8 unit current sources used to interpolate the coarse DAC. The unit current of both the coarse and ne DACs is set at 26 A. The largest current in the current source matrix is 338 A, which is composed of 13 unit current sources. In the layout, with consideration for current source matching, each current source is split into four identical current sources which carry a quarter of the whole current. To further improve their matching, all current source transistors, including those used in both the coarse and ne DACs, are randomly distributed in the whole current source matrix. The current switch contains two di erential pairs and improves the bandwidth of the switching operation with minimum logic transistors and cascode current sources that provide better isolation and current mirror accuracy. The current outputs are converted to di erential voltages by a pair of o -chip 15 pull-up resistors. Fig. 5.8 shows that the currents from the cascode current sources are fed to outputs OUTp and OUTm by pairs of switches (Msw). The MSB controls the selection between di erent half periods. MSB p MSB m OUT p OUT m M sw Q p Q m D p D m S p S m C L K p C L K m Pull up resistor V cas V cs D p D m C p C m MSB p MSB m M sw Q p Q m D p D m S p S m C L K m C L K p D p D m C m C p DAC current cell Figure 5.8: Diagram of the DAC switch and current source matrix cell. 66 5.3.2 Bandwidth Limitation of the DAC Switch Output Impedance The dynamic performance of the current-steering DAC is closely related to the current switch output impedance as well as the frequency response of the output impedance. With a full thermometer decoded DAC, the SFDR can be written as the function of the output impedance Zout and the DAC?s number of bit resolution N [23]. SFDR = 20 log Z out RL 6:02(N 2); (5.5) where RL is the load resistance of the current switch. The core switch cell of the sine- weighted DAC is shown in Fig. 5.9(A). C0 and C1 denote the parasitic capacitance at the drain of the current source transistor and the collector of the cascade transistor, including the device parasitic capacitance and the wire capacitance. The small signal equivalent circuit is drawn in Fig. 5.9(B). If r is neglected, the total output impedance looking through the switch output is given by Zout = gmswrosw 1 sC1== gmcasrocas 1 sC0==rocs : (5.6) In Eq. (5.6), gmsw and gmcas denote the transconductance of the switch and the cascode transistor, respectively. rosw, rocas and rocs denote the output resistance of the the switch, the cascode and the current source transistor, respectively. At low frequency, the output impedance can be simpli ed as Zout = gmswrosw (gmcasrocas) rocs: (5.7) gmcasrocas is the maximum gain of the cascode transistor. The bipolar transistor is chosen since it has higher maximum gain than the MOS counterpart, while an MOS transistor is chosen for the current source because it has higher output resistance than the bipolar transistor as well as a lower overdrive voltage. 67 C 1 M sw M cs M cas R L OUT R L VCC C 0 Z out g mcs v gscs r ocs g mcas v becas r ocas g msw v besw r osw v gscs v becas v besw C 0 C 1 Z out (A) (B) r ? r ? Figure 5.9: DAC switch core circuit and its small signal equivalent circuit. From Eq. (5.6), the dominant pole of the output impedance is !p = 1r ocs(C0 +rocasgmcasC1) : (5.8) To increase the bandwidth of the output impedance, the parasitic capacitances C0 and C1 must keep as small as possible. Because the maximum gain of the cascode transistor gmcasrocas is much larger than 1, the parasitic capacitance C1 a ects the bandwidth of the output impedance much more signi cant than C0. So the long wire connection between the current source and current switch is used in the drain of the current source transistor. It results in a much larger capacitance C0 than C1. 5.4 Experimental Results Figs. 5.10 and 5.11 illustrate the measured DDS output spectra and waveforms for di erent outputs and clock frequencies. All measurements were done in single output and using CLCC-68 packaged parts without calibrating the losses of the cables and PCB tracks. 68 Fig. 5.10 shows a 469.360351 MHz DDS output with a 5.0 GHz clock input with a FCW = 0x180800 in hex format. Because of the MSB mirroring shown in Fig. 5.8, the single output peak to peak voltage is 400 mV. So the output power in theory can be calculated as 10 log (400 mV=(2 p2))2 15 1 mW 1:25 dBm: (5.9) Counting the loss due to the parasitic capacitances, PCB tracks and RF cables, the measured output power is approximately -2.12 dBm. The measured SFDR is approximately 38 dBc in Nyquist bandwidth. Since the FCW = 0x180800, the output frequency is given by FCW 2N fclk = 0x180800 224 5:0 GHz = 469:360351 MHz: (5.10) In the stretch processing radar, which is essentially narrow band system, the narrow band SFDR of the DDS is often more important since wideband spurs can be removed relatively easily. It is only a speci c narrow band near the output, which is usually less than 1% of the update frequency, which is of the interest of many applications [9]. Fig. 5.11 provides an example of a 1.246258914 GHz output frequency (FCW = 0x3FCFE7) with a 5.0 GHz clock frequency in 50 MHz band nearby. The measured narrow band SFDR is about 82 dBc. Fig. 5.12 demonstrates the operation of the DDS with an LFM output. With a 300 MHz clock input, a 24-bit 300 MHz ramp from 0x000001 to 0x00AD9C is fed into the FCW input. The output sweeps from 18 Hz to 397.367947 kHz. In this DDS, CMOS logic was used to provide the modulation data inputs. The speed is limited by the speed of the data source that was provided by an Agilent pattern generator through the PCB board. A maximum of 2.5 GHz LFM can be reached if the modulation ramp is generated inside the DDS chip. Fig. 5.13 shows the measured DDS output with FCW = 7, phase modulated by a step of PCW = 0x800 resulting in an 180 phase shift. The output frequency is 1.251 kHz with a 3.0 GHz clock. Both the LFM and the PM waveforms can be used in radar transceivers as 69 SFDR=38 dBc Figure 5.10: Measured DDS output with a 469.360351 MHz output and the maximum 5.0 GHz clock (FCW = 0x180800), showing a 38 dBc Nyquist band SFDR. the source of transmitted chirp signal and the reference chirp signal for stretch processing, as described in Section II. Based on the discussion in Section II, chirp modulated waveform improves the radar range resolution, while the stretch processing using LFM reduces the bandwidth requirement for the ADC in receiver path. Fig. 5.14 provides a plot of the measured SFDR versus the output frequency for the 24-bit 5.0 GHz DDS within a 50 MHz bandwidth, and demonstrates about 45 dBc narrow band worst-case SFDR. In addition, the DDS has several sweet spots, in which its output spectrum purity and its dynamic performance are much better than what can be obtained in other frequency bands. Fig. 5.11 gave an example of an 82 dBc SFDR in 50 MHz narrow band. The die photo of the DDS is shown in Fig. 5.15. This DDS design is quite compact with an active area of 3.0 2.5 mm2 and a total die area of 3.7 3.0 mm2. Table 5.1 compares all the reported ultrahigh speed DDS RFIC performances with over-GHz output frequency, 70 SFDR = 82 dBc Figure 5.11: Measured DDS output with a 1.246258914 GHz output and the maximum 5.0 GHz clock (FCW = 0x3FCFE7), showing an 82 dBc narrow band SFDR. Figure 5.12: Measured DDS LFM output with a FCW sweeps from 1 to 0x005AD9C and using a 300 MHz clock. 71 Figure 5.13: Measured DDS output with FCW = 7 phase modulated by a phase step of PCW = 0x800 causing an 180 phase shift. The output frequency is 1.251 kHz with a 3.0 GHz clock. 0 0.5 1 1.5 2 2.5 0 20 40 60 OUTPUT FREQUENCY (GHz) M E A S U R E D S F D R ( d B c ) MEASURED SFDR VS. OUTPUT FREQUENCY (F CLK = 5.0 GHz) Figure 5.14: Measured DDS narrow band SFDR versus output frequency within a 50 MHz bandwidth. including the DDSs presented in the previous chapters. Although the DDS reported here has relatively low frequency when compared with others, it is the rst over-GHz output frequency implementation with direct digital FM and PM capabilities. Some commercial 72 parts have the direct digital FM and PM capabilities, but all the parts only work up to 1.0 GHz with an output of less than 500 MHz [31]. 24-bit Ripple Accumulator 12-bit Ripple FA 10-bit Segmented Sine-weighted DAC PRBS Figure 5.15: Die photo of the 24-bit DDS RFIC. 5.5 Conclusion This chapter presented a 24-bit 5.0 GHz DDS RFIC with direct digital modulation capabilities, developed in a 0.13 m SiGe BiCMOS technology for pulse compression radar applications. A 24-bit RCA accumulator and a 12-bit RCA are implemented for the use of modulator designs with over-GHz frequency output. For high-speed DDS implementation, adder delay is mainly dominated by the wire delays. A comparison between the RCA and CLA adder has been performed in this chapter. Compared to a CLA adder, the RCA has a simple ripple architecture, which can be layed out in a cascaded format one bit after another, resulting in a very compact layout with short wire interconnections between stages. Thus, the RCA actually ends up with higher operation frequency than the CLA adder. 73 This 24-bit DDS has more than 20,000 transistors and achieves a maximum clock fre- quency of 5.0 GHz. The measured worst-case SFDR is 45 dBc under a 5.0 GHz clock frequency and within a 50 MHz bandwidth. The best Nyquist band SFDR is 38 dBc with a 469.360351 MHz output using a 5.0 GHz clock frequency. This DDS represent the rst implemented RFICs with direct digital modulations at over-GHz output frequency. 74 Table 5.1: Ultrahigh Sp eed DDS RFIC Performance Comparison Tec hnology fT =f M AX Phase Amplitude Max Clo ck FM PM SFDR Po wer Area FOM [GHz] [bit] [bit] [GHz] [bit] [bit] [bit] [W] [mm 2] [GHz 2SFDR =6 /W] [18] InP 137/267 8 7 9.2 None None <30 15 8 5 <16.0 [19] InP 300/300 8 7 13 None None 26.67 5.42 2.7 1.45 42.6 [20] InP 300/300 8 5 32 None None 21.56 9.45 2.7 1.45 34.8 [27] InP 370/370 12 7.5 24 None None 30.7 19.8 5.0 3.3 42.1 [21] SiGe 100/120 9 8 12.3 None None 20 1.9 3.0 3.0 65.3 [8, 9] SiGe 200/250 11 10 8.6 None None 33 4.8 4.0 3.5 81.1 [28] SiGe:C fT =70 9 8 6.0 9 None 17 0.308 1 138.8 [9B DDS] SiGe 200/250 9 7 2.9 9 9 35 2.0 2.5 3.0 82.7 [24B DDS] SiGe 200/250 24 10 5.0 24 12 45 4.7 3.7 3.0 192.6 75 Chapter 6 An 8.7-13.8 GHz Transformer-coupled Varactor-less QCCO RFIC 6.1 Introduction Quadrature signals are widely used in the wireless transceivers as local oscillator (LO) to generate the up- and down-conversions with image-reject mixing. There are several ways to generate quadrature signals. A frequency divider can be used to divide a voltage-controlled- oscillator (VCO) output at higher frequency to quadrature phase outputs. Divided-by-four is usually used because the divided-by-two method requires a 50% duty cycle for the VCO output. However, the divided-by-four method requires a VCO frequency output running at four times of the LO frequency, which results in higher power consumption and poor phase noise. A VCO followed by a passive poly-phase complex lter can be used to generate the quadrature outputs as well. However, the output has poor phase accuracy for wide band input. In addition, large loss due to the poly-phase network requires power-hungry bu ers to boost the LO magnitude. At higher frequency, poly-phase lter is very di cult to be implemented because the reduced component values are more sensitive to the process variations and parasitic in uences. Cross-coupling two single phase LC-VCO architectures are widely used to generate a quadrature output at high frequency. This technique provides wide-band quadrature accuracy and superior phase noise performance with increase power consumption. There are various ways to couple the two VCOs and lock their oscillation frequency. The most common quadrature VCO (QVCO) topology shown in Fig. 6.1 utilizes the parallel coupling proposed by Rofougaran et al. [32]. The parallel VCO (P-QVCO) delivers quadra- ture signals with low phase and amplitude errors, yet has a narrow tuning range with the tuning limit of the varactor. Series QVCOs (S-QVCO) have been proposed using CMOS or 76 I+ I- Q+ Q- VCC V tune Q+ Q- I- I+ VCC V tune Figure 6.1: Quadrature VCO circuits with parallel coupling. BiCMOS technology by connecting the coupling transistors in series [33, 34, 35]. It reduces the noise by using the cascode devices and provides better isolation between the VCO output and its current sources. However, the S-QVCO also su ers from a narrow frequency tuning range because of the varactor?s small tuning capability. A magnetically tuned quadrature oscillator has been reported by Cusmai et al. and the output frequency can be tuned from 3.2 GHz to 7.3 GHz [36]. Modern communication and radar systems require quadrature sig- nal generation at X- and Ku-bands with wide tuning range for the frequency source used in phase-locked-loops (PLL) or direct digital frequency synthesizers (DDS) [8,9,21]. An 8.7-13.8 GHz transformer-coupled varactor-less quadrature current-controlled oscillator (QCCO) is presented in this chapter [37,38]. It employs the same mechanism as what presented in [36] but has a higher output frequency. This chapter will present the principle and oscillator implementation as well as the phase accuracy and phase noise analysis. The implementation and modeling of the adopted stacked octagonal transformers will be discussed. Finally, it gives the experimental results and the conclusion is drawn. 77 I+ I- Q+ Q- VCC Q+ Q- I- I+ VCC I tune I core T 1 T 2 T 5 T 6 T 3 T 4 T 7 T 8 M 2 M 1 M 3 M 4 I tune I core Figure 6.2: Schematic of transformer-coupled varactor-less QCCO. 6.2 Analysis and Design of Transformer Coupled Quadrature Oscillator 6.2.1 Oscillation Analysis and Design The varactor-less QCCO presented here is a transformer-coupled current-controlled LC oscillator that utilizes SiGe hetero-junction bipolar transistors (HBT) for oscillation and current tuning. The NPN HBTs achieve very high oscillation frequency and low phase noise. The proposed QCCO circuit is illustrated in Fig. 6.2, in which two pairs of cross- coupled NPN HBTs T1, T2 and T3, T4 are used to generate the negative resistance for in-phase CCO (I-CCO) and quadrature phase CCO (Q-CCO) output respectively. Another two pairs of NPN HBTs T5, T6 and T7, T8 are used to provide the tuning currents for the transformers. All the HBTs operate near the peak fT bias current in order to maximize the switching speed. Fig. 6.3 shows an AC equivalent circuit of one of the I-CCO or Q-CCO. The discussions presented below take I-CCO as an example, since the Q-CCO has the same structure. The primary winding of the transformer has the same function as the LC-tank in the conventional LC coupled VCO. 1=gm is generated from the cross-coupled transistor 78 pair T1 and T2. cp and rp are the total parasitic resistance and capacitance between the two terminals of the primary transformer winding in the oscillator circuit. The capacitance cp includes all the transformer capacitances as well as the transistor parasitic capacitance. The secondary winding parasitic devices do not show up since they have little e ect on the CCO output. To achieve high frequency, no extra capacitor or varactor is used. With intuitive analysis based on Fig. 6.3, the output voltage Vo of the I-CCO equivalent circuit can be expressed as Vo = j!LpIcore +j!MItune = j!Icore(Lp + M); (6.1) where = Itune=Icore. M is the mutual inductance between the primary and secondary windings. The mutual inductance M can be calculated using M = kpLpLs: (6.2) where k is the coupling factor of the transformer. Thus, the e ective inductance for the oscillation tank is given by Leff = Lp + M: (6.3) For either I-CCO or Q-CCO the oscillation frequency can be found as fosc = 12 p(L p + M)cp : (6.4) By changing the tuning current Itune, will be changed, so does the oscillation frequency of the QCCO output. Because can be tuned arbitrarily by the QCCO core current and turning current, and can be negative or positive, the ideal oscillation frequency can be tuned from a small value to in nity when is tuned from positive in nity to Lp=M. So the QCCO output frequency can be very widely tuned with carefully selected devices and current ratios. 79 2 c p 2 c p r p /2 r p /2 I tu n e M L s L p -1 /g m -1 /g m I c o re + - V o Figure 6.3: AC equivalent circuit of the transformer tank. To determine the actual accuracy output voltage and oscillation frequency, from the circuit analysis of the AC equivalent circuit shown in Fig. 6.3, the output voltage of the oscillator can be calculated as Vo = Icore 1 j!cp==(rp +j!Lp) j!M: (6.5) Separation Eq. (6.5) into real and imaginary parts leads to the following expression for the oscillation amplitude Vo and frequency !osc: Vo = Icore Lpr pcp = Icore !0QLp: (6.6) !2osc = !20 2 4(1 12m 12Q 2) 1 2m s m!20 !2c + (3 2m) 2 4 [(m 1)2 + 1] 3 5 !20 (1 12m 12Q2) 12mp1 4m : (6.7) where Q = (!0Lp)=rp, !0 = 1=pLpcp and !c = 1=(rpcp) are the quality factor, self-resonance frequency and corner frequency of the transformer primary winding. m = MLp ItuneIcore can be 80 considered as the coupling strength of the transformer [36,39,40]. The oscillation amplitude is independent of the tuning current and is determined by core current and transformer parameters only. The approximation of osc is acceptable when !c !0, which is true for the transformer windings. The oscillation frequency is determined by the quality factor as well as m, which is the function of the self-inductance and mutual inductance of the transformer and the current ratio of the tuning current and core current of the oscillator. To increase the tuning capability with small tuning current, transformer need to be carefully designed to maximize its mutual inductance. A stacked octagonal transformer shown in Fig. 6.4, which has the maximum mutual inductance in theory, is designed to reduce the magnetic ux leakage [41]. The transformer design will be discussed in the following section. 6.2.2 Quadrature Coupling Phase Accuracy and Phase Noise Fig. 6.5 shows the full AC equivalent circuit of the varactor-less QCCO. If we take T5, T6 and T7, T8 as -Gm ampli ers, Fig. 6.5 can be further simpli ed to Fig. 6.6. Fig. 6.6(a) shows the transformer in-phase case of the QCCO, while Fig. 6.6(b) shows the transformer anti-phase. Suppose the phase delay of both -Gm ampli ers is , from Barkhausen criteria, the phase delay for the in-phase and anti-phase oscillators can be determined by + + = 2n ; n = 1;2; : (6.8) and + + + + = 2n ; n = 1;2; : (6.9) Thus, the phase delay of the -Gm ampli er is given by = 2: (6.10) 81 T o p M e ta l 2 n d T o p M e ta l S p S m P m P p P p P m S p S m L p L s M (A ) (B ) Figure 6.4: Stacked octagonal transformer. Therefore, regardless of in-phase or anti-phase, the phase delay between I and Q is =2 or =2. In another word, quadrature frequency outputs can be generated. In practice, there are mismatches between the devices used in the oscillator circuit, which results in slightly di erent phase delays between the two -Gm ampli ers as well as the two transformers. The device and transformer mismatches are the major contributors of phase error between the quadrature outputs. Another phase error source comes from the coupling between the two transformers used in I oscillator and Q oscillator. From [40], the total phase error e can be determined by e = 12 Qm2"+k0Qm: (6.11) 82 I+ I- Q - Q + -1 /g m -1 /g m -1 /g m -1 /g m T 6 T 7 T 5 T 8 Figure 6.5: AC equivalent circuit of the varactor-less QCCO. -G m -G m I- I+ Q - Q +0 ? 1 8 0 ? 0 ? 1 8 0 ? 0 ? 1 8 0 ? 9 0 ?9 0 ? 2 7 0 ?2 7 0 ? -G m -G m I- I+ Q - Q +1 8 0 ? 0 ? 0 ? 1 8 0 ? 0 ? 1 8 0 ? 2 7 0 ?9 0 ? 9 0 ?2 7 0 ? (a ) T ra n s fo rm e r in p h a s e (b ) T ra n s fo rm e r a n ti p h a s e Figure 6.6: Equivalent circuits of the varactor-less QCCO. 83 where " represents the mismatches between the devices and transformers, k0 is the coupling factor between the primary windings of the two transformers. m is the coupling strength of the transformer de ned same as that in Eq. (6.7). So the phase error will increase with a better quality factor and a bigger coupling strength of the transformer. The phase noise of the oscillators has been intensively investigated previously [41,42,43]. The analysis of the conventional quadrature oscillator has been proposed in [33,39,44]. The phase noise of the transformer coupled oscillator is similar with the conventional quadrature oscillators de ned in [39], namely, L( !) = kTC !oscQ 1 +m 2 2 1 !2 1 + (1 +m)F A20 (6.12) where A0 is the oscillation amplitude across one of the tanks, F is the noise factor of the conventional single-phase oscillator. From Eq. (6.12), the phase noise of the quadrature oscillator increases rapidly with the increasing of m and reduced with a better quality factor of coupling inductors or transformers. This conclusion can be obtained by the phase noise analysis in [36], as well. So, a trade-o needs to be considered between the phase accuracy and phase noise with respect to m and Q in the quadrature oscillator designs. 6.3 Transformer Implementation 6.3.1 Geometry Design of Transformers The transformer coupled QCCO has been designed in a 0.18 m SiGe BiCMOS tech- nology. The transformer design has been optimized in simulations, by means of the full-wave electromagnetic solver, Agilent Momentum, in order to maximize magnetic coupling k or the M=L ratio and the primary winding quality factor Q. For di erent transformer struc- tures, the self-inductance L, the mutual inductance M or the coupling coe cient k, the turn ratio n, the quality factor Q and the self-resonance frequency !0 may vary signi - cantly. Depending on the transformer structure and the magnetic coupling method (lateral 84 or vertical), di erent approaches in transformer layout have been proposed [45,46]. Usually, transformers are formed by magnetically coupling two or more inductors. There are four commonly used inductor shapes: square, hexagonal, octagonal and circular. Based on these inductors, inter-winding or stacked transformer can be built with di erent geometry shapes. Considering the transformer performance (usually inductance and quality factor), the cir- cular is the best choice, followed by octagonal and hexagonal, and the square is the worst. But the circular layout of the transformers is not compatible with most of the design rules. So octagonal is the most commonly used shape to build inductors and transformers. For di erential circuits, such as what used in the proposed quadrature oscillator, symmetrical shape is required. Based on above discussions, Fig. 6.7 illustrates three transformers: con- centric, inter-wound and stacked. The concentric transformer has a worse coupling factor than the inter-wound and stacked structure. A stacked structure transformer is used in this design because of a smaller layout area than the inter-wound one as well as a better magnetic coupling than the concentric one. Fig. 6.4 shows the adopted stacked transformer drawing diagram with terminal names labeled with respect to the transformer symbol. In this SiGe technology, top metal layer is much thicker than any of other metals and stays farthest to the substrate. It is thus used to fabricate the primary winding in order to achieve a bet- ter quality factor. The second top metal layer is used to fabricate the secondary windings. While the top metal is thicker than the second top metal, both top and second top metals in the chosen 0.18 m SiGe technology are much thicker than other metal layers and both are optimized with lower sheet resistance for analog routing. Both the primary and secondary windings are 10 m wide and have two turns with diameter of 200 m. The two windings are exactly overlapped and the winding wire space is 5 m, which is the minimum design rules allowed space, between the two turns, to maximize the ll ratio (de ned later). At low frequencies, the Q of the inductor is limited primarily by the resistance of the metal layer. At high frequency, Q degradation is dominated by the loss mechanisms caused by the substrate [32]. To minimize the Q dependence on the substrate resistivity, the transformer 85 (a) (b) (c) Figure 6.7: Octagonal symmetrical transformer: (a) concentric, (b) inter-wound, and (c) stacked. is placed on top of a patterned ground shield (PGS) to minimize the current injected into the substrate. The PGS is a patterned conductive layer and formed by a lattice of highly resistive deep trench (DT) isolation layer in this design. Fig. 6.8 shows the diagram of the three-dimension substrate and two-dimension DT lattice used in this design. The PGS sub- strate is used to reduce the parasitic capacitance of the transformer to the substrate as well as increase the parasitic resistance to the substrate. 86 (b)(a) Silicon Substrate Figure 6.8: Diagram of the (a) three-dimension PGS substrate and (b) two-dimension deep trench lattice. 6.3.2 Transformer Equivalent Circuit and Parameters Usually the frequency domain model of the transformer is more important since most performances of the oscillator are analyzed in frequency domain. However the time domain equivalent circuit is more intuitive. Fig. 6.9 shows the 2- equivalent circuit of the stacked transformer. Lp and Ls are the self-inductance of the primary and secondary windings. Rp and Rs are the series resistance of the primary and secondary windings. Cpp and Css are the inter-winding capacitance between the two turns of the primary or secondary winding. Cps is the capacitance between the stacked primary and secondary winding. Cbp and Cbs are the parasitic capacitances of primary and secondary windings coupled to the PGS. Rb is the parasitic resistance to the PGS substrate. The self-inductance for the octagonal inductor can be estimated using equation devel- oped by Mohan [46,47], namely, L = 2:25 0 n 2davg 1 + 3:55 (6.13) 87 k k L p /2 L p /2 L s /2 L s /2 R p /2 R p /2 R s /2 R s /2 C p p C s s C p s /2 C p s /2 2 R b 2 R b C b p /4C b p /4 C b p /2 C b s /4C b s /4 C b s /2 P p S p P m S m P G S P G S Figure 6.9: Transformer time domain equivalent circuit model. where, davg = 0:5(dout + din) is the average value of the outer diameter dout and the inner diameter din of the octagonal inductor. is the ll ratio de ned as = (dout din)=(dout+din). The mutual inductance M is de ned in Eq. (6.2). The coupling factor k of the stacked transformer is over 0.8. More accurate self-inductance and mutual inductance or coupling factor can be obtained by electromagnetic simulation and vector network analyzer (VNA) measurement. Fig. 6.10 shows the simulated parameters of the octagonal stacked transformer. Fig. 6.10(a) is the plot of primary and secondary self-inductance. They are almost identical since the metal material and thickness doesn?t a ect the inductance greatly [45]. Fig. 6.10(b) shows the coupling factor of the transformer. It is around 0.8 and approaches to 1 at high frequency. Fig. 6.10(c) is the plot of quality factor of the primary and secondary windings. The peak Q 88 of the primary winding is about twice of that of the secondary winding because the primary winding metal is thicker and has a lower sheet resistance than the secondary one. In this oscillator design, the parallel capacitance between the terminals of the trans- former primary winding is used as the oscillation tank capacitance. The total parallel ca- pacitance is given by Cp = Cpp + Cbp8 + Cps4 ==(Css + Cbs8 ): (6.14) With the geometry and electronic parameters of the transformer, all these capacitance Cpp, Cbp, Cps, Css and Cbs represent the total capacitance corresponding to the respective nodes and can be calculated using simple parallel-plate capacitor model. The PGS is far away from the windings, so Cbp is much smaller than other capacitance. The accurate frequency dependent capacitance paralleled with the primary winding can be simulated from elec- tromagnetic simulation tools, too. Fig. 6.11 gives the plot of the total capacitance with simulated capacitance value of 0.6 pF at 10 GHz frequency. In practical, it is not easy to nd out the exact inductance and capacitance associated with the transformer. With the help of S-parameter, all the simulations can be performed through hybrid simulation tools. In this design, AgilentDynamicLink tool is used to recall the SPICE simulator and Momentum electromagnetic simulator for all time domain and frequency domain simulations. Therefore, the oscillator was designed by directly specifying the geometric parameters of the transformers instead of giving the L, C parameters in the traditional design ows. The more exible electromagnetic and circuit co-simulation approach increases the design speed dramatically since the modeling process is removed. 6.4 Experimental Results The transformer-coupled varactor-less QCCO was implemented and fabricated in a 0.18 m SiGe BiCMOS process with the chip die photo shown in Fig. 6.12. The QCCO 89 0 5 10 15 20 25 30 35 0 2 4 6 8 10 F r e q u e n cy [ G H z ] In d u ct a n ce [n H ] P r i m a r y S e c o n d a r y 0 5 10 15 20 25 30 35 0 0 . 2 0 . 4 0 . 6 0 . 8 1 F r e q u e n cy [ G H z ] C o u p li n g F a ct o r (k) 0 5 10 15 20 25 30 35 0 5 10 15 20 25 F r e q u e n cy [ G H z ] Q u a li ty F a ct o r (Q ) P r i m a r y S e c o n d a r y (a) (b) (c) Figure 6.10: Simulated parameters of the transformer windings: (a) self-inductance L, (b) coupling factor k, and (c) quality factor Q. 90 0 5 10 15 20 25 30 35 10 -1 10 0 10 1 F r e q u e n cy [ G H z ] C a p a ci t a n ce [ p F ] Figure 6.11: Simulated capacitance parallel with the transformer primary winding. core area is 0.5 0.4 mm2. As shown in the die photo, the I-CCO and Q-CCO are symmetri- cally placed. The layout is also optimized to lower the e ect of layout parasitic on the QCCO performance including the harmonic distortion and phase noise. The QCCO is tested in the package of CLCC-28. A bu er is included on-chip in order to drive 50 Omega load provided at the input of a spectrum analyzer or a digital oscilloscope. Due to the limitation of the test set up, all the test results were measured based on the single-ended output, although the QCCO has full di erential frequency output capability. Single-ended testing ends up with degraded phase noise and I-Q mismatch. A wide tuning range of 45.3% is achieved with the tuning current Itune tuned from 0.4 to 2.9 mA and the QCCO core current Icore tuned from 1.2 to 5.5 mA. The measured QCCO turning range is given in Fig. 6.13. It shows continuous tuning range from 8.7 to 13.8 GHz. Fig. 6.14 shows the measured quadrature outputs with 11.5 GHz frequency. The measured phase noise with an 11.02 GHz output frequency is shown in Fig.6.15. With the single-ended test, the transformer-coupled varactor-less QCCO achieves 86.8 dBc/Hz 91 V C O Figure 6.12: Fabricated QCCO RFIC die photo. 0.5 1 1.5 2 2.5 8 9 10 11 12 13 14 TUNING CURRENT (mA) O U T P U T F R E Q U E N C Y ( G H z ) I c = 1.2mA I c = 2.0mA I c = 2.9mA I c = 3.8mA I c = 4.6mA I c = 5.5mA Figure 6.13: Measured QCCO tuning range. phase noise at 1 MHz o set frequency and 110 dBc/Hz at 10 MHz o set frequency. A widely accepted gure-of-merit (FOM) for VCO designs is proposed in [48]. The FOM takes into ac- count output frequency, phase noise performances and power consumption and is expressed by FOM = L( f) 20 log f 0 f + 10 log P diss 1mW : (6.15) 92 Figure 6.14: Measured QCCO outputs at 10.5 GHz with tuning current of 1.5 mA and core current of 2 mA. where L( f) is the phase noise at the o set frequency f from the carrier frequency f0. Pdiss is the total core power dissipation of the I-CCO and Q-CCO. The FOM of this transformer- coupled varactor-less QCCO is calculated as -154 dBc/Hz. Table 6.1 summarizes the measured results of the transformer-coupled varactor-less QCCO. It achieves 45.3% wide tuning range and the core circuit occupies 0.4 0.5 mm2 chip area in a 0.18 m SiGe BiCMOS process. It draws 8-18 mA current over the tuning range under a 1.8 V power supply. Table 6.2 compares the frequency, tuning range, power consumption and phase noise for several variable-frequency oscillators. Compared to [32,33, 35], the proposed oscillator has a much higher output frequency and tuning range. Compared to [36], it has higher frequency, yet a worse phase noise. 6.5 Conclusion A transformer-coupled varactor-less wide tuning QCCO is presented in this chapter. It achieves 45.3% wide tuning range by tuning the oscillator currents owing through the 93 Figure 6.15: Measured QCCO phase noise with output frequency of 11.02 GHz. Table 6.1: QCCO Performance Summary Technology 0.18 m SiGe BiCMOS Supply voltage 1.8 V Oscillation frequency 8.7-13.8 GHz Tuning range 45.3% Core current 8-18 mA Bu er current 8 mA Phase noise @ 1MHz -86.8 dBc/Hz Phase noise @ 10MHz -110 dBc/Hz QCCO area 500 m 400 m FOM -154 dBc/Hz primary and secondary winding of the stacked octagonal transformers. The prototype QCCO is fabricated in 0.18 m SiGe BiCMOS technology and the core circuit occupies 0.4 0.5 mm2 chip area. It draws 8-18 mA current under a 1.8 V power supply. The measured phase noise 94 Table 6.2: Performance Comparison of Variable-frequency Oscillators Ref Technology Frequency Tuning Range Power Phase Noise [GHz] [mW] [dB/Hz] [32] 1 m CMOS 0.9 17% 30 -110@1MHz [33] 0.35 m CMOS 1.8 18% 50 -140@3MHz [35] 0.5 m BiCMOS 4.3-5 14.6% 19.8 -115@2MHz [36] 65 nm CMOS 3.2-7.3 67% 7.2-24 -120 -135@1MHz [37] 0.18 m BiCMOS 8.7-13.8 45.3% 14.4-32.4 -86.8@1MHz, /This Work -110@10MHz of the single-end output is about -86.8 dBc/Hz at 1 MHz o set and 110 dBc/Hz at 10 MHz o set with a 11.02 GHz quadrature output. The phase noise FOM of this transformer-coupled QCCO is -154 dBc/Hz. 95 Chapter 7 Summary and Future Work 7.1 Summary of Original Work This dissertation presents detailed design procedure of ultrahigh speed DDS. The main target is to achieve microwave range speed and high resolution performance as well as keep moderate power consumption. A transformer-coupled frequency variable oscillator is de- signed to provide the clock frequency of the DDS system. Three DDSs with di erent architectures have been implemented in 0.13 m SiGe BiC- MOS technology. The sine-weighted DAC brings in the system additional spurs to the nal DDS output spectra. The major problem is the non-ideal nature of a sine-weighted DAC is more noticeable than that of a linear DAC. The detailed analysis shows that the DAC associated spurs coming from two major sources. One is the static performance of a DAC, such as DNL or INL. The other is the dynamic performance of a DAC, which is input code dependent and clock frequency dependent. To make the whole situation more complicate, the noise coming from digital block will also inject into the substrate and power supply. To suppress the crosstalk and power and ground bouncing, more unknown variables need to be taken into account. 7.2 Possible Future Directions The main di culty comes from good model to accurately re ect the real natures of a working DDS. Though there are some theoretical works appeared in this eld, there is still something to be desired. Especially, to predicate the dynamic performance of the DAC and the DDS with good accuracy, up to now hardly anyone can give satis ed results. The 96 following works are intend to make further study of modeling of the DDS, provide some new thoughts into this problem and hopefully nd alternative solution to answer the questions when designing a DDS. 97 Bibliography [1] Xueyang Geng, F. F. Dai, J. D. Irwin, and R. C. Jaeger, \A 9-bit 2.9 GHz direct digital synthesizer MMIC with direct digital frequency and phase modulations," in IEEE MTT-S International Microwave Symposium Di- gest, June 2009, pp. 1125{1128. [2] Xueyang Geng, F. F. Dai, J. D. Irwin, and R. C. Jaeger, \A 24-bit 5.0 GHz direct digital synthesizer MMIC with direct digital modulations and spur randomization," in Radio Frequency Integrated Circuits (RFIC) Sympo- sium, 2009 IEEE, June 2009, pp. 419{422. [3] J. M. P. Langlois and D. Al-Khalili, \Phase to sinusoid amplitude conversion techniques for direct digital frequency synthesis," in IEE Proc. Circuits Devices Syst., Dec. 2004, pp. 519{528. [4] D. A. Sunderland, R. A. Strauch, S. S. Whar eld, H. T. Peterson, and C. R. Cole, \CMOS/SOS frequency synthesizer LSI circuit for spread spectrum communications," IEEE J. of Solid-State Circuits, vol. sc-19, no. 4, pp. 497{506, Aug. 1984. [5] H. T. Nicholas, III, H. Samueli, and B. Kim, \The optimization of direct digital fre- quency synthesizer performance in the presence of nite word length e ects," in Proc. of the 42nd Annual Frequency Control Symposium, 1988, pp. 257{263. [6] C. C. Wang, Y. L. Tseng, H. C. She, C. C. Li, and R. Hu, \A 13-bit resolution ROM-less direct digital frequency synthesizer based on a trigeometric quadruple angle formula," IEEE Trans. on Very Large Scale Integration Systems, vol. 12, no. 9, pp. 895{900, Sept. 2004. [7] D. De Caro, N. Petra, and A. G. M. Strollo, \Reducing lookup-table size in direct digital frequency synthesizers using optimized multipartite table method," IEEE Trans. on Circuits and Systems II, vol. 55, no. 7, pp. 2116{2127, Aug. 2008. [8] Xueyang Geng, X. Yu, F. F. Dai, J. D. Irwin, and R. C. Jaeger, \An 11-bit 8.6 GHz direct digital synthesizer MMIC with 10-bit segmented nonlin- ear DAC," in 34th European Solid-State Circuits Conference (ESSCIRC), Sept. 2008, pp. 362{365. [9] Xueyang Geng, F. F. Dai, J. D. Irwin, and R. C. Jaeger, \An 11-bit 8.6 GHz direct digital synthesizer MMIC with 10-bit segmented sine-weighted DAC," IEEE J. of Solid-State Circuits, vol. 45, no. 2, pp. 300{313, Feb. 2010. 98 [10] P. Lacomme, J.-P. Hardange, J.-C. Marchais, and E. Normant, Air and Spaceborne Radar System: An Introduction, ISBN 1-891121-13-8. William Andrew Publishing, LLC, 2001. [11] J. R. Klauder, A. C. Price, S. Darlington, and W. J. Albersheim, \The theory and design of chirp radars," The Bell System Technical Journal, vol. 39, no. 4, pp. 745{808, 1960. [12] M. Skolnik, Radar handbook, 3rd ed., ISBN-10: 0071485473, ISBN-13: 978-0071485470, McGraw-Hill Professional, 2008. [13] F. F. Dai, W. Ni, Y. Shi, and R. C. Jaeger, \A direct digital frequency synthesizer with fourth-order phase domain noise shaper and 12-bit current steering DAC," IEEE J. Solid-State Circuits, vol. 41, no. 4, pp. 839{850, April 2006. [14] A. Van den Bosch, M. Steyaert, and W. Sansen, Static and Dynamic Performance Limitations for High Speed D/A converters, ISBN 1402077610, Kluwer Academic Publishers, chapter 5. [15] J. J. Wikner and N. Tan, \Modeling of CMOS digital-to-analog converters for telecom- munication," IEEE Trans. on Circuit and Systems II: Analog and Digital Signal Pro- cessing, vol. 46, no. 5, pp. 489{499, May 1999. [16] W. Ni, Xueyang Geng, Y. Shi, and F. Dai, \A 12-bit 300 MHz CMOS DAC for high-speed system applications," in IEEE International Symposium on Circuits and Systems (ISCAS), Kos, Greece, May 2006, pp. 1402{1405. [17] J. Jiang and E. K. F. Lee, \A low-power segmented nonlinear DAC-based direct digital frequency synthesizer," IEEE J. Solid-State Circuits, vol. 37, no. 10, pp. 1326{1330, Oct. 2002. [18] A. Gutierrez-Aitken, J. Matsui, E. Kaneshiro, B. Oyama, A. Oki, and D. Streit, \Ultra high speed direct digital synthesizer using InP DHBT technology," IEEE J. of Solid- State Circuits, vol. 37, no. 9, pp. 1115{1121, Sept. 2002. [19] S. E. Turner and D. E. Kotecki, \Direct digital synthesizer with ROM-less architecture at 13-GHz clock frequency in InP DHBT technology," IEEE Microwave and Wireless Components Letters, vol. 16, no. 5, pp. 296{298, May 2006. [20] S. E. Turner and D. E. Kotecki, \Direct digital synthesizer with sine-weighted DAC at 32-GHz clock frequency in InP DHBT technology," IEEE J. of Solid-State Circuits, vol. 41, no. 10, pp. 2284{2290, Oct. 2006. [21] X. Yu, F. F. Dai, J. D. Irwin, and R. C. Jaeger, \A 12 GHz 1.9 W direct digital synthesizer RFIC implemented in 0.18 m SiGe BiCMOS technology," IEEE J. of Solid- State Circuits, vol. 43, no. 6, pp. 1384{1393, June 2008. 99 [22] X. Yu, F. F. Dai, J. D. Irwin, and R. C. Jaeger, \A 9-bit quadrature direct digital synthesizer implemented in 0.18- m SiGe BiCMOS technology," IEEE Trans. on Mi- crowave Theory and Technique, vol. 56, no. 5, pp. 1257{1266, May 2008. [23] A. Van den Bosch, M. A. F. Borremans, M. S. J. Steyaert, and W. Sansen, \A 10- bit 1-Gsample/s Nyquist current-steering CMOS D/A converter," IEEE J. Solid-State Circuits, vol. 36, no. 3, pp. 315{324, March 2001. [24] K. R. Elliott, \Direct digital synthesis for enabling next generation RF systems," in IEEE Compound Semiconductor Integrated Circuit Symposium (CSIC), Nov. 2005, pp. 125{128. [25] P. G. A. Jespers, Integrated Converters: D to A and A to D Architectures, Analysis and Simulation, ISBN 0198564465, Oxford University Press. [26] R. H. Walden, \Analog-to-digital converter survey and analysis," IEEE J. on Selected Areas in Communications, vol. 17, no. 4, pp. 539{550, April 1999. [27] S. E. Turner, R. T. Chan, and J. T. Feng, \ROM-based direct digital synthesizer at 24 GHz clock frequency in InP DHBT technology," IEEE Microwave and Wireless Components Letters, vol. 18, no. 8, pp. 566{568, Aug. 2008. [28] S. Thuries, E. Tournier, A. Cathelin, S. Godet, and J. Gra euil, \A 6-GHz low-power BiCMOS SiGe:C 0.25 m direct digital synthesizer," IEEE Microwave and Wireless Components Letters, vol. 16, no. 1, pp. 46{48, Jan 2008. [29] Xueyang Geng, F. F. Dai, J. D. Irwin, and R. C. Jaeger, \A 24-bit 5.0 GHz direct digital synthesizer RFIC with direct digital modulations in 0.13 m SiGe BiCMOS Technology," IEEE J. of Solid-State Circuits, vol. 45, no. 5, pp. 944{954, May 2010. [30] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits, A Design Perspective (Second Edition), 2nd ed., ISBN 0-13-597444-5, Pearson, Inc, 2003. [31] AD9912 datasheet, Analog Devices. [32] A. Rofougaran, J. Rael, M. Rofougaran, , and A. Abidi, \A 900 MHz CMOS LC- oscillator with quadrature outputs," in IEEE Int. Solid-State Circuits Conf. (ISSCC), 1996, pp. 392{393. [33] P. Andreani, A. Bonfanti, L. Romano, , and C. Samori, \Analysis and design of a 1.8- GHz CMOS LC quadrature VCO," IEEE J. of Solid-State Circuits, vol. 37, no. 12, pp. 1737{1747, Dec. 2002. [34] P. Andreani and X. Wang, \On the phase-noise and phase error performance of multi- phase LC CMOS VCOs," IEEE J. of Solid-State Circuits (JSSC), vol. 39, no. 11, pp. 1883{1893, Nov. 2004. 100 [35] V. Kakani, F. Dai, , and R. C. Jaeger, \A 5GHz BiCMOS quadrature LC VCO with wide tuning range," in IEEE Bipolar/BiCMOS Circuits and Technology meeting (BCTM), 2006, pp. 138{141. [36] G. Cusmai, M. Repossi, G. Albasini, A. Mazzanti, , and F. Svelto, \A magnetically tuned quadrature oscillator," IEEE J. of Solid-State Circuits (JSSC), vol. 42, no. 12, pp. 2870{2877, Dec. 2007. [37] Xueyang Geng and F. F. Dai, \An 8.7-13.8 GHz transformer-coupled varactor-less quadrature current -controlled oscillator," in the 2009 Bipo- lar/BiCMOS Circuits and Technology Meeting (BCTM), IEEE, 2009, pp. 63{66. [38] Xueyang Geng and F. F. Dai, \An X-band transformer-coupled varactor-less quadrature current-controlled oscillator in 0.18 m SiGe BiCMOS technol- ogy," Invited to submit to IEEE J. of Solid-State Circuits (JSSC), BCTM Special Issue, vol. 45, no. 9, Sep 2010. [39] L. Romano, S. Levantino, A. Bonfanti, C. Samori, , and A. L. Lacaita, \Phase noise and accuracy in quadrature oscillators," in in IEEE International Symposium on Circuits and Systems (ISCAS), 2006, pp. 161{164. [40] A. Mazzanti, F. Svelto, , and P. Andreani, \On the amplitude and phase errors of the quadrature LC-tank CMOS oscillators," IEEE J. of Solid-State Circuits (JSSC), vol. 41, no. 6, pp. 1305{1313, Jun. 2006. [41] D. B. Leeson, \A simple model of feedback oscillator noise spectrum," in in Proc. IEEE, 1966, pp. 329{330. [42] A. Hajimiri and T. H. Lee, \A general theory of phase noise in electrical oscillators," IEEE J. of Solid-State Circuits (JSSC), vol. 33, no. 2, pp. 179{194, Feb. 1998. [43] T. H. Lee and A. Hajimiri, \Oscillator phase noise: A tutorial," IEEE Journal of Solid- State Circuits (JSSC), vol. 35, no. 3, pp. 326{336, Mar. 2000. [44] A. Andreani, \A time-variant analysis of the 1=f2 phase noise in CMOS parallel LC- tank quadrature oscillators," IEEE Trans. Circuits Syst. I, vol. 53, no. 8, pp. 1749{1770, Aug. 2006. [45] T. H. Lee, The Design of CMOS Radio-Frequency Integrated Circuits, 2nd ed., Cam- bridge University Press. [46] S. S. Mohan, M. D. Hershenson, S. P. Boyd, , and T. H. Lee, \Simple accurate expres- sions for planar spiral inductances," IEEE J. of Solid-State Circuits (JSSC), vol. 34, pp. 1419{1424, 1999. [47] J. Rogers and C. Plett, Radio Frequency Integrated Circuit Design, Artech House. [48] P. Kinget, Integrated GHz Voltage Controlled Oscillators, New York: Kluwer Academic Publisher, 1999. 101