Doubly-Selective Channel Estimation and Equalization Using Superimposed Training and Basis Expansion Models Except where reference is made to the work of others, the work described in this dissertation is my own or was done in collaboration with my advisory committee. This dissertation does not include proprietary or classified information. Shuangchi He Certificate of Approval: Stanley J. Reeves Professor Electrical and Computer Engineering Jitendra K. Tugnait, Chair Professor Electrical and Computer Engineering Soo-Young Lee Professor Electrical and Computer Engineering Joe F. Pittman Interim Dean Graduate School Doubly-Selective Channel Estimation and Equalization Using Superimposed Training and Basis Expansion Models Shuangchi He A Dissertation Submitted to the Graduate Faculty of Auburn University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Auburn, Alabama August 4, 2007 Doubly-Selective Channel Estimation and Equalization Using Superimposed Training and Basis Expansion Models Shuangchi He Permission is granted to Auburn University to make copies of this dissertation at its discretion, upon the request of individuals or institutions and at their expense. The author reserves all publication rights. Signature of Author Date of Graduation iii Dissertation Abstract Doubly-Selective Channel Estimation and Equalization Using Superimposed Training and Basis Expansion Models Shuangchi He Doctor of Philosophy, August 4, 2007 (M.S., Tsinghua University, 2003) (B.E., Tsinghua University, 2000) 261 Typed Pages Directed by Jitendra K. Tugnait Owing to multipath propagation and Doppler spread, typical wireless channels are both frequency- and time-selective (doubly-selective). In this dissertation, we concentrate on channel estimation and equalization over doubly-selective channels, by exploiting both superimposed training and basis expansion models (BEM). In contrast to the conventional time-multiplexed (TM) training schemes, at the trans- mitter, a periodic training sequence is arithmetically added at low power to the information sequence in superimposed training schemes. There is no loss in data transmission rate, but some useful power has to be allocated to superimposed training. We also employ vari- ous BEM?s to describe the temporal variations of the doubly-selective channel so that the estimation of a time-varying process can be reduced to estimating fewer invariant BEM coefficients. Firstly, a channel estimator is presented using superimposedtraining and the first-order statistics of the observations, based on various BEM?s, where information sequences act as iv interference in channel estimation. By using user-specific training sequences, the estimator can be extended to multiple-user systems. We next analyze the information-induced self-interference of this estimator. The per- formance analysis and the parameter optimizations are investigated. We propose two schemes to alleviate the self-interference in channel estimation. Us- ing the channel estimates by the first-order statistics-based estimator as an initial guess, a deterministic maximum likelihood (DML) approach is used to jointly estimate the channel and the information sequence. Exploiting the channel estimates and the detected informa- tion data from the previous iteration, the self-interference can be significantly reduced at the present iteration. We also propose a data-dependent superimposed training scheme. The training sequence is designed based on the current information sequence so that the self-interference can be entirely eliminated at the receiver. However, total elimination of the interference may lead to information loss. We then modify the scheme to the partially- data-dependent (PDD) training, striking a compromise between interference cancelation and information integrity. Using superimposed training and a BEM, direct equalization of doubly-selective chan- nels is also considered, without estimating the channel first. The direct equalizer is also extended to a multiple-user scenario, which can be used in a wireless ad hoc network. The proposed approaches are illustrated by computer simulation examples, and com- pared with conventional TM training-based approaches. When self-interference is suffi- ciently suppressed by our proposed schemes, the performance of superimposed training- based approaches are competitive with the ones usingthe conventional TM training, without incurring any data-rate loss. v Acknowledgments I have had a phenomenal time during my four years in Auburn. I attribute my good fortune to the wonderful people I have come to know through the past four years. First and foremost, I would like to thank my advisor, Prof. Jitendra K. Tugnait, for his generosity and support over the years. As the guide of my research career, he gave me careful and rigorous instructions, from which I will surely benefit for life. Many thanks also go to my committee members, Profs. Stanley J. Reeves and Soo- Young Lee; they have provided me with invaluable guidance and friendliness during my studies and contributed much to my dissertation work. Thanks to Prof. Douglas A. Leonard who served as my outside reader, for his suggestions and instruction. These acknowledge- ments would be far from complete without thanks to Profs. Tin-Yau Tam, Xiaoli Ma, and Shiwen Mao, for their sage advice during my study. I want to thank Weilin Luo and Xiaohong Meng for their foundational work relating to my dissertation?and especially Xiaohong, for her elegant simulation programs that offered me a good model to follow. Finally, I would like to express my gratitude to my parents and my wife, Fan, whose love and faith in me are the source of my strength. My studies were funded by National Science Foundation under Grant ECS 0424145 and Vodafone Fellowship. vi Style manual or journal used Journal of Approximation Theory (together with the style known as ?aums?). Bibliography follows van Leunen?s A Handbook for Scholars. Computer software used The document preparation package TEX (specifically LATEX) together with the departmental style-file aums.sty. vii Table of Contents List of Figures xi 1 Introduction 1 1.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2 Representations of Wireless Channels 11 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Jakes? Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Complex Exponential Basis Expansion Model (CE-BEM) . . . . . . . . . . 16 2.4 Orthogonal Polynomial Basis Expansion Model (OP-BEM) . . . . . . . . . 18 2.5 Discrete Prolate Spheroidal Basis Expansion Model (DPS-BEM) . . . . . . 20 2.6 Modeling Error of BEM?s . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.6.1 LS Approximation by Basis Expansion Models . . . . . . . . . . . . 23 2.6.2 Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.6.3 Simulation Example: Modeling Error of CE-, OP-, and DPS-BEM?s in Approximating a Doubly-Selective Channel . . . . . . . . . . . . . 25 2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3 First-Order Statistics-Based Estimation of Doubly-Selective Channels 28 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2 First-Order Statistics-Based Channel Estimation Using CE-BEM [81] . . . 28 3.3 First-Order Statistics-Based Channel Estimation Using DPS-BEM . . . . . 36 3.4 First-Order Statistics-Based Channel Estimation Using OP-BEM . . . . . . 40 3.5 First-Order Statistics-Based Channel Estimation: Multiple-User (MIMO) Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.6 Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.6.1 First-Order Statistics-Based Estimator: Single User . . . . . . . . . 59 3.6.2 First-Order Statistics-Based Estimator: Multiple Users . . . . . . . . 64 3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4 Performance Analysis and Parameter Design for First-Order Statistics-Based Estimator 71 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.2 Performance Analysis for the First-Order Statistics-Based Estimator Using BEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 viii 4.2.1 Performance Analysis for CE-BEM-Based Estimator . . . . . . . . . 74 4.2.2 Performance Analysis for DPS-BEM-Based Estimator . . . . . . . . 77 4.2.3 Performance Analysis for OP-BEM-Based Estimator . . . . . . . . . 81 4.2.4 Performance Analysis for Multiple-User (MIMO) Channels . . . . . 84 4.3 Performance Analysis for the First-Order Statistics-Based Estimator: with Modeling Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.4 Training Power Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.5 Bias-Variance Trade-Off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.6 Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.6.1 Performance Analysis for the First-Order Statistics-Based Estimator 106 4.6.2 Training Power Allocation . . . . . . . . . . . . . . . . . . . . . . . . 111 4.6.3 Bias-Variance Trade-Off . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5 Deterministic Maximum Likelihood (DML) Approach 118 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.2 DML Approach Using BEM . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.3 DML Approach: Multiple-User (MIMO) Channels . . . . . . . . . . . . . . 125 5.4 Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.4.1 DML Approach: Single User . . . . . . . . . . . . . . . . . . . . . . 132 5.4.2 DML Approach: Multiple Users . . . . . . . . . . . . . . . . . . . . . 139 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 6 Doubly-Selective Channel Estimation Using Data-Dependent Superim- posed Training 144 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 6.2 Data-Dependent Superimposed Training Using CE-BEM . . . . . . . . . . . 145 6.2.1 Data-Dependent Processing at the Transmitter . . . . . . . . . . . . 146 6.2.2 Data Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.2.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.3 Data-Dependent Superimposed Training Using DPS-BEM . . . . . . . . . . 151 6.3.1 Partially-Data-Dependent (PDD) Superimposed Training . . . . . . 153 6.3.2 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 154 6.3.3 Power Allocation and Self-Interference Suppression . . . . . . . . . . 156 6.3.4 Recovery of Suppressed Frequencies via DML Approach . . . . . . . 163 6.4 Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 6.4.1 Data-Dependent Superimposed Training Using CE-BEM . . . . . . . 167 6.4.2 (Partially) Data-Dependent Superimposed Training Using DPS-BEM 174 6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 ix 7 Direct FIR Linear Equalization of Doubly-Selective Channels Based on Superimposed Training 185 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 7.2 Direct FIR Linear Equalization Using CE-BEM . . . . . . . . . . . . . . . . 186 7.2.1 Time-Varying FIR Equalizers . . . . . . . . . . . . . . . . . . . . . . 187 7.2.2 Linear LS Equalizers Based on CE-BEM . . . . . . . . . . . . . . . . 188 7.3 Direct FIR Linear Equalization: Multiple Users . . . . . . . . . . . . . . . . 196 7.3.1 User-Specific Training Sequences . . . . . . . . . . . . . . . . . . . . 199 7.3.2 Linear LS Equalizers for the Desired User . . . . . . . . . . . . . . . 199 7.4 Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 7.4.1 Direct FIR Equalization: Single User . . . . . . . . . . . . . . . . . . 204 7.4.2 Direct FIR Equalization: Multiple Users . . . . . . . . . . . . . . . . 207 7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 8 Concluding Remarks and Future Work 211 8.1 Summary of Original Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 8.2 Possible Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 Bibliography 216 Appendices 224 A Optimal Time-Multiplexed Training for Block Transmissions over Doubly-Selective Channels [40] 225 B Symbol Detection 230 B.1 Maximum Likelihood Sequence Detector (Viterbi Detector) [64] . . . . . . . 230 B.2 Kalman Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 C Mathematical Notations 239 D Abbreviations 241 x List of Figures 2.1 Modeling error of CE-, OP-, and DPS-BEM?s in approximating a three-tap (L = 2) Rayleigh fading channel following Jakes? model. . . . . . . . . . . . 25 3.1 First-order statistics-based estimator (SISO): BER vs SNR under fd = 0Hz (time-invariant) and K = N = 1. The curves for CE-, OP- and DPS- BEM?s completely overlap, since the three basis functions are all constant for time-invariant channels (Q = 1). (SI: superimposed training; TM: time- multiplexed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . 55 3.2 First-order statistics-based estimator (SISO): BER vs SNR underfd = 50Hz andK = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . . . . . . . . . . . . . 56 3.3 First-order statistics-based estimator (SISO): BER vs SNR under fd = 100Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . . . . . . . . 57 3.4 First-order statistics-based estimator (SISO): BER vs SNR under fd = 200Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . . . . . . . . 58 3.5 First-order statistics-based estimator (SISO): NCMSE vs SNR under fd = 0Hz (time-invariant) and K = N = 1. The curves for CE-, OP- and DPS- BEM?s completely overlap, since the three basis functions are all constant for time-invariant channels (Q = 1). (SI: superimposed training; TM: time- multiplexed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . 59 3.6 First-order statistics-based estimator (SISO): NCMSE vs SNR under fd = 50Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . . . . . . . . 60 3.7 First-order statistics-based estimator (SISO): NCMSE vs SNR under fd = 100Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . . . . . . . . 61 3.8 First-order statistics-based estimator (SISO): NCMSE vs SNR under fd = 200Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . . . . . . . . 62 xi 3.9 First-order statistics-based estimator (MIMO): BER vs SNR underfd = 0Hz (time-invariant) and K = N = 2. The curves for CE- and DPS-BEM?s completely overlap, since the two basis functions are both constant for time-invariant channels (Q = 1). (SI: superimposed training; TM: time- multiplexed training; CE: CE-BEM; DPS: DPS-BEM; Kalman: a Kalman filter as the symbol detector; Viterbi: a Viterbi detector as the symbol de- tector.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.10 First-order statistics-based estimator (MIMO): BER vs SNR under fd = 50Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; Kalman: a Kalman filter as the symbol detector; Viterbi: a Viterbi detector as the symbol detector.) . . . . 64 3.11 First-order statistics-based estimator (MIMO): BER vs SNR under fd = 100Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; Kalman: a Kalman filter as the symbol detector; Viterbi: a Viterbi detector as the symbol detector.) . . . . 65 3.12 First-order statistics-based estimator (MIMO): BER vs SNR under fd = 200Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; Kalman: a Kalman filter as the symbol detector; Viterbi: a Viterbi detector as the symbol detector.) . . . . 66 3.13 First-order statistics-based estimator (MIMO): NCMSE vs SNR under fd = 0Hz (time-invariant) and K = N = 2. The curves for CE- and DPS- BEM?s completely overlap, since the two basis functions are both constant for time-invariant channels (Q = 1). (SI: superimposed training; TM: time- multiplexed training; CE: CE-BEM; DPS: DPS-BEM.) . . . . . . . . . . . . 67 3.14 First-order statistics-based estimator (MIMO): NCMSE vs SNR under fd = 50Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM.) . . . . . . . . . . . . . . . . . . . 68 3.15 First-order statistics-based estimator (MIMO): NCMSE vs SNR under fd = 100Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM.) . . . . . . . . . . . . . . . . . . . 69 3.16 First-order statistics-based estimator (MIMO): NCMSE vs SNR under fd = 200Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM.) . . . . . . . . . . . . . . . . . . . 70 xii 4.1 Estimation variance: NCMSE vs SNR underfd = 0Hz (time-invariant). The curves for CE-, OP- and DPS-BEM?s completely overlap, since the three basis functions are all constant for time-invariant channels (Q = 1). (SI: super- imposed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM; MSE1: defined in (4.2); MSE: defined in (4.33).) . . . . . . . . . . . . . . . . . . . . 103 4.2 Estimation variance: NCMSE vs SNR under fd = 50Hz. (SI: superimposed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM; MSE1: defined in (4.2); MSE: defined in (4.33).) . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.3 Estimation variance: NCMSE vs SNR underfd = 100Hz. (SI: superimposed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM; MSE1: defined in (4.2); MSE: defined in (4.33).) . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.4 Estimation variance: NCMSE vs SNR underfd = 200Hz. (SI: superimposed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM; MSE1: defined in (4.2); MSE: defined in (4.33).) . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.5 Training power allocation: BER vs ? under fd = 0Hz (time-invariant). The curves for CE-, OP- and DPS-BEM?s completely overlap, since the three basis functions are all constant for time-invariant channels (Q = 1). (SI: superimposed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . 107 4.6 Training power allocation: BER vs ? under fd = 50Hz. (SI: superimposed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . . . . . . . . 108 4.7 Training power allocation: BER vs ? under fd = 100Hz. (SI: superimposed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . . . . . . . . 109 4.8 Training power allocation: BER vs ? under fd = 200Hz. (SI: superimposed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . . . . . . . . 110 4.9 Training power allocation: optimum ? vs SNR for CE-BEM. (?sim.?: simu- lation results; ?analy.?: ?opt in (4.46); ?analy. app.?: ??opt in (4.48).) . . . . 111 4.10 Training power allocation: optimum ? vs SNR for OP-BEM. (?sim.?: simu- lation results; ?analy.?: ?opt in (4.46); ?analy. app.?: ??opt in (4.48).) . . . . 112 4.11 Training power allocation: optimum ? vs SNR for DPS-BEM. (?sim.?: sim- ulation results; ?analy.?: ?opt in (4.46); ?analy. app.?: ??opt in (4.48).) . . . 113 4.12 Bias-variance trade-off: BER vs Q under TIR = 0.3 for different fd?s. . . . . 114 xiii 4.13 Bias-variance trade-off: SNRd (Q) (defined in (4.50)) vs Q under TIR = 0.3 for different fd?s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.14 Bias-variance trade-off: BER vs Q under TIR = 1.0 for different fd?s. . . . . 116 4.15 Bias-variance trade-off: SNRd (Q) (defined in (4.50)) vs Q under TIR = 1.0 for different fd?s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.1 DML approach (SISO): BER vs SNR under fd = 0Hz (time-invariant) and K = N = 1. The curves for CE- and DPS-BEM?s completely overlap, since the two basis functions are both constant for time-invariant channels (Q = 1). (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 128 5.2 DML approach (SISO): BER vs SNR underfd = 50Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.3 DML approach (SISO): BER vs SNR under fd = 100Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 130 5.4 DML approach (SISO): BER vs SNR under fd = 200Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 131 5.5 DML approach (SISO): NCMSE vs SNR under fd = 0Hz (time-invariant) and K = N = 1. The curves for CE- and DPS-BEM?s completely overlap, since the two basis functions are both constant for time-invariant channels (Q = 1). (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based esti- mator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . 132 xiv 5.6 DML approach (SISO): NCMSE vs SNR under fd = 50Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 133 5.7 DML approach (SISO): NCMSE vs SNR underfd = 100Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 134 5.8 DML approach (SISO): NCMSE vs SNR underfd = 200Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 135 5.9 DML approach (MIMO): BER vs SNR under fd = 0Hz (time-invariant) and K = N = 2. The curves for CE- and DPS-BEM?s completely overlap, since the two basis functions are both constant for time-invariant channels (Q = 1). (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 136 5.10 DML approach (MIMO): BER vs SNR under fd = 50Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 137 5.11 DML approach (MIMO): BER vs SNR under fd = 100Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 138 5.12 DML approach (MIMO): BER vs SNR under fd = 200Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 139 xv 5.13 DML approach (MIMO): NCMSE vs SNR under fd = 0Hz (time-invariant) and K = N = 2. The curves for CE- and DPS-BEM?s completely overlap, since the two basis functions are both constant for time-invariant channels (Q = 1). (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based esti- mator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . 140 5.14 DML approach (MIMO): NCMSE vs SNR underfd = 50Hz andK = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 141 5.15 DML approach (MIMO): NCMSE vs SNR under fd = 100Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE- BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . 142 5.16 DML approach (MIMO): NCMSE vs SNR under fd = 200Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE- BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . 143 6.1 Data-dependent superimposed training (CE-BEM): BER vs SNR for non- data-dependent, data-dependent, and time-multiplexed training, under fd = 0 and 50Hz. (SI: superimposed training; TM: time-multiplexed training; ? = 1: non-data-dependent training; ? = 0: total elimination of self-interference.) 167 6.2 Data-dependent superimposed training (CE-BEM): BER vs SNR for non- data-dependent, data-dependent, and time-multiplexed training, under fd = 100 and 200Hz. (SI: superimposed training; TM: time-multiplexed train- ing; ? = 1: non-data-dependent training; ? = 0: total elimination of self- interference.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 6.3 Data-dependent superimposedtraining (CE-BEM): NCMSE vs SNR for non- data-dependent, data-dependent, and time-multiplexed training, under fd = 0 and 50Hz. (SI: superimposed training; TM: time-multiplexed training; ? = 1: non-data-dependent training; ? = 0: total elimination of self-interference.) 169 xvi 6.4 Data-dependent superimposed training (CE-BEM): NCMSE vs SNR for non-data-dependent, data-dependent, and time-multiplexed training, under fd = 100 and 200Hz. (SI: superimposed training; TM: time-multiplexed training; ? = 1: non-data-dependent training; ? = 0: total elimination of self-interference.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 6.5 Data-dependent superimposed training (CE-BEM): BER vs SNR for non- data-dependent, data-dependent, and time-multiplexed training, under fd = 100Hz and N = 1, 2, and 3. (SI: superimposed training; TM: time- multiplexed training; ? = 1: non-data-dependent training; ? = 0: total elimination of self-interference.) . . . . . . . . . . . . . . . . . . . . . . . . . 171 6.6 Data-dependent superimposed training (fast fading): BER vs SNR under fd = 100 and 250Hz. (SI: superimposed training; TM: time-multiplexed training; ? = 1: non-data-dependent training; ? = 0: total elimination of self-interference.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 6.7 Estimation variance: NCMSE vs fd under SNR = 25dB for comparison between analytical and simulation-based results of non-data-dependent and data-dependent superimposed training. (? = 1: non-data-dependent train- ing; ? = 0: total elimination of self-interference; ?: standard deviation.) . . 173 6.8 PDD superimposed training: NCMSE vs SNR for CE- and DPS-BEM-based estimators, under fd = 100Hz. (? = 1: non-data-dependent training; ? = 0: total elimination of self-interference; ? = 0.2: partial elimination of self- interference at the channel estimation stage.) . . . . . . . . . . . . . . . . . 174 6.9 PDD superimposed training: BER vs SNR for CE- and DPS-BEM-based estimators, under fd = 100Hz. (? = 1: non-data-dependent training; ? = 0: total elimination of self-interference; ? = 0.2: partial elimination of self- interference at the channel estimation stage.) . . . . . . . . . . . . . . . . . 175 6.10 PDD superimposed training: BER vs (?,?) under SNR = 15dB and fd = 100Hz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 6.11 PDD superimposed training: optimum (?,?) vs SNR under fd = 100Hz. . . 177 6.12 PDD superimposed training: BER vs SNR underfd = 0Hz (time-invariant). (TM: time-multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2; ?3rd iteration?: the third DML iteration; ? = 1: non-data- dependent training; ? = 0: total elimination of self-interference; ? = 0.2: partial elimination of self-interference at the channel estimation stage.) . . . 178 xvii 6.13 PDD superimposed training: BER vs SNR under fd = 100Hz. (TM: time- multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2; ?3rd iteration?: the third DML iteration; ? = 1: non-data-dependent train- ing; ? = 0: total elimination of self-interference; ? = 0.2: partial elimination of self-interference at the channel estimation stage.) . . . . . . . . . . . . . 179 6.14 PDD superimposed training: BER vs SNR under fd = 200Hz. (TM: time- multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2; ?3rd iteration?: the third DML iteration; ? = 1: non-data-dependent train- ing; ? = 0: total elimination of self-interference; ? = 0.2: partial elimination of self-interference at the channel estimation stage.) . . . . . . . . . . . . . 180 6.15 PDD superimposed training: NCMSE vs SNR under fd = 0Hz (time- invariant). (TM: time-multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2; ?3rd iteration?: the third DML iteration; ? = 1: non-data-dependent training; ? = 0: total elimination of self-interference; ? = 0.2: partial elimination of self-interference at the channel estimation stage.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 6.16 PDD superimposed training: NCMSE vs SNR under fd = 100Hz. (TM: time-multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2; ?3rd iteration?: the third DML iteration; ? = 1: non-data-dependent training; ? = 0: total elimination of self-interference; ? = 0.2: partial elimi- nation of self-interference at the channel estimation stage.) . . . . . . . . . 182 6.17 PDD superimposed training: NCMSE vs SNR under fd = 200Hz. (TM: time-multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2; ?3rd iteration?: the third DML iteration; ? = 1: non-data-dependent training; ? = 0: total elimination of self-interference; ? = 0.2: partial elimi- nation of self-interference at the channel estimation stage.) . . . . . . . . . 183 7.1 Single-user direct FIR equalization: BER vs SNR underfd = 0Hz and length of equalizer Le = 6 with different TIR and number of receivers. . . . . . . . 205 7.2 Single-user direct FIR equalization: BER vs SNR under fd = 50Hz and length of equalizer Le = 6 with different TIR and number of receivers. . . . 206 7.3 Single-user direct FIR equalization: BER vs SNR under fd = 100Hz and length of equalizer Le = 6 with different TIR and number of receivers. . . . 207 7.4 Multiple-user direct FIR equalization (ad hoc): BER vs SNR under fd = 100Hz and length of equalizer Le = 4 with different number of receivers. . . 208 xviii 7.5 Multiple-user direct FIR equalization (ad hoc): BER vs Doppler spread fd under SNR = 25dB and length of equalizer Le = 4 with different number of receivers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 7.6 Multiple-user direct FIR equalization (ad hoc): BER vs length of equalizer Le under fd = 50Hz and SNR = 25dB with different number of receivers. . 210 xix Chapter 1 Introduction With the emergence of next-generation wireless mobile communications, multimedia services have increasing demands for higher data rates, better quality of service, and higher network capacity. In efforts to support such demands, researchers have paid special atten- tion to wireless channels. Phenomena occurring in wireless channels, such as fading, delay spread, Doppler spread, co-channel interference, and multi-user interference, may impair signal transmission and data reception. A wireless channel is a challenging communica- tions medium with limited bandwidth, relatively low capacity per unit bandwidth, random amplitude and phase fluctuations, and inter-symbol interference (ISI). To design a physical link with data rates approaching the fundamental information capacity limits of the wireless channel, accurate knowledge of the channel state information (CSI) becomes a prerequisite for many physical layer approaches. Channel estimation thus plays a key role. At the re- ceive ends, equalizers are usually used to compensate for the signal distortion. One may design an equalizer based on a channel estimate, or by directly using the received signals. Traditionally, receivers rely on a transmitter-assisted training session to extract the desired reference signal for channel estimation or equalization [64]. In a fast-varying envi- ronment, training sessions have to be transmitted frequently and periodically to keep up with the temporal variation of the channel. For a band-limited wireless application, frequent use of training sessions decreases the effective information rate. To save valuable spectrum resources, blind (self-recovering) channel estimation and equalization, based solely on the noisy received data, exploiting the statistical or other properties, has attracted researchers? 1 interest, where no training sessions are available nor are used [23]. Semi-blind channel es- timation, combining explicit (time-multiplexed) training and blind cost functions, has also attracted considerable attention due to the need for fast and robust channel estimation and the fact that, for many packet transmission systems, embedded known symbols can be exploited for channel estimation. In semi-blind approaches, there are training sessions but one uses information data also to improve the training-based results [23,87]. More recently, superimposed training-based approaches have been explored where the training sequence is ?on? all the time, and is transmitted (at low power) concurrently with (superimposed on) the information data. In contrast to explicit training, there is no loss in data transmission rate. On the other hand, some useful power is wasted in superimposed training sequences which could have otherwise been allocated to the information data. Inthis dissertation, we will discuss doubly-selective channel estimation and equalization using superimposed training. Common wireless channels are frequency-selective (due to delay spread and multipath propagation) and time-selective (due to mobility). An accurate model of realistic wireless channels can be complicated and involve too many parameters for estimation purposes. Therefore, a parsimonious representation is preferred. We employ basis expansion models (BEM) to represent the doubly-selective channel with many fewer parameters [24]. In a BEM, the channel is represented as a finite impulse response (FIR) filter where each tap is a superposition of distinct basis functions that describe the temporal variations of the channel. Three BEM?s are considered: the complex exponential basis expansion model (CE-BEM), the orthogonal polynomial basis expansion model (OP-BEM), and the discrete prolate spheroidal basis expansion model (DPS-BEM). 2 1.1 Previous Work In this section, we summarize the previous research work on superimposed training- based channel estimation, equalization, and related areas. To the best of our knowledge, the idea of superimposed training (simultaneous trans- mission of information-bearing signal and channel sounding) was first proposed in [36] in 1965 for analog communications, where a pseudo-random channel sounding signal was su- perimposed upon a frequency-modulated (FM) information-bearing signal by amplitude modulation (AM). This idea was extended to digital systems in [17] in 1995, where both least squares (LS) and least mean squares (LMS) methods were considered to build an adap- tive filter, treating the known superimposed training sequence as the input and the received signal as the desired output. Periodic superimposed training sequences allowed for the use of first-order statistics (time-varying mean) of the received signal, which were also exploited for time-invariant channel estimation in [59,82,98], among others. Using CE-BEM, such periodic superimposed training schemes were extended to doubly-selective channel environ- ments in [81,97]. Direct design of FIR equalizers using periodic superimposed training was investigated in [58] for time-invariant channels. The Cram?er-Rao lower bound (CRLB) on channel estimation variance was given in [98], under the assumption of Gaussian source symbols for a special class of training sequences. Such bounds were extended to a general class of training sequences in [59]. Non-periodic random or pseudo-noise (PN) superim- posed training sequences (known at the receiver) were used in [33,44]. A linear predictor was designed in [33] to estimate the time-varying flat fading channel, and based on the minimum mean square error (MMSE) criterion channel estimation and equalization were discussed in [44] for M-quadrature amplitude modulated (QAM) symbols. 3 The formulations of [59,82] allowed for the presence of an unknown ?direct current? (DC) offset at the receiver, whereas [17, 98] did not. The two schemes in [59,82] were compared by [45], where their structural equivalence was verified and therefore identical estimates would be got for zero (or known) DC offset. In the presence of an unknown DC offset, the basic approach of [59] yielded biased channel estimates, so that estimation of the DC offset was required by using the biased estimates and received data by finding the roots of a fifth-degree polynomial [59]. In contrast, the method of [82] yielded unbiased channel estimates directly. Performance analysis (a closed-form solution for the channel estimation variance) was also performed in [59] for zero (or known) DC offset, which was then used for an optimal training sequence synthesis to yield a channel-independent performance. Unfortunately, the synthesized training sequences in [59] do not necessarily have a small peak-to-average power ratio, whereas that of the training sequence in [82] has the optimal value of one. The performance analysis that is valid for any DC offset of the approach pro- posed by [82] was conducted in [84,85], where power allocation for superimposed training in Rayleigh fading channels was also addressed by maximizing the equivalent signal-to-noise ratio (SNR) for equalizer design under a fixed power constraint. As in [17], the period of the superimposed training sequence of [59] was equal to the number of channel taps, whereas this condition was relaxed in [82] to be greater or equal to the number of chan- nel taps. Synchronization of the training sequence (frame synchronization), based on the correlation and the fourth-order cumulant functions of the observations, was also discussed in [59]. Under mis-synchronization, however, the estimated channel will consequently yield a circularly shifted estimate whose ?shift? cannot be resolved via the first-order statistics of the data [59]. Synchronization of the approach in [82] was discussed in [83,85], where 4 the problem of shifted channel estimate was avoided. A synchronization technique based on subspace projections was discussed in [1]. The estimator proposed in [82] offers the fundamental approach to channel estimation in this dissertation. To exploit the enormous capacity potential of multiple-input multiple-output (MIMO) communications [18], superimposed training-based channel estimation was considered in [4,5,21,47,48] for MIMO systems. Superimposed training in multi-carrier systems was considered in [9,10,13,93]. We also note that a more general framework of superimposed training, engaged in affine precoding, has now attracted much interest and was investigated in [42,56,89], among others. Since superimposed training-based methods usually use statistical properties of the information data, they can be treated as semi-blind approaches [17]. In contrast to slow convergence and possible convergence toward incorrect solutions occurring in blind meth- ods [92], identifiability conditions for superimposed training-based methods are much less stringent [17]. Furthermore, blind approaches cannot resolve complex scaling factor ambi- guity, so that differential coding and decoding resulting in 3dB SNR loss is required [85], whereas power allocated to superimposed training is typically much less than 3dB (1dB or less in [17]). In superimposed training-based approaches, the unknown information data are typi- cally incorporated in the noise term, in essence yielding a lower SNR [73]. In other words, the information data may act as interference at the receiver to the superimposed training and adversely affect channel estimation and data detection performance?no loss in data transmission rate might be at the price of degradation in data reception. 5 Several methods were investigated to reduce the interference from information data (we call it self-interference in this dissertation since it comes from the transmitted signal itself). A selective superimposed training scheme was proposed in [76], where the selection of the training sequence depends on each frame of information data. From a candidate set of orthogonal sequences maintained at the transmitter, a training sequence is chosen and superimposed on the incoming information frame so as to minimize the correlation be- tween training and the information frame. In [20], a data-dependent superimposed training scheme was proposed for time-invariant channel estimation, where the training sequence is distorted before transmission in order that the self-interference is eliminated at the re- ceiver. This scheme was extended to MIMO systems in [21], and was modified to allow for unknown DC offset in [19]. However, in Chapter 6 we will show that the cost of using data- dependent superimposed training for interference cancelation is information loss to some extent. Channel estimation and data detection can also be enhanced in an iterative way, i.e., the detected data can be utilized to cancel the self-interference in the next iteration. Such applications are available in [13,49?51,97]. Now, one may wonder, in comparison with conventional time-multiplexed (TM) train- ing, what are the advantages and disadvantages of superimposed training? Since the ulti- mate goal of communications is to improve the capacity of communication systems to the Shannon bound, how can superimposed training help? A superimposed training-based scheme for space-time coded transmission over flat block fading (quasi-static) channels was considered in [8]. The analysis revealed the weak- ness of superimposed training in block-stationary (and thus time-invariant) environments, showing that superimposed training has higher CRLB than that of TM training due to 6 the presence of self-interference. On the other hand, if training must be included in every block and the channel estimation is accurate, the superimposed scheme gives higher mutual information (capacity) [8]. Performance bounds for TM and superimposed training-based semi-blind estimation of time-varying flat fading channels were considered in [16]. Under the same overall power allocation, it was shown that the superimposed training performs better for fast fading channels, which confirms the intuition that the constant presence of training has considerable benefit. For slow fading and high SNR, such an advantage disappears and there is a penalty for using superimposed training, since data transmission interferes with channel estimation [74]. This viewpoint was also confirmed by [73], which showed that when the coherence time is relatively short (compared with the time devoted to training in each block), superimposed training achieves higher capacity than that of TM training. It is because superimposed training allows for data transmission over the entire block, whereas TM training sessions occupy a large portion of time in this situation and hence not much time remains for information transmission. Capacity of superimposedtraining-based MIMO systems was considered in [5], where similar results have been derived, i.e., in the scenarios of high SNR, many receiver antennas, and short coherence time, it is beneficial to employ superimposed training; otherwise, TM training will be better. An important conclusion appeared in [4], where the author answered the following question: How much will the capacity increase by allowing re-estimation of the channel when the detected symbols are available? It was shown that the capacity after re-estimation can be very close to the fundamental capacity of the non-coherent channel, especially when the channel coherence time is short. After iterations, significant improvement is achieved?of course, at the expense of increased complexity. 7 1.2 Contributions In this dissertation, we investigate superimposed training-based approaches to the es- timation and equalization of doubly-selective channels. In order to model channel variation by a parsimonious representation, we explore various BEM?s, including CE-, OP-, and DPS-BEM?s, to approximate the multipath channel with Doppler spread. Our starting point is the first-order statistics-based estimator proposed by [81], us- ing CE-BEM. In approximating band-limited time-varying channels, the modeling error of CE-BEM is noticeable. Therefore, we extend this estimator using DPS- and OP-BEM?s to reduce the modeling error. A more general estimator that applies to arbitrary BEM?s is also provided. We then further apply this estimator to a multiple-user scenario. Chan- nel estimation across different users is decoupled by means of user-specific superimposed training sequences. Performance analysis is then conducted for the first-order statistics-based estimator for doubly-selective channels, in which we demonstrate that the interference in estimation mainly comes from the unknown information sequence (self-interference). Based on the results of the performance analysis, we cast the issues of power allocation and bias-variance trade-off as ones of optimizing an SNR for equalizer design, following the method proposed for time-invariant channels in [85]. The major drawback of superimposed training is that the self-interference from infor- mation data may adversely affect channel estimation and data reception performance. To alleviate the effect of self-interference, we propose two methods: A deterministic maximum likelihood (DML) approach is employed at the receiver, to enhance the channel estimation 8 iteratively, exploiting the detected symbols from the previous iteration to reduce the self- interference. We can also achieve this by transmitter-end processing?the superimposed training sequence is modified based on the information sequence, or equivalently, the in- formation sequence is distorted before transmission so that training and information data occupy distinct frequencies and hence can be separated at the receiver. However, distor- tion of the information sequence may cause ?information? loss before transmission, which cannot be fully recovered by receiver-end processing. A partially-data-dependent (PDD) superimposed training scheme is proposed in order to strike a trade-off between interference cancelation and information integrity. We also design a direct equalizer, without first estimating the channel, using superim- posed training and CE-BEM. With the aid of periodic white training sequences, we show that the optimal linear equalizer for the training sequence is also a scaled version of the op- timal equalizer for the information sequence. By employing user-specific training sequences, this direct equalizer can be extended to a multiple-user scenario, which can be used in a wireless ad hoc network. Computer simulation examples illustrate our proposed approaches. Analytical results are also compared with simulation results to show their validity. Comparisons with con- ventional TM training-based approaches are also presented?when self-interference is suf- ficiently suppressed by our proposed schemes, the performance of superimposed training- based approaches are competitive with the ones using TM training, without incurring any data-rate loss. 9 1.3 Organization The rest of this dissertation is organized as follows. In Chapter 2, representations of time-varying wireless channels are briefly reviewed, including Jakes? model, CE-, OP-, and DPS-BEM?s. Chapter 3 introduces the first-order statistics-based channel estimator using super- imposed training. We explore this estimator under different channel representations, and extend it from a single-user scenario to a multiple-user situation by exploiting user-specific superimposed training sequences. We consider performanceanalysis forthe first-orderstatistics-based estimators in Chap- ter 4. Training power allocation and bias-variance trade-off are also optimized from the viewpoint of equalization. The DML approach is considered in Chapter 5. Exploiting detected symbols from the previous iteration, the self-interference is reduced at the current iteration and therefore, channel estimation and data reception performance are improved. A multiple-user scenario is also considered. In Chapter 6, we investigate the data-dependent superimposed training scheme. The data-dependent processing at the transmitter results in information loss. We propose a PDD superimposed training scheme to mend this problem. Performance analysis and parameter design are also provided. Superimposed training-based direct equalization is considered in Chapter 7, by the aid of periodic white training sequences. This algorithm is also extended to a multiple-user wireless ad hoc network. The dissertation concludes in Chapter 8. Future directions are also suggested. 10 Chapter 2 Representations of Wireless Channels 2.1 Introduction Due to multipath propagation and Doppler spread, wireless channels are characterized by frequency- and time-selectivity [66]. A radio signal, experiencing distortions through transmission by fading, background noise, and interference of every sort, becomes stochastic to an observer at the receiver. Small-scale fading (or simply fading) is the term to describe the rapid fluctuations of the amplitudes, phases, or multipath delays of a signal over a short period of time or travel distance, so that large-scale path loss may be ignored [66]. The goal of channel estimation and equalization is mainly to combat small-scale fading. Fading can be attributed to physical factors including multipath propagation, relative motion between the transmitter and the receiver or surrounding objects, and the trans- mission bandwidth of the signal, etc [35]. The presence of reflecting objects and scatterers makes the wireless channel constantly changing, which dissipates the signal energy and dis- torts the signal in amplitude, phase, and time. Multiple versions of the transmitted signal arrive at the receiver through different paths. The random amplitudes and phases of the different multipath components induce fading. The relative motion between the transmitter and the receiver as well as the motion of the objects within the wireless channel, induces Doppler spreads, which are typically time-varying and become a source of fading also. 11 For channel estimation or tracking purposes, accurate modeling of the temporal evo- lution of the channel plays an important role. A parsimonious and accurate channel repre- sentation is always preferred. Among various models for channel time variations, the au- toregressive (AR) process, particularly the first-order AR model, is regarded as a tractable model to describe a time-varying channel, where the channel is assumed to be Markovian, i.e., for the current channel symbol, the effect of channel symbols other than the immediately preceding one is negligible [90]. This Markovian assumption has been verified for Rayleigh fading channels in [90], by considering the mutual information between channel symbols. The AR model has been used for time-varying channel estimation in [7,11,16,37,38]. The AR model, based on symbol-by-symbol update, is suitable for sequential time- domain processing. When we deal with block processing schemes, it is more convenient to use block-based channel models such as BEM?s. The BEM that is optimal in MSE is the discrete Karhuen-Lo`eve BEM (DKL-BEM), which is a reduced-rank decomposition of a certain type of Doppler spectrum [72]. The CE-BEM can be viewed as a special DKL-BEM based on a white Doppler spectrum, and the DPS-BEM corresponds to the DKL-BEM with a rectangular Doppler spectrum [72]. In this chapter, we briefly review representations of time-varying channels. In Section 2.2, Jakes? model is introduced, which will be used as the model of the ?real? channel in the simulation examples of this dissertation. In Sections 2.3?2.5, CE-, OP-, and DPS-BEM representations are discussed. The modeling error of these BEM?s is compared one another in Section 2.6 via a simulation example. Section 2.7 summarizes this chapter. 12 2.2 Jakes? Model If we assume that many statistically independent scattering waves with random am- plitudes and phases reach the receiver with the phases uniformly lying in [0,2pi), and there is no dominant non-fading signal component present (no line-of-sight), by the central limit theorem, the real and imaginary parts of the sum of the scattering waves are both Gaussian. The signal envelope A as a function of time t obeys a Rayleigh distribution, which has a probability density function (pdf) given by fA (a) := ? ?? ?? a ?2 exp parenleftBig ? a22?2 parenrightBig a? 0, 0 a< 0 (2.1) with ?2 being the time-average power of the received signal before envelope detection. The phase ? of the received signal is uniformly distributed with pdf f? (?) := 12pi, ? ? [0,2pi). (2.2) The autocorrelation function of the received signal for two-dimensional isotropic scattering and an omnidirectional receiving antenna is given by [12,71] RA(?) = ?2 cos(?c?)J0(?m?) (2.3) where ?c is the carrier radian frequency, J0(?) is the zero-order Bessel function of the first kind and ?m is the maximum Doppler radian frequency spread. Any model that attempts to model the Rayleigh flat fading narrow-band wireless channel has to exhibit the statistical behaviors given by (2.1)?(2.3). 13 Clarke summarized the important characteristics of fading channels and provided a use- ful mathematical model [12]. According to this model, Jakes proposed a sum-of-sinusoids- based simulator [35] that has been widely used and studied over the past decades. The simulator supposes the received signal S(t) to be a superposition of waves S(t) = E0 Nsummationdisplay n=1 Cn cos(?ct+?mtcosAn +?n) where E0 is the amplitude of the transmitted cosine wave, Cn is a random variable repre- senting the attenuation of the n-th path, An is a random variable representing the angle of arrival of the n-th ray with respect to the direction of motion of the receiver, ?n is a random variable representing the phase shift undergone by the n-th ray. Note that the stochastic signal S(t) representing the flat fading signal can be characterized by N sets of triples (Cn,An,?n). The random variables Cn, An, and ?n are assumed statistically independent. To reduce the complexity, Jakes? model selects Cn = 1?N, n = 1,2,...,N, (2.4a) An = 2pinN , n = 1,2,...,N, (2.4b) ?n = 0, n = 1,2,...,N. (2.4c) Furthermore, N is of the form N = 4M +2 where M is a positive integer. However, the simplification in (2.4) makes this simulation model deterministic and wide-sense nonstationary [63,96]. In [96], a modified Jakes? simulator was proposed. It is wide-sense stationary and its autocorrelation and cross correlation functions match the 14 desired reference model exactly. Following [96], the normalized low-pass fading process of the statistical sum-of-sinusoids simulation model is defined by X(t) = Xc (t) +jXs (t), (2.5a) Xc (t) = 2?M Msummationdisplay n=1 cos(?n)cos(?mtcos?n +?), (2.5b) Xs (t) = 2?M Msummationdisplay n=1 sin(?n)cos(?mtcos?n +?) (2.5c) with ?n = 2pin?pi+?4M , n = 1,2,...,M where ?, ?, and ?n are statistically independent and uniformly distributed over [?pi,pi) for all n. As M ? ?, the envelope |X| is Rayleigh distributed and the phase ?X (t) is uniformly distributed over [?pi,pi), for which the pdf?s are given by f|X| (x) =xexp parenleftbigg ?x 2 2 parenrightbigg , x? 0, f?X (?) = 12pi, ? ? [?pi,pi). A minor defect, however, occurs in model (2.5) when ?m = 0 or the Doppler spread is small: A Rayleigh distribution cannot be guaranteed [94]. This problem can be easily resolved by replacing a common phase ? by ?n, which is also uniformly distributed over 15 [?pi,pi) for all n. The simulation model is revised as [94]: X(t) = Xc (t)+jXs (t), (2.6a) Xc (t) = 2?M Msummationdisplay n=1 cos(?n)cos(?mtcos?n +?n), (2.6b) Xs (t) = 2?M Msummationdisplay n=1 sin(?n)cos(?mtcos?n +?n). (2.6c) 2.3 Complex Exponential Basis Expansion Model (CE-BEM) Recently, deterministic complex exponential basis expansion models (CE-BEM) have been widely investigated in wireless applications, especially when the multipath is caused by a few strong reflectors, and path delays exhibit variations due to the kinematics of the mobiles [24]. In these models, the time-varying taps are expressed as a superposition of time-varying basis functions in modeling Doppler effects, with time-invariant coefficients. By assigning temporal variations to basis functions, rapidly fading channels with coherence time as small as a few tens of symbols can be captured. If the delay spread and the Doppler spread of the channel (or at least the upper bounds of them) are known, one can infer the basis functions of the CE-BEM [40]. Treating the basis functions as known parameters, estimation of a time-varying process is reduced to estimate time-invariant coefficients. Consider a time-varying channel with impulse response h(t;?) (response at time t to a unit impulse at time t??) which includes transmit-receive filters as well as doubly-selective propagation effects. Let s(t) denote the complex baseband, continuous-time input signal (with symbol durationTs), andx(t) denote the complex baseband, continuous-time received signal. The noise-free received signal x(t) is the convolution of s(t) and h(t;?) [64]: 16 x(t) = integraldisplay ? 0 h(t;?)s(t??)d?. (2.7) Let H(f;?) = integraltext???h(t;?)e?j2piftdt be the Fourier transform of h(t;?). If |H(f;?)| ? 0 for |?| > ?d, then ?d is defined as the delay-spread of the channel; if |H(f;?)| ? 0 for |f| >fd, then fd is defined as the Doppler spread of the channel [40]. Sampling s(t), x(t) and h(t;?) in (2.7) at the symbol rate, then for t = nTs ? [t0,t0 +TTs), the sampled signal x(n) := x(t)|t=nTs has the representation x(n) = Lsummationdisplay l=0 h(n;l)s(n?l). (2.8) Over the block interval of [t0,t0 +TTs), the channel impulse response {h(n;l)}T?1n=0 can be represented by Q coefficients {hq (l)}Qq=1 (which remain invariant throughout this block but are allowed to change at the next block) and the corresponding Q Fourier basis functions that are common for each block. Then over the interval [t0,t0 + TTs), the discrete-time baseband equivalent channel model for the block can be described as [39,40]: h(n;l) = Qsummationdisplay q=1 hq(l)ej?qn (2.9a) Q := 2?fdTTs?+1, (2.9b) L := ??d/Ts?, (2.9c) ?q := 2piT (q? Q+12 ), q = 1,...,Q. (2.9d) 17 2.4 Orthogonal Polynomial Basis Expansion Model (OP-BEM) A time-varying channel over a fixed time interval can also be expressed as a superpo- sition of polynomials with invariant coefficients. Following [6], by a Taylor series expan- sion, the continuous-time channel impulse response h(t;?) within a window of time interval [t0,t0 +TTs) with respect to a midpoint nTs +t0 is given by h(t;?) = Ksummationdisplay i=0 ?(i)n (?) parenleftbiggt?nT s ?t0 Ts parenrightbiggi +RK(t;?), (2.10) where the coefficients ?(i)n (?) are defined as ?(i)n (?) := T is i! bracketleftbiggdih(t;?) dti bracketrightbigg t=nTs+t0 (2.11) and RK(t;?) is the remainder of the Taylor series, given by RK(t;?) := (t?nTs ?t0) K+1 (K +1)! bracketleftbiggdK+1h(t;?) dtK+1 bracketrightbigg t=s? (2.12) for some s? ? [t,nTs +t0]. The polynomials [(t?nTs ?t0)/Ts]i (i = 0,1,...,K) serve as the basis functions in (2.10). In mobile wireless channels, the bandwidth of h(t;?) in t (the Doppler spread) is strictly bounded above by v/? where v is the velocity of the mobile and ? is the carrier wavelength. Therefore, h(t;?) can be differentiated to any order with respect to t in the mean square sense, and so (2.11) and (2.12) are well defined [6]. Since h(t;?) is band-limited in t, for a given window size T, limK??|RK(t;?)|2 = 0. Thus with increasing K, the polynomial approximation becomes more and more accurate. As pointed out in [6], increasing the polynomial order K allows the window size to be 18 increased significantly, without the remainder term (2.12) becoming large. Sampling h(t;?) every Ts seconds (t = nTs +t0 ? [t0,t0 +TTs)) and ignoring the error remainder term, we have the discrete time-varying channel impulse response as follows h(n;l) = Ksummationdisplay i=0 ?(i)(l)(n?n)i, (2.13) which is valid over a duration of TTs seconds (T samples). The polynomials {1,t,t2,...,tK} are linearly independent over [?1,1], but not or- thogonal. A QR-decomposition was suggested in [6] and generated a unitary matrix of- fering an orthonormal set of basis vectors. Or equivalently, via the Gram-Schmidt pro- cedure over the interval [?1,1], we get the Legendre polynomials [53]. By appropri- ate scaling and translation of the (original) Legendre polynomials, we can obtain modi- fied Legendre polynomials which are orthonormal over the interval [t0,t0 + TTs]. Sam- pling these polynomials at the symbol interval Ts, we get the orthogonal polynomial ba- sis expansion model (OP-BEM). Let p(i)(?t) denote the orthonormal Legendre polynomial of degree (order) i over the interval [?1,1]. To extend [?1,1] to [t0,t0 + TTs], we set t = (TTs/2)?t +t0 + (TTs/2), leading to ?t = (2/(TTs))[t?t0] ? 1 and modified Legendre polynomials p(i)?(t) = p(i)((2/(TTs))[t?t0]?1) orthonormal over the interval [t0,t0 +TTs]. Sample p(i)?(t)?s at t = nTs +t0, (n = 0,1,...) to obtain the discretized modified Legendre polynomials ?i(n) := p(i)(2nT ?1). (2.14) 19 The discretized modified Legendre polynomials up to degree (order) five are as follows: ?0(n) = 1, ?1(n) = c1 bracketleftbigg2 Tn?1 bracketrightbigg , ?2(n) = c2 bracketleftbigg (2Tn?1)2 ? 13 bracketrightbigg , ?3(n) = c3 bracketleftbigg (2Tn?1)3 ? 35( 2Tn?1) bracketrightbigg , ?4(n) = c4 bracketleftbigg (2Tn?1)4 ? 67( 2Tn?1)2 + 335 bracketrightbigg , ?5(n) = c5 bracketleftbigg (2Tn?1)5 ? 109 (2Tn?1)3 + 521(2Tn?1) bracketrightbigg where 0 ?n?T ?1 and ci = 1/ radicalBigsummationtext T?1 n=0 ?i(n)2 for i = 0,1,...,5. Using the basis functions of (2.14), the polynomial representation of equation (2.13) is written as h(n;l) = Ksummationdisplay i=0 hi(l)?i(n), 0 ?l?L. (2.15) Note that the number of basis functions is Q =K +1. 2.5 Discrete Prolate Spheroidal Basis Expansion Model (DPS-BEM) Using CE-BEM to describe a band-limited channel, it has been observed that this truncated discrete Fourier transform (DFT)-based model has the following drawback: The rectangular window associated with the DFT introduces spectral leakage?the energy at each individual frequency leaks to the full frequency range [65]. An effect similar to Gibbs phenomenon results in significant amplitude and phase distortion at the beginning and the end of the block [94]. The modeling error of CE-BEM may cause a noticeable floor in bit 20 error rate (BER) curves, as shown in [2]. Taking advantage of OP-BEM will reduce the spectral leakage to some extent [51], whereas the polynomial functions are neither time- limited nor band-limited. Its square bias varies significantly over the range of the Doppler spread [94]. An ideal basis function should have at least two properties: It is band-limited to the normalized frequency range [?fdTs,fdTs]; and its energy is time-concentrated in a certain time interval 0 ?n?T ?1. Given the maximum normalized Doppler bandwidth fdTs and the window size T, we seek a sequence to maximize ?= summationtextT?1 n=0 |u(n)| 2 summationtext? m=??|u(m)| 2 (2.16) with the band-limited constraint u(n) = integraldisplay fdTs ?fdTs U(f)ej2pifndf where U(f) =summationtext?m=??u(m)e?j2pifm. The discrete prolate spheroidal (DPS) sequences {ui (n)} give us the solution of the above constrained maximization problem [69], which is defined as the real-valued solution of T?1summationdisplay n=0 sin[2pi(n?m)fdTs] pi(n?m) ui (n) = ?iui (m) for i = 1,...,T and ?? < m < ?. For the discrete time index 0 ? n ? T ?1, the i-th time-limited DPS vector ui := bracketleftbigg ui (0) ui (1) ??? ui (T ?1) bracketrightbiggT is the i-th eigenvector of a matrix C: Cui = ?iui, (2.17) 21 where the (n,m)-th entry of the T ?T matrix C is [C]n,m = sin[2pi(n?m)fdTs]pi(n?m) and ?1 ??2 ?...??T are the eigenvalues of C. The DPS sequences are orthonormal on the finite time interval 0 ? n ? T ? 1, and orthogonal on the doubly infinite interval, i.e., T?1summationdisplay n=0 ui (n)uk (n) = ?i ?summationdisplay m=?? ui (m)uk (m) = ?(i?k). The band-limited (infinite) sequence {u1 (m)} has the maximum energy concentration in 0 ? m ? T ? 1, {u2 (m)} is the next band-limited sequence that has the most energy concentration among the DPS sequences orthogonal to {u1 (m)}, and so on. By (2.16), the eigenvalues ?i?s are a measure for energy concentration, which are clustered near 1 for i? ?2fdTsT?+1 and drop rapidly toward zero when i>?2fdTsT?+1 [94]. Therefore, the number of dimensions of time-limited snapshots of a band-limited channel is approximately given by [69] Q = ?2fdTsT?+1. (2.18) All the properties describedso far make it possible to greatly reduce spectral leakage induced by the CE-BEM, by using several DPS sequences to form the basis set to approximate a band-limited time-varying channel. 22 In a DPS-BEM representation, we assume that h(n;l) = Qsummationdisplay q=1 hq (l)uq (n). (2.19) The square bias of the DPS-BEM to approximate a time-varying channel is several magni- tudes lower than that of the CE-BEM over the range of Doppler spreads [94]. 2.6 Modeling Error of BEM?s We illustrate the modeling errorof CE-, OP-, and DPS-BEM?s by a simulation example. 2.6.1 LS Approximation by Basis Expansion Models In a BEM, we assume that the time-varying channel satisfies h(n;l) = Qsummationdisplay q=1 hq (l)?q (n), (2.20) where ?q (n) is the q-th basis function (corresponding to ej?qn in (2.9), ?q?1(n) in (2.15), and uq (n) in (2.19)), and Q is the number of basis functions (Q = K + 1 for (2.15)). However, the true channel may not exactly follow this expression, for modeling error always occurs. We revise (2.20) as h(n;l) = Qsummationdisplay q=1 hq (l)?q (n) +e(n;l) where e(n;l) denotes the modeling error. By the orthogonality principle, e(n;l) is or- thogonal to the basis set {?q (n)}Qq=1 when the square error summationtextT?1n=0 |e(n;l)|2 is minimized. 23 Then T?1summationdisplay n=0 h(n;l)??q? (n) = Qsummationdisplay q=1 hq (l) T?1summationdisplay n=0 ?q (n)??q? (n) = hq? (l) T?1summationdisplay n=0 vextendsinglevextendsingle? q? (n) vextendsinglevextendsingle2. Therefore hq? (l) = summationtextT?1 n=0 h(n;l)? ? q? (n)summationtext T?1 n=0 vextendsinglevextendsingle? q? (n) vextendsinglevextendsingle2 . The LS approximation by a BEM is given by ?h(n;l) = Qsummationdisplay q=1 summationtextT?1 n=0 h(n;l)|?q (n)| 2 summationtextT?1 n=0 |?q (n)| 2 . (2.21) 2.6.2 Simulation Model In the simulations of this dissertation, we use the modified Jakes? model (2.6) to rep- resent the ?real? channel. We emphasize that BEM representations are only used for pro- cessing at the receiver. A discrete-time baseband Rayleigh fading channel (which can be SISO, SIMO, or MIMO in the subsequent chapters) of order L is generated (see (2.8)). For different taps (i.e., different l?s), h(n;l)?s are mutually independent, and for a given tap, we follow (2.6) to generate h(n;l) by sampling X(t) with symbol period Ts: h(n;l) = X(t)|t=nTs. In simulations, we take M = 25 in (2.6). 24 0 20 40 60 80 100 120 140 160 180 200?60 ?50 ?40 ?30 ?20 ?10 0 10 fd (Hz) Normalized Channel MSE (dB) Modeling Error of BEM?s: L=2, Ts=25?s, T=400, 1000 runs. CE?BEM: Q=3,5,7 OP?BEM: Q=4,5,6 DPS?BEM (known Doppler): Q=4,5,6 DPS?BEM (unknown Doppler): Q=4,5,6 Figure 2.1: Modeling error of CE-, OP-, and DPS-BEM?s in approximating a three-tap (L = 2) Rayleigh fading channel following Jakes? model. 2.6.3 Simulation Example: Modeling Error of CE-, OP-, and DPS-BEM?s in Approximating a Doubly-Selective Channel We consider a system with carrier frequency of 2GHz, data rate of 40kBd (kilo-Bauds), therefore, Ts = 25?s. The maximum Doppler spread (in Hz) fd = ?m/2pi ranges from 0Hz to 200Hz (or the normalized Doppler shift ranges from 0 to 0.005), corresponding to a maximum mobile velocity in the range 0 to 108km/h. A SISO three-tap (L = 2) Rayleigh fading channel is generated using (2.6). We try to approximate this Jakes? model using different BEM?s: CE-BEM, OP-BEM, DPS-BEM with known Doppler spread, and DPS-BEM with Doppler spread unknown. If the Doppler spread is known, we can follow (2.17) to obtain the DPS sequences using the exact Doppler 25 spread; whereas we can only use predetermined DPS sequences if the Doppler spread is unknown. In this example, we assume fd = 200Hz in (2.17) to get the DPS sequences. We pick up a data record length of 400 symbols, average over 1000 realizations of randomly generated channels, and plot the normalized channel mean square error (NCMSE) defined as NCMSE := summationtext1000 i=1 summationtextT?1 n=0 summationtext2 l=0 vextendsinglevextendsingle vextendsingleh(i) (n;l)??h(i) (n;l) vextendsinglevextendsingle vextendsingle 2 summationtext1000 i=1 summationtextT?1 n=0 summationtext2 l=0 vextendsinglevextendsingleh(i) (n;l)vextendsinglevextendsingle2 , where h(i) (n;l) denotes the realization of the channel in the i-th run, and ?h(i) (n;l) denotes the BEM-based approximation by (2.21). For CE-BEM, we plot the NCMSE curves with Q = 3, 5, and 7 (note that in (2.9), only odd Q is allowed). For OP- and DPS-BEM representations, we take Q = 4, 5, and 6. Using more basis functions apparently reduces modeling error, which is confirmed by Figure 2.1. The OP- and DPS-BEM with known Doppler spreads outperform CE-BEM significantly for small fd?s. As fd grows, OP-BEM deteriorates quickly, whereas DPS-BEM with known Doppler spread is still much better than the other two. If the Doppler spread is unknown, the performance of DPS-BEM is a little worse, but it still outperforms CE-BEM. In Figure 2.1, the NCMSE of DPS-BEM is at least two orders of magnitude lower than that of the CE-BEM, whether with known Doppler spread or not?DPS-BEM is undoubtedly the best among the three to describe a band-limited channel. For a fixed Q, the NCMSE?s of the CE-BEM and DPS-BEM with unknown Doppler spread fluctuate mildly over the range of Doppler spreads. In these two scenarios, the BEM?s are both band-limited. If the ?true? channel (that is also band-limited within the Doppler shifts) lies within the frequency band of the BEM, the resulting error should not fluctuate significantly for different Doppler spreads. 26 2.7 Conclusions In this chapter, we reviewed characteristics and representations of wireless channels. We first discussed Jakes? model, which will be used as the ?real? channel in simulation examples in the following chapters. For channel estimation and data processing at the receiver, a BEM, a more parsimonious representation that is independent of the ?real? channel, will be used to describe the temporal variation of the channel. We discussed CE-, OP-, and DPS-BEM?s. Although the CE-BEM is more convenient in the theoretical analysis, its modeling error is noticeable due to the spectral leakage. We may employ OP- and DPS-BEM?s to reduce this phenomenon. We also compared the modeling error of the three BEM?s?the DPS-BEM has the minimum modeling error among the three, since spectral leakage is greatly reduced due to the energy concentration of the DPS sequences. 27 Chapter 3 First-Order Statistics-Based Estimation of Doubly-Selective Channels 3.1 Introduction An estimator for time-invariant frequency-selective channels, using periodic superim- posed training and the first-order statistics of the observations, was proposed in [82]. This estimator was soon extended to doubly-selective channels by exploiting CE-BEM in [81]. We start with this CE-BEM-based channel estimator in [81], which offers us the basic framework of our superimposed training-based channel estimation schemes. In this chapter, we first review the first-order statistics-based doubly-selective channel estimator of [81] in Section 3.2. By exploiting the band-limitedness of DPS sequences, in Section 3.3 this estimator is extended to DPS-BEM. Since polynomial models are not band- limited, we propose a more general estimator for OP-BEM in Section 3.4, which can apply to arbitrary BEM representations. In Section 3.5, we extend our CE- and DPS-BEM-based channel estimators to multiuser systems. Exploiting the band-limitedness of CE- and DPS- BEM?s, the channel estimation across different users is decoupled by assigning user-specific training sequences to different frequencies for distinct users. Our approaches are illustrated by simulation examples in Section 3.6. Section 3.7 concludes this chapter. 3.2 First-Order Statistics-Based Channel Estimation Using CE-BEM [81] Considera single-input multiple-output (SIMO) discrete-time basebandcommunication system. Let {s(n)}denote the input symbol sequence that is transmitted over an FIR linear channel with N outputs and discrete-time impulse response {h(n;l)} (N-column vector 28 channel response at time n to a unit input at time n?l). The vector channel may be the result of multiple receiver antennas or over-sampling at the receiver. The channel output is given by x(n) = Lsummationdisplay l=0 h(n;l)s(n?l). (3.1) The noisy measurement is given by y(n) = x(n)+v(n) (3.2) where v(n) is an N-column white complex-Gaussian noise vector. To allow for mean-value ambiguity, we take E{v(n)} = m, with m unknown. In practice, linear systems arise because of linearization about some operating (set) point??bias? in BJT/FET amplifiers. These set points are typically unknown (at least not known precisely) a priori, and one does not normally worry about them since unknown means are estimated and removed before processing (blocked by capacitor-coupling etc.) and they are not needed in any processing. However, we will initially use the first-order statistics, i.e., E{y(n)}, of the noisy data. Then we must include a term such as nonzero m. Channel taps are superpositions of complex exponentials weighted by time-invariant coefficients in CE-BEM. The time-varying SIMO channel response is given by (2.9) h(n;l) = Qsummationdisplay q=1 hq(l)ej?qn, (3.3) 29 for Q := 2?fdTTs?+1, L := ??d/Ts?, ?q := 2piT parenleftbigg q? 12 ? Q2 parenrightbigg , q = 1,2,...,Q, where ?d is the (multipath) delay-spread and fd is the Doppler spread. The above represen- tation is valid over a duration of TTs sec. with symbol interval Ts sec. If ?d and fd (or their upper bounds) are known (typically true), then h(n;l) is unknown up to only time-invariant quantities hq(l)?s. In superimposed training, the transmitted signal {s(n)} is the superposition of the information sequence {b(n)} and a training sequence {c(n)}, i.e., s(n) = b(n)+c(n). (3.4) Assume the following: (H3.2.1) The time-varying channel {h(n;l)} satisfies (3.3) where the frequencies ?q?s (q = 1,2,...,Q) are distinct and known with ?q ? [0,2pi). Also N ? 1. (H3.2.2) The information sequence {b(n)} is zero-mean, white with E{| b(n)|2} = ?2b. (H3.2.3) The measurement noise {v(n)} may be nonzero-mean (E{v(n)} = m), white, uncor- related with {b(n)}, with E{[v(n + ?) ? m][v(n) ? m]H} = ?2vIN?(?). The mean vector m may be unknown. 30 (H3.2.4) The superimposed training sequencec(n) = c(n+P) for allnis a non-random periodic sequence with period P. By (H3.2.4), we have cm := 1P P?1summationdisplay n=0 c(n)e?j?mn, (3.5a) c(n) = P?1summationdisplay m=0 cmej?mn, for all n, (3.5b) ?m = 2pim/P, m = 0,1,...,P ?1. (3.5c) The coefficients cm?s are known at the receiver since the training sequence {c(n)} is known. We then have y(n) = Lsummationdisplay l=0 Qsummationdisplay q=1 hq(l)ej?qn bracketleftBigg b(n?l) + P?1summationdisplay m=0 cmej?m(n?l) bracketrightBigg +v(n). By (H3.2.2) and (H3.2.3), the expectation of the observation at time n is E{y(n)} = Qsummationdisplay q=1 P?1summationdisplay m=0 bracketleftBigg Lsummationdisplay l=0 cmhq(l)e?j?ml bracketrightBigg ej(?q+?m)n +m. Defining dmq := Lsummationdisplay l=0 cmhq(l)e?j?ml, (3.6) we then have E{y(n)} = Qsummationdisplay q=1 P?1summationdisplay m=0 dmqej(?q+?m)n +m. (3.7) 31 Suppose that we pick P to be such that (?q +?m)?s are all distinct for any choice of m and q; for instance, take TP?1 ? Q. In fact, we will take TP?1 = K ? Q where K is an integer, so that ?q?s and ?m?s are on the same frequency grid of resolution T?1. Then the sequence E{y(n)} is periodic with cycle frequencies (?q + ?m), for 1 ? q ? Q and 0 ?m?P ?1. By (3.7), y(n) = E{y(n)}+e(n) = Qsummationdisplay q=1 P?1summationdisplay m=0 dmqej(?q+?m)n +m+e(n) where {e(n)} is a zero-mean random ?error? sequence. Define the cost function J = T?1summationdisplay n=0 bardble(n)bardbl2 = T?1summationdisplay n=0 vextenddoublevextenddouble vextenddoublevextenddouble vextenddoublevextenddoubley(n)? Qsummationdisplay q=1 P?1summationdisplay m=0 dmqej(?q+?m)n +m vextenddoublevextenddouble vextenddoublevextenddouble vextenddoublevextenddouble 2 . (3.8) Choose dmq?s to minimize J. We must have ?J ?d?mq vextendsinglevextendsingle vextendsinglevextendsingle dmq=?dmq = T?1summationdisplay n=0 e?j(?q+?m)n ? ?y(n)? Qsummationdisplay q?=1 P?1summationdisplay m?=0 dmqej(?q?+?m?)n +m ? ?= 0 for each dmq. A consistent mean-square (m.s.) estimate of dmq, for ?q +?m negationslash= 0, follows as ?dmq = 1 T T?1summationdisplay n=0 y(n)e?j(?q+?m)n. (3.9) It follows from (3.7) and (3.9) that E braceleftBig? dmq bracerightBig = dmq (3.10) 32 for ?q +?m negationslash= 0. As T ? ?, ?dmq ? dmq m.s. if ?q +?m negationslash= 0, and ?dmq ? dmq +m m.s. if ?q +?m = 0. It is established in [82] that given dmq for 1 ? q ? Q and 1 ? m ? P ? 1, we can (uniquely) estimate hq(l)?s if P ? L+ 2, ?m negationslash= 0, and cm negationslash= 0 for all m negationslash= 0. Since m is unknown and ?q +?m = 0 only when m = 0, we will omit the term m = 0 for further discussion. For 1 ?m?P ?1, define the NQ-column vector Dm := [dTm1, dTm2, ..., dTmQ]T, (3.11) and for 0 ?l?L, define the NQ-column vector Hl := [hT1 (l), hT2 (l), ..., hTQ(l)]T. (3.12) Then by (3.6), we have Dm = Lsummationdisplay l=0 cme?j?mlHl for 1 ?m?P ?1. Define the NQ(P ?1)?NQ(L+1) matrix C := (diag{c1,...,cP?1}V)?INQ (3.13) where V := ? ?? ?? ?? ?? ?? 1 e?j?1 ??? e?j?1L 1 e?j?2 ??? e?j?2L ... ... ... ... 1 e?j?P?1 ??? e?j?P?1L ? ?? ?? ?? ?? ?? , (3.14) 33 the NQ(L+1)-column vector H := bracketleftbigg HT0, HT1, ..., HTL bracketrightbiggT , (3.15) and the NQ(P ?1)-column vector D := bracketleftbigg DT1, DT2, ..., DTP?1 bracketrightbiggT . (3.16) Then (3.6) leads to CH = D. (3.17) Since ?m?s are distinct and cm negationslash= 0 for all m, rank(C) = NQ(L+1) if P ?L+2; hence, we can determine hq(l)?s uniquely. Define ?Dm as in (3.11) with dmq replaced with ?dmq. Define ?D as in (3.16) with Dm replaced with ?Dm. Then the estimate of H is given by ?H = (CHC)?1CH ?D. (3.18) By (3.10) and (3.17), it follows that E braceleftBig? H bracerightBig = H. (3.19) Denote the corresponding estimate of hq(l) by ?hq(l) for q = 1,2,...,Q and l = 0,1,...,L. Following (3.3), the time-varying channel coefficients are given by ?h(n;l) = Qsummationdisplay q=1 ?hq(l)ej?qn, l = 0,1,...,L, 0 ?n?T ?1. (3.20) 34 We summarize this estimator in the following lemmas: Lemma 3.2.1: Under assumptions (H3.2.1)?(H3.2.4), the channel estimator (3.18) is unbiased by (3.19) if the periodic training sequence is such that cm negationslash= 0 for all m negationslash= 0, P ?L+2, and P and T are such that T = KP for integer K ?Q. Lemma 3.2.2: Under assumptions (H3.2.1)?(H3.2.4), the channel estimator (3.18) is consistent in probability if the periodic training sequence is such that cm negationslash= 0 for all mnegationslash= 0, P ?L+2 and P is such that ?q +?m negationslash= 0 for q = 1,2,...,Q and mnegationslash= 0, and Q is fixed as T becomes large. Remark 3.2.1: If the channel length L is unknown, an upper bound Lu will suffices. Then we are estimating ?h(n;l) for l = 0,1,...,Lu and limT?? ?h(n;l) = 0 for l>L. Remark 3.2.2: We do not need cm negationslash= 0 for every m. We need at least L+2 nonzero cm?s. Remark 3.2.3: If the noise v(n) is zero-mean, i.e., m = 0, we do not have to discard d0q. Thus, by setting ?V := ? ?? ?? ?? ?? ?? 1 1 ??? 1 1 e?j?1 ??? e?j?1L ... ... ... ... 1 e?j?P?1 ??? e?j?P?1L ? ?? ?? ?? ?? ?? , (3.21) ?C := (diag{c0,c1,...,cP?1}V)?INQ, (3.22) ?D := bracketleftbigg DT0, DT1, ..., DTP?1 bracketrightbiggT , (3.23) then we have ?CH = ?D and ?H = (?CH ?C)?1 ?CH ??D. (3.24) 35 To identify the BEM coefficients, we now need P ? L+ 1. All our results hold true if we use appropriate substitutions. 3.3 First-Order Statistics-Based Channel Estimation Using DPS-BEM The first-order statistics-based channel estimator described in Section 3.2, using CE- BEM representation, can be easily extended to DPS-BEM. As we discussed in Section 2.5, band-limitedness and energy-concentration of DPS sequences greatly reduce the spectral leakage intrinsic to CE-BEM. Therefore, better per- formance can be expected if we use DPS-BEM instead in the first-order statistics-based estimator. In the DPS-BEM representation, we assume that h(n;l) = Qsummationdisplay q=1 hq (l)uq (n), (3.25) where {uq (n)} is the q-th DPS sequence. Similar to (H3.2.1)?(H3.2.4), we assume: (H3.3.1) The time-varying channel {h(n;l)} satisfies (3.25) with the DPS sequences {uq (n)} known at the receiver. Also N ? 1. (H3.3.2) The information sequence {b(n)} is zero-mean, white with E{| b(n)|2} = ?2b. (H3.3.3) The measurement noise {v(n)} is zero-mean, white, uncorrelated with {b(n)}, with E{v(n+?)vH(n)} = ?2vIN?(?). (H3.3.4) The superimposed training sequencec(n) = c(n+P) for allnis a non-random periodic sequence with period P. 36 Note that for the time being we assume that the measurement noise is zero-mean, i.e., m = 0. By the SIMO channel model (3.1), (3.2), and the DPS-BEM representation (3.25), we have y(n) = Lsummationdisplay l=0 Qsummationdisplay q=1 hq(l)uq (n)[b(n?l)+c(n?l)] +v(n) = Lsummationdisplay l=0 Qsummationdisplay q=1 hq(l)uq (n) bracketleftBigg b(n?l) + P?1summationdisplay m=0 cmej?m(n?l) bracketrightBigg +v(n). By (H3.3.2) and (H3.3.3), the expectation of the observation at time n is E{y(n)} = Qsummationdisplay q=1 P?1summationdisplay m=0 bracketleftBigg Lsummationdisplay l=0 cmhq(l)e?j?ml bracketrightBigg uq (n)ej?mn. (3.26) Using (3.6), we have E{y(n)} = Qsummationdisplay q=1 P?1summationdisplay m=0 dmquq (n)ej?mn. (3.27) It follows that y(n) = Qsummationdisplay q=1 P?1summationdisplay m=0 dmquq (n)ej?mn+e(n) where {e(n)} is a zero-mean random sequence. Define the cost function as in (3.8) J = T?1summationdisplay n=0 bardble(n)bardbl2 = T?1summationdisplay n=0 vextenddoublevextenddouble vextenddoublevextenddouble vextenddoublevextenddoubley(n)? Qsummationdisplay q=1 P?1summationdisplay m=0 dmquq (n)ej?mn vextenddoublevextenddouble vextenddoublevextenddouble vextenddoublevextenddouble 2 . 37 Choose dmq?s to minimize J. We must have ?J ?d?mq vextendsinglevextendsingle vextendsinglevextendsingle dmq=?dmq = T?1summationdisplay n=0 ? ?y(n)? Qsummationdisplay q?=1 P?1summationdisplay m?=0 dm?q?uq? (n)ej?m?n ? ?uq (n)e?j?mn = 0, which leads to Qsummationdisplay q?=1 P?1summationdisplay m?=0 ?dm?q? bracketleftBiggT?1summationdisplay n=0 uq? (n)uq (n)ej(?m???m)n bracketrightBigg = T?1summationdisplay n=0 y(n)uq (n)e?j?mn. (3.28) We then define gmq := T?1summationdisplay n=0 y(n)uq (n)e?j?mn and substitute it into (3.28) gmq = Qsummationdisplay q?=1 P?1summationdisplay m?=0 ?dm?q? bracketleftBiggT?1summationdisplay n=0 uq? (n)uq (n)ej(?m???m)n bracketrightBigg . (3.29) By the definitions (3.11), (3.12), (3.15), and (3.21)?(3.23), we also have ?CH = ?D. (3.30) If P ?L+1, then rank parenleftBig? C parenrightBig = NQ(L+1) [82]. We can determine the hq (l)?s uniquely. Define ??D and G in the similar way as (3.23) with dmq replaced with ?dmq or gmq, then (3.29) turns out to be G =(??IN) ?D 38 where the entries of the PQ?PQ matrix ? are [?]mQ+q,m?Q+q? = T?1summationdisplay n=0 uq? (n)uq (n)ej(?m???m)n. The estimate of ?D is given by ??D =parenleftbig??1?INparenrightbigG. (3.31) By (3.30) and (3.31) we have the estimate of channel coefficient ?H = C? ??D =parenleftBig?CH ?CparenrightBig?1 ?CHparenleftbig??1?INparenrightbigG. (3.32) The channel estimate is then given by ?h(n;l) = Qsummationdisplay q=1 ?hq (l)uq (n). (3.33) Remark 3.3.1: Since DPS sequences are approximately band-limited to the normal- ized frequency range [?fdTs,fdTs], it follows that T?1summationdisplay n=0 uq? (n)uq (n)ej(?m???m)n ??parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig (3.34) when fdTs ? 1/P and T is a multiple of P or T is ?large?. This is usually true, and a short period P helps to achieve it. Under the assumption (3.34), ? ? IPQ. 39 By (3.28) and (3.34), ?dmq = T?1summationdisplay n=0 y(n)uq (n)e?j?mn. (3.35) The estimate (3.32) is then given by ?H =parenleftBig?CH ?CparenrightBig?1 ?CH ??D. (3.36) Remark 3.3.2: If the mean of the noise v(n) is unknown, suppose E{v(n)} = m. Under the approximation (3.34), we should omit the first row (corresponding to ?0) of ?V in (3.21) (denote the resulting (P ?1)?(L+1) matrix by V, as in (3.14)), and also omit the block D0 from ?D in (3.23) (denote the resulting matrix by D, as in (3.16)). We have CH = D and ?H = (CHC)?1CH ?D, (3.37) where C is defined in (3.13) and ?D is acquired by (3.35). To identify the BEM coefficients, we now need P ?L+2. All our results hold true if appropriate substitutions are used. 3.4 First-Order Statistics-Based Channel Estimation Using OP-BEM The first-order statistics-based channel estimators, using CE-and DPS-BEM?s, exploit band-limitedness of the basis functions. OP-BEM, however, does not have this property. We assume the SIMO channel with N outputs satisfies the OP-BEM representation (2.15), i.e., h(n;l) = Ksummationdisplay q=0 hq(l)?q(n), 0 ?l ?L (3.38) where ?q(n) is the discretized modified Legendre polynomial of degree q. 40 We make the following assumptions: (H3.4.1) The time-varying channel {h(n;l)} satisfies (3.38) where {?q (n)}Kq=0 are known at the receiver. N ? 1. (H3.4.2) The information sequence {b(n)} is zero-mean, white with E{| b(n)|2} = ?2b. (H3.4.3) The measurement noise {v(n)} may be nonzero-mean (E{v(n)} = m), white, uncor- related with {b(n)}, with E{[v(n + ?) ? m][v(n) ? m]H} = ?2vIN?(?). The mean vector m may be unknown. (H3.4.4) The superimposed training sequencec(n) = c(n+P) for allnis a non-random periodic sequence with period P. By the SIMO channel model (3.1)?(3.2), and the OP-BEM representation (3.38), we have y(n) = Lsummationdisplay l=0 Ksummationdisplay q=0 hq(l)?q (n)[b(n?l)+c(n?l)]+v(n) with mean E{y(n)} = Lsummationdisplay l=0 Ksummationdisplay q=0 hq(l)?q (n)c(n?l) +m. It follows that y(n) = E{y(n)}+e(n) where {e(n)} is a zero-mean random ?error? sequence and e(n) = y(n)? Lsummationdisplay l=0 Ksummationdisplay q=0 hq (l)?q (n)c(n?l)?m. 41 We define the cost function as J = T?1summationdisplay n=0 bardble(n)bardbl2 = T?1summationdisplay n=0 vextenddoublevextenddouble vextenddoublevextenddouble vextenddoublevextenddoubley(n)? Lsummationdisplay l=0 Ksummationdisplay q=0 hq (l)?q (n)c(n?l)?m vextenddoublevextenddouble vextenddoublevextenddouble vextenddoublevextenddouble 2 . Choose m and hq (l)?s (q = 0,1,...,K; l = 0,1,...,L) to minimize the cost function J. We must have ?J ?m? vextendsinglevextendsingle vextendsinglevextendsingle vextendsingle m=?mhq(l)=?hq(l) = 0 , and ?J ?h?q (l) vextendsinglevextendsingle vextendsinglevextendsingle vextendsinglevextendsingle m=?m hq(l)=?hq(l) = 0 , leading to ?m = 1T T?1summationdisplay n=0 ? ?y(n)? Lsummationdisplay l=0 Ksummationdisplay q=0 hq (l)?q (n)c(n?l) ? ? (3.39) and Lsummationdisplay l=0 Ksummationdisplay q=0 ?hq (l) bracketleftBigg 1 T T?1summationdisplay n=0 ?q (n)??q1 (n)c(n?l)c?(n?l1) bracketrightBigg = 1T T?1summationdisplay n=0 [y(n)? ?m]??q1 (n)c?(n?l1). (3.40) Substitute (3.39) in (3.40), we then have Lsummationdisplay l=0 Ksummationdisplay q=0 ?hq (l)?[(q,l),(q1,l1)] = 1 T T?1summationdisplay n=0 y(n) bracketleftBigg ??q1 (n)c?(n?l1)? 1T T?1summationdisplay n=0 ??q1 (n)c?(n?l1) bracketrightBigg , (3.41) where we define ?[(q,l),(q1,l1)] := 1T T?1summationdisplay n=0 ?q (n)??q1 (n)c(n?l)c?(n?l1) ? bracketleftBigg 1 T T?1summationdisplay n=0 ?q (n)c(n?l) bracketrightBiggbracketleftBigg 1 T T?1summationdisplay n=0 ??q1 (n)c?(n?l1) bracketrightBigg . (3.42) 42 We further define a (K +1)(L+1)?(K +1)(L+1) matrix ?OP := ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? ?[(0,0),(0,0)] ??? ?[(K,0),(0,0)] ??? ?[(K,1),(0,0)] ??? ?[(K,L),(0,0)] ... ... ... ... ... ... ... ?[(0,0),(K,0)] ??? ?[(K,0),(K,0)] ??? ?[(K,1),(K,0)] ??? ?[(K,L),(K,0)] ... ... ... ... ... ... ... ?[(0,0),(K,1)] ??? ?[(K,0),(K,1)] ??? ?[(K,1),(K,1)] ??? ?[(K,L),(K,1)] ... ... ... ... ... ... ... ?[(0,0),(K,L)] ??? ?[(K,0),(K,L)] ??? ?[(K,1),(K,L)] ??? ?[(K,L),(K,L)] ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? (3.43) whose [(K +1)l1 +q1 +1,(K +1)l+q+1]-th entry is?[(q,l),(q1,l1)], a (K +1)(L+1)- column vector ?(n) := ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? ??0 (n)c?(n)? 1T summationtextT?1n=0 ??0 (n)c?(n) ... ??K (n)c?(n)? 1T summationtextT?1n=0 ??K (n)c?(n) ??0 (n)c?(n?1)? 1T summationtextT?1n=0 ??0 (n)c?(n?1) ... ??K (n)c?(n?1)? 1T summationtextT?1n=0 ??K (n)c?(n?1) ... ??K (n)c?(n?L)? 1T summationtextT?1n=0 ??K (n)c?(n?L) ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? , 43 and the channel coefficient vectors ?Hl := bracketleftbigg ?hT0 (l) ?hT1 (l) ??? ?hTK (l) bracketrightbiggT , (3.44) ?H : = bracketleftbigg ?HT0 ?HT1 ??? ?HTL bracketrightbiggT . (3.45) We also define Hl and H as the vectors of true values of channel coefficients, corresponding to (3.44) and (3.45), respectively. Then (3.41) can be written into (?OP ?IN) ?H = 1T T?1summationdisplay n=0 ?(n)?y(n), which yields ?H = 1 T T?1summationdisplay n=0 ??OP?(n)?y(n). (3.46) The estimate of the time-varying channel h(n;l) is then given by ?h(n;l) = Ksummationdisplay q=0 ?hq (l)?q (n). (3.47) Remark 3.4.1: We did not use assumption (H3.4.4) throughout the manipulations, so that aperiodic training can be used also. Remark 3.4.2: Neither orthogonality nor properties of polynomials was used to ob- tain (3.46). We can use any basis set {?q (n)}T?1n=0 (q = 1,2,...,Q), as in (2.20), instead of polynomials in (3.46). The basis functions are only required to be linearly indepen- dent, not necessarily orthogonal. This offers a general channel estimator using BEM?s and superimposed training. 44 If we consider CE-BEM in (3.42) and replace ?q (n) with ?q (n) = ej?qn for ?q := 2pi T (q? 1 2 ? Q 2 ), we have ?[(q,l),(q1,l1)] = 1T T?1summationdisplay n=0 ej?qne?j?q1nc(n?l)c?(n?l1) ? bracketleftBigg 1 T T?1summationdisplay n=0 ej?qnc(n?l) bracketrightBiggbracketleftBigg 1 T T?1summationdisplay n=0 e?j?q1nc?(n?l1) bracketrightBigg . (3.48) Taking (3.5) into account, we have 1 T T?1summationdisplay n=0 ej?qne?j?q1nc(n?l)c?(n?l1) = P?1summationdisplay m=0 |cm|2e?j?m(l?l?)?(q?q1) (3.49) and 1 T T?1summationdisplay n=0 ej?qnc(n?l) = c0? parenleftbigg q? Q+12 parenrightbigg , (3.50) so that (3.48) becomes ?[(q,l),(q1,l1)] = P?1summationdisplay m=0 |cm|2e?j?m(l?l1)?(q?q1)?|c0|2? parenleftbigg q? Q+12 parenrightbigg ? parenleftbigg q1 ? Q+12 parenrightbigg . (3.51) Substituting (3.50) and (3.51) into (3.41), it follows that P?1summationdisplay m=0 c?mej?ml1 bracketleftBigg Lsummationdisplay l=0 cm?hq (l)e?j?ml bracketrightBigg ? Lsummationdisplay l=0 ?hQ+1 2 (l)|c0|2? parenleftbigg q? Q+12 parenrightbigg = P?1summationdisplay m=0 c?mej?ml1 bracketleftBigg 1 T T?1summationdisplay n=0 y(n)e?j(?q+?m)n bracketrightBigg ? 1T T?1summationdisplay n=0 y(n)c?0? parenleftbigg q? Q+12 parenrightbigg . (3.52) 45 By (3.6) and (3.9) we have Lsummationdisplay l=0 cm?hq(l)e?j?ml = 1T T?1summationdisplay n=0 y(n)e?j(?q+?m)n (3.53) for q = 1,2,...,Q and m = 1,2,...,P ?1. If we discard the terms corresponding to m = 0, then (3.52) reduces to P?1summationdisplay m=1 c?mej?ml1 bracketleftBigg Lsummationdisplay l=0 cm?hq (l)e?j?ml bracketrightBigg = P?1summationdisplay m=1 c?mej?ml1 bracketleftBigg 1 T T?1summationdisplay n=0 y(n)e?j(?q+?m)n bracketrightBigg (3.54) for q = 1,2,...,Q. The solution to (3.54) coincides with that of (3.53) if the matrix C in (3.13) has full column rank. This result demonstrates that the estimator given by (3.46) is indeed the same as the one proposed in Section 3.2. The only difference appears that since m is unknown, we simply discard all the terms corresponding to m = 0 in Section 3.2. However, omitting the terms corresponding to ?q +?m = 0 is enough, and the terms corresponding to m = 0 but ?q +?m negationslash= 0 are still useful?this is the estimator proposed in this section. Using the band-limitedness of the DPS-BEM (3.34), we have Lsummationdisplay l=0 cm?hq(l)e?j?ml = T?1summationdisplay n=0 y(n)uq (n)e?j?mn, which is similar to (3.53). Thus the estimator proposed in Section 3.3 is also a special case of the estimator defined by (3.41). 46 Remark 3.4.3: If the mean value of the noise m is zero or known, then instead of (3.41), the solution of the estimator is given by (3.40) with ?m = m. We define ??[(q,l),(q1,l1)] := 1 T T?1summationdisplay n=0 ?q (n)??q1 (n)c(n?l)c?(n?l1) (3.55) and ??OP in the same way as in (3.43) with ?[(q,l),(q1,l1)] replaced with ??[(q,l),(q1,l1)]. Also define ??(n) := bracketleftbigg c?(n) ??? c?(n?L) bracketrightbiggT ? bracketleftbigg ??0 (n) ??? ??K (n) bracketrightbiggT . The estimator is given by ?H = 1 T T?1summationdisplay n=0 ???OP ??(n)?y(n). (3.56) Note that the estimators given by (3.24) and (3.36), with zero or known m, are special cases of the estimator of (3.56), using CE- and DPS-BEM respectively. 3.5 First-Order Statistics-Based Channel Estimation: Multiple-User (MIMO) Channels The first-order statistics-based channel estimator using CE- or DPS-BEM can be easily extended to a multiple-user (MIMO) system, by exploiting the band-limitedness of the basis functions. Consider an MIMO FIR linear channel with K inputs and N outputs. Let {sk(n)} denote k-th user?s information sequence which is input to the MIMO channel with the k-th 47 user?s time-varying discrete-time impulse response {hk(n;l)} (channel response for the k-th user at time instance n to a unit input at time instance n?l). The symbol-rate, N-column channel output vector is given by x(n) = Ksummationdisplay k=1 Lsummationdisplay l=0 hk(n;l)sk(n?l). (3.57) The noisy measurement is given by y(n) = x(n) +v(n), (3.58) where v(n) is the additive N-column vector white complex Gaussian noise. In superimposed training-based approaches, for the k-th user, one takes sk(n) = bk(n)+ck(n) (3.59) where {bk(n)} and {ck(n)} are information sequence and non-random periodic training sequence of the k-th user. Then the noisy channel output becomes y(n) = Ksummationdisplay k=1 Lsummationdisplay l=0 hk(n;l)[bk(n?l)+ck(n?l)] +v(n). Assume the following: (H3.5.1) The time-varying channel {hk(n;l)} satisfies CE- or DPS-BEM, i.e., hk(n;l) = Qsummationdisplay q=1 hqk(l)ej?qn (3.60) 48 where hqk(l) is the N-column time-invariant coefficient vector for the k-th user, or hk (n;l) = Qsummationdisplay q=1 hqk (l)uq (n). (3.61) Also N ? 1. (H3.5.2) The information sequence {bk(n)} is zero-mean, white with E{|bk(n)|2} = ?2bk and mutually independent for k = 1,2,...,K. (H3.5.3) The measurement noise {v(n)} is zero-mean, white, uncorrelated with {bk(n)}, with E{v(n+?)vH(n)} = ?2vIN?(?). (H3.5.4) The superimposed training sequence ck(n) = ck(n + P) for all n is a non-random periodic sequence with period P such that cmk negationslash= 0 for all m,k, and ?P is integer with P = ?PK. The expected value of the noisy channel output is given by E{y(n)} = Ksummationdisplay k=1 Lsummationdisplay l=0 hk(n;l)ck(n?l). (3.62) We pick user-specific training sequences so that channel estimation is decoupled across various users?this allows us to use the single user superimposed training based approach outlined in Section 3.2. We assign distinct cycle frequencies of the periodic training se- quences to distinct users. Suppose that for each user k, {ck(n)} is periodic with period P = ?PK where ?P is a positive integer. Then ck(n) = P?1summationdisplay m?=0 cm?kej(2pim?/P)n for all n. 49 Define cm?k := 1P P?1summationdisplay n=0 ck(n)e?j(2pim?/P)n. Pick {ck(n)} so that only ?P coefficients (out of total P) cm?k?s, associated with ?P distinct frequencies, are nonzero. For instance, we may choose ck(n) = ?P?1summationdisplay m=0 cmkej(2pi/P)(Km+k?1)n, for all n (3.63) such that cmk negationslash= 0 for all m,k. Define the frequencies ?mk := 2piP (Km+k?1) (3.64) for m = 0,1,..., ?P ?1 and k = 1,2,...,K. For example, we show how to use m-sequences (maximal length pseudo-random binary sequences) for training. Pick ?P = 2n ?1 for some integer n such that ?P ? L + 2. Let {?c0 (n)} be an m-sequence of length ?P. Pick the superimposedtraining sequence{c1 (n)}P?1n=0 for user1 asK repetitions of{?c0 (n)}multiplied by a factor ?c1 so that P?1summationtextP?1n=0 |c1 (n)|2 = ?2c1. This choice satisfies (3.63) and (3.64) for k = 1. Pick ?ck (n) = ?c1 (n)ej(2pi/P)(k?1)n for k = 2,3,...,K and ck (n) = ?ck?ck (n). Then {ck (n)}P?1n=0 satisfies (3.63) and (3.64) for k ? 2. The above procedures can be used to generate a user-specific training sequence of period P = ?PK from a sequence of period ?P. It follows that E{y(n)} = Ksummationdisplay k=1 Lsummationdisplay l=0 hk(n;l) ? ? ?P?1summationdisplay m=0 cmkej?mk ? ?, for all n. (3.65) 50 We can write (3.60) and (3.61) into a unified form as (2.20) hk (n;l) = Qsummationdisplay q=1 hqk (l)?q (n), (3.66) where ?q (n) = ej?qn in CE-BEM, and ?q (n) = uq (n) in DPS-BEM. The expected value of the observations (3.65) can be rewritten as E{y(n)} = Ksummationdisplay k=1 ?P?1summationdisplay m=0 Qsummationdisplay q=1 bracketleftBigg Lsummationdisplay l=0 hqk(l)cmke?j?mkl bracketrightBigg ?q (n)ej?mkn, for all n. Define dmqk := Lsummationdisplay l=0 hqk(l)cmke?j?mkl. (3.67) We have E{y(n)} = Ksummationdisplay k=1 ?P?1summationdisplay m=0 Qsummationdisplay q=1 dmqk?q (n)ej?mkn. It follows that y(n) = Ksummationdisplay k=1 Qsummationdisplay q=1 P?1summationdisplay m=0 dmqk?q (n)ej?mkn+e(n) (3.68) where {e(n)} is a zero-mean random sequence. Define the cost function by (3.68) J = T?1summationdisplay n=0 bardble(n)bardbl2 = T?1summationdisplay n=0 vextenddoublevextenddouble vextenddoublevextenddouble vextenddoublevextenddoubley(n)? Ksummationdisplay k=1 Qsummationdisplay q=1 P?1summationdisplay m=0 dmqk?q (n)ej?mkn vextenddoublevextenddouble vextenddoublevextenddouble vextenddoublevextenddouble 2 . 51 Choose dmqk?s to minimize J. We must have ?J ?d?mqk vextendsinglevextendsingle vextendsinglevextendsingle vextendsingled mqk=?dmqk = T?1summationdisplay n=0 ? ?y(n)? Ksummationdisplay k?=1 Qsummationdisplay q?=1 P?1summationdisplay m?=0 dm?q?k??q? (n)ej?m?k?n ? ???q (n)e?j?mkn = 0, leading to Ksummationdisplay k?=1 Qsummationdisplay q?=1 P?1summationdisplay m?=0 ?dm?q?k? bracketleftBiggT?1summationdisplay n=0 ?q? (n)??q (n)ej(?m?k???mk)n bracketrightBigg = T?1summationdisplay n=0 y(n)??q (n)e?j?mkn. (3.69) For CE-BEM, suppose that we pick P to be such that (?q +?mk)?s are all distinct for any choice of m, k and q, e.g., take T/P ?Q. Then the sequence E{y(n)} is periodic [14] with cycle frequencies (?q +?mk), where 1 ? q ? Q, 0 ? m ? P ? 1 and 1 ? k ? K, so that we have T?1summationdisplay n=0 ej(?q???q+?m?k???mk)n = T?parenleftbigm? ?mparenrightbig?parenleftbigk? ?kparenrightbig?parenleftbigq? ?qparenrightbig. (3.70) It follows that ?dmqk = 1 T Tsummationdisplay n=1 y(n)e?j(?q+?mk)n. (3.71) 52 If DPS-BEM applies, we use the fact that DPS sequences are approximately band-limited to the normalized frequency range [?fdTs,fdTs]. Then if 2fdTs ? 1/P, we use the approx- imation T?1summationdisplay n=0 uq? (n)uq (n)ej(?m?k???mk)n ??parenleftbigk? ?kparenrightbig?parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig. (3.72) By using the approximation (3.72), ?dmqk = Tsummationdisplay n=1 y(n)uq (n)e?j?mkn. (3.73) For 0 ?m? ?P ?1 and 1 ?k ?K, define an NQ-column vector Dmk := bracketleftbigg dTm1k dTm2k ??? dTmQk bracketrightbiggT , (3.74) and for 0 ?l?L and 1 ?k ?K, define an NQ-column vector Hkl := bracketleftbigg hH1k(l) hH2k(l) ??? hHQk(l) bracketrightbiggH . Then by equation (3.67), we have Dmk = Lsummationdisplay l=0 cmke?j?mklHkl (3.75) for 0 ?m? ?P ?1 and 1 ?k ?K. Define a ?P ?(L+1) matrix 53 ?Vk := ? ?? ?? ?? ?? ?? 1 e?j?0k ??? e?j?0kL 1 e?j?1k ??? e?j?1kL ... ... ... ... 1 e?j?(?P?1)k ??? e?j?(?P?1)kL ? ?? ?? ?? ?? ?? , (3.76) an NQ?P ?NQ(L+1) matrix ?Ck :=parenleftbigdiagbraceleftbigc0k,c1k,...,c(P?1)kbracerightbigVkparenrightbig?INQ, an NQ(L+1)-column vector Hk = bracketleftbigg HHk0 HHk1 ??? HHkL bracketrightbiggH (3.77) and an NQP-column vector ?Dk = bracketleftbigg DH0k DH1k ??? DH(?P?1)k bracketrightbiggH . (3.78) Then (3.75) leads to ?CkHk = ?Dk. (3.79) Since ?mk?s are distinct and cmk negationslash= 0 for all m,k, rank(Ck) = NQ(L+1) if ?P ?L+1; hence, we can determine hqk(l)?s uniquely. Define ?Dmk as in (3.74) with dmqk replaced with ?dmqk and define ??Dk as in (3.78) with Dmk replaced with ?Dmk. The estimate of Hk is given by ?Hk = (?CHk ?Ck)?1CHk ??Dk. (3.80) 54 0 5 10 15 20 25 3010 ?5 10?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=0Hz, 500 runs. SI&CE SI&OP SI&DPS TM&CE TM&OP TM&DPS Figure 3.1: First-order statistics-based estimator (SISO): BER vs SNR under fd = 0Hz (time-invariant) and K = N = 1. The curves for CE-, OP- and DPS-BEM?s completely overlap, since the three basis functions are all constant for time-invariant channels (Q = 1). (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) Denote the corresponding estimate of hqk(l) as ?hqk(l). Following (3.66), for k = 1,2,...,K, l = 0,1,...,L and for all n, the estimate of time-varying channel is given by ?hk(n;l) = Qsummationdisplay q=1 ?hqk(l)?q (n). (3.81) Remark 3.5.1: If the additive noise v(n) is nonzero-mean (E{v(n)} = m) with the mean unknown. Then (3.69) can be modified as Ksummationdisplay k?=1 Qsummationdisplay q?=1 P?1summationdisplay m?=0 ?dm?q?k? bracketleftBiggT?1summationdisplay n=0 ?q? (n)??q (n)ej(?m?k???mk)n bracketrightBigg = T?1summationdisplay n=0 [y(n)?m]??q (n)e?j?mkn. 55 0 5 10 15 20 25 3010 ?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=50Hz, 500 runs. SI&CE SI&OP SI&DPS TM&CE TM&OP TM&DPS Figure 3.2: First-order statistics-based estimator (SISO): BER vs SNR under fd = 50Hz andK = N = 1. (SI: superimposedtraining; TM: time-multiplexed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) Note that T?1summationdisplay n=0 e?j(?q+?mk)n = T?(?q +?mk), and if 2fdTs ? 1/P, approximately we have T?1summationdisplay n=0 uq (n)e?j?mkn ? 0, for all ?mk negationslash= 0. Since T/P ?Q and ?mk = 0 only happens when m = 0 and k = 1, if m = 0 we have T?1summationdisplay n=0 m??q (n)e?j?mkn = 0. 56 0 5 10 15 20 25 3010 ?5 10?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=100Hz, 500 runs. SI&CE SI&OP SI&DPS TM&CE TM&OP TM&DPS Figure 3.3: First-order statistics-based estimator (SISO): BER vs SNR under fd = 100Hz andK = N = 1. (SI: superimposedtraining; TM: time-multiplexed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) We hence omit the terms corresponding to m = 0, and then m has no effect on estimation. We should only omit the first row of ?Vk in (3.76), and the resulting parenleftBig? P ?1 parenrightBig ? (L+1) matrix is denoted by Vk := ? ?? ?? ?? ?? ?? 1 e?j?1k ??? e?j?1kL 1 e?j?2k ??? e?j?2kL ... ... ... ... 1 e?j?(?P?1)k ??? e?j?(?P?1)kL ? ?? ?? ?? ?? ?? . We also define an NQ parenleftBig? P ?1 parenrightBig ?NQ(L+1) matrix Ck := parenleftBig diag braceleftBig c1k,c2k,...,c(?P?1)k bracerightBig Vk parenrightBig ?INQ 57 0 5 10 15 20 25 3010 ?5 10?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=200Hz, 500 runs. SI&CE SI&OP SI&DPS TM&CE TM&OP TM&DPS Figure 3.4: First-order statistics-based estimator (SISO): BER vs SNR under fd = 200Hz andK = N = 1. (SI: superimposedtraining; TM: time-multiplexed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) and an NQ(P ?1)-column vector Dk = bracketleftbigg DH1k DH2k ??? DH(?P?1)k bracketrightbiggH . Then as in (3.79), we have CkHk = Dk, so that the channel estimation is given by ?Hk = (CHk Ck)?1CHk ?Dk, where ?Dk is also acquired by (3.73). For identifiability, we now need ?P ?L+2. 58 0 5 10 15 20 25 30?45 ?40 ?35 ?30 ?25 ?20 ?15 ?10 SNR (dB) Normalized Channel MSE (dB) K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=0Hz, 500 runs. SI&CE SI&OP SI&DPS TM&CE TM&OP TM&DPS Figure 3.5: First-order statistics-based estimator (SISO): NCMSE vs SNR under fd = 0Hz (time-invariant) and K = N = 1. The curves for CE-, OP- and DPS-BEM?s completely overlap, since the three basis functions are all constant for time-invariant channels (Q = 1). (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) 3.6 Simulation Examples 3.6.1 First-Order Statistics-Based Estimator: Single User In this example, we generate a doubly-selective Rayleigh fading channel as we men- tioned in Section 2.6.2, with N = 1 and L = 2, satisfying modified Jakes? model. We also employ the communication system described in Section 2.6.3. We emphasize again that BEM?s are only used for processing at the receiver; the ?true? channel follows Jakes? model, not BEM?s. In simulations, we pick a data record length of 420 symbols (time duration of approxi- mately 10ms). We consider the system operating under different Doppler spreads. For the 59 0 5 10 15 20 25 30?35 ?30 ?25 ?20 ?15 ?10 ?5 SNR (dB) Normalized Channel MSE (dB) K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=50Hz, 500 runs. SI&CE SI&OP SI&DPS TM&CE TM&OP TM&DPS Figure 3.6: First-order statistics-based estimator (SISO): NCMSE vs SNR underfd = 50Hz andK = N = 1. (SI: superimposedtraining; TM: time-multiplexed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) Doppler spreads fd = 0, 50, 100, and 200Hz, we take Q = 1, 3, 5, and 7 in CE-BEM by (2.9b), andQ = 1, 3, 4, and 6 for OP- and DPS-BEM representations by (2.18). The average transmitted power in {c(n)} is 0.3 of that in {b(n)}, leading to a training-to-information power ratio (TIR) of 0.3. We consider a single-user scenario. The information sequence {b(n)} and the train- ing sequence {c(n)} are all modulated by binary phase-shift keying (BPSK). The periodic training sequence {c(n)} is generated from the m-sequence of period P = 7, one period of which is given by {c1(n)}6n=0 = {1,?1,?1,1,1,1,?1}. (3.82) 60 0 5 10 15 20 25 30?30 ?25 ?20 ?15 ?10 ?5 0 SNR (dB) Normalized Channel MSE (dB) K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=100Hz, 500 runs. SI&CE SI&OP SI&DPS TM&CE TM&OP TM&DPS Figure 3.7: First-order statistics-based estimator (SISO): NCMSE vs SNR under fd = 100Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) To explore different estimators under equal conditions, we assume the additive noise {v(n)} is zero-mean (i.e., m = 0), white complex-Gaussian, uncorrelated with {b(n)} with E{v(n+?)vH(n)} = ?2vIN?(?), so that no terms are discarded. The (receiver) SNR refers to the energy per bit over one-sided noise spectral density with both information and su- perimposed training sequence counting toward the bit energy. The result are shown in Figures 3.1?3.8 for various Doppler spreads and SNR?s. The re- sults are based on 500 Monte Carlo runs for Viterbi detectors (see Appendix B.1). For com- parison, CE-, OP- and DPS-BEM-based periodically placed TM training with zero-padding (see AppendixA) is also considered for doubly-selective channel estimation. We take a train- ing session of length 2L+ 1 = 5 symbols with the training sequence braceleftbig0,0,?2L+1,0,0bracerightbig, and at the receiver an LS estimation is performed. A data session of 17 symbols is inserted 61 0 5 10 15 20 25 30?25 ?20 ?15 ?10 ?5 0 SNR (dB) Normalized Channel MSE (dB) K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=200Hz, 500 runs. SI&CE SI&OP SI&DPS TM&CE TM&OP TM&DPS Figure 3.8: First-order statistics-based estimator (SISO): NCMSE vs SNR under fd = 200Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) between two such training sessions to form a frame of length 22 symbols. Such a block is repeated over a record length of 418 symbols. Thus, we have training-to-information bit ratio as well as training-to-information power ratio of about 0.3. For comparison, we plot the results of CE-, OP-, and DPS-based superimposed and TM training approaches in each figure. Figures 3.1?3.4 show the BER?s with a Viterbi detector at the receiver. Figures 3.5?3.8 show the normalized channel mean square error (NCMSE) correspondingly, which is defined as NCMSE := summationtextMc i=1 summationtextT?1 n=0 summationtext2 l=0 vextenddoublevextenddouble vextenddoubleh(i) (n;l)??h(i) (n;l) vextenddoublevextenddouble vextenddouble 2 summationtextMc i=1 summationtextT?1 n=0 summationtext2 l=0 vextenddoublevextenddoubleh(i) (n;l)vextenddoublevextenddouble2 (3.83) 62 0 5 10 15 20 25 3010 ?6 10?5 10?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=0Hz, 500 runs. SI&CE: Kalman SI&DPS: Kalman TM&CE: Kalman TM&DPS: Kalman SI&CE: Viterbi SI&DPS: Viterbi TM&CE: Viterbi TM&DPS: Viterbi Figure 3.9: First-order statistics-based estimator (MIMO): BER vs SNR under fd = 0Hz (time-invariant) and K = N = 2. The curves for CE- and DPS-BEM?s completely overlap, since the two basis functions are both constant for time-invariant channels (Q = 1). (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; Kalman: a Kalman filter as the symbol detector; Viterbi: a Viterbi detector as the symbol detector.) where Mc denotes the number of Monte Carlo runs, h(i) (n;l) denotes the i-th realization of the time-varying channel, and ?h(i) (n;l) denotes the acquired channel estimate. It is seen from the figures that the DPS-BEM-based estimators, no matter whether superimposed or TM training is used, outperform the CE- and OP-BEM-based solutions. (For fd = 0 andQ = 1, all the three models gives the same results, since the three models all use constants as the basis functions. The performances of superimposed and TM training, however, are different.) It is consistent with the fact that DPS-BEM can efficiently remove spectral leakage, and it is a much better model in describing a band-limited channel. Due to severe spectral leakage, the CE-BEM-based results are often the worst among the three. 63 0 5 10 15 20 25 3010 ?5 10?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=50Hz, 500 runs. SI&CE: Kalman SI&DPS: Kalman TM&CE: Kalman TM&DPS: Kalman SI&CE: Viterbi SI&DPS: Viterbi TM&CE: Viterbi TM&DPS: Viterbi Figure 3.10: First-order statistics-based estimator (MIMO): BER vs SNR underfd = 50Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE- BEM; DPS: DPS-BEM; Kalman: a Kalman filter as the symbol detector; Viterbi: a Viterbi detector as the symbol detector.) Comparing superimposed training with TM training, we see that the superimposed training-based estimators performs worse than their TM counterparts, although superim- posed training can offer higher data transmission rate. As we have discussed in Chapter 1, the major issue of superimposed training is the information-induced interference (self- interference), which results in a notable error floor at BER and NCMSE curves. We will discuss the issue of self-interference in Chapters 4?6. 3.6.2 First-Order Statistics-Based Estimator: Multiple Users In this example, we follow the conditions addressed in Section 3.6.1 except that a multiple-user scenario is considered. 64 0 5 10 15 20 25 3010 ?6 10?5 10?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=100Hz, 500 runs. SI&CE: Kalman SI&DPS: Kalman TM&CE: Kalman TM&DPS: Kalman SI&CE: Viterbi SI&DPS: Viterbi TM&CE: Viterbi TM&DPS: Viterbi Figure 3.11: First-order statistics-based estimator (MIMO): BER vs SNR underfd = 100Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE- BEM; DPS: DPS-BEM; Kalman: a Kalman filter as the symbol detector; Viterbi: a Viterbi detector as the symbol detector.) In simulations, all the users have the same transmitted power in training and in- formation data. The average transmitted power in {ck (n)} is 0.3 of that in {bk (n)} (k = 1,2,...K), leading to the same TIR as in Section 3.6.1. We consider a simple two-user scenario, i.e., K = 2, with two receive antennas, i.e., N = 2. The information sequences {bk(n)} and the training sequences {ck(n)} are all BPSK modulated. The training sequence is generated from the m-sequence of period ?P = 7 by the procedure we introduced in Sec- tion 3.5. The training sequences are of length P = 14, and the training sequence for the first user is {c1(n)}13n=0 = {1,?1,?1,1,1,1,?1,1,?1,?1,1,1,1,?1}, (3.84) the repetitions of the m-sequence of period ?P = 7. 65 0 5 10 15 20 25 3010 ?6 10?5 10?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=200Hz, 500 runs. SI&CE: Kalman SI&DPS: Kalman TM&CE: Kalman TM&DPS: Kalman SI&CE: Viterbi SI&DPS: Viterbi TM&CE: Viterbi TM&DPS: Viterbi Figure 3.12: First-order statistics-based estimator (MIMO): BER vs SNR underfd = 200Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE- BEM; DPS: DPS-BEM; Kalman: a Kalman filter as the symbol detector; Viterbi: a Viterbi detector as the symbol detector.) The additive noise {v(n)} is also zero-mean, white complex-Gaussian, uncorrelated with {bk(n)} with E{v(n+?)vH(n)} = ?2vI2?(?). The (receiver) SNR refers to the energy per bit per user over one-sided noise spectral density with both information and superim- posed training sequence counting toward the bit energy. At the receiver end, a Viterbi detector or a Kalman filter (see Appendix B.2) acts as the symbol detector. We consider different Doppler spreads of fd = 0, 50, 100, and 200Hz for this communications system. We also pick Q for CE-BEM as 1,3,5,7 by (2.9b) and DPS-BEM as 1,3,4,6 by (2.18). The results for a record length of T = 420 symbols are shown in Figures 3.9?3.16 for various Doppler spreads and SNR?s. The results are based on 500 Monte Carlo runs 66 0 5 10 15 20 25 30?50 ?45 ?40 ?35 ?30 ?25 ?20 ?15 ?10 SNR (dB) Normalized Channel MSE (dB) K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=0Hz, 500 runs. SI&CE SI&DPS TM&CE TM&DPS Figure 3.13: First-order statistics-based estimator (MIMO): NCMSE vs SNR under fd = 0Hz (time-invariant) and K = N = 2. The curves for CE- and DPS-BEM?s completely overlap, since the two basis functions are both constant for time-invariant channels (Q = 1). (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS- BEM.) for Viterbi detectors, and 1000 runs for Kalman filters. For comparison, CE-BEM and DPS-BEM-based periodically placed TM training (see Appendix A) is also considered for doubly-selective channel estimation. We take a training session of length of (K +1)L+K = 8 symbols with the first user?s training sequence braceleftBig 0,0,radicalbig(K +1)L+K,0,0,0,0,0 bracerightBig and the second user?s braceleftBig 0,0,0,0,0,radicalbig(K +1)L+K,0,0 bracerightBig . An information data session of 27 symbols is inserted between two such training sessions to form a frame of length 35 symbols. Such a frame is repeated over a record length of 420 symbols. Thus, we have a training- to-information bit and power ratio of about 0.3. For multiple-user communications, the 67 0 5 10 15 20 25 30?35 ?30 ?25 ?20 ?15 ?10 ?5 0 SNR (dB) Normalized Channel MSE (dB) K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=50Hz, 500 runs. SI&CE SI&DPS TM&CE TM&DPS Figure 3.14: First-order statistics-based estimator (MIMO): NCMSE vs SNR under fd = 50Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM.) NCMSE is defined as NCMSE := summationtextK k=1 summationtextMc i=1 summationtextT?1 n=0 summationtext2 l=0 vextenddoublevextenddouble vextenddoubleh(i)k (n;l)??h(i)k (n;l) vextenddoublevextenddouble vextenddouble 2 summationtextK k=1 summationtextMc i=1 summationtextT?1 n=0 summationtext2 l=0 vextenddoublevextenddouble vextenddoubleh(i)k (n;l) vextenddoublevextenddouble vextenddouble 2 (3.85) We plot the curves for Viterbi detectors and Kalman filters in each figure. The discus- sion in Section 3.6.1 for the SISO channel applies to the MIMO channel also: DPS-BEM performs best and CE-BEM is the worst; TM training outperforms its superimposed rival. The optimal Viterbi detector shows its advantage in error probability over the Kalman filter, at the expense of increased computational complexity. 68 0 5 10 15 20 25 30?30 ?25 ?20 ?15 ?10 ?5 0 SNR (dB) Normalized Channel MSE (dB) K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=100Hz, 500 runs. SI&CE SI&DPS TM&CE TM&DPS Figure 3.15: First-order statistics-based estimator (MIMO): NCMSE vs SNR under fd = 100Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM.) 3.7 Conclusions In this chapter, we discussed a first-order statistics-based estimator of doubly-selective channels using superimposed training and BEM?s. Our starting point was the CE-BEM- based estimator proposed by [81]. Due to the spectral leakage of CE-BEM, this estimator does not perform well in estimating a band-limited channel. We thus extended the estimator to using DPS- and OP-BEM?s to reduce the modeling error. We further considered this estimator in a multiple-user scenario. By assigning distinct cycle frequencies of the periodic training sequences to distinct users, channel estimation across various users is decoupled so that the single-user estimator can be used. Our schemes are illustrated by simulation examples, and compared with the conventional TM training: Although higher transmission rate has been achieved by superimposedtraining, the performance of the proposed estimator 69 0 5 10 15 20 25 30?25 ?20 ?15 ?10 ?5 0 SNR (dB) Normalized Channel MSE (dB) K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=200Hz, 500 runs. SI&CE SI&DPS TM&CE TM&DPS Figure 3.16: First-order statistics-based estimator (MIMO): NCMSE vs SNR under fd = 200Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM.) using superimposed training is inferior to that of TM training, due to the existence of information-induced self-interference. Analyzing and reducing self-interference will be the topic of the following three chapters. 70 Chapter 4 Performance Analysis and Parameter Design for First-Order Statistics-Based Estimator 4.1 Introduction A first-order statistics-based channel estimator, using superimposed training and vari- ous BEM?s, was discussed in Chapter 3. We present performance analysis of this estimator in this chapter. Several parameters may affect the performance of the estimator. For example, a por- tion of transmitted power is allocated to the superimposed training, and this portion will apparently affect the estimator?s behavior?more training power leads to higher estimation accuracy, but suppression of information power worsens the effective information SNR in the meantime. Therefore, a trade-off must be made to achieve a balance for the power allocation between training and information. A similar consideration lies in selecting the number of basis functions for BEM?s. More basis functions yield more accurate approximation (or less bias, see Section 2.6), and higher estimation variance. A trade-off between bias and variance should also be studied to achieve better estimation. In Section 4.2, assuming that the ?true? channel follows a BEM, performance analysis for the estimators using CE-, DPS-, and OP-BEM?s is explored; performance analysis for the estimator of multiple-user channels is also discussed in this section. In Section 4.3, modeling error of a BEM is counted in performance analysis. Based on the results of performance analysis, we cast the issues of power allocation (Section 4.4) and bias-variance 71 trade-off (Section 4.5) as ones of optimizing an SNR for equalizer design. Simulation results are provided in Section 4.6. Section 4.7 concludes this chapter. 4.2 Performance Analysis for the First-Order Statistics-Based Estimator Us- ing BEM We assume the following. (H4.2.1) The SIMO channel satisfies a BEM representation as in (2.20), i.e., h(n;l) = hBEM (n;l) = Qsummationdisplay q=1 hq (l)?q (n), (4.1) where ?q (n) is the q-th basis function (corresponding to ej?qn in CE-BEM (2.9), ?q?1(n) in OP-BEM (2.15), and uq (n) in DPS-BEM (2.19)), and Q is the number of basis functions (note that Q = K +1 for K in (2.15)). Also N ? 1. (H4.2.2) The information sequence {b(n)} is zero-mean, white with E{| b(n)|2} = ?2b. (H4.2.3) The measurement noise {v(n)} is zero-mean, white, uncorrelated with {b(n)}, with E{v(n+?)vH(n)} = ?2vIN?(?). (H4.2.4) The superimposed training sequence c(n) = c(n+P) for all n is a non-random periodic sequence with period P, with average power ?2c :=summationtextP?1n=0 |c(n)|2/P. (H4.2.5) The time-varying channel {h(n;l)} is zero-mean, complex Gaussian with vari- ance ?2h, and mutually independent for distinct l?s: Ebraceleftbigh(n;l)hH (n;l)bracerightbig= ?2hIN and E{h(n1;l1)hH (n2;l2)} = 0, for l1 negationslash= l2, for all n1, n2, i.e., different channel taps are 72 independent of each other and are identically distributed zero-mean complex Gaus- sian. We wish to evaluate the MSE of channel estimation when the true channel follows (4.1). The MSE of estimation is defined as MSE1 = 1T T?1summationdisplay n=0 Lsummationdisplay l=0 E braceleftbiggvextenddoublevextenddouble vextenddoublehBEM (n;l)??hBEM (n;l) vextenddoublevextenddouble vextenddouble 2bracerightbigg (4.2) where ?hBEM (n;l) is also given by the BEM ?hBEM (n;l) = Qsummationdisplay q=1 ?hq (l)?q (n). (4.3) By (H4.2.4), we define the normalized training sequence as ?c(n) = c(n)/?c, for all n and ?cm := cm/?c, m = 0,1,...,P ?1 where cm := 1P P?1summationdisplay n=0 c(n)e?j?mn, c(n) = P?1summationdisplay m=0 cmej?mn, for all n, ?m = 2pim/P, m = 0,1,...,P ?1. 73 4.2.1 Performance Analysis for CE-BEM-Based Estimator Since we have assumed the measurement noise {v(n)} is zero-mean, the CE-BEM based channel estimate is acquired through (3.24), and ?hBEM(n;l) = Qsummationdisplay q=1 ?hq(l)ej?qn, l = 0,1,...,L, 0 ?n?T ?1. Let Em (n) = bracketleftbigg e?j(?1+?m)n e?j(?2+?m)n ??? e?j(?Q+?m)n bracketrightbiggT , and E (n) = bracketleftbigg EH0 (n) EH1 (n) ??? EHP?1 (n) bracketrightbiggH . (4.4) By (3.6), ??D = 1 T T?1summationdisplay n=0 E (n)?y(n), and by (3.24), we have ?H = 1 T( ?CH ?C)?1 ?CH T?1summationdisplay n=0 E (n)?y(n). (4.5) We define ?x(n) := y(n)?E{y(n)|H} = Lsummationdisplay l=0 h(n;l)b(n?l) +v(n), (4.6) 74 then Ebraceleftbig?x(n1)?xH (n2)vextendsinglevextendsingleHbracerightbig = Lsummationdisplay l1=0 Lsummationdisplay l2=0 h(n1;l1)hH (n2;l2)?2b?(n1 ?n2 ?l1 +l2) +?2vIN?(n1 ?n2). Using (H4.2.5), EHbraceleftbigEbraceleftbig?x(n1)?xH (n2)vextendsinglevextendsingleHbracerightbigbracerightbig=bracketleftbig(L+1)?2h?2b +?2vbracketrightbigIN?(n1 ?n2). Since summationtextT?1n=0 E (n)EH (n) = TIPQ, by defining cov{ ?H, ?H | H} := E{[ ?H?H][ ?H?H]H} and using (4.5) we have (see also (3.21)?(3.23)) EH braceleftBig cov braceleftBig ? H, ?H vextendsinglevextendsingle vextendsingleH bracerightBigbracerightBig = 1T2(?CH ?C)?1 ?CH cov braceleftBiggT?1summationdisplay n=0 E (n)?y(n), T?1summationdisplay n=0 E (n)?y(n) bracerightBigg ?CH(?CH ?C)?1 = 1T2(?CH ?C)?1 ?CH bracketleftBiggT?1summationdisplay n1=0 T?1summationdisplay n2=0 E (n1)EH (n2) bracketrightBigg ?cov{y(n),y(n)} ?CH(?CH ?C)?1 = (L+1)? 2 h? 2 b +? 2v T2 ( ?CH ?C)?1 ?CH bracketleftBiggT?1summationdisplay n1=0 T?1summationdisplay n2=0 E (n1)EH (n2) bracketrightBigg ?IN?(n1 ?n2) ?CH(?CH ?C)?1 = (L+1)? 2 h? 2 b +? 2v T ( ?CH ?C)?1 ?CHIPQ ?IN ?CH(?CH ?C)?1 = (L+1)? 2 h? 2 b +? 2v T parenleftBig? CH ?C parenrightBig?1 = (L+1)? 2?2 b +? 2v T parenleftBig? VH diag braceleftBig |c0|2,|c1|2,...,|cP?1|2 bracerightBig? V parenrightBig?1 ?INQ. 75 Since ??D = ?C ?H, then E{??D} = ?CE{ ?H}. By (3.9), it follows that E{?dmq} = 1T T?1summationdisplay n=0 E{y(n)}e?j(?q+?m)n = 1T T?1summationdisplay n=0 Lsummationdisplay l=0 h(n;l)c(n?l)e?j(?q+?m)n = Lsummationdisplay l=0 Qsummationdisplay q?=1 hq? (l) P?1summationdisplay m?=0 cm?e?j?m?l bracketleftBigg 1 T T?1summationdisplay n=0 e?j(?q??q?+?m??m?)n bracketrightBigg = Lsummationdisplay l=0 Qsummationdisplay q?=1 hq? (l) P?1summationdisplay m?=0 cm?e?j?m?l?parenleftbigm?m?parenrightbig?parenleftbigq?q?parenrightbig = Lsummationdisplay l=0 hq (l)cme?j?ml. Therefore, E{??D} = ?CH, or ?CE braceleftBig? H bracerightBig = ?CH. Since ?C is full column-rank and P ?L+1, we have E braceleftBig? H bracerightBig = H. Now we evaluate the MSE of the channel estimate given by (4.3): MSE1 = 1T T?1summationdisplay n=0 Lsummationdisplay l=0 E ? ? ? ? ? Qsummationdisplay q1=1 bracketleftBig hq1 (l)??hq1 (l) bracketrightBigH e?j?q1n ? ? ? ? Qsummationdisplay q2=1 bracketleftBig hq2 (l)??hq2 (l) bracketrightBig ej?q2n ? ? ? ? ? = Lsummationdisplay l=0 E ?? ? Qsummationdisplay q1=1 Qsummationdisplay q2=1 bracketleftBig hq1 (l)??hq1 (l) bracketrightBigHbracketleftBig hq2 (l)??hq2 (l) bracketrightBigbracketleftBigg1 T T?1summationdisplay n=0 e?j(?q1??q2)n bracketrightBigg?? ? = Lsummationdisplay l=0 E ? ? ? Qsummationdisplay q1=1 Qsummationdisplay q2=1 bracketleftBig hq1 (l)??hq1 (l) bracketrightBigHbracketleftBig hq2 (l)??hq2 (l) bracketrightBig ?(q1 ?q2) ? ? ? = tr braceleftBig EH braceleftBig cov braceleftBig ? H, ?H vextendsinglevextendsingle vextendsingleH bracerightBigbracerightBigbracerightBig = bracketleftbig(L+1)?2 h? 2 b +? 2vbracketrightbigNQ T?2c tr braceleftbiggparenleftBig ?VH diagbraceleftBig|?c0|2,|?c1|2,...,|?cP?1|2bracerightBig?VparenrightBig?1 bracerightbigg . (4.7) 76 Remark 4.2.1.1: If the zero-mean assumption of (H4.2.1.3) is relaxed, i.e., we follow the assumption (H3.2.3) instead, then the estimator follows (3.18) and the terms corre- sponding to m = 0 are discarded. We omit the entry E0 (n) in (4.4), and use C, V, and ?D in (3.13) (3.14), and (3.16) instead of ?C, ?V, and ??D. We have the MSE of the channel estimation when the mean of the noise is unknown: MSE1 = bracketleftbig(L+1)?2 h? 2 b +? 2vbracketrightbigNQ T?2c tr braceleftbiggparenleftBig VH diag braceleftBig |?c1|2,|?c2|2,...,|?cP?1|2 bracerightBig V parenrightBig?1bracerightbigg . (4.8) Remark 4.2.1.2: If we define an interference factor If as If = NQT?2 c tr braceleftbiggparenleftBig ?VH diagbraceleftBig|?c0|2,|?c1|2,...,|?cP?1|2bracerightBig?VparenrightBig?1 bracerightbigg or If = NQT?2 c tr braceleftbiggparenleftBig VH diag braceleftBig |?c1|2,|?c2|2,...,|?cP?1|2 bracerightBig V parenrightBig?1bracerightbigg , we can clearly see that the MSE of the channel estimation consists of two parts: one is given by (L+1)?2h?2bIf coming from the self-interference, and the other one ?2vIf is the noise-induced part. Normally (L+1)?2h?2b ??2v, so that the estimation error mainly comes from the interference from information data. 4.2.2 Performance Analysis for DPS-BEM-Based Estimator Consider (3.35). From observation y(n), the estimate ?dmq has contributions from the information sequence {b(n)}, which is unknown at the receiver, the superimposed training {c(n)}, which is known at the receiver, and the measurement noise v(n). It follows (3.1), 77 (3.2), (3.4), (3.25), and (3.35) that ?dmq = T?1summationdisplay n=0 y(n)uq (n)e?j?mn = T?1summationdisplay n=0 bracketleftBigg Lsummationdisplay l=0 h(n;l)c(n?l)+ Lsummationdisplay l=0 h(n;l)b(n?l)+v(n) bracketrightBigg uq (n)e?j?mn = T?1summationdisplay n=0 bracketleftBigg E{y(n)}+ Lsummationdisplay l=0 h(n;l)b(n?l)+v(n) bracketrightBigg uq (n)e?j?mn. Then by (H4.2.1)?(H4.2.4), (3.27), (3.34), and (3.35) E{?dmq} = T?1summationdisplay n=0 E{y(n)}uq (n)e?j?mn = T?1summationdisplay n=0 Qsummationdisplay q?=1 P?1summationdisplay m?=0 dm?q?uq? (n)ej?m?nuq (n)e?j?mn = Qsummationdisplay q?=1 P?1summationdisplay m?=0 dm?q??parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig = dmq. (4.9) Define wmq := T?1summationdisplay n=0 v(n)uq (n)e?j?mn, which is zero-mean and by (H4.2.3) and (3.34) Ebraceleftbigwm?q?wHmqbracerightbig= T?1summationdisplay n=0 T?1summationdisplay n?=0 Ebraceleftbigvparenleftbign?parenrightbigvH (n)bracerightbiguq?parenleftbign?parenrightbiguq (n)ej(?mn??m?n?) = T?1summationdisplay n=0 T?1summationdisplay n?=0 ?2vIN?parenleftbign? ?nparenrightbiguq?parenleftbign?parenrightbiguq (n)ej(?mn??m?n?) = ?2vIN?parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig. (4.10) 78 Thus, ?dmq = dmq +smq +wmq, (4.11) where smq := T?1summationdisplay n=0 bracketleftBigg Lsummationdisplay l=0 h(n;l)b(n?l) bracketrightBigg uq (n)e?j?mn. (4.12) Clearly, the information sequence?s contribution, which is given bysmq above, interferes with the estimation of dmq, hence with channel estimation from the observations (see (3.36)). Since ?C is full column-rank when P ? L+ 1, by (3.30) and (4.9), we have E{??D} = ?D and E{ ?H} = H. Then from (3.36) cov braceleftBig ? H, ?H vextendsinglevextendsingle vextendsingleH bracerightBig = parenleftBig? CH ?C parenrightBig?1 ? CH cov{??D, ??D}?C parenleftBig? CH ?C parenrightBig?1 . (4.13) Consider the zero-mean interference smq in (4.12), by (H4.2.2), (H4.2.5), and (3.34) Ebraceleftbigsm?q?sHmqbracerightbig= T?1summationdisplay n?=0 Lsummationdisplay l?=0 T?1summationdisplay n=0 Lsummationdisplay l=0 Ebraceleftbighparenleftbign?;l?parenrightbighH (n;l)bracerightbigEbraceleftbigbparenleftbign? ?l?parenrightbigb?(n?l)bracerightbig ?uq (n)uq?parenleftbign?parenrightbigej(?mn??m?n?) = (L+1)?2h?2bIN T?1summationdisplay n=0 uq (n)uq? (n)ej(?m??m?)n = (L+1)?2h?2bIN?parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig. (4.14) By (H4.2.3), since the noise v(n) uncorrelated with the zero-mean information sequence {b(n)} Ebraceleftbigsm?q?wHmqbracerightbig= 0. 79 Then by (4.10), (4.11) and (4.14), E braceleftBig [?dmq ?dmq][?dm?q? ?dm?q?]H bracerightBig = E braceleftBig [smq +wmq]bracketleftbigsm?q? +wm?q?bracketrightbigH bracerightBig = Ebraceleftbigsm?q?sHmqbracerightbig+Ebraceleftbigwm?q?wHmqbracerightbig =bracketleftbig(L+1)?2h?2b +?2vbracketrightbigIN?parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig. We further have cov{??D, ??D} =bracketleftbig(L+1)?2h?2b +?2vbracketrightbigINPQ. (4.15) Substitute (4.15) for (4.13) cov braceleftBig ? H, ?H vextendsinglevextendsingle vextendsingleH bracerightBig =bracketleftbig(L+1)?2h?2b +?2vbracketrightbig parenleftBig? CH ?C parenrightBig?1 . Using the orthonormality of the DPS sequence, the MSE in channel estimation (4.2) is then given by MSE1 = 1T T?1summationdisplay n=0 Lsummationdisplay l=0 E ? ? ? ? ? Qsummationdisplay q1=1 bracketleftBig hq1 (l)??hq1 (l) bracketrightBigH uq1 (n) ? ? ? ? Qsummationdisplay q2=1 bracketleftBig hq2 (l)??hq2 (l) bracketrightBig uq2 (n) ? ? ? ? ? = 1T Lsummationdisplay l=0 E ? ? ? Qsummationdisplay q1=1 Qsummationdisplay q2=1 bracketleftBig hq1 (l)??hq1 (l) bracketrightBigHbracketleftBig hq2 (l)??hq2 (l) bracketrightBig ?(q1 ?q2) ? ? ? = 1T tr{cov braceleftBig ? H, ?H vextendsinglevextendsingle vextendsingleH bracerightBig } = bracketleftbig(L+1)?2 h? 2 b +? 2vbracketrightbigNQ T?2c tr braceleftbiggparenleftBig ?VH diagbraceleftBig|?c0|2,|?c1|2,...,|?cP?1|2bracerightBig?VparenrightBig?1 bracerightbigg . (4.16) 80 Remark 4.2.2.1: If the mean of noise v(n) is unknown, we replace ?C in (4.16) with C as in (3.13). Then the MSE in channel estimation is MSE1 = bracketleftbig(L+1)?2 h? 2 b +? 2vbracketrightbigNQ T?2c tr braceleftbiggparenleftBig VH diag braceleftBig |?c1|2,|?c2|2,...,|?cP?1|2 bracerightBig V parenrightBig?1bracerightbigg . (4.17) Whether the noise is zero-mean or not, the interference from the information sequence contributes a majority of the estimation error. Remark 4.2.2.2: We assume (3.34) which holds precisely for CE-BEM if replacing uq (n) withej?qn, so that the CE-BEM-based estimator and the DPS-BEM-based estimator give us the same MSE results (compare (4.7) and (4.8) with (4.16) and (4.17)), given the assumption that the ?true? channel follows CE- or DPS-BEM respectively. In this section, the modeling error of a BEM in describing a real channel has been omitted. 4.2.3 Performance Analysis for OP-BEM-Based Estimator Now we turn to the OP-BEM-based channel estimator (3.46). The noise v(n) is of unknown mean m. Similar to (4.6) we define ?x(n) := y(n)?E{y(n)|H} = Lsummationdisplay l=0 h(n;l)b(n?l)+v(n)?m, then Ebraceleftbig?x(n1)?xH (n2)vextendsinglevextendsingleHbracerightbig = Lsummationdisplay l1=0 Lsummationdisplay l2=0 h(n1;l1)hH (n2;l2)?2b?(n1 ?n2 ?l1 +l2) +?2vIN?(n1 ?n2). 81 Since h(n;l)?s are independent for different l?s, EHbraceleftbigEbraceleftbig?x(n1)?xH (n2)vextendsinglevextendsingleHbracerightbigbracerightbig = E braceleftBigg Lsummationdisplay l=0 h(n1;l)hH (n2;l) bracerightBigg ?2b?(n1 ?n2) +?2vIN?(n1 ?n2) =bracketleftbig(L+1)?2h?2b +?2vbracketrightbigIN?(n1 ?n2). (4.18) By (3.46) and (4.18), we have EH braceleftBig cov braceleftBig ? H, ?H vextendsinglevextendsingle vextendsingleH bracerightBigbracerightBig = 1T2 T?1summationdisplay n1=0 T?1summationdisplay n2=0 ??OP?(n1)?H (n2)??HOP ?EHbraceleftbigEbraceleftbig?x(n1)?xH (n2)vextendsinglevextendsingleHbracerightbigbracerightbig = bracketleftbig(L+1)?2 h? 2 b +? 2vbracketrightbig T2 ? ? OP bracketleftBiggT?1summationdisplay n=0 ?(n)?H (n) bracketrightBigg ??HOP ?IN. (4.19) Note that 1T summationtextT?1n=0 ?(n)?H (n) = ?OP and ?OP = ?HOP. We define the normalized ??OP := ?OP/?2c, then (4.19) becomes EH braceleftBig cov braceleftBig ? H, ?H vextendsinglevextendsingle vextendsingleH bracerightBigbracerightBig = bracketleftbig(L+1)?2 h? 2 b +? 2vbracketrightbig T?2c ???OP ?IN. Let ?(n) = bracketleftbigg ?0 (n)IN ?1 (n)IN ??? ?K (n)IN bracketrightbigg , ?(n) = IL+1 ??(n). 82 By the orthonormality of {?q (n)}T?1n=0, summationtextTn=1 ?H (n)?(n) = IN(L+1)(K+1). It follows the OP-BEM (3.38) and (3.47) that hBEM (n;l) = ?(n)Hl and ?hBEM (n;l) = ?(n) ?Hl. The estimation MSE is given by MSE1 = 1T T?1summationdisplay n=0 Lsummationdisplay l=0 EHE braceleftbiggbraceleftbiggvextendsinglevextendsingle vextendsingle?hBEM (n;l)?hBEM (n;l) vextendsinglevextendsingle vextendsingle 2vextendsinglevextendsinglevextendsingle vextendsingleH bracerightbiggbracerightbigg = 1T T?1summationdisplay n=0 tr braceleftBig ?(n)EH braceleftBig cov braceleftBig ? H, ?H vextendsinglevextendsingle vextendsingleH bracerightBigbracerightBig ?H (n) bracerightBig = (L+1)? 2 h? 2 b +? 2v T2?2c T?1summationdisplay n=0 tr braceleftBig ?(n) parenleftBig? ??OP ?IN parenrightBig ?H (n) bracerightBig = (L+1)? 2 h? 2 b +? 2v T2?2c T?1summationdisplay n=0 tr braceleftBig IN(L+1)(K+1) parenleftBig? ??OP ?IN parenrightBigbracerightBig = bracketleftbig(L+1)?2 h? 2 b +? 2vbracketrightbigN T2?2c tr ???OP. Remark 4.2.3.1: If the measurement noise is zero-mean, i.e., m = 0, the channel MSE is given by MSE1 = bracketleftbig(L+1)?2 h? 2 b +? 2vbracketrightbigN T2?2c tr ????OP, (4.20) where ???OP = ??OP/?2c (see Remark 3.4.3 for the definition of ??OP and related discus- sions). Remark 4.2.3.2: In Remark 3.4.2, we have shown that this estimator can apply to any BEM representation. To reconfirm this, we now consider the channel MSE given by (4.7), (4.16), and (4.20), where we assume m = 0 in all the three cases. The entries of ???OP 83 are ???(q,q1,l,l1) := 1 T T?1summationdisplay n=0 ?q (n)??q1 (n)?c(n?l)?c?(n?l1). We denote ???CE and ???DPS if we replace ?q (n) with ej?qn/?T and uq (n) in ???OP. Since we have 1 T T?1summationdisplay n=0 ej?qn? T e?j?q1n? T ?c(n?l)?c ?(n?l1) = 1 T P?1summationdisplay m=0 |?cm|2e?j?m(l?l?)?(q?q1) by (3.49), and 1 T T?1summationdisplay n=0 uq (n)uq1 (n)?c(n?l)?c?(n?l1) = 1T P?1summationdisplay m=0 |?cm|2e?j?m(l?l?)?(q?q1) by (3.34), then it follows that ???DPS ? ???CE = ?VH diagbraceleftBig|c0|2,|c1|2,...,|cP?1|2bracerightBig?V?IQ. Thus, if we replace ?q (n) with ej?qn/?T or uq (n) in ???OP, (4.20) gives us the same result as (4.7) or (4.16). 4.2.4 Performance Analysis for Multiple-User (MIMO) Channels We now analyze the estimation performance of the MIMO estimator proposed in Sec- tion 3.5. Due to band-limitedness, the analysis approach used in Section 4.2.2 can also apply to CE-BEM. We hence consider the performance of CE- and DPS-BEM-based MIMO channel estimator in this way. 84 We assume the following: (H4.2.4.1) The time-varying channel satisfies a BEM representation, as in (2.20), i.e., hk (n;l) = hBEMk (n;l) = Qsummationdisplay q=1 hqk (l)?q (n), where ?q (n) is the q-th basis function (corresponding to ej?qn in CE-BEM (2.9) and uq (n) in DPS-BEM (2.19)), and Q is the number of basis functions. Also N ? 1. (H4.2.4.2) The information sequence {bk(n)} is zero-mean, white with E{| bk(n)|2} = ?2bk and mutually independent for k = 1,2,...,K. (H4.2.4.3) The measurement noise {v(n)} is zero-mean (m = 0), white, uncorrelated with {bk(n)}, with E{v(n+?)vH(n)} = ?2vIN?(?). (H4.2.4.4) The superimposed training sequence ck(n) = ck(n + P) for all n is a non- random periodic sequence with period P and average power ?2ck :=summationtextP?1n=0 |ck (n)|2/P such that cmk negationslash= 0 for all m,k, and ?P is integer with P = ?PK. (H4.2.4.5) The time-varying channel{hk (n;l)}is zero-mean, complex Gaussian with vari- ance ?2hk, and mutually independent for distinct l?s: Ebraceleftbighk (n;l)hHk (n;l)bracerightbig = ?2hIN and E{hk (n1;l1)hHk (n2;l2)} = 0, for l1 negationslash= l2, for all n1, n2, i.e., different channel taps are independent of each other and are identically distributed zero-mean complex Gaussian. In addition, E{hk? (n1;l1)hHk (n2;l2)} = 0 for for all n1, n2, l1, and l2 if k? negationslash= k, i.e., the channels of different users are mutually independent. 85 Considering (3.71) and (3.73), we have ?dmqk = Tsummationdisplay n=1 y(n)??q (n)e?j?mkn, (4.21) where ?q (n) := ej?qn/?T for the CE-BEM and ?q (n) := uq (n) for the DPS-BEM. By (3.70) and (3.72), T?1summationdisplay n=0 ?q? (n)??q (n)ej(?m?k???mk)n ??parenleftbigk? ?kparenrightbig?parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig. (4.22) From y(n), the estimate ?dmqk has contributions from the information sequence {bk (n)} (k = 1,2,...,K) unknown at the receiver, the superimposed training {ck (n)} known at the receiver, and the measurement noise v(n). It follows (3.57)?(3.59) that ?dmqk = T?1summationdisplay n=0 y(n)??q (n)e?j?mkn = T?1summationdisplay n=0 bracketleftBigg E{y(n)}+ Ksummationdisplay k=1 Lsummationdisplay l=0 hk (n;l)b(n?l)+v(n) bracketrightBigg ??q (n)e?j?mkn. By (H4.2.4.1)?(H4.2.4.5), (3.62) (note that m = 0), (4.21), and (4.22) E{?dmqk} = T?1summationdisplay n=0 E{y(n)}??q (n)e?j?mkn = T?1summationdisplay n=0 Ksummationdisplay k=1 ?P?1summationdisplay m=0 Qsummationdisplay q=1 dm?q?k??q? (n)ej?m?k?n??q (n)e?j?mkn = dmqk. (4.23) 86 Define wmqk := T?1summationdisplay n=0 v(n)?q (n)e?j?mkn, which is zero-mean and by (H4.2.4.3) and (4.22) Ebraceleftbigwm?q?k?wHmqkbracerightbig= T?1summationdisplay n=0 T?1summationdisplay n?=0 Ebraceleftbigvparenleftbign?parenrightbigvH (n)bracerightbig?q?parenleftbign?parenrightbig??q (n)ej(?mkn??m?k?n?) = ?2vIN?parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig?parenleftbigk? ?kparenrightbig. (4.24) Thus ?dmqk = dmqk +smqk +wmqk where smqk := T?1summationdisplay n=0 bracketleftBigg Ksummationdisplay k?=1 Lsummationdisplay l=0 hk? (n;l)bk? (n?l) bracketrightBigg ??q (n)e?j?mkn. (4.25) Also, the information sequence?s contribution, given by smqk above, interferes with the estimation of dmqk, hence with channel estimation from the observations. Since ?Ck is full column-rank when ?P ?L+1, by (3.79) and (4.23), we haveE{??Dk} = ?Dk and E{ ?Hk} = Hk. Then by (3.79) cov braceleftBig ? Hk, ?Hk vextendsinglevextendsingle vextendsingleHk bracerightBig = parenleftBig? CHk ?Ck parenrightBig?1 ? CHk cov{??Dk, ??Dk}?Ck parenleftBig? CHk ?Ck parenrightBig?1 . (4.26) Consider the zero-mean interference smqk in (4.25), by (H4.2.4.2), (H4.2.4.5), and (4.22) Ebraceleftbigsm?q?k?sHmqkbracerightbig = T?1summationdisplay n?=0 Lsummationdisplay l?=0 T?1summationdisplay n=0 Lsummationdisplay l=0 Ksummationdisplay k?=1 Ksummationdisplay k=1 Ebraceleftbighk?parenleftbign?;l?parenrightbighHk (n;l)bracerightbigEbraceleftbigbk?parenleftbign? ?l?parenrightbigb?k (n?l)bracerightbig 87 ??q (n)??q?parenleftbign?parenrightbigej(?mkn??m?k?n?) = T?1summationdisplay n=0 Lsummationdisplay l=0 Ksummationdisplay k?=1 Ksummationdisplay k=1 Ebraceleftbighk (n;l)hHk (n;l)bracerightbig?2bk?parenleftbigk? ?kparenrightbig?q (n)??q? (n)ej(?mkn??m?k?n) = Ksummationdisplay k=1 (L+1)?2hk?2bkIN T?1summationdisplay n=0 ?q (n)??q? (n)ej(?mkn??m?k?n) = (L+1)IN parenleftBigg Ksummationdisplay k=1 ?2hk?2bk parenrightBigg ?parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig?parenleftbigk? ?kparenrightbig. (4.27) By (H4.2.4.3), since the noise v(n) are uncorrelated with {bk(n)}, E braceleftBig sm?q?k?wHmqk bracerightBig = 0. Then by (4.24) and (4.27), E braceleftBig [?dmqk ?dmqk][?dm?q?k? ?dm?q?k?]H bracerightBig = Ebraceleftbigsm?q?k?sHmqkbracerightbig+Ebraceleftbigwm?q?k?wHmqkbracerightbig = bracketleftBigg (L+1) Ksummationdisplay k=1 ?2hk?2bk +?2v bracketrightBigg IN?parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig?parenleftbigk? ?kparenrightbig. We have cov{??Dk, ??Dk} = bracketleftBigg (L+1) Ksummationdisplay k=1 ?2hk?2bk +?2v bracketrightBigg IN ?PQ. (4.28) Substitute (4.28) for (4.26) cov braceleftBig ? Hk, ?Hk vextendsinglevextendsingle vextendsingleHk bracerightBig = bracketleftBigg (L+1) Ksummationdisplay k=1 ?2hk?2bk +?2v bracketrightBiggparenleftBig ?CHk ?CkparenrightBig?1. Using orthonormality of the basis functions, the channel MSE for thek-th user is then given by MSE1k = 1T T?1summationdisplay n=0 Lsummationdisplay l=0 E ? ? ? vextenddoublevextenddouble vextenddoublevextenddouble vextenddoublevextenddouble Qsummationdisplay q=1 bracketleftBig hqk (l)??hqk (l) bracketrightBigH ??qk (n) vextenddoublevextenddouble vextenddoublevextenddouble vextenddoublevextenddouble 2?? ? 88 = 1T tr{cov braceleftBig ? Hk, ?Hk vextendsinglevextendsingle vextendsingleHk bracerightBig } = bracketleftBigg (L+1) Ksummationdisplay k=1 ?2hk?2bk +?2v bracketrightBigg tr braceleftbiggparenleftBig ?CHk ?CkparenrightBig?1 bracerightbigg = bracketleftBig (L+1)summationtextKk=1?2hk?2bk +?2v bracketrightBig NQ T?2c tr braceleftbiggparenleftBig ?VHk diagbraceleftBig|?c0|2,|?c1|2,...,vextendsinglevextendsingle?c?P?1vextendsinglevextendsingle2bracerightBig?VkparenrightBig?1 bracerightbigg . (4.29) Remark 4.2.4.1: If the mean of v(n) is unknown, we have MSE1k = bracketleftBig (L+1)summationtextKk=1?2hk?2bk +?2v bracketrightBig NQ T?2c tr braceleftbiggparenleftBig VHk diag braceleftBig |?c1|2,|?c2|2,...,vextendsinglevextendsingle?c?P?1vextendsinglevextendsingle2 bracerightBig Vk parenrightBig?1bracerightbigg . Remark 4.2.4.2: Compare (4.29) with (4.7) and (4.16), we can see that for the esti- mation of a multiple-user channel, multiple-user interference (MUI) exists and information data from all users act as interference. The MUI linearly increases as more users join the system. 4.3 Performance Analysis for the First-Order Statistics-Based Estimator: with Modeling Error In Section 4.2, we assumed that the channel (SIMO or MIMO) follows a BEM rep- resentation, i.e., the modeling error was omitted. In practice, modeling error always exits (see Section 2.6). In approximating a band-limited channel, CE-, OP-, and DPS-BEM have 89 distinct performances. If modeling error is considered, (4.1) is now revised as h(n;l) = hBEM (n;l) +eBEM (n;l) = Qsummationdisplay q=1 hq (l)?q (n)+eBEM (n;l) where eBEM (n;l) is the modeling error that is intrinsic to the BEM representation and has nothing to do with BEM-based channel estimation. Therefore, the MSE of channel estimation consists of two parts: One comes from the estimation, which has been discussed in Section 4.2; the other part arises from the modeling error. Consider a ?complete? basis matrix ?T := ? ?? ?? ?? ?? ?? ?1 (0) ?2 (0) ??? ?T (0) ?1 (1) ?2 (1) ??? ?T (1) ... ... ... ... ?1 (T ?1) ?2 (T ?1) ??? ?T (T ?1) ? ?? ?? ?? ?? ?? , where{?q (n)}Tq=1 represents braceleftBig ej2pi[q?(T+1)/2]/T/?T bracerightBigT q=1 in CE-BEM, or modifiedLegendre polynomials of degree 0 to T ?1 in OP-BEM, or the total T eigenvectors of the matrix C defined in (2.17). Note that ?T is unitary, and for an arbitrary channel {h(n;l)}T?1n=0, the following formula is always true: h(n;l) = Tsummationdisplay q=1 hq (l)?q (n), 90 or equivalently bracketleftbigg hT (0;l) hT (1;l) ??? hT (T ?1;l) bracketrightbiggT = (?T ?IN) bracketleftbigg hT1 (l) hT2 (l) ??? hTT (l) bracketrightbiggT , (4.30) where hq (l) = T?1summationdisplay n=0 h(n;l)??q (n). (4.31) Given the above representations, one may view the BEM representation (2.20) as an ap- proximation where we use only Q (out of total T) basis functions to describe the channel. In a BEM, (4.30) is approximated by using bracketleftbigg hTBEM (0;l) hTBEM (1;l) ??? hTBEM (T ?1;l) bracketrightbiggT = (?Q ?IN) bracketleftbigg hT1 (l) hT2 (l) ??? hTQ (l) bracketrightbiggT where ?Q consists of the first Q columns of ?T. The modeling error is given by bracketleftbigg eTBEM (0;l) eTBEM (1;l) ??? eTBEM (T ?1;l) bracketrightbiggT = (?T?Q ?IN) bracketleftbigg hTQ+1 (l) hTQ+2 (l) ??? hTT (l) bracketrightbiggT , (4.32) where ?T?Q consists of the last T ?Q columns of ?T. The MSE in channel estimation is now given by MSEc = 1T T?1summationdisplay n=0 Lsummationdisplay l=0 E braceleftbiggvextenddoublevextenddouble vextenddoubleh(n;l)??hBEM (n;l) vextenddoublevextenddouble vextenddouble 2bracerightbigg (4.33) 91 where ?hBEM (n;l) follows (4.3). Since ?HQ?T?Q = 0, we have 1 T T?1summationdisplay n=0 bracketleftBig hBEM (n;l)??hBEM (n;l) bracketrightBigH eBEM (n;l) = 0, so that MSEc = 1T T?1summationdisplay n=0 Lsummationdisplay l=0 E braceleftbiggvextenddoublevextenddouble vextenddoublehBEM (n;l)??hBEM (n;l) vextenddoublevextenddouble vextenddouble 2bracerightbigg+ 1 T T?1summationdisplay n=0 Lsummationdisplay l=0 E braceleftBig bardbleBEM (n;l)bardbl2 bracerightBig = MSE1 +MSE2 (4.34) where we define MSE2 := 1T T?1summationdisplay n=0 Lsummationdisplay l=0 E braceleftBig bardbleBEM (n;l)bardbl2 bracerightBig as the mean square modeling error. It follows by (4.31) and (4.32) that MSE2 = 1T Tsummationdisplay q=Q+1 Lsummationdisplay l=0 E braceleftBig bardblhq (l)bardbl2 bracerightBig = 1T Tsummationdisplay q=Q+1 Lsummationdisplay l=0 T?1summationdisplay n1=0 T?1summationdisplay n2=0 Rh (n1;n2)?q (n1)??q (n2) where Rh (n1,n2;l) := EbraceleftbighH (n1;l)h(n2;l)bracerightbig. For example, we consider the CE-BEM representation for a modified Jakes? channel. We follow all the other assumptions in Section 4.2 except (H4.2.1). Then Rh (n1,n2;l) = N?2hJ0 (2pifdTs (n1 ?n2)) 92 where J0 (?) denotes the zero-th order Bessel function of the first kind. We have MSE2 = L+1T2 T?1summationdisplay n1=0 T?1summationdisplay n2=0 N?2hJ0 (2pifdTs (n1 ?n2)) ? ?? T?Q+12summationdisplay q=Q+12 ej2piq(n1?n2)T ? ?? = L+1T N?2h bracketleftBigg (T ?Q)?2 T?1summationdisplay ?=1 parenleftBig 1? ?T parenrightBig J0 (2pifdTs?) sin pi?Q T sin pi?T bracketrightBigg . (4.35) 4.4 Training Power Allocation We address the issue of superimposed training power allocation in this section, i.e., we seek the optimal power assignment to training and information under a fixed transmitted power budget. To this end, the more general OP-BEM-based estimator in Section 3.4 is considered that can easily apply to other BEM representations. For our convenience, we assume (H4.2.1) holds, i.e., the channel satisfies a BEM so that modeling error is omitted. We define the training power overhead ? as ? := 1 P summationtextP n=1|c(n)| 2 1 P summationtextP n=1E braceleftBig |s(n)|2 bracerightBig = ? 2c ?2b +?2c. (4.36) For a fixed SNR or transmitted power budget, higher ? implies smaller effective SNR at the receiver, due to decreased power in the information sequence, but higher channel estimation accuracy. Removing the estimated time-varying mean from the received data, define ?y(n) := y(n)? Lsummationdisplay l=0 ?hBEM (n;l)c(n?l)? ?m. 93 Assuming that ?m = m, we have ?y(n) ? Lsummationdisplay l=0 ?hBEM (n;l)b(n?l) + Lsummationdisplay l=0 bracketleftBig hBEM (n;l)??hBEM (n;l) bracketrightBig [b(n?l) +c(n?l)]+ ?v(n). (4.37) In (4.37), define the effective signal as xs (n) := Lsummationdisplay l=0 ?hBEM (n;l)b(n?l), (4.38) and the effective noise as w(n) := Lsummationdisplay l=0 bracketleftBig hBEM (n;l)??hBEM (n;l) bracketrightBig [b(n?l) +c(n?l)] + ?v(n). (4.39) When using ?hBEM (n;l) for equalization or detection, the variance of the effective noise w(n) contains channel estimation error variance as a component, which in turn depends on ?. An ?optimum? value of ? for the superimposed training method may be obtained by maximizing the SNR in (4.37) with respect to ?, which is defined as SNRe (?,n) = ? 2xs (n) ?2w (n) (4.40) 94 under the constraint of a fixed transmitted power, i.e., ?2b +?2c = PT. In (4.38), the signal power at time n is given by ?2xs (n) = E braceleftBig bardblxs (n)bardbl2 bracerightBig = ?2b Lsummationdisplay l=0 EH braceleftbigg E braceleftbiggvextenddoublevextenddouble vextenddouble?hBEM (n;l) vextenddoublevextenddouble vextenddouble 2vextendsinglevextendsinglevextendsingle vextendsingleH bracerightbiggbracerightbigg +Oparenleftbig1/T2parenrightbig = ?2b Lsummationdisplay l=0 bracketleftbigg EH braceleftbigg E braceleftbiggvextenddoublevextenddouble vextenddoublehBEM (n;l)??hBEM (n;l) vextenddoublevextenddouble vextenddouble 2vextendsinglevextendsinglevextendsingle vextendsingleH bracerightbiggbracerightbigg +E braceleftBig bardblhBEM (n;l)bardbl2 bracerightBigbracketrightbigg +Oparenleftbig1/T2parenrightbig, (4.41) where the Oparenleftbig1/T2parenrightbig term accounts for the dependence between {?hBEM (n;l)} and {b(n)} (see Appendix of [32] for the corresponding details in the time-invariant case). Therefore, the time average of signal power is given by (omitting Oparenleftbig1/T2parenrightbig terms) ??2xs := 1T T?1summationdisplay n=0 ?2xs (n) = ?2b bracketleftbigMSE1 +(L+1)?2hbracketrightbig. (4.42) The noise power at time n is given by ?2w (n) = E braceleftBig bardblw(n)bardbl2 bracerightBig = ?2b Lsummationdisplay l=0 EH braceleftbigg E braceleftbiggvextenddoublevextenddouble vextenddoublehBEM (n;l)??hBEM (n;l) vextenddoublevextenddouble vextenddouble 2vextendsinglevextendsinglevextendsingle vextendsingleH bracerightbiggbracerightbigg +N?2v +?2c Lsummationdisplay l1=0 Lsummationdisplay l2=0 E braceleftbiggbracketleftBig ?hBEM (n;l1)?hBEM (n;l1)bracketrightBigHbracketleftBig?hBEM (n;l2)?hBEM (n;l2)bracketrightBig bracerightbigg ??c(n?l1)? ?c(n?l2)+Oparenleftbig1/T2parenrightbig, (4.43) 95 where in a manner similar to (3.41), theOparenleftbig1/T2parenrightbigterm accounts for the dependence between {?hBEM (n;l)} and {b(n)}. We define E (n) := bracketleftbigg ?c(n)IN(K+1) ?c(n?1)IN(K+1) ??? ?c(n?L)IN(K+1) bracketrightbigg and consider Lsummationdisplay l1=0 Lsummationdisplay l2=0 E braceleftbiggbracketleftBig ?hBEM (n;l1)?hBEM (n;l1)bracketrightBigHbracketleftBig?hBEM (n;l2)?hBEM (n;l2)bracketrightBig bracerightbigg ??c(n?l1)? ?c(n?l2) = Lsummationdisplay l1=0 Lsummationdisplay l2=0 EH braceleftbigg E braceleftbigg ?(n) parenleftBig? Hl1 ?Hl1 parenrightBigparenleftBig? Hl2 ?Hl2 parenrightBigH ?H (n) vextendsinglevextendsingle vextendsinglevextendsingleH bracerightbiggbracerightbigg ?c(n?l1)?c?(n?l2) = ?(n)EH braceleftbigg E braceleftbigg E (n) parenleftBig? H?H parenrightBigparenleftBig? H?H parenrightBigH EH (n) vextendsinglevextendsingle vextendsinglevextendsingleH bracerightbiggbracerightbigg ?H (n) = bracketleftbig(L+1)?2 h? 2 b +? 2vbracketrightbig T?2c tr braceleftBigparenleftBig? ??OP ?IN parenrightBig EH (n)?H (n)?(n)E (n) bracerightBig . Therefore (4.43) can be written as ?2w (n) = ?2b Lsummationdisplay l=0 EH braceleftbigg E braceleftbiggvextenddoublevextenddouble vextenddoublehBEM (n;l)??hBEM (n;l) vextenddoublevextenddouble vextenddouble 2vextendsinglevextendsinglevextendsingle vextendsingleH bracerightbiggbracerightbigg +N?2v +Oparenleftbig1/T2parenrightbig + bracketleftbig(L+1)?2 h? 2 b +? 2vbracketrightbig T tr braceleftBigparenleftBig? ??OP ?IN parenrightBig EH (n)?H (n)?(n)E (n) bracerightBig . Taking time average of the noise power and omitting Oparenleftbig1/T2parenrightbig terms, we have ??2w := 1T T?1summationdisplay n=0 ?2w (n) = ?2b MSE1 +N?2v + bracketleftbig(L+1)?2?2 b +? 2vbracketrightbig T2 tr braceleftBigparenleftBig? ??OP ?IN parenrightBig K bracerightBig , (4.44) 96 where K := Tsummationdisplay n=1 EH (n)?H (n)?(n)E (n). We define the time average version of (4.40) as SNRd (?) = ?? 2xs ??2w . (4.45) Using the constraint ?2b +?2c = PT, we have ?2c = ?PT and ?2b = (1??)PT. Incorporating these constraint-carrying variables in (4.42), (4.44), and (4.45), we have an unconstrained cost SNRd (?) = f1? 2 +f2?+f3 g1?2 +g2? +g3, where f1 = (L+1)?2h parenleftBig Ntr ???OP ?T2 parenrightBig , f2 = ?(L+1)?2h parenleftBig 2Ntr ???OP ?T2 parenrightBig ? ? 2vNtr ??? OP PT , f3 = (L+1)?2hNtr ???OP + N? 2v tr ??? OP PT , g1 = (L+1)?2h parenleftBig Ntr ???OP ?tr braceleftBigparenleftBig? ??OP ?IN parenrightBig K bracerightBigparenrightBig , g2 = ?(L+1)?2h parenleftBig 2N tr ???OP ?tr braceleftBigparenleftBig? ??OP ?IN parenrightBig K bracerightBigparenrightBig + ?2v parenleftBig tr braceleftBigparenleftBig? ??OP ?IN parenrightBig K bracerightBig ?Ntr ???OP +T2 parenrightBig PT , g3 = f3. 97 We seek the optimum value of ? by taking the derivative d[SNRd (?)] d? = (f1g2 ?f2g1)?2 +2(f1g3 ?f3g1)? +f2g3 ?f3g2 (g1?2 +g2?+g3)2 = 0, the root of which lying in [0,1] is ?opt = (f1g2 ?f2g1)?1 (f3g1 ?f1g3 ? radicalBig ?f1f2g2g3 ?2f1f3g1g3 ?f2f3g1g2 +f22g1g3 +f1f3g22 +f21g23 +f23g21 parenrightbigg (4.46) Since h(n;l)?s are mutually independent for different l?s, the calculation can be sim- plified if we suppose ?hBEM (n;l)?s are also approximately uncorrelated for distinct l?s, i.e., E braceleftbiggbracketleftBig ?hBEM (n;l1)?h(n;l1)bracketrightBigHbracketleftBig?hBEM (n;l2)?h(n;l2)bracketrightBig bracerightbigg ? 0, for l1 negationslash= l2. Then (4.43) becomes (omitting Oparenleftbig1/T2parenrightbig terms) ?2w (n) ??2b Lsummationdisplay l=0 EH braceleftbigg E braceleftbiggvextenddoublevextenddouble vextenddouble?hBEM (n;l) vextenddoublevextenddouble vextenddouble 2vextendsinglevextendsinglevextendsingle vextendsingleH bracerightbiggbracerightbigg +N?2v +?2c Lsummationdisplay l=0 E braceleftbiggvextenddoublevextenddouble vextenddouble?hBEM (n;l)?hBEM (n;l) vextenddoublevextenddouble vextenddouble 2bracerightbigg?c?(n?l)?c(n?l). (4.47) In addition, if ?c?(n)?c(n) keeps constant for all n (e.g., ?c(n) = ?1), (4.47) can be further reduced to ?2w (n) ? Lsummationdisplay l=0 EH braceleftbigg E braceleftbiggvextenddoublevextenddouble vextenddouble?hBEM (n;l) vextenddoublevextenddouble vextenddouble 2vextendsinglevextendsinglevextendsingle vextendsingleH bracerightbiggbracerightbiggparenleftbig ?2b +?2cparenrightbig+?2v. Then ??2xs = ?2b bracketleftbigMSE1 +(L+1)?2hbracketrightbig and ??2w = MSE1parenleftbig?2b +?2cparenrightbig+?2v, 98 and SNRd (?) = ?? 2xs ??2w = f1?2 +f2? +f3 g?1?+g?2 where f1 = (L+1)?2 parenleftBig Ntr ???OP ?T2 parenrightBig , f2 = ?(L+1)?2 parenleftBig 2Ntr ???OP ?T2 parenrightBig ? N? 2v tr ??? OP PT , f3 = (L+1)?2N tr ???OP + N? 2v tr ??? OP PT , g?1 = ? 2vT2 PT ?(L+1)? 2Ntr ??? OP, g?2 = f3. Setting the first derivative of SNRd (?) to be zero to get the optimum ?, we have ??opt = g ?2 g?1 bracketleftBigg ?1+ radicalBigg 1 + g ?1 (f3g?1 ?f2g?2) g?22 f1 bracketrightBigg . (4.48) 4.5 Bias-Variance Trade-Off In Section 4.4, we have addressed the issue of training power allocation from an equal- ization viewpoint by maximizing SNR for data detection. This method can also be applied in selection of the number of basis functions Q, since SNRd in (4.45) is also a function of Q. In this section, we give the CE-BEM-based analysis as an example to clarify our trade-off. Note that modeling error of a BEM must be considered in this issue. Consider(4.37)?(4.39), which will beusedfor equalization withxs (n) acting as ?signal? and w(n) as ?noise?. Since the modeling error is now considered, by (4.41) the time-average 99 of ?2xs (n) now becomes (omitting Oparenleftbig1/T2parenrightbig terms) ??2xs := 1T T?1summationdisplay n=0 ?2xs (n) = ?2b bracketleftBigg MSE1 +1T T?1summationdisplay n=0 Lsummationdisplay l=0 E braceleftBig bardblhBEM (n;l)bardbl2 bracerightBigbracketrightBigg = ?2b bracketleftBigg MSE1 +1T T?1summationdisplay n=0 Lsummationdisplay l=0 E braceleftBig bardblh(n;l)bardbl2 bracerightBig ? 1T T?1summationdisplay n=0 Lsummationdisplay l=0 E braceleftBig bardbleBEM (n;l)bardbl2 bracerightBigbracketrightBigg = ?2b bracketleftbigMSE1 +N(L+1)?2h ?MSE2bracketrightbig. Next consider the power of the noise (4.43), the time-average of which (omitting Oparenleftbig1/T2parenrightbig terms) is given by ??2w := 1T T?1summationdisplay n=0 ?2w (n) = ?2b MSEc +N?2v +R where we define R := 1T T?1summationdisplay n=0 Lsummationdisplay l1=0 Lsummationdisplay l2=0 EbraceleftbigeH2 (n;l1)e2 (n;l2)bracerightbigc?(n?l1)c(n?l2) e1 (n;l) := ?hBEM (n;l)?hBEM (n;l), e2 (n;l) := e1 (n;l)?eBEM (n;l). It turns out that R = R1 +R2, where R1 := 1T T?1summationdisplay n=0 Lsummationdisplay l1=0 Lsummationdisplay l2=0 EbraceleftbigeH1 (n;l1)e1 (n;l2)bracerightbigc?(n?l1)c(n?l2), R2 := 1T T?1summationdisplay n=0 Lsummationdisplay l1=0 Lsummationdisplay l2=0 EbraceleftbigeHBEM (n;l1)eBEM (n;l2)bracerightbigc?(n?l1)c(n?l2). 100 By (H4.2.5), then EbraceleftbigeHBEM (n;l1)eBEM (n;l2)bracerightbig= 0 for l1 negationslash= l2. Thus R2 = 1T T?1summationdisplay n=0 Lsummationdisplay l=0 EbraceleftbigeHBEM (n;l)eBEM (n;l)bracerightbig P?1summationdisplay m1=0 P?1summationdisplay m2=0 c?m1cm2ej(?m1??m2)le?j(?m1??m2)n = Lsummationdisplay l=0 E ?? ? ?? T?Q+12summationdisplay q1=Q+12 T?Q+12summationdisplay q2=Q+12 hHq1 (l)hq2 (l) ?? ? ?? bracketleftBigg 1 T T?1summationdisplay n=0 ej(?q2??q1??m1+?m2)n bracketrightBigg ? P?1summationdisplay m1=0 P?1summationdisplay m2=0 c?m1cm2ej(?m1??m2)l. We consider the correlation between hq1 (l) and hq2 (l), EbraceleftbighHq1 (l)hq2 (l)bracerightbig= 1T2 T?1summationdisplay n1=0 T?1summationdisplay n2=0 EbraceleftbighH (n1;l)h(n2;l)bracerightbige?j?q1n1ej?q2n2. By defining Rh (n1 ?n2;l) := EbraceleftbighH (n1;l)h(n2;l)bracerightbig and setting ? := n1 ?n2, we have EbraceleftbighHq1 (l)hq2 (l)bracerightbig= 1T T?1summationdisplay ?=?(T?1) Rh (?;l)e?j?q1? ? ?1T min(T?1,T?1??)summationdisplay n2=max(0,??) ej(?q2??q1)n2 ? ?. Note that vextendsinglevextendsingle vextendsinglevextendsingle vextendsinglevextendsingle 1 T min(T?1,T?1??)summationdisplay n2=max(0,??) ej(?q2??q1)n2 ? 1T T?1summationdisplay n2=0 ej(?q2??q1)n2 vextendsinglevextendsingle vextendsinglevextendsingle vextendsinglevextendsingle? |?| T . 101 Since the coherence time of the channel is limited, we can select a number Tcoh such that |Rh (n1 ?n2;l)| ? 0 for |?|>Tcoh. Therefore, 1 T min(T?1,T?1??)summationdisplay n2=max(0,??) ej(?q2??q1)n2 = ?(q1 ?q2)+O parenleftbigg1 T parenrightbigg . This fact leads to EbraceleftbighHq1 (l)hq2 (l)bracerightbig? 0 (4.49) for q1 negationslash= q2 and ?large? T. Omitting the O(1/T) term, R2 can be rewritten as R2 = Lsummationdisplay l=0 E ? ?? ?? T?Q+12summationdisplay q1=Q+12 T?Q+12summationdisplay q2=Q+12 hHq1 (l)hq2 (l) ? ?? ???(q1 ?q2)?(m1 ?m2) ? P?1summationdisplay m1=0 P?1summationdisplay m2=0 c?m1cm2ej(?m1??m2)l = Lsummationdisplay l=0 E ? ?? ?? T?Q+12summationdisplay q=Q+12 hHq (l)hq (l) ? ?? ?? P?1summationdisplay m=0 |cm|2 = MSE2 P?1summationdisplay m=0 |cm|2. For the first part of R, by using (4.49) R1 = Lsummationdisplay l1=0 Lsummationdisplay l2=0 E ? ? ? Qsummationdisplay q1=1 Qsummationdisplay q2=1 bracketleftBig? hHq1 (l1)?hHq1 (l1) bracketrightBigbracketleftBig? hq2 (l2)?hq2 (l2) bracketrightBig ? ? ? ? P?1summationdisplay m1=0 P?1summationdisplay m2=0 c?m1cm2ej(?m1l1??m2l2) bracketleftBigg 1 T T?1summationdisplay n=0 ej(?q2??q1??m1+?m2)n bracketrightBigg = Lsummationdisplay l1=0 Lsummationdisplay l2=0 E ? ?? ?? Q?1 2summationdisplay q1=?Q?12 Q?1 2summationdisplay q2=?Q?12 bracketleftBig? hHq1 (l1)?hHq1 (l1) bracketrightBigbracketleftBig? hq2 (l2)?hq2 (l2) bracketrightBig ? ?? ?? 102 0 5 10 15 20 25 30?20 ?18 ?16 ?14 ?12 ?10 ?8 ?6 ?4 ?2 0 SNR (dB) Normalized Channel MSE (dB) Estimation variance: L=2, T=399, Ts=25?s, TIR=0.3, P=7, fd=0Hz, 500 runs. SI&CE: K=N=1 SI&OP: K=N=1 SI&DPS: K=N=1 SI&CE: K=N=2 SI&DPS: K=N=2 MSE1: K=N=1 MSE: K=N=1 MSE1: K=N=2 MSE: K=N=2 Figure 4.1: Estimation variance: NCMSE vs SNR under fd = 0Hz (time-invariant). The curves for CE-, OP- and DPS-BEM?s completely overlap, since the three basis functions are all constant for time-invariant channels (Q = 1). (SI: superimposed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM; MSE1: defined in (4.2); MSE: defined in (4.33).) ? P?1summationdisplay m1=0 P?1summationdisplay m2=0 c?m1cm2ej(?m1l1??m2l2)?(q1 ?q2)?(m1 ?m2) = E ? ? ? bracketleftBigg Lsummationdisplay l=0 parenleftBig? Hl ?Hl parenrightBig e?j?ml bracketrightBiggHbracketleftBigg Lsummationdisplay l=0 parenleftBig? Hl ?Hl parenrightBig e?j?ml bracketrightBigg?? ? P?1summationdisplay m=0 |cm|2. Define B(m) := bracketleftbigg 1 e?j?m ??? e?j?mL bracketrightbigg , 103 0 5 10 15 20 25 30?20 ?18 ?16 ?14 ?12 ?10 ?8 ?6 ?4 SNR (dB) Normalized Channel MSE (dB) Estimation variance: L=2, T=399, Ts=25?s, TIR=0.3, P=7, fd=50Hz, 500 runs. SI&CE: K=N=1 SI&OP: K=N=1 SI&DPS: K=N=1 SI&CE: K=N=2 SI&DPS: K=N=2 MSE1: K=N=1 MSE: K=N=1 MSE1: K=N=2 MSE: K=N=2 Figure 4.2: Estimation variance: NCMSE vs SNR under fd = 50Hz. (SI: superimposed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM; MSE1: defined in (4.2); MSE: defined in (4.33).) then R1 becomes R1 = E braceleftbiggbracketleftBig B(m)?INQ parenleftBig? H?H parenrightBigbracketrightBigHbracketleftBig B(m)?INQ parenleftBig? H?H parenrightBigbracketrightBigbracerightbiggP?1summationdisplay m=0 |cm|2 = E braceleftbiggparenleftBig ?H?HparenrightBigH CHCparenleftBig?H?HparenrightBig bracerightbigg = (L+1)? 2 h? 2 b +? 2v T (L+1)NQ. Thus R =(L+1)? 2 h? 2 b +? 2v T (L+1)NQ+MSE2? 2 c, and ??2w = ?2b MSEc +N?2v + (L+1)? 2 h? 2 b +? 2v T (L+1)NQ+MSE2? 2 c. 104 0 5 10 15 20 25 30?20 ?18 ?16 ?14 ?12 ?10 ?8 ?6 ?4 ?2 0 SNR (dB) Normalized Channel MSE (dB) Estimation variance: L=2, T=399, Ts=25?s, TIR=0.3, P=7, fd=100Hz, 500 runs. SI&CE: K=N=1 SI&OP: K=N=1 SI&DPS: K=N=1 SI&CE: K=N=2 SI&DPS: K=N=2 MSE1: K=N=1 MSE: K=N=1 MSE1: K=N=2 MSE: K=N=2 Figure 4.3: Estimation variance: NCMSE vs SNR under fd = 100Hz. (SI: superimposed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM; MSE1: defined in (4.2); MSE: defined in (4.33).) From an equalization viewpoint, based on (4.37)?(4.39), an equivalent SNR for data detec- tion is defined as SNRd (Q) = ?? 2xs (Q) ??2w (Q) = ?2b bracketleftbigMSE1 +(L+1)N?2h ?MSE2bracketrightbig ?2b MSEc +N?2v + (L+1)?2h?2b+?2vT (L+1)NQ+MSE2?2c . (4.50) We pick Q to maximize SNRd (Q) as we expect the detection performance to improve with increasing SNRd (Q). 105 0 5 10 15 20 25 30?20 ?18 ?16 ?14 ?12 ?10 ?8 ?6 ?4 ?2 0 SNR (dB) Normalized Channel MSE (dB) Estimation variance: L=2, T=399, Ts=25?s, TIR=0.3, P=7, fd=200Hz, 500 runs. SI&CE: K=N=1 SI&OP: K=N=1 SI&DPS: K=N=1 SI&CE: K=N=2 SI&DPS: K=N=2 MSE1: K=N=1 MSE: K=N=1 MSE1: K=N=2 MSE: K=N=2 Figure 4.4: Estimation variance: NCMSE vs SNR under fd = 200Hz. (SI: superimposed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM; MSE1: defined in (4.2); MSE: defined in (4.33).) 4.6 Simulation Examples 4.6.1 Performance Analysis for the First-Order Statistics-Based Estimator Inthis example, weexplore the variance ofchannel estimation ofthe first-orderstatistics- based estimator. Simulation results are compared with theoretical results to show the va- lidity of our analysis. We employ the same model for simulation as in Section 3.6, i.e., a doubly-selective Rayleigh fading channel with L = 2, satisfying modified Jakes? model. Performance of the channel estimation for both SISO (K = N = 1) and MIMO (K = N = 2) channels is investigated. One more time, we emphasize that BEM?s are only used for processing at the receiver; random channels are generated using Jakes? model, not BEM representations. 106 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110 ?5 10?4 10?3 10?2 10?1 100 ? Bit Error Rate Kalman filter: K=N=1, L=2, T=399, Ts=25?s, P=7, SNR=0,10,20,30dB, fd=0Hz, 1000 runs. SI&CE SI&OP SI&DPS Figure 4.5: Training power allocation: BER vs ? under fd = 0Hz (time-invariant). The curves for CE-, OP- and DPS-BEM?s completely overlap, since the three basis functions are all constant for time-invariant channels (Q = 1). (SI: superimposed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) In simulations, we pick a data record length of 399 symbols (time duration of approxi- mately 10ms). We consider the system operating under different Doppler spreads. For the Doppler spreads fd = 0, 50, 100, and 200Hz, we take the number of basis functions Q = 1, 3, 5, and 7 for CE-BEM, and Q = 1, 3, 4, and 6 for OP- and DPS-BEM representations. The average transmitted power in {c(n)} is 0.3 of that in {b(n)}, leading to TIR = 0.3. In the single-user scenario, the information sequences {b(n)} and the training sequences {c(n)} are all BPSK modulated. The periodic training sequence {c(n)} is generated from the m-sequence of period P = 7, one period of which is given by (3.82). For the MIMO (multiple-user) case with K = N = 2, all the users have the same transmitted power in training and information data. The average transmitted power in 107 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110 ?4 10?3 10?2 10?1 100 ? Bit Error Rate Kalman filter: K=N=1, L=2, T=399, Ts=25?s, P=7, SNR=0,10,20,30dB, fd=50Hz, 1000 runs. SI&CE SI&OP SI&DPS Figure 4.6: Training power allocation: BER vs ? under fd = 50Hz. (SI: superimposed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) {ck (n)} is 0.3 of that in {bk (n)} (k = 1,2,...,K). The information sequences {bk(n)} and the training sequences {ck(n)} are also both BPSK modulated. The training sequence is generated from the above m-sequence of period ?P = 7 by the procedure we introduced in Section 3.5. The training sequences are of length P = 14, and the training sequence for the first user is given by (3.84). To explore different estimators in equal conditions, we assume the additive noise{v(n)} is zero-mean (i.e., m = 0), white complex-Gaussian, uncorrelated with {b(n)} withE{v(n+ ?)vH(n)} = ?2vIN?(?), so that no terms are discarded. The (receiver) SNR refers to the energy per bit per user over one-sided noise spectral density with both information and superimposed training sequence counting toward the bit energy. The results for SISO and MIMO scenarios are shown in Figures 4.1?4.4, for various Doppler spreads and SNR?s, based on 500 Monte Carlo runs. 108 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110 ?3 10?2 10?1 100 ? Bit Error Rate Kalman filter: K=N=1, L=2, T=399, Ts=25?s, P=7, SNR=0,10,20,30dB, fd=100Hz, 1000 runs. SI&CE SI&OP SI&DPS Figure 4.7: Training power allocation: BER vs ? under fd = 100Hz. (SI: superimposed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) In each figure, performances of CE-, OP-, and DPS-BEM-based estimators using first- order statistics are shown for the SISO case, together with CE- and DPS-BEM-based MIMO estimators. The normalized channel MSE in simulation is defined as (3.83) for the SISO channel and (3.85) for the MIMO channel. Simulation results are compared with the theo- retical analysis of the ?pure? estimation error MSE1 (defined in (4.2)) as well as the ?entire? error MSEc (defined in (4.33)) counting modeling error. We plot MSE1 based on CE- (using (4.7)), DPS- (using (4.16)), and OP-BEM (using (4.20)), for SISO and MIMO respectively, and MSEc based only on CE-BEM (using (4.7), (4.34), and (4.35)). Figure 4.1 exhibits the normalized channel MSE for simulation and theoretical results for Doppler spreadfd = 0Hz. Note that since the channel is time-invariant, the basis sets of CE-, OP-, and DPS-BEM are the same (constant sequences), leading to the same simulated and analytical results. Since no modeling error is present, MSE1 and MSEc are equal. From 109 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110 ?3 10?2 10?1 100 ? Bit Error Rate Kalman filter: K=N=1, L=2, T=399, Ts=25?s, P=7, SNR=0,10,20,30dB, fd=200Hz, 1000 runs. SI&CE SI&OP SI&DPS Figure 4.8: Training power allocation: BER vs ? under fd = 200Hz. (SI: superimposed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) this figure, we can see that the simulation results and the analytical results agree very well, whether the channel is SISO or MIMO. For time-varying channels, the modeling error cannot be eliminated. Considering Fig- ures 4.2?4.4 corresponding to fd = 50, 100, and 200Hz, different channel models now give distinct estimation error. The CE-BEM-based estimator has the highest estimation vari- ance, and that of DPS-BEM is the lowest. For low Doppler spreads (slow fading channel), the OP-BEM-based solution has similar performance as that of the DPS-BEM-based one (see Figure 4.2), for the modeling errors are both tiny. As the Doppler spreads increases, however, the OP-BEM-based estimator deteriorates until reaching similar performance as the CE-BEM-based estimator for fd = 200Hz. In these three figures, the theoretically derived MSE1 fits the simulation results of the DPS-BEM-based estimator very well, which confirms the fact that DPS-BEM offers the smallest (usually negligible) modeling error 110 0 5 10 15 20 25 300 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 SNR (dB) Optimum ? Kalman filter (CE?BEM): K=N=1, L=2, T=399, Ts=25?s, P=7. sim.: fd=0Hz sim.: fd=50Hz sim.: fd=100Hz sim.: fd=200Hz analy.: Q=1 analy.: Q=3 analy.: Q=5 analy.: Q=7 analy. app.: Q=1 analy. app.: Q=3 analy. app.: Q=5 analy. app.: Q=7 Figure 4.9: Training power allocation: optimum? vs SNR for CE-BEM. (?sim.?: simulation results; ?analy.?: ?opt in (4.46); ?analy. app.?: ??opt in (4.48).) among the three. Counting the modeling error, the curves for MSEc (based on CE-BEM) also fit the simulation results of the CE-BEM-based estimator well. 4.6.2 Training Power Allocation Under the same settings, we now consider the issue of training power allocation for the first-order statistics-based estimator. Only the SISO scenario is considered. Figures 4.5?4.8 show the curves for BER versus different ??s (defined in (4.36)) that stands for the ratio of the power assigned to superimposed training to the total transmitted power, for different Doppler spreads. At the receiver, a Kalman filter is applied based on the estimated channel to detect the information sequence. All three BEM representations are studied. 111 0 5 10 15 20 25 300 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 SNR (dB) Optimum ? Kalman filter (OP?BEM): K=N=1, L=2, T=399, Ts=25?s, P=7. sim.: fd=0Hz sim.: fd=50Hz sim.: fd=100Hz sim.: fd=200Hz analy.: Q=1 analy.: Q=3 analy.: Q=4 analy.: Q=6 analy. app.: Q=1 analy. app.: Q=3 analy. app.: Q=4 analy. app.: Q=6 Figure 4.10: Training power allocation: optimum ? vs SNR for OP-BEM. (?sim.?: simula- tion results; ?analy.?: ?opt in (4.46); ?analy. app.?: ??opt in (4.48).) We choose the optimal ? as that corresponding to the smallest BER. As we expected, the optimal ? grows with increasing SNR?at low SNR?s, more power assigned to training for better channel estimation cannot offset the effective SNR loss for information symbols, so that we should allocate more power to information for higher effective SNR; for higher SNR, noise is no longer the key factor, so that more training power for better estimation can achieve better BER?s. Figures 4.9?4.11 compare the analytical results of the optimal ? with the simulated results based on a Kalman filter. We consider three cases: the value of ? minimizing the BER (denoted by ?sim.? in the figures), the theoretical result ?opt in (4.46) (denoted by ?analy.? in the figures), and an approximation of it in (4.48) (denoted by ?analy. app.? in the figures). Note that we select ? that minimizes the BER from Figures 4.5?4.8. The 112 0 5 10 15 20 25 300 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 SNR (dB) Optimum ? Kalman filter (DPS?BEM): K=N=1, L=2, T=399, Ts=25?s, P=7. sim.: fd=0Hz sim.: fd=50Hz sim.: fd=100Hz sim.: fd=200Hz analy.: Q=1 analy.: Q=3 analy.: Q=4 analy.: Q=6 analy. app.: Q=1 analy. app.: Q=3 analy. app.: Q=4 analy. app.: Q=6 Figure 4.11: Training power allocation: optimum ? vs SNR for DPS-BEM. (?sim.?: simu- lation results; ?analy.?: ?opt in (4.46); ?analy. app.?: ??opt in (4.48).) analytical results in (4.46) and(4.48) may producenegative solutions if the SNR is extremely low (e.g., 0dB)?we simply take ? = 0 in that case. In the three figures, the approximation solution of (4.48) agrees well with that of (4.46). It is also seen that for all the curves, the optimal ? grows as SNR increases. We do not consider the modeling error in (4.46) and (4.48). In simulations, modeling error acts as a noise term in data reception, decreasing the actual SNRd (?), so that at high SNR?s the simulated optimal ? is smaller than that of the analytical one. Due to larger modeling errors, the simulated optimal ? for CE- and OP-BEM is smaller than that of the DPS-BEM. The simulation results for DPS-BEM fit the analytical solutions well for the range of SNR from 5dB to 20dB. 113 2 4 6 8 10 12 1410 ?4 10?3 10?2 10?1 100 Q Bit Error Rate Kalman filter (CE?BEM): K=N=1, L=2, T=399, Ts=25?s, TIR=0.3, SNR=20dB, 1000 runs. fd=0Hz fd=50Hz fd=100Hz fd=200Hz Figure 4.12: Bias-variance trade-off: BER vs Q under TIR = 0.3 for different fd?s. fd = 0Hz fd = 50Hz fd = 100Hz fd = 200Hz Analytical 1 5 3 5 TIR = 0.3 Simulation 1 5 3 5 (2.9b) 1 3 3 5 Analytical 1 7 5 7 TIR = 1.0 Simulation 1 7 5 7 (2.9b) 1 3 3 5 Table 4.1: Selected optimal Q. 4.6.3 Bias-Variance Trade-Off Under the same settings, we now consider the problem of bias-variance trade-off, i.e., selecting appropriate basis functions. Only CE-BEM and SISO channels are considered. Figure 4.12 shows BER?s for different Q, the number of basis functions employed in CE-BEM, for Doppler spreads fd = 0, 50, 100, and 200Hz where we choose TIR = 0.3. At the receiver, a Kalman filter is adopted for information symbol detection. Figure 4.14 displays those of TIR = 1.0. Figures 4.13 and 4.15 show the detection SNR defined in 114 1 3 5 7 9 11 13 15?10 ?5 0 5 10 15 Q SNR d(Q) Bias?variance trade?off (CE?BEM): K=N=1, L=2, T=399, Ts=25?s, TIR=0.3, SNR=20dB. fd=0Hz fd=50Hz fd=100Hz fd=200Hz Figure 4.13: Bias-variance trade-off: SNRd (Q) (defined in (4.50)) vs Q under TIR = 0.3 for different fd?s. (4.50), as a function of Q. The agreement between the two sets of figures is very good: BER?s are minimized for the same values of Q that maximize SNRd (Q). In CE-BEM, the number of basis functions is usually given by (2.9b). Using super- imposed training, TIR needs also to be considered to select Q: If more power has been assigned to superimposed training, more basis functions can be employed to get more accu- rate estimation, and vice versa. In Table 4.1, we compare the selected Q by simulation, and that maximize SNRd (Q) (denoted by ?Analytical?), and by (2.9b). Our analytical results agree with the simulation better than that given by (2.9b). 115 2 4 6 8 10 12 1410 ?4 10?3 10?2 10?1 100 Q Bit Error Rate Kalman filter (CE?BEM): K=N=1, L=2, T=399, Ts=25?s, TIR=1.0, SNR=20dB, 1000 runs. fd=0Hz fd=50Hz fd=100Hz fd=200Hz Figure 4.14: Bias-variance trade-off: BER vs Q under TIR = 1.0 for different fd?s. 4.7 Conclusions In this chapter, performance analysis of the first-order statistics-based estimator pro- posed in the previous chapter was discussed, under different BEM settings. Modeling error was also considered. We clearly showed that in this estimator, the major interference using superimposed training comes from the unknown information sequences. Power allocation and bias-variance trade-off of the first-order statistics-based estimator were also considered in this chapter, based on the results of performance analysis. We cast these optimization issues as maximization of an SNR for equalizer design. Numerical examples illustrated good agreement of our analytical results with the simulations. 116 1 3 5 7 9 11 13 15?15 ?10 ?5 0 5 10 15 Q SNR d(Q) Bias?variance trade?off (CE?BEM): K=N=1, L=2, T=399, Ts=25?s, TIR=1.0, SNR=20dB. fd=0Hz fd=50Hz fd=100Hz fd=200Hz Figure 4.15: Bias-variance trade-off: SNRd (Q) (defined in (4.50)) vs Q under TIR = 1.0 for different fd?s. 117 Chapter 5 Deterministic Maximum Likelihood (DML) Approach 5.1 Introduction By the performance analysis in Chapter 4, we can clearly see that the first-order statistics-based estimator proposed in Chapter 3 views the information sequence as interfer- ence in channel estimation, which leads to a poor received SNR. Since the training and infor- mation sequences pass through an identical channel, we exploit this fact to enhance channel estimation. Now we consider joint channel and information sequence estimation via an iterative DML approach, assuming that the noise v(n) is complex Gaussian. Convergence to a local extremum is guaranteed, and moreover, if the initial superimposed training-based solution is ?good?, the global extremum (minimum error probability sequence detector) can be achieved by the DML approach. We discuss the DML approach in this chapter. Section 5.2 deals with the single-user scenario and Section 5.3 considers the multiple-user case. Simulation examples illustrate our approach in Section 5.4, and Section 5.5 concludes this chapter. 5.2 DML Approach Using BEM Consider the first-order statistics-based channel estimator described in Chapter 3. As in (3.1) and (3.2), the SIMO channel output is given by x(n) = Lsummationdisplay l=0 h(n;l)s(n?l), (5.1) 118 and its noisy measurement is given by y(n) = x(n) +v(n). (5.2) We make the following assumptions: (H5.2.1) The time-varying channel {h(n;l)} satisfies a BEM representation as in (2.20) h(n;l) = Qsummationdisplay q=1 hq (l)?q (n) (5.3) where the basis functions {?q (n)}Qq=1 are known at the receiver. Also N ? 1. (H5.2.2) The complex Gaussian noise {v(n)} may be of unknown mean E{v(n)} = m, white, uncorrelated with {b(n)}, and E{[v(n+?)?m][v(n)?m]H} = ?2vIN?(?). We collect T ?L samples of the observations into the vector Y : = bracketleftbigg yT (T ?1) yT (T ?2) ??? yT (L) bracketrightbiggT . (5.4) Define s := bracketleftbigg s(T ?1) s(T ?2) ??? s(0) bracketrightbiggT , and let ?v(n) := v(n)?m. 119 Given the vectors of the BEM coefficients in (3.12) and (3.15), Hl := [hT1 (l), hT2 (l), ..., hTQ(l)]T, H := bracketleftbigg HT0, HT1, ..., HTL bracketrightbiggT . Define ?V := bracketleftbigg ?vT (T ?1) ?vT (T ?2) ??? ?vT (L) bracketrightbiggT , M := bracketleftbigg mT mT ??? mT bracketrightbiggT where M is of the same size as ?V, and V = ?V+M is a column-vector consisting of samples of noise {v(n)} in a manner similar to (5.4). Using (5.1)?(5.3) we have the following linear model Y = T (s)H+ ?V+M (5.5) where T (s) is a block Hankel matrix (a block Hankel matrix has identical block entries on its antidiagonals) given by T (s) := ? ?? ?? ?? ?? ?? s(T ?1)?T?1 s(T ?2)?T?1 ??? s(T ?L?1)?T?1 s(T ?2)?T?2 s(T ?3)?T?2 ??? s(T ?L?2)?T?2 ... ... ... ... s(L)?L s(L?1)?L ??? s(0)?L ? ?? ?? ?? ?? ?? , ?n := bracketleftbigg ?1 (n)IN ?2 (n)IN ... ?Q (n)IN bracketrightbigg . 120 Also using (5.1) and (5.2), an alternative linear model for Y is given by Y = F (H)s+ ?V+M (5.6) where F (H) := ? ?? ?? ?? h(T ?1;0) ??? h(T ?1;L) ... ... h(L;0) ??? h(L;L) ? ?? ?? ?? is a ?filtering matrix?. Consider (5.1), (5.2), and (5.5). Under the assumption of temporally white complex Gaussian measurement noise, consider the joint estimation braceleftBig? H,?s, ?m bracerightBig = arg min H,s,m bardblY?T (s)H?Mbardbl2, (5.7) where ?s is the estimate of s. We follow a DML approach assuming no statistical model for the input sequence {s(n)}. Under a white Gaussian noise assumption, the DML estimates are obtained by the nonlinear LS optimization (5.7). Using (5.5) and (5.6), we have a separable nonlinear LS problem that can be solved sequentially as (joint optimization with respect to H and m can be further ?separated?) braceleftBig? H,?s, ?m bracerightBig = argmins braceleftbigg min H,m bardblY?T (s)H?Mbardbl2 bracerightbigg = argmin H,m braceleftBig mins bardblY?F (H)s?Mbardbl bracerightBig . 121 The finite alphabet properties of the information sequences can also be incorporated into the DML methods. These algorithms, first proposed by [68] and also applied in [50], iterate between estimates of the channel and the input sequences. At iteration i, with an initial guess of the channel H(i) and the mean m(i), the algorithm estimates the input sequence s(i) and the channel H(i+1) and mean m(i+1) for the next iteration by s(i) = argmin s?S vextenddoublevextenddouble vextenddoubleY?F parenleftBig H(i) parenrightBig s?M(i) vextenddoublevextenddouble vextenddouble 2, (5.8a) H(i+1) = argmin H vextenddoublevextenddouble vextenddoubleY?T parenleftBig s(i) parenrightBig H?M(i) vextenddoublevextenddouble vextenddouble 2, (5.8b) m(i+1) = argminm vextenddoublevextenddouble vextenddoubleY?T parenleftBig s(i) parenrightBig H(i+1)?M vextenddoublevextenddouble vextenddouble 2 (5.8c) where S is the (discrete) domain of s. The optimizations in (5.8b) and (5.8c) are linear LS problems whereas the optimization in (5.8a) can be achieved by using the Viterbi al- gorithm. Since the above iterative procedure involving (5.8a)?(5.8c) decreases the cost at every iteration, one achieves a local minimum of the nonlinear LS cost (local maximum of DML function). The maximum likelihood estimation of the noise mean in the optimization (5.8c) may be obtained by letting ?vextenddoublevextenddoubleY?T parenleftbigs(i)parenrightbigH(i+1)?Mvextenddoublevextenddouble2 ?m vextendsinglevextendsingle vextendsinglevextendsingle vextendsinglem=m(i+1) = 0, which yields m(i+1) = 1T ?L T?1summationdisplay n=L bracketleftBigg y(n)? Lsummationdisplay l=0 h(i+1) (n;l)s(i) (n?l) bracketrightBigg . 122 We now summarize our DML approach: 1. a. Use the first-order statistics-based approach described in Chapter 3 to estimate the channel. Denote the estimate of the channel coefficients by ?H(1) and ?h(1)q (l). In this method {c(n)} is known and {b(n)} is regarded as interference. (If we assume m = 0 or known, the following steps to estimate m should be skipped and simply set ?m = 0 or the known value.) b. Estimate the mean ?m(1) as ?m(1) := 1T T?1summationdisplay n=0 bracketleftBigg y(n)? Lsummationdisplay l=0 ?h(1) (n;l)c(n?l) bracketrightBigg (5.9) where ?h(1) (n;l) :=summationtextQq=1 ?h(1)q (l)?q (n) is given by (5.3). c. Design a Viterbi sequence detector (see Appendix B.1) to estimate {s(n)} as {?s(n)} using the estimated channel ?H(1), mean ?m(1), and cost function in (5.8a) with i = 1. Note that knowledge of {c(n)} is used in s(n) = b(n) + c(n), therefore, we are in essence estimating {b(n)}. 2. a. Substitute ?s(n) for s(n) in (5.1) and use the corresponding formulation in (5.5) to estimate the channel H as ?H(2) = T ?(?s)bracketleftBigY? ?M(1)bracketrightBig. Following (5.9) the mean m is estimated as ?m(2) for i = 1. b. Design a Viterbi sequence detector using the estimated channel ?H(2), the mean ?m(2), and cost (5.8a) with i = 2, as in Step 1.c. 123 3. Step 2 provides one iteration of (5.8a)?(5.8c). Repeat a few times till any (relative) improvement in channel estimation over previous iteration is below a pre-specified threshold. Since the Viterbi detector used in the proposed DML approach is computationally burdensome, we can replace it with a Kalman filter with hard decisions to expedite the iterations?at the expense of a little BER loss. This iterative method can follow these steps: 1. a. As Step 1.a of the DML approach. b. As Step 1.b of the DML approach. c. Design a Kalman filter (see Appendix B.2) of delay d to estimate {s(n)} as {?s(n)} using the estimated channel ?H(1) and mean ?m(1). Quantize {?s(n)} into {?s(n)} withthe knowledge ofthe symbolalphabet (harddecisions). Note that knowledge of {c(n)} is used in s(n) = b(n) +c(n), therefore, we are in essence estimating {b(n)}. 2. a. As Step 2.a of the DML approach. b. Design a Kalman filter using the estimated channel ?H(2), the mean ?m(2), as in Step 1.c. 3. Step 2 provides one iteration of our proposed iterative method. Repeat a few times till any (relative) improvement in channel estimation over previous iteration is below a pre-specified threshold. 124 5.3 DML Approach: Multiple-User (MIMO) Channels In this section we extend the DML approach to multiple-user (MIMO) channels corre- sponding to the estimator described in Section 3.5. We collect T ?L samples of the observations to form the N(T ?L)-column vector as in (5.4) Y : = bracketleftbigg yT(T ?1) yT(T ?2) ??? yT(L) bracketrightbiggT and the KT-column vector s : = bracketleftbigg s1(T ?1) ??? sK(T ?1) ??? s1(0) ??? sK(0) bracketrightbiggT . (5.10) Define the N ?NQ matrix ?n := bracketleftbigg ?1 (n)IN ?2 (n)IN ??? ?Q (n)IN bracketrightbigg , the N(T ?L)?NQ(L+1)K matrix T (s) := ? ?? ?? ?? ?? ?? s1(T ?1)?T?1 ??? s1(T ?L?1)?T?1 ??? sK(T ?L?1)?T?1 s1(T ?2)?T?2 ??? s1(T ?L?2)?T?2 ??? sK(T ?L?2)?T?2 ... ... ... ... ... s1(L)?L ??? s1(0)?L ??? sK(0)?L ? ?? ?? ?? ?? ?? 125 the NQ(L+1)K-column vector (by (3.77)) H := bracketleftbigg HT1 HT2 ??? HTK bracketrightbiggT , and the N(T ?L)-column vector ?V : = bracketleftbigg ?vT(T ?1), ?vT(T ?2), ..., ?vT(L) bracketrightbiggT where ?v(n) := v(n)?m. We also have the following linear model Y = T (s)H+ ?V+M (5.11) where M := bracketleftbigg mT mT ??? mT bracketrightbiggT . We further define the N(T ?L)?K(L+1) matrix F(H) := ? ?? ?? ?? h1(T ?1;0) ??? hK(T ?1;0) ??? hK(T ?1;L) ... ... ... h1(L;0) ??? h1(L;L) ??? hK(L;L) ? ?? ?? ?? and obtain another linear model as follows Y = F(H)s+ ?V +M. (5.12) 126 By (5.11) and (5.12), the DML approach described in Section 5.2 can be followed. Under the assumption of white complex Gaussian measurement noise, the joint estimators of the relevant parameters are given by the following nonlinear optimization problem { ?H,?s, ?m} = arg braceleftbigg min H,s,m bardblY?T (s)?Mbardbl2 bracerightbigg = arg braceleftbigg min H,s,m bardblY?F (H)?Mbardbl2 bracerightbigg . We also follow the DML approach assuming no statistical model for the input sequences {sk(n)}. We may separate the nonlinear LS problem sequentially as { ?H,?s, ?m} = argmins {min H,m ||Y?T (s)H?M||2} = argmin H,m {mins ||Y?F(H)s?M||2}. At iterationi, with an initial guess of the channelH(i) and the mean m(i), the algorithm estimates the input sequence s(i) and the channel H(i+1) and mean m(i+1) for the next iteration by s(i) = argmin s?S ||Y ?C(H(i))s?M(i)||2, (5.13a) H(i+1) = argmin H ||Y ?T (s(i))H?M(i)||2, (5.13b) m(i+1) = argminm ||Y ?T (s(i))H(i+1) ?M||2. (5.13c) We now summarize our MIMO DML approach: 127 0 5 10 15 20 25 3010 ?6 10?5 10?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=0Hz, 500 runs. SI&DPS: step 1 SI&DPS: 1st iter. SI&DPS: 2nd iter. SI&DPS: 3rd iter. SI&CE: step 1 SI&CE: 1st iter. SI&CE: 2nd iter. SI&CE: 3rd iter. TM&DPS TM&CE Figure 5.1: DML approach (SISO): BER vs SNR under fd = 0Hz (time-invariant) and K = N = 1. The curves for CE- and DPS-BEM?s completely overlap, since the two basis functions are both constant for time-invariant channels (Q = 1). (SI: superimposedtraining; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) 1. a. Use (3.80) and (3.81) to estimate the channel. Denote the channel estimates by ?H(1)k and ?h(1)k (n;l). In this method {ck(n)} is known and {bk(n)} is regarded as interference. b. The noise mean m is estimated as ?m(1) = 1T Tsummationdisplay n=1 bracketleftBigg y(n)? Ksummationdisplay k=1 Lsummationdisplay l=0 ?h(1)k (n;l)ck(n?l) bracketrightBigg . (5.14) c. Design a Viterbi sequence detector (see Appendix B.1) to estimate {sk(n)} as {?sk(n)} using the estimated channel ?H(1), mean ?m(1) and cost (5.13a) withi= 1. 128 0 5 10 15 20 25 3010 ?6 10?5 10?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=50Hz, 500 runs. SI&DPS: step 1 SI&DPS: 1st iter. SI&DPS: 2nd iter. SI&DPS: 3rd iter. SI&CE: step 1 SI&CE: 1st iter. SI&CE: 2nd iter. SI&CE: 3rd iter. TM&DPS TM&CE Figure 5.2: DML approach (SISO): BER vs SNR under fd = 50Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) Note that knowledge of {ck(n)} is used in sk(n) = bk(n) +ck(n), therefore, we are in essence estimating bk(n) in the Viterbi detector. 2. a. Substitute ?sk(n) for sk(n) in (5.10) and use the corresponding formulation in equation (5.13b) to estimate the time-invariant channel coefficient matrix H as ?H(2) = T ?(?s)bracketleftBigY? ?M(1)bracketrightBig and estimate the time-varying channel as ?h(2)k (n;l) using (3.81). The mean m is estimated as ?m(2) using (5.14) with ?h(1)k (n;l) replaced with ?h(2)k (n;l). 129 0 5 10 15 20 25 3010 ?6 10?5 10?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=100Hz, 500 runs. SI&DPS: step 1 SI&DPS: 1st iter. SI&DPS: 2nd iter. SI&DPS: 3rd iter. SI&CE: step 1 SI&CE: 1st iter. SI&CE: 2nd iter. SI&CE: 3rd iter. TM&DPS TM&CE Figure 5.3: DML approach (SISO): BER vs SNR under fd = 100Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) b. Design a Viterbi sequence detector using the estimated channel ?H(2), mean ?m(2) and cost (5.13a) with i= 2, as in Step 1.c. 3. Step 2 provides one iteration of (5.13a)?(5.13c). Repeat a few times until reaching the desired point. An approximation of the MIMO DML approach by replacing Viterbi detector with multiple-user Kalman filter is given by the following steps: 1. a. As Step 1.a of the MIMO DML approach. b. As Step 1.b of the MIMO DML approach. 130 0 5 10 15 20 25 3010 ?5 10?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=200Hz, 500 runs. SI&DPS: step 1 SI&DPS: 1st iter. SI&DPS: 2nd iter. SI&DPS: 3rd iter. SI&CE: step 1 SI&CE: 1st iter. SI&CE: 2nd iter. SI&CE: 3rd iter. TM&DPS TM&CE Figure 5.4: DML approach (SISO): BER vs SNR under fd = 200Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) c. Design a multiple-user Kalman filter (see Appendix B.2) of delay d to estimate {sk (n)} as {?sk (n)} using the estimated channel ?H(1) and mean ?m(1). Quan- tize {?sk (n)} into {?sk (n)} with the knowledge of the symbol alphabet (hard decisions). Note that knowledge of {ck (n)} is used in sk (n) = bk (n) +ck (n), therefore, we are in essence estimating {bk (n)}. 2. a. As Step 2.a of the DML approach. b. Design a multiple-user Kalman filter using the estimated channel ?H(2), the mean ?m(2), as in Step 1.c. 131 0 5 10 15 20 25 30?55 ?50 ?45 ?40 ?35 ?30 ?25 ?20 ?15 ?10 SNR (dB) Normalized Channel MSE (dB) Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=0Hz, 500 runs. SI&DPS: step 1 SI&DPS: 1st iter. SI&DPS: 2nd iter. SI&DPS: 3rd iter. SI&CE: step 1 SI&CE: 1st iter. SI&CE: 2nd iter. SI&CE: 3rd iter. TM&DPS TM&CE Figure 5.5: DML approach (SISO): NCMSE vs SNR under fd = 0Hz (time-invariant) and K = N = 1. The curves for CE- and DPS-BEM?s completely overlap, since the two basis functions are both constant for time-invariant channels (Q = 1). (SI: superimposedtraining; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) 3. Step 2 provides one iteration of our proposed iterative method. Repeat a few times till any (relative) improvement in channel estimation over previous iteration is below a pre-specified threshold. 5.4 Simulation Examples 5.4.1 DML Approach: Single User In this example, we adopt the same simulation conditions as in Section 3.6.1 to perform a comparison between the first-order statistics-based estimator in Chapter 3 and our DML 132 0 5 10 15 20 25 30?45 ?40 ?35 ?30 ?25 ?20 ?15 ?10 ?5 SNR (dB) Normalized Channel MSE (dB) Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=50Hz, 500 runs. SI&DPS: step 1 SI&DPS: 1st iter. SI&DPS: 2nd iter. SI&DPS: 3rd iter. SI&CE: step 1 SI&CE: 1st iter. SI&CE: 2nd iter. SI&CE: 3rd iter. TM&DPS TM&CE Figure 5.6: DML approach (SISO): NCMSE vs SNR under fd = 50Hz and K = N = 1. (SI: superimposedtraining; TM: time-multiplexed training; CE:CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) approach. We generate a doubly-selective Rayleigh fading channel following Jakes? model with N = 1 and L = 2. In simulations, we pick a data record length of 420 symbols (time duration of approx- imately 10ms). We consider the system operating under different Doppler spreads with different number of basis functions Q. For the Doppler spreadsfd = 0, 50, 100, and 200Hz, we take Q = 1, 3, 5, and 7 for the CE-BEM-based solution, and Q = 1, 3, 4, and 6 for the DPS-BEM representations. The average transmitted power in {c(n)} is 0.3 of that in {b(n)}, leading to TIR of 0.3. 133 0 5 10 15 20 25 30?40 ?35 ?30 ?25 ?20 ?15 ?10 ?5 SNR (dB) Normalized Channel MSE (dB) Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=100Hz, 500 runs. SI&DPS: step 1 SI&DPS: 1st iter. SI&DPS: 2nd iter. SI&DPS: 3rd iter. SI&CE: step 1 SI&CE: 1st iter. SI&CE: 2nd iter. SI&CE: 3rd iter. TM&DPS TM&CE Figure 5.7: DML approach (SISO): NCMSE vs SNR under fd = 100Hz and K = N = 1. (SI: superimposedtraining; TM: time-multiplexed training; CE:CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) We first consider a single-user scenario. The information sequences {b(n)} and the training sequences {c(n)} are all BPSK modulated. The periodic training sequence {c(n)} is generated from the m-sequence of period P = 7, one period of which is given by (3.82). To explore different estimators and their iterative versions under identical conditions, we assume the additive noise {v(n)} is zero-mean (i.e., m = 0), white complex-Gaussian, uncorrelated with {b(n)} with E{v(n +?)vH(n)} = ?2vIN?(?), so that no terms are dis- carded in the first-order statistics-based estimator, the first step of our DML approach. The (receiver) SNR refers to the energy per bit over one-sided noise spectral density with both information and superimposed training sequence counting toward the bit energy. At the receiver, a Viterbi detector is used for data reception. 134 0 5 10 15 20 25 30?40 ?35 ?30 ?25 ?20 ?15 ?10 ?5 0 SNR (dB) Normalized Channel MSE (dB) Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=200Hz, 500 runs. SI&DPS: step 1 SI&DPS: 1st iter. SI&DPS: 2nd iter. SI&DPS: 3rd iter. SI&CE: step 1 SI&CE: 1st iter. SI&CE: 2nd iter. SI&CE: 3rd iter. TM&DPS TM&CE Figure 5.8: DML approach (SISO): NCMSE vs SNR under fd = 200Hz and K = N = 1. (SI: superimposedtraining; TM: time-multiplexed training; CE:CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) The results for a record length of T = 420 symbols are shown in Figures 5.1?5.8 for various Doppler spreads and SNR?s. The results are based on 500 Monte Carlo runs. To compare with other possible approaches, CE- and DPS-BEM-based TM training described in Appendix A is also considered for doubly-selective channel estimation. Training sessions are periodically inserted in the transmitted symbol frame. We take a training session of length of 2L+ 1 = 5 symbols with the training sequence braceleftbig0,0,?2L+1,0,0bracerightbig, and at the receiver an LS estimation is performed. A data session of 18 symbols is inserted between two successive training sessions to form a frame of length 23 symbols. Such a frame is repeated over a record length of 418 symbols. Thus, we have a training-to-information bit and power ratio of about 0.3. 135 0 5 10 15 20 25 3010 ?6 10?5 10?4 10?3 10?2 10?1 SNR (dB) Bit Error Rate Viterbi detector: K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=0Hz, 500 runs. SI&DPS: step 1 SI&DPS: 1st iter. SI&DPS: 2nd iter. SI&DPS: 3rd iter. SI&CE: step 1 SI&CE: 1st iter. SI&CE: 2nd iter. SI&CE: 3rd iter. TM&DPS TM&CE Figure 5.9: DML approach (MIMO): BER vs SNR under fd = 0Hz (time-invariant) and K = N = 2. The curves for CE- and DPS-BEM?s completely overlap, since the two basis functions are both constant for time-invariant channels (Q = 1). (SI: superimposedtraining; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) For comparison, we plot the results of the CE- and DPS-BEM-based superimposed training schemes (denoted as SI in the figures), including the first-order statistics-based estimator (denoted as ?step 1? in the figures), and the DML approach after one, two, and three iterations (denoted as ?1st iter.?, ?2nd iter.?, and ?3rd iter.? in the figures), and TM training approaches (denoted as TM in the figures). Figures 5.1?5.4 show BER?s for fd = 0, 50, 100, and 200 respectively. Figures 5.5?5.8 show the normalized channel MSE correspondingly, which is defined as (3.83). From the eight figures, we can see that after iterations, superimposed training-based estimation and detection performances improve a lot, because the information data that 136 0 5 10 15 20 25 3010 ?6 10?5 10?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate Viterbi detector: K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=50Hz, 500 runs. SI&DPS: step 1 SI&DPS: 1st iter. SI&DPS: 2nd iter. SI&DPS: 3rd iter. SI&CE: step 1 SI&CE: 1st iter. SI&CE: 2nd iter. SI&CE: 3rd iter. TM&DPS TM&CE Figure 5.10: DML approach (MIMO): BER vs SNR underfd = 50Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) are viewed as interference by the first-order statistics-based estimator are now exploited to enhance the channel estimation for the next iteration. Therefore, the self-interference is efficiently removed after iterations. The DML approach provides comparable error perfor- mance with TM training, but at a higher data transmission rate. The valuable bandwidth resources can thus be saved by using iterative estimation, at the expense of increased com- putational complexity. Since we assume the channel satisfies a BEM, the modeling error of the prescribed BEM sets a limit for the estimation performance. Except for fd = 0Hz, the DPS-BEM is much more accurate in describing a band-limited time-varying channel, so that it provides much better estimation performance than the CE-BEM. For both BER and NCMSE curves, 137 0 5 10 15 20 25 3010 ?6 10?5 10?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate Viterbi detector: K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=100Hz, 500 runs. SI&DPS: step 1 SI&DPS: 1st iter. SI&DPS: 2nd iter. SI&DPS: 3rd iter. SI&CE: step 1 SI&CE: 1st iter. SI&CE: 2nd iter. SI&CE: 3rd iter. TM&DPS TM&CE Figure 5.11: DML approach (MIMO): BER vs SNR under fd = 100Hz and K = N = 2. (SI: superimposedtraining; TM: time-multiplexed training; CE:CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) the error floors of the DPS-BEM-based solutions are lower than that of the CE-BEM-based ones. As the Doppler spread fd increases, the performance of the DML-based superimposed training deteriorates compared with the TM training. Figures 5.4 and 5.8 for fd = 200Hz clearly show this result. It is partially because the first-order statistics-based estimator performs worse for fast fading channels, since more basis functions involved to represent the channel result in higher estimation variance. More iterations are required to approach the performance of TM training. 138 0 5 10 15 20 25 3010 ?6 10?5 10?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate Viterbi detector: K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=200Hz, 500 runs. SI&DPS: step 1 SI&DPS: 1st iter. SI&DPS: 2nd iter. SI&DPS: 3rd iter. SI&CE: step 1 SI&CE: 1st iter. SI&CE: 2nd iter. SI&CE: 3rd iter. TM&DPS TM&CE Figure 5.12: DML approach (MIMO): BER vs SNR under fd = 200Hz and K = N = 2. (SI: superimposedtraining; TM: time-multiplexed training; CE:CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) 5.4.2 DML Approach: Multiple Users In this example, we follow the settings in Section 5.4.1 except that a multiple-user scenario is considered. It can also be viewed as an extension of Section 3.6.2, for now the iterative DML approach based on the multiple-user channel estimator using the first-order statistics is considered. Insimulations, we assume that all the usershave the same transmitted power in training and information data. The average transmitted power in {ck (n)} is 0.3 of that in {bk (n)} (k = 1,2,...,K), leading to the same TIR as in Section 5.4.1. We consider a simple two-user scenario, i.e., K = 2, each user with two receive antennas, i.e., N = 2. The information sequences{bk(n)}andthe training sequences{ck(n)}are all BPSKmodulated. The training 139 0 5 10 15 20 25 30?55 ?50 ?45 ?40 ?35 ?30 ?25 ?20 ?15 ?10 SNR (dB) Normalized Channel MSE (dB) Viterbi detector: K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=0Hz, 500 runs. SI&DPS: step 1 SI&DPS: 1st iter. SI&DPS: 2nd iter. SI&DPS: 3rd iter. SI&CE: step 1 SI&CE: 1st iter. SI&CE: 2nd iter. SI&CE: 3rd iter. TM&DPS TM&CE Figure 5.13: DML approach (MIMO): NCMSE vs SNR under fd = 0Hz (time-invariant) and K = N = 2. The curves for CE- and DPS-BEM?s completely overlap, since the two basis functions are both constant for time-invariant channels (Q = 1). (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) sequence is generated from the m-sequence of period ?P = 7 by the procedure we introduced in Section 3.5. The additive noise {v(n)} is also zero-mean, white complex-Gaussian, uncorrelated with {bk(n)} with E{v(n+?)vH(n)} = ?2vI2?(?). The (receiver) SNR refers to the energy per bit per user over one-sided noise spectral density with both information and superim- posed training sequence counting toward the bit energy. At the receive end, a Viterbi detector is usedforsymbol detection. We considerdifferent Doppler spreads of fd = 0, 50, 100, and 200Hz for this communications system. We also pick Q of CE-BEM as 1,3,5,7 and DPS-BEM as 1,3,4,6. 140 0 5 10 15 20 25 30?45 ?40 ?35 ?30 ?25 ?20 ?15 ?10 ?5 SNR (dB) Normalized Channel MSE (dB) Viterbi detector: K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=50Hz, 500 runs. SI&DPS: step 1 SI&DPS: 1st iter. SI&DPS: 2nd iter. SI&DPS: 3rd iter. SI&CE: step 1 SI&CE: 1st iter. SI&CE: 2nd iter. SI&CE: 3rd iter. TM&DPS TM&CE Figure 5.14: DML approach (MIMO): NCMSE vs SNR under fd = 50Hz and K = N = 2. (SI: superimposedtraining; TM: time-multiplexed training; CE:CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) The results for a record length ofT = 420 symbols are shown in Figures 5.9?5.16 for var- ious Doppler spreads and SNR?s. The results are based on 500 Monte Carlo runs. For com- parison, CE-BEM and DPS-BEM-based periodically placed TM training with zero-padding, as we described in Appendix A, is also considered. We take a training session of length (K +1)L+K = 8 symbols with the first user?s training braceleftBig 01?2,radicalbig(K +1)L+K,01?5 bracerightBig and the second user?s braceleftBig 01?5,radicalbig(K +1)L+K,01?2 bracerightBig . A data session of 27 symbols is inserted between two such training sessions to form a frame of length 35 symbols. Such a frame is repeated over a record length of 420 bits. Thus, we have a training-to-information bit and power ratio of about 0.3. At the receiver an LS estimation is performed. 141 0 5 10 15 20 25 30?40 ?35 ?30 ?25 ?20 ?15 ?10 ?5 SNR (dB) Normalized Channel MSE (dB) Viterbi detector: K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=100Hz, 500 runs. SI&DPS: step 1 SI&DPS: 1st iter. SI&DPS: 2nd iter. SI&DPS: 3rd iter. SI&CE: step 1 SI&CE: 1st iter. SI&CE: 2nd iter. SI&CE: 3rd iter. TM&DPS TM&CE Figure 5.15: DML approach (MIMO): NCMSE vs SNR underfd = 100Hz and K = N = 2. (SI: superimposedtraining; TM: time-multiplexed training; CE:CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) Figures 5.9?5.16 show similar results as in the SISO case: The DML approach enhances the channel estimation and data detection performances significantly over the first-order statistics-based estimator for a multiple-user channel; DPS-BEM well outperforms CE- BEM, so that it appears to be a good choice to approximate the time-varying channel. 5.5 Conclusions We explored the DML approach in this chapter. By exploiting the fact that training and information sequences pass through an identical channel, the iterative DML approach was used to jointly improve the channel and sequence estimation. Beginning with the first-order statistics-based channel estimator, the detected data symbols from the preceding 142 0 5 10 15 20 25 30?40 ?35 ?30 ?25 ?20 ?15 ?10 ?5 0 SNR (dB) Normalized Channel MSE (dB) Viterbi detector: K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=200Hz, 500 runs. SI&DPS: step 1 SI&DPS: 1st iter. SI&DPS: 2nd iter. SI&DPS: 3rd iter. SI&CE: step 1 SI&CE: 1st iter. SI&CE: 2nd iter. SI&CE: 3rd iter. TM&DPS TM&CE Figure 5.16: DML approach (MIMO): NCMSE vs SNR underfd = 200Hz and K = N = 2. (SI: superimposedtraining; TM: time-multiplexed training; CE:CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.) iteration are used to reduce the self-interference at the current iteration. A local maximum of DML function is guaranteed. Symbol detection techniques such as Kalman filtering can also be adopted instead of Viterbi algorithm to reduce the computational complexity in iterations?the method can be viewed as an approximation of the DML approach. 143 Chapter 6 Doubly-Selective Channel Estimation Using Data-Dependent Superimposed Training 6.1 Introduction For the first-order statistics-based channel estimator proposed in Chapter 3, the infor- mation sequence acts as interference resulting in a poor training SNR. Simulation results have shown that noticeable error floors occur in BER and channel MSE curves for this estimator. Although we can employ the DML method described in Chapter 5 to reduce the interference, DML iterations add to computational complexity and delay in symbol detection at the receiver. Inspired by the work of [20], where the training sequence is distorted according to the information data before transmission so as to eliminate the self-interference on recep- tion, we extend this data-dependent method to time-varying channels by the aid of BEM representations. For the first-order statistics-based estimator using CE- or DPS-BEM, (4.11) addresses the source of the estimation error (see Remark 4.2.2.2 for a detailed discussion): ?dmq = dmq +smq +wmq, 144 where the information sequence?s contribution, given by smq, interferes with the estimation of dmq from ?dmq, and hence with channel estimation from the observations. For the CE- BEM-based estimator, smq := 1T T?1summationdisplay n=0 braceleftBigg Lsummationdisplay l=0 h(n;l)b(n?l) bracerightBigg e?j(?q+?m)n, (6.1) and for the DPS-BEM-based estimator, smq := T?1summationdisplay n=0 bracketleftBigg Lsummationdisplay l=0 h(n;l)b(n?l) bracketrightBigg uq (n)e?j?mn (6.2) as in (4.12). Our goal of the data-dependent superimposed training is to null out the influence of smq on the channel estimation by transmitter-end processing. In this chapter, we focus on transmitter-end processing techniques to reduce self- interference of superimposed training. In Section 6.2, we present the data-dependent su- perimposed training based on CE-BEM; this scheme is extended to channels satisfying DPS-BEM representation in Section 6.3, where the approach of partially-data-dependent superimposed training is also proposed. Our approaches are demonstrated by simulation examples in Section 6.4, and Section 6.5 concludes this chapter. 6.2 Data-Dependent Superimposed Training Using CE-BEM We assume: (H6.2.1) The time-varying channel satisfies CE-BEM (3.3) where the frequencies?q?s (q = 1,2,...,Q) are distinct and known with ?q ? [0,2pi). Also N ? 1. 145 (H6.2.2) The information data sequence {b(n)} is zero-mean and white, with the variance E{|b(n)|2} = ?2b; (H6.2.3) Themeasurement noise{v(n)}iszero-mean, white, anduncorrelated with{b(n)}, with the autocorrelation Ebraceleftbigv(n+?)vH (n)bracerightbig= ?2vIN?(?); (H6.2.4) The superimposed training sequence c(n) = c(n+P) for all n is a non-random periodic sequence with period P and average power ?2c :=summationtextP?1n=0 |c(n)|2/P. 6.2.1 Data-Dependent Processing at the Transmitter Consider the DFT of information sequence {b(n)} over the block n = 0,1,...,T ?1, br := 1T T?1summationdisplay n=0 b(n)e?j?rn, ?r := 2pirT , (6.3) for r = 0,1,...,T ?1 and b(n) =summationtextT?1r=0 brej?rn. Then the interference smq of (6.1) can be expressed as smq = 1T T?1summationdisplay n=0 ? ? ? Qsummationdisplay q1=1 Lsummationdisplay l=0 hq1 (l)ej?q1n T?1summationdisplay r=0 brej?r(n?l) ? ? ?e ?j(?q+?m)n = Qsummationdisplay q1=1 Lsummationdisplay l=0 T?1summationdisplay r=0 bracketleftBig hq1 (l)e?j?rlbr bracketrightBigbracketleftBigg1 T T?1summationdisplay n=0 ej(?q1??q+?r??m)n bracketrightBigg = Qsummationdisplay q1=1 Lsummationdisplay l=0 T?1summationdisplay r=0 bracketleftBig hq1 (l)e?j?rlbr bracketrightBig ?((q1 ?q+r?mK)modT). Therefore, if we can make br = 0 for r = q+mK ?q1, 1 ? q,q1 ?Q, m = 0,1,...,P ?1, then smq = 0. We do so by modifying {c(n)} based on {b(n)} (at the transmitter). 146 Define a set ? := {r : ?(Q?1) +mK ?r ? (Q?1) +mK, m = 0,1,...,P ?1}. (6.4) The frequency components {br : r ? ?} of the information sequence are hence the ?self- interference?. Define a ?self-interference? sequence be (n) := summationdisplay r?? brej 2pirnT (6.5) and a data-dependent superimposed training ?c(n) over the block n = 0,1,...,T ? 1 such that ?c(n) := c(n)?be (n). (6.6) Note that {?c(n)} is no longer periodic with period P. At the transmitter, we transmit ?c(n)+b(n) = c(n)+[b(n)?be (n)]. The model (3.1)?(3.4) holds with c(n) replaced with ?c(n). By construction, the DFT of b(n)?be (n) over the block n = 0,1,...,T ? 1 vanishes at frequencies in the set ?. Also the DFT of b(n?l)?be (n?l) over the block n = 0,1,...,T ?1 vanishes at frequencies in the set ? provided that a cyclic prefix of length M ?L is used. A cyclic prefix of length M is added at the transmitter by choosing s(?i) = s(T ?i), i = 1,2,...,M ?L 147 where s(i) = ?c(i) + b(i). This allows linear convolution in (3.1) to be equal to circular convolution (implicit in the DFT operation) over the block length n= 0,1,...,T ?1. We summarize our data-dependent channel estimation solution as follows: 1. At the transmitter, we are given information sequence over a block as {b(n)} for n = 0,1,...,T ?1 with T chosen as T =KP, K ?Q. Calculate the DFT by (6.3). 2. To eliminate interference with channel estimation at the receiver, we need to set br?s to be zero for r ? ?. Define the self-interference sequence {be (n)} as in (6.5). 3. Define the data-dependent superimposed training ?c(n) as in (6.6). Use a cyclic prefix of length M ?L and transmit. The channel estimation given in (3.18) stays the same for data-dependent superimposed training, because we still use periodic {c(n)} at the receiver, and we do not know be (n) or b(n) at the receiver. It is easily established that now there is no contribution of {b(n)} to ?dmq for 0 ?m?P ?1 and 1 ?q ?Q. 6.2.2 Data Detection Now the ?information sequence? is {b(n)?be (n)} whereas we are interested in {b(n)}. We will follow an iterative solution, similar to the time-invariant results of [20]. The first step in our solution is to use the estimated channel to detect {b(n)} via Viterbi algorithm (ignoring be (n) but accounting for the known {c(n)}). Use the detected {b(n)} to estimate {be (n)}, and iterate the detection procedure (but not channel estimation) with known {c(n)} and estimate braceleftBig? be (e) bracerightBig from the previous iteration. Note that although iterations 148 are also employed, as the DML algorithm, we do not have to re-estimate the channel in the subsequent iterations. 6.2.3 Performance Analysis If the true channel follows the CE-BEM representation (3.3), the MSE in channel esti- mation is then given by (4.2). We now relax the assumption (H6.2.3), i.e., the measurement noise {v(n)} may be nonzero-mean, as in (H3.2.3). We further assume that (H6.2.5) The time-varying channel {h(n;l)} is zero-mean, complex Gaussian with vari- ance ?2h, and mutually independent for distinct l?s: Ebraceleftbigh(n;l)hH (n;l)bracerightbig= ?2hIN and E{h(n1;l1)hH (n2;l2)} = 0, for l1 negationslash= l2, for all n1, n2, i.e., different channel taps are independent of each other and are identically distributed zero-mean complex Gaus- sian. In the data-dependent superimposed training, the interference from the information sequence {b(n)} has been canceled out, so smq = 0. Then by (4.11), ?dmq = dmq +wmq, so that E braceleftBig [?dm1q1 ?dm1q1][?dm2q2 ?dm2q2]H bracerightBig = Ebraceleftbigwm1q1wHm2q2bracerightbig = 1T?2vIN?(m1 ?m2)?(q1 ?q2). By (3.11) and (3.16), it follows that cov{?D, ?D} = 1T?2vINQ(P?1). (6.7) 149 By (3.18), we also have cov braceleftBig? H, ?H bracerightBig := E braceleftbiggparenleftBig ?H?HparenrightBigparenleftBig?H?HparenrightBigH bracerightbigg = (CHC)?1CH cov braceleftBig? D, ?D bracerightBig C(CHC)?1. (6.8) Substitute (6.7) into (6.8), we have cov braceleftBig? H, ?H bracerightBig = 1T?2v(CHC)?1. In a manner similar to (4.16), it then follows that MSE1 = E ? ? ? Lsummationdisplay l=0 Qsummationdisplay q=1 vextenddoublevextenddouble vextenddoublehq (l)??hq (l) vextenddoublevextenddouble vextenddouble 2 ? ? ? = NQ? 2v T tr braceleftbiggparenleftBig VH diag braceleftBig |c1|2,|c2|2,...,|cP?1|2 bracerightBig V parenrightBig?1bracerightbigg . We note that the real channel over a block cannot be exactly equal to the CE-BEM representation. Counting the modeling error, the total channel MSE can be expressed by (4.34) MSEc = MSE1 +MSE2, where MSE1 comes from the estimation and MSE2 is the mean square modeling error. Under the assumption (H4.2.5) and Jakes? model, the modeling error of CE-BEM is given by (4.35). 150 6.3 Data-Dependent Superimposed Training Using DPS-BEM Exploiting the fact that DPS sequences are also approximately band-limited, in this section we extend the data-dependent superimposed training to DPS-BEM, so that spectral leakage arising from CE-BEM is efficiently reduced. We follow the assumptions: (H6.3.1) The time-varying channel {h(n;l)} satisfies (3.25) with the DPS sequences {uq (n)} known at the receiver. Also N ? 1. (H6.3.2) The information sequence {b(n)} is zero-mean, white with E{| b(n)|2} = ?2b. (H6.3.3) The measurement noise {v(n)} is zero-mean, white, uncorrelated with {b(n)}, with E{v(n+?)vH(n)} = ?2vIN?(?). (H6.3.4) The superimposed training sequencec(n) = c(n+P) for allnis a non-random periodic sequence with period P. Consider the interference smq in (6.2), which can be expressed by using (6.3) as smq = T?1summationdisplay n=0 bracketleftBigg Lsummationdisplay l=0 h(n;l)b(n?l) bracketrightBigg uq (n)e?j?mn = T?1summationdisplay n=0 ? ? ? Qsummationdisplay q1=1 Lsummationdisplay l=0 hq1 (l)uq1 (n) T?1summationdisplay r=0 brej?r(n?l) ? ? ?uq (n)e ?j?mn = Qsummationdisplay q1=1 Lsummationdisplay l=0 T?1summationdisplay r=0 bracketleftBig hq1 (l)e?j?rlbr bracketrightBigbracketleftBiggT?1summationdisplay n=0 uq1 (n)uq (n)ej2pi(r?mK)T n bracketrightBigg . 151 We exploit the approximate band-limitedness of the time-limited DPS sequences, assuming that T?1summationdisplay n=0 uq? (n)uq (n)ej2pi(r?mK)n/T ? 0 for |r?mK| ?Q+k, where k is an integer and (Q+k)/T > 2fdTs (k ? ?1). Therefore, the information-induced interference comes from the frequency components br?s for those r?s belonging to a set ? := {r : ?(Q+k) +mK ?r? (Q+k)+mK,m = 0,...,P ?1}. (6.9) The frequency components {br :r ? ?} of the information sequence are hence the self- interference. If we set br = 0 for those r ? ?, then smq = 0. We do so by modifying {c(n)} based on {b(n)} at the transmitter. Define a self-interference sequence be (n) := T?1summationdisplay r=0,r?? brej?rn (6.10) and a data-dependent superimposed training sequence {?c(n)}T?1n=0 such that ?c(n) := c(n)?be (n). All the other steps for the DPS-BEM-based channel estimator using data-dependent super- imposed training just follow the steps described in Section 6.2.1. For symbol detection, we can also follow the iterative approach addressed in Section 6.2.2. 152 6.3.1 Partially-Data-Dependent (PDD) Superimposed Training In the data-dependent superimposed training method, by setting br = 0 for r ? ? before transmission, we discard the ?frequency components? of the information sequence corresponding to ?, so that information-induced self-interference is eliminated. By (6.9), however, the information contained at those P(2Q+ 2k + 1) (among total T) frequencies is also discarded. Though it may be partially recovered by other properties (e.g., the finite alphabet of the information sequence [22]), it can cause severe detection errors under severe frequency loss. For the first-order statistics-based estimator we discussed in Section 3.3, the sequence {be (n)} acts as self-interference in channel estimation, whereas it also bears ?information? so that it should not be totally discarded. Given {be (n)} as in (6.10), we now transmit s(n) = c(n)+b(n)?(1??)be (n) (6.11) at the transmitter, where 0 ? ? ? 1 is the self-interference factor. When ? = 1, the information sequence {b(n)} keeps intact, corresponding to the first-order statistics-based estimator in Section 3.3. When ? = 0, the interference-induced frequency components br?s (r ? ?) are totally annihilated, corresponding to the data-dependent solution described in Section 6.2 and Section 6.3. If 0 < ? ? 1, which is the partially-data-dependent (PDD) case, at each r? ? the frequency componentbr is reduced to ?br. Then the self-interference will be ?sufficiently? suppressed when conducting channel estimation, while the frequency components at r ? ? remain ?partially? intact. Note that in this PDD method, the self- interference is not completely removed, so that the channel estimation is not as accurate 153 as that of the data-dependent solution. But since no information-bearing frequencies are nulled out, the information contained there can be recovered in data reception. Our PDD superimposed training-based channel estimation follows the data-dependent solution described in Section 6.2, except that we define the PDD superimposed training sequence as ?c(n) := c(n)?(1??)be (n) instead of (6.6) with a prescribed interference factor ?. For data detection at the receiver, the ?information sequence? is {b(n)?(1??)be (n)} (0 ? ? ? 1), while we are interested in {b(n)}. We can first use the estimated channel to detect {b(n)} via Viterbi algorithm (ignoring (1??)be (n) but accounting for the known {c(n)}). Since the training andinformation sequences passthroughan identical channel, the DML approach described in Section 5.2 can be exploited to recover the suppressedfrequency components br?s (r ? ?), as well as enhance the channel estimation, in an iterative way (see Section 6.3.4). 6.3.2 Performance Analysis We wish to evaluate the MSE in channel estimation, defined in (4.2), using PDD superimposed training when the true channel follows DPS-BEM. We make the following assumption about the channel h(n;l): (H6.3.5) The time-varying channel {h(n;l)} is zero-mean, complex Gaussian with vari- ance ?2h, and mutually independent for distinct l?s: Ebraceleftbigh(n;l)hH (n;l)bracerightbig= ?2hIN and E{h(n1;l1)hH (n2;l2)} = 0, for l1 negationslash= l2, for all n1, n2, i.e., different channel taps are 154 independent of each other and are identically distributed zero-mean complex Gaus- sian. Note that in PDD superimposed training, (4.11) still holds if smq is revised as smq := T?1summationdisplay n=0 braceleftBigg Lsummationdisplay l=0 h(n;l)[b(n?l)?(1??)be (n)] bracerightBigg uq (n)e?j?mn. By (3.34) and (6.10), we have Ebraceleftbigsm?q?sHmqbracerightbig=?2 (L+1)?2h?2bIN?parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig. (6.12) Then from (3.36) cov{ ?H, ?H} := E{[ ?H?H][ ?H?H]H} = parenleftBig? CH ?C parenrightBig?1 ? CH cov{??D, ??D}?C parenleftBig? CH ?C parenrightBig?1 . (6.13) Since E braceleftBig [?dmq ?dmq][?dm?q? ?dm?q?]H bracerightBig = Ebraceleftbigsm?q?sHmqbracerightbig+Ebraceleftbigwm?q?wHmqbracerightbig, by (4.10) and (6.12) we have cov{?D, ?D} =bracketleftbig?2 (L+1)?2h?2b +?2vbracketrightbigINPQ. (6.14) Substitute (6.14) for (6.13) cov{ ?H, ?H} =bracketleftbig?2 (L+1)?2h?2b +?2vbracketrightbig parenleftBig? CH ?C parenrightBig?1 . 155 Using orthonormality of DPS sequences, the MSE in channel estimation is given by MSE1 = 1TE ? ? ? Lsummationdisplay l=0 Qsummationdisplay q=1 vextenddoublevextenddouble vextenddoublehq (l)??hq (l) vextenddoublevextenddouble vextenddouble 2 ? ? ?= 1 T tr{cov{ ?H, ?H}} = ? 2 (L+1)?2 h? 2 b +? 2v T tr{ parenleftBig? CH ?C parenrightBig?1 }. (6.15) The MSE of the first-order statistics-based channel estimator in Section 3.3 is given by (6.15) for ? = 1, and that of the data-dependent solution in Section 6.3 is corresponding to ? = 0. For a PDD scheme with 0nf. We had to settle for nf = 7 leading to loss of parameter identifiability (more unknown than equations). In Figures 6.2 and 6.4 we also show the results fornf = 11 leading to a shorter data session with training-to-information bit and power ratio of 0.167? it is so labeled in Figures 6.2 and 6.4. The performance clearly improves and it is better 170 0 5 10 15 20 25 3010 ?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate Viterbi detector: K=1, L=5, T=840, fd=100Hz, Ts=25?s, TIR=0.1, P=7, 500 runs. SI: ?=0, N=1 SI: ?=0, N=2 SI: ?=0, N=3 TM: N=1 TM: N=2 TM: N=3 SI: ?=1, N=1 SI: ?=1, N=2 SI: ?=1, N=3 Figure 6.5: Data-dependent superimposed training (CE-BEM): BER vs SNR for non-data- dependent, data-dependent, and time-multiplexed training, under fd = 100Hz and N = 1, 2, and 3. (SI: superimposed training; TM: time-multiplexed training; ? = 1: non-data- dependent training; ? = 0: total elimination of self-interference.) than that of data-dependent superimposed training, but at the cost of 16.7% reduction in transmission rate. Figure 6.5 shows the detection results (based on estimated channel and Viterbi algo- rithm) for multiple receivers when fd = 100Hz and N = 1, 2, and 3. Again we see that data-dependent superimposed training is better than TM training without incurring the 10% training overhead penalty (which results in a transmission rate penalty). Now we consider a fast fading channel: N = 1, L = 1 (2 taps), a uniform power delay profile and Ts = 200?s; the rest is as the above channel. Therefore, for fd = 100 and 250Hz, the normalized Doppler spreads fdTs = 0.02 and 0.05 (corresponding value of Q are 35 and 85, respectively). Here we also keep T = 840. 171 0 5 10 15 20 25 3010 ?3 10?2 10?1 100 SNR (dB) Bit Error Rate Viterbi detector: K=N=1, L=1, T=840, Ts=25?s, TIR=0.1, P=2, 500 runs. SI: ?=1, fd=100Hz SI: ?=1, fd=250Hz SI: ?=0, fd=100Hz SI: ?=0, fd=250Hz TM: fd=100Hz TM: fd=250Hz TM: fd=100Hz, TIR=0.143 TM: fd=250Hz, TIR=0.429 Figure 6.6: Data-dependent superimposed training (fast fading): BER vs SNR under fd = 100 and 250Hz. (SI: superimposed training; TM: time-multiplexed training; ? = 1: non- data-dependent training; ? = 0: total elimination of self-interference.) Now the performance of all schemes is worse because of a large number of unknowns to be estimated. However, data-dependent superimposed training still outperforms TM training when we enforce the constraint training-to-information bit and power ratio of 0.1, because we cannot get nf ? Q. With the length of a training session 2L+ 1 = 3 bits, we show in Figure 6.6 the results for nf = 35 and 85 for fd = 100 and 250Hz respectively, leading to reduced data sessions with training-to-information bit and power ratio of 0.143 or 0.429. The performance clearly improves and it is better than that of data-dependent superimposed training, but at the cost of 14.3% or 42.9% reduction in the transmission rate. It is also seen from Figure 6.6 that superimposed training does not perform well for fd = 250Hz because of loss of ?information? due to nulling of contribution from information sequence at a ?large? number of frequencies (related to Q). 172 0 20 40 60 80 100 120 140 160 180 200?50 ?45 ?40 ?35 ?30 ?25 ?20 ?15 ?10 ?5 0 5 fd (Hz) Normalized Channel MSE (dB) K=N=1, L=2, T=400, Ts=25?s, TIR=0.1, SNR=25dB, Q=2?fdTTs?+1, P=4,500 runs. simulated, ?=1 analytical, ?=1 simulated, ?=0 theoretical, ?=1 simulated ? ?, ?=1 simulated ? ?, ?=1 Figure 6.7: Estimation variance: NCMSE vs fd under SNR = 25dB for comparison be- tween analytical and simulation-based results of non-data-dependent and data-dependent superimposed training. (? = 1: non-data-dependent training; ? = 0: total elimination of self-interference; ?: standard deviation.) We next consider the performance analysis of our data-dependent superimposed train- ing scheme. We revise the channel to be with N = 1, L = 2 (3 taps), a uniform power delay profile and Ts = 25?s. We also omit ?0 in the receiver-end processing. In Figure 6.7, we show the channel MSE versus Doppler spreads where we compare our theoretical expressions with simulation-based MSE results and ?? bounds. The channel MSE for non- data-dependent superimposed training is given by (4.8) and (4.35). The agreement is good between the theoretical and simulations-based results. Note that the discontinuities in the theoretical curves as a function of fd as we picked Q values based on fd per (2.9b). 173 0 5 10 15 20 25 30?15 ?10 ?5 0 SNR (dB) Normalized Channel MSE (dB) K=N=1, L=4, T=420, Ts=25?s, TIR=0.15, fd=100Hz, P=5, 500 Runs. CE?BEM, ?=1 CE?BEM, ?=0 DPS?BEM, ?=1 DPS?BEM, ?=0 DPS?BEM, ?=0.2 Figure 6.8: PDD superimposed training: NCMSE vs SNR for CE- and DPS-BEM-based estimators, under fd = 100Hz. (? = 1: non-data-dependent training; ? = 0: total elim- ination of self-interference; ? = 0.2: partial elimination of self-interference at the channel estimation stage.) 6.4.2 (Partially) Data-Dependent Superimposed Training Using DPS-BEM We consider a random doubly-selective Rayleigh fading channel. We take N = 1 and L = 4 (5 taps) with h(n;l) as in (H6.3.5) satisfying the Jakes? model. We consider a communication system described in Section 2.6.3 with symbol interval Ts = 25?s, record length T = 420 symbols, and varying Doppler spreads fd in the range of 0Hz to 200Hz. For fd = 100Hz, the normalized Doppler spread fdTs = 0.0025 and for fd = 200Hz, fdTs = 0.005. We emphasize that in the simulations the DPS-BEM is used only for processing at the receiver; the random channels are generated by the Jakes? model, not the DPS-BEM. In the processing of channel estimation, we select Q, the number of basis functions, as in Q = ?2fdTsT?+ 1. Using the estimated channel, a Viterbi detector is used for data symbol detection at the receiver. For the DPS-BEM-based estimator using 174 0 5 10 15 20 25 3010 ?3 10?2 10?1 100 SNR (dB) Bit Error Rate Viterbi detector: K=N=1, L=4, T=420, Ts=25?s, TIR=0.15, fd=100Hz, P=5, 500 Runs. CE?BEM, ?=1 CE?BEM, ?=0 DPS?BEM, ?=1 DPS?BEM, ?=0 DPS?BEM, ?=0.2 Figure 6.9: PDD superimposed training: BER vs SNR for CE- and DPS-BEM-based esti- mators, under fd = 100Hz. (? = 1: non-data-dependent training; ? = 0: total elimination of self-interference; ? = 0.2: partial elimination of self-interference at the channel estimation stage.) PDD superimposed training, we took k = ?1 (the minimum allowed value) in (6.9), so that the information loss is comparatively mild. The additive noise is zero-mean complex white Gaussian. The (receiver) SNR refers to the energy per bit over one-sided noise spectral density with both information and superimposed training sequence counting toward the bit energy. Information sequences are BPSK. We take the superimposed training sequence of period P = L + 1 = 5 as c(n) = ?cejpin(n+?)/P where ? = 1 if P is odd and ? = 2 if P is even, as in [59]. All simulated results were based on 500 Monte Carlo runs. To show the advantage of DPS-BEM over the Fourier-based CE-BEM, we compare the performance of our proposed channel estimators using the two BEM?s. Figure 6.8 shows the comparison for normalized channel MSE?s (NCMSE). In an environment of fd = 100Hz, 175 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 10?3 10?2 10?1 100 ? Viterbi detector: K=N=1, L=4, T=420, Ts=25?s, fd=100Hz, SNR=15dB, Q=4, 500 runs. ? Bit Error Rate Figure 6.10: PDD superimposed training: BER vs (?,?) under SNR = 15dB and fd = 100Hz. we had Q = 5 for CE-BEM following (2.9b) and Q = 4 for DPS-BEM. The channel estimators have identical design parameters except for using different basis models. In the simulation, the average transmitted power ?2c in c(n) is 0.15 of the power in b(n), leading to a training-to-information power ratio TIR := ?2c/?2b = ?/(1??) = 0.15 (or ? = 0.13). We consider the CE-BEM-based estimator using superimposed training with ? = 1 (corresponding to the first-order statistics-based estimator in Chapter 3) and ? = 0 (corresponding to the ?fully? data-dependent scheme), and the DPS-BEM-based estimator using PDD superimposed training with ? = 0, 0.2, and 1. At the receiver, we follow the data detection scheme described in Section 6.2.2. It is seen from Figure 6.8 that the data-dependent (? = 0) training offers much lower estimation variance than that of the ?non-data-dependent? (? = 1) training, whether using CE- or DPS-BEM, since the self-interference is eliminated or greatly reduced with ? = 0. 176 5 10 15 20 25 300 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 SNR (dB) Optimum Value Viterbi detector: K=N=1, L=4, T=420, Ts=25?s, fd=100Hz, Q=4, 500 runs. analytical: ? analytical: ? simulated: ? simulated: ? Figure 6.11: PDD superimposed training: optimum (?,?) vs SNR under fd = 100Hz. For the same values of ??s, the estimator exploiting DPS-BEM provides better estimation compared with the CE-BEM-based one, because DPS-BEM has smaller modeling error than CE-BEM. These conclusions are also confirmed by the BER results in Figure 6.9. Note that for the DPS-BEM-based estimator with ? = 0.2, although the estimation cannot be as accurate as the ?completely-data-dependent? (? = 0) case, the BER result is still better since the ?information? lost in the ?completely-data-dependent? training is now partially retained. Figure 6.10 depicts the BER surface as a function of ? (defined in (6.21)) and ?, for SNR = 15dB and fd = 100Hz. The BER performance varies along ?- and ?-axes. We selected the coordinate point (??o,??o) corresponding to the minimum value at the BER surface as the simulation-based optimum value for the given SNR. In Figure 6.11 we compare the simulation-based optimum value with the ?analytical? optimum value (?o,?o) derived recursively (with the initial value ? = 0) by (6.23) and (6.24). In Figure 6.11, the analytical 177 0 5 10 15 20 25 30 10?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate Viterbi detector: K=N=1, L=4, T=420, Ts=25?s, TIR=0.15, fd=0Hz, P=5, 500 Runs. ?=0, step 1 ?=0, 3rd iteration ?=1, step 1 ?=1, 3rd iteration ?=0.2, step 1 ?=0.2, 3rd iteration TM Figure 6.12: PDD superimposed training: BER vs SNR under fd = 0Hz (time-invariant). (TM: time-multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2; ?3rd iteration?: the third DML iteration; ? = 1: non-data-dependent training; ? = 0: total elimination of self-interference; ? = 0.2: partial elimination of self-interference at the channel estimation stage.) and the simulation-based results follow the same trend, and the agreement between them is good. It is also seen in Figure 6.11 that as the received signal SNR increases, the optimum ? and ? increase too. Higher ? implies that a higher fraction of the transmitted power is allocated to training, leading to more accurate channel estimates (with smaller estimation variance). Intuitively, for higher SNR?s, it pays to achieve more accurate channel estimates in order to achieve a lower effective noise power ??2w. On the other hand, when the SNR is low, improving channel estimation does not have much effect on the effective noise power ??2w. Similar comments apply to changes in optimum ? with SNR. Higher ? implies lower power in effective noise component (1??)be (n?l) but higher self-interference, hence higher channel estimation variance ? these two effects need to be counter-balanced. Finally, 178 0 5 10 15 20 25 30 10?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate Viterbi detector: K=N=1, L=4, T=420, Ts=25?s, TIR=0.15, fd=100Hz, P=5, 500 Runs. ?=0, step 1 ?=0, 3rd iteration ?=1, step 1 ?=1, 3rd iteration ?=0.2, step 1 ?=0.2, 3rd iteration TM Figure 6.13: PDD superimposed training: BER vs SNR under fd = 100Hz. (TM: time- multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2; ?3rd iteration?: the third DML iteration; ? = 1: non-data-dependent training; ? = 0: total elimination of self-interference; ? = 0.2: partial elimination of self-interference at the channel estimation stage.) observe from Figure 6.10 that the ?bottom? (corresponding to the neighborhood of the minimum BER point) of the BER surface is rather ?flat?: the BER performance is not sensitive to changes in ? and ? over a rather ?wide? area around the minimum, so that the analysis described in Section 6.3.3 provides us an effective means for power allocation and interference suppression. The DML approach is now investigated to enhance the channel estimation and data detection performance. We considered the channels with Doppler spread fd = 0, 100, and 200Hz respectively, with the corresponding number of basis functions Q = 1, 4, and 6. Note that the channel with fd = 0Hz is time-invariant; this case is considered because here there are no modeling errors. The ?non-data-dependent? (? = 1) superimposed training 179 0 5 10 15 20 25 3010 ?3 10?2 10?1 100 SNR (dB) Bit Error Rate Viterbi detector: K=N=1, L=4, T=420, Ts=25?s, TIR=0.15, fd=200Hz, P=5, 500 Runs. ?=0, step 1 ?=0, 3rd iteration ?=1, step 1 ?=1, 3rd iteration ?=0.2, step 1 ?=0.2, 3rd iteration TM Figure 6.14: PDD superimposed training: BER vs SNR under fd = 200Hz. (TM: time- multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2; ?3rd iteration?: the third DML iteration; ? = 1: non-data-dependent training; ? = 0: total elimination of self-interference; ? = 0.2: partial elimination of self-interference at the channel estimation stage.) was compared with the ?completely? (? = 0) and ?partially? (? = 0.2) data-dependent scheme. We again set TIR = 0.15 for the superimposed training. At the receiver, DML iterations follow the data detection scheme we described at Section 6.2.2 (denoted by ?step 1? in the figures). We show NCMSE and BER in Figures 6.12?6.17 where the results of Step 1 and the third iteration only are depicted. It is seen that the DML algorithm significantly improves the performances. For the purpose of comparison, the TM training approach described in Appendix A, originally proposed for CE-BEM as an optimal scheme, is applied to the DPS-BEM case. We take a training block of 2L + 1 = 9 symbols as braceleftbig0,0,0,0,?2L+1,0,0,0,0bracerightbig, which follows an information data block of length 60 leading to a frame of 69 symbols. This 180 0 5 10 15 20 25 30?50 ?45 ?40 ?35 ?30 ?25 ?20 ?15 ?10 ?5 SNR (dB) Normalized Channel MSE (dB) Viterbi detector: K=N=1, L=4, T=420, Ts=25?s, TIR=0.15, fd=0Hz, P=5, 500 Runs. ?=0, step 1 ?=0, 3rd iteration ?=1, step 1 ?=1, 3rd iteration ?=0.2, step 1 ?=0.2, 3rd iteration TM Figure 6.15: PDD superimposed training: NCMSE vs SNR under fd = 0Hz (time- invariant). (TM: time-multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2; ?3rd iteration?: the third DML iteration; ? = 1: non-data-dependent training; ? = 0: total elimination of self-interference; ? = 0.2: partial elimination of self-interference at the channel estimation stage.) subblock was repeated over a record length of 414 symbols with a total of 6 subblocks. The information data is also BPSK and have unit power. Thus, the training-to-information bit and power ratios are both 0.15 (the amplitude of the single nonzero training bit was picked to achieve this power ratio). Using the training sequence, we can uniquely determine hq (l)?s via an LS approach. By (6.9) note that |?| = P (2Q+2k+1) (k = ?1 in the simulations). For the time- invariant channel (fd = 0) Q = 1, so that the information contained in the self-interference part of the information sequence is comparatively small. Therefore, total suppression (? = 0) of the self-interference does not have a significant deleterious effect on the BER, while the improvement in channel estimation is significant?it is the scenario described by [20] and 181 0 5 10 15 20 25 30?35 ?30 ?25 ?20 ?15 ?10 ?5 0 SNR (dB) Normalized Channel MSE (dB) Viterbi detector: K=N=1, L=4, T=420, Ts=25?s, TIR=0.15, fd=100Hz, P=5, 500 Runs. ?=0, step 1 ?=0, 3rd iteration ?=1, step 1 ?=1, 3rd iteration ?=0.2, step 1 ?=0.2, 3rd iteration TM Figure 6.16: PDD superimposed training: NCMSE vs SNR under fd = 100Hz. (TM: time- multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2; ?3rd iteration?: the third DML iteration; ? = 1: non-data-dependent training; ? = 0: total elimination of self-interference; ? = 0.2: partial elimination of self-interference at the channel estimation stage.) depicted in Figures 6.12 and 6.15. Exploiting the PDD scheme (? = 0.2) in this case does not have much impact. All the three schemes (? = 0, 0.2, and 1) after three iterations of the DML scheme have BER performance in Figure 6.12 similar to that of the TM training. As the Doppler spread fd increases, we have to employ a larger Q to describe the channel and the self-interference thus grows. Now total suppression (? = 0) is no longer a wise option, whereas the PDD scheme still performs well. In Figures 6.13 and 6.14, with ? = 0.2 thePDD scheme is superiorto theother two superimposedtraining-basedschemes in data detection, whether before or after DML iterations, since the self-interference has been greatly suppressed while the information loss has been effectively reduced. In Figures 6.16 and 6.17, although the ?completely? data-dependent (? = 0) training has the best channel 182 0 5 10 15 20 25 30?30 ?25 ?20 ?15 ?10 ?5 0 5 SNR (dB) Normalized Channel MSE (dB) Viterbi detector: K=N=1, L=4, T=420, Ts=25?s, TIR=0.15, fd=200Hz, P=5, 500 Runs. ?=0, step 1 ?=0, 3rd iteration ?=1, step 1 ?=1, 3rd iteration ?=0.2, step 1 ?=0.2, 3rd iteration TM Figure 6.17: PDD superimposed training: NCMSE vs SNR under fd = 200Hz. (TM: time- multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2; ?3rd iteration?: the third DML iteration; ? = 1: non-data-dependent training; ? = 0: total elimination of self-interference; ? = 0.2: partial elimination of self-interference at the channel estimation stage.) estimation before iterations, the PDD scheme with ? = 0.2 yields better performance after three iterations, due to the lower BER in the DML approach. The BER performance of the PDD scheme after several DML iterations is competitive with the TM training, without incurring any training overhead penalty. 6.5 Conclusions In this chapter, we presented a data-dependent superimposed training scheme to re- duce the self-interference in channel estimation. Inspired by the work of [20], we observed that over a channel satisfying a band-limited BEM such as CE- or DPS-BEM, the periodic 183 superimposed training components within the received signal occur only at certain fre- quencies. By designing a data-dependent superimposed training sequence that suppresses the information sequence at those frequencies, the self-interference adversely affecting the channel estimation at the receiver is greatly reduced. However, the suppressed frequency components of the information sequence carry ?information? as well. A PDD superimposed training method was proposed to strike a trade-off between self-interference cancelation and information integrity. Performance analysis and related optimization of parameters were also discussed. Computer simulation examples demonstrated that by using PDD superim- posed training, competitive performance with TM training can be achieved but with no training overhead penalty. 184 Chapter 7 Direct FIR Linear Equalization of Doubly-Selective Channels Based on Superimposed Training 7.1 Introduction Two powerful tools have been exploited so far in our previous chapters on channel estimation: superimposed training, providing us a means to track the temporal variation of the channel, and BEM?s, reducing the problem of estimation of a time-varying channel over a period of time to estimation of time-invariant parameters. In wireless channels, signal distortion due to multipath propagation or band-limited transmission may cause ISI at reception. An equalizer is the device to compensate ISI at the receive end. The noise-free received signal is the convolution between the transmitted symbols and the impulse response of the channel. Therefore, the equalizer, whose task is to recover the transmitted symbols, is a deconvolution device. Since a doubly-selective channel can be well described by BEM?s, we expect that an equalizer, as an inverse of the channel, can also be well represented by BEM?s. Given the knowledge of the time- varying channel described by CE-BEM, design of serial time-varying FIR equalizers has been discussed in [2]. Direct design of time-invariant FIR equalizers based on superimposed training, for time-invariant channels, has been investigated in [58]. In this chapter, we investigate direct design of time-varying FIR linear equalizers for doubly-selective channels using superimposed training and without first estimating the underlying channel response. We exploit the prior results of [2,58]. 185 In Section 7.2, the direct FIR linear equalization using superimposed training and CE-BEM is discussed. By utilizing user-specific training sequences, this direct equalizer is extended to a multiple-user wireless ad hoc network in Section 7.3. 7.2 Direct FIR Linear Equalization Using CE-BEM Consider a time-varying SIMO FIR linear channel withN outputs. Let {s(n)} denote a scalar sequence which is input to the SIMO time-varying channel with discrete-time impulse response {h(n;l)} (N-vector channel response at time nto a unit input at timen?l). Then the symbol-rate, channel output vector is given by (3.1) x(n) := Lsummationdisplay l=0 h(n;l)s(n?l). (7.1) The noisy measurements of x(n) are given by y(n) = x(n) +v(n). In a CE-BEM representation it is assumed that the channel follows (2.9) h(n;l) = ?Qsummationdisplay q=1 hq(l)ej?qn (7.2) where N-column vectors hq(l) are invariant for the whole block n = 0,1,...,T ?1, and ?Q := 2?fdTTs?+1, L := ??d/Ts?, 186 ?q := 2piT (q? 12 ? Q2 ), q = 1,2,...,Q. In superimposed training one takes s(n) = b(n)+c(n), where {b(n)} is the information sequence and {c(n)} is the training sequence added (super- imposed) at low power to the information sequence at the transmitter before modulation and transmission. Given this channel model, the main problem considered here is: how to design an equalizer to estimate {b(n)} when one knows only {c(n)} but not (obviously) {b(n)} and one does not also have (frame) synchronization with {c(n)} at the receiver. We will design an equalizer to estimate {c(n)} with a delay d. We will then show that this equalizer is a scaled version of the corresponding equalizer designed to estimate {b(n)} with a delay d. 7.2.1 Time-Varying FIR Equalizers We will restrict ourselves to serial linear equalizers instead of block linear equalizers, since as shown in [2], the latter are computationally prohibitive (compared with the former). We look for a time-varying linear equalizer g(n;l) (l = 0,1,...,Le) over the same time block as the received data with channel model (7.2). We note that for an arbitrary time-varying impulse response ?g(n;l), the following is always true ?g(n;l) = Tsummationdisplay q=1 ?gq(l)ej?qn, n= 0,1,...,T ?1. 187 We would like to use a more parsimonious (but approximate) representation for ?g(n;l), denoted by g(n;l), given by g(n;l) = Qsummationdisplay q=1 gq(l)ej?qn, n= 0,1,...,T ?1, where Q?T ?1. In order to estimate the input sequence {s(n)} (see (7.1)), we may seek a linear time-varying FIR estimator using CE-BEM to yield an estimate with equalization delay d (0 ?d?Le) ?s(n?d) = Lesummationdisplay i=0 gH(n;i)y(n?i). Existence of a zero-forcing linear equalizer has been discussed in [2]. Their conclusion is that if N is at least two, then with probability one, one has a zero-forcing solution for sufficiently large Le and Q. For a linear MMSE solution, existence is not an issue, although MMSE equalizer performance can be expected to be ?good? if zero-forcing equalizers exist [2]. Here we will seek a least squares solution g(n;l) to minimize a cost such as J = 1T T?1summationdisplay n=0 |s(n?d)? ?s(n?d)|2. 7.2.2 Linear LS Equalizers Based on CE-BEM Our algorithm is based on the following model assumptions: (H7.2.1) The information sequence {b(n)} is zero-mean, i.i.d. (independent and identically distributed), with E{|b(n)|2} = ?2b. (H7.2.2) The measurement noise {v(n)} is zero-mean (E{v(n)} = 0), white, independent of {b(n)}, with E{[v(n+?)][v(n)]H} = ?2vIN?(?). 188 (H7.2.3) The superimposed training sequencec(n) = c(n+P) for allnis a non-random periodic sequence with period P. Let ?2c := (1/P)summationtextPn=1|c(n)|2. (H7.2.4) Record length T and period P satisfy TP?1 > ?Q. Moreover, P >L+Le ?d. As in (3.5), the periodic training sequence of period P can be written as c(n) = P?1summationdisplay m=0 cmej?mn where ?m := 2pim/P. To design the time-varying linear equalizer to estimate a delayed version of the training sequence c(n?d) (0 ?d?Le), we have ?c(n?d) = Lesummationdisplay i=0 gHd (n;i)y(n?i) where we assume that gd(n;i) = Qsummationdisplay q=1 gq(i)ej?qn. Choose gq(i)?s to minimize the time-averaged cost Jc := 1T T?1summationdisplay n=0 |c(n?d)??c(n?d)|2 = 1T T?1summationdisplay n=0 vextendsinglevextendsingle vextendsinglevextendsingle vextendsinglevextendsinglec(n?d)? Lesummationdisplay i=0 Qsummationdisplay q=1 gHq (i)e?j?qny(n?i) vextendsinglevextendsingle vextendsinglevextendsingle vextendsinglevextendsingle 2 . By taking the derivative and setting it to be zero, we have ?Jc ?g?q1(i1) = ? 1 T T?1summationdisplay n=0 e?j?q1ny(n?i1) ? ?c?(n?d)? Lesummationdisplay i=0 Qsummationdisplay q=1 ej?qnyH(n?i)gq(i) ? ?= 0 189 for i1 = 0,1,...,Le and q1 = 1,2,...,Q. This leads to Lesummationdisplay i=0 Qsummationdisplay q=1 bracketleftBigg 1 T T?1summationdisplay n=0 ej(?q??q1)ny(n?i1)yH(n?i) bracketrightBigg gq(i) = 1T T?1summationdisplay n=0 c?(n?d)e?j?q1ny(n?i1) =: Rc(q1,i1). (7.3) To design the time-varying linear equalizer to estimate the information sequenceb(n?d) (0 ?d?Le), we have ?b(n?d) = Lesummationdisplay i=0 ?gHd (n;i)y(n?i) where we assume that ?gd(n;i) = Qsummationdisplay q=1 ?gq(i)ej?qn. Choose ?gq(s)?s to minimize Jb := 1T T?1summationdisplay n=0 vextendsinglevextendsingle vextendsingleb(n?d)??b(n?d) vextendsinglevextendsingle vextendsingle 2. Mimicking the results for the superimposed training sequence-based equalization, Lesummationdisplay i=0 Qsummationdisplay q=1 bracketleftBigg 1 T T?1summationdisplay n=0 ej(?q??q1)ny(n?i1)yH(n?i) bracketrightBigg ?gq(i) = 1T T?1summationdisplay n=0 b?(n?d)e?j?q1ny(n?i1) =: Rb(q1,i1). (7.4) Comparing (7.3) and (7.4), we see that (ignoring the equalizer coefficients) the left sides of the two are identical. We now seek to establish that for large T, Rc(q1,i1) = ?Rb(q1,i1) for all q1,i1, for some scalar ?, so that gq(i) = ??gq(i) for all i. 190 We first consider Rc(q1,i1) = 1T T?1summationdisplay n=0 c?(n?d)e?j?q1n braceleftBigg Lsummationdisplay l=0 h(n?i1;l)[b(n?i1 ?l) +c(n?i1 ?l)] +v(n?i1) bracerightBigg = 1T T?1summationdisplay n=0 P?1summationdisplay m1=0 c?m1e?j?m1(n?d)e?j?q1n ? ? ? ? Lsummationdisplay l=0 ?Qsummationdisplay q=1 hq (l)ej?q(n?i1) bracketleftBiggP?1summationdisplay m2=0 cm2ej?m2(n?i1?l) +b(n?i1 ?l) bracketrightBigg +v(n?i1) ? ? ? = ?Qsummationdisplay q=1 Lsummationdisplay l=0 P?1summationdisplay m1=0 P?1summationdisplay m2=0 c?m1cm2ej?m1de?j?m2(i1+l)e?j?qi1hq(l)A0 + ?Qsummationdisplay q=1 Lsummationdisplay l=0 P?1summationdisplay m1=0 c?m1ej?m1de?j?qi1hq(l)A1 + P?1summationdisplay m1=0 c?m1ej?m1dA2 where A0 := 1T T?1summationdisplay n=0 ej(??m1+?m2??q1+?q)n, A1 := 1T T?1summationdisplay n=0 ej(??m1??q1+?q)nb(n?i1 ?l), A2 := 1T T?1summationdisplay n=0 e?j(?m1+?q1)nv(n?i1). Under the condition TP?1 > ?Q (then (?m1 +?q1) = (?m2 +?q2) if and only if m1 = m2 and q1 = q2), we have A0 = ?(m1 ?m2)?(q1 ?q). 191 Furthermore, we have E braceleftBig |A1|2 bracerightBig = 1T2 T?1summationdisplay n1=0 T?1summationdisplay n2=0 ej(??m1??q1+?q)(n1?n2)E{b(n1 ?i1 ?l)b?(n2 ?i1 ?l)} = 1T2 T?1summationdisplay n1=0 T?1summationdisplay n2=0 ej(??m1??q1+?q)(n1?n2)?2b?(n1 ?n2) = ? 2 b T . Similarly, it follows that E braceleftBig bardblA2bardbl2 bracerightBig = 1T2 T?1summationdisplay n1=0 T?1summationdisplay n2=0 e?j(?m1+?q1)(n1?n2)vH (n1 ?i1)v(n2 ?i1) = 1T2 T?1summationdisplay n1=0 T?1summationdisplay n2=0 e?j(?m1+?q1)(n1?n2)N?2v?(n1 ?n2) = N? 2v T . In the mean square sense (and thus in probability), we then have the following two limits lim T?? A1 m.s.= 0, and lim T?? A2 m.s.= 0. Thus for ?large? T, we have lim T?? Rc(q1,i1) m.s.= ?Qsummationdisplay q=1 Lsummationdisplay l=0 P?1summationdisplay m1=0 P?1summationdisplay m2=0 c?m1cm2ej?m1de?j?m2(i1+l)e?j?qi1hq (l) bracketleftBigg 1 T T?1summationdisplay n=0 ej(??m1+?m2??q1+?q)n bracketrightBigg 192 = ?Qsummationdisplay q=1 Lsummationdisplay l=0 P?1summationdisplay m1=0 P?1summationdisplay m2=0 c?m1cm2ej?m1de?j?m2(s1+l)e?j?ki1hq (l)?(m1 ?m2)?(q1 ?q) = ?Qsummationdisplay q=1 Lsummationdisplay l=0 P?1summationdisplay m=0 |cm|2ej?m(d?i1?l)e?j?qi1hk (l)?(q1 ?q). If the training sequence {c(n)} is periodic white, i.e., P?1 P?1summationdisplay n=0 c(n)c?(n?l) = ?2c?(lmodP), then P?1summationdisplay m=0 |cm|2ej?m(d?i1?l) = ?2c?((d?i1 ?l) modP). This fact then leads to lim T?? Rc(q1,i1) m.s.= ?? ? ?? ?2ce?j?q1i1hq1((d?i1)mod P) if 1 ?q1 ? ?Q 0 otherwise (7.5) for i1 = 0,1,...,Le and q1 = 1,2,...,Q. Turning to (7.4), we have Rb (q1,i1) = ?Qsummationdisplay q=1 Lsummationdisplay l=0 P?1summationdisplay m=0 cmhq(l)e?j?m(i1+l)e?j?ki1A3 + ?Qsummationdisplay q=1 Lsummationdisplay l=0 hq(l)e?j?qi1A4 +A5 where A3 := 1T T?1summationdisplay n=0 ej(?m??q1+?q)nb?(n?d), A4 := 1T T?1summationdisplay n=0 ej(?q??q1)nb(n?i1 ?l)b?(n?d), 193 A5 := 1T T?1summationdisplay n=0 e?j?q1nv(n?i1)b?(n?d). We can show (as before) that lim T?? A3 m.s.= 0, lim T?? A5 m.s.= 0. Consider A6 := 1T T?1summationdisplay n=0 ej(?q??q1)nbracketleftbigb(n?i1 ?l)b?(n?d)??2b?(d?i1 ?l)bracketrightbig. It then follows that E braceleftBig |A6|2 bracerightBig = 1T2 T?1summationdisplay n1=0 T?1summationdisplay n2=0 ej(?q??q1)(n1?n2)E braceleftBigvextendsinglevextendsingle b(n?i1 ?l)b?(n?d)??2b?(d?i1 ?l)vextendsinglevextendsingle2 bracerightBig = 1T2 T?1summationdisplay n1=0 T?1summationdisplay n2=0 ej(?q??q1)(n1?n2) bracketleftBig E braceleftBig |b(n)|4 bracerightBig ??4b bracketrightBig ?(n1 ?n2)?(d?i1 ?l) = 1T bracketleftBig E braceleftBig |b(n)|4 bracerightBig ??4b bracketrightBig ?(d?i1 ?l). Therefore, we have limT??A6 m.s.= 0, and consequently lim T?? A4 m.s.= 1T T?1summationdisplay n=0 ej(?q??q1)n?2b?(d?i1 ?l) = ?2b?(d?i1 ?l)?(q1 ?q). 194 Hence, for ?large? T, we have lim T?? Rb (q1,i1) m.s.= ?Qsummationdisplay q=1 Lsummationdisplay l=0 hq(l)e?j?qi1?2b?(d?i1 ?l)?(q1 ?q). For i1 = 0,1,...,Le and q1 = 1,2,...,Q but q1 ? ?Q, we therefore have lim T?? Rb(q1,i1) m.s.= hq1(d?i1)e?j?q1i1?2b. (7.6) If P >L+Le?d, then (7.5) equals (7.6) (within a scale factor). Therefore, for ?large? T, Rc(q1,i1) = ?Rb(q1,i1) for all q1,i1 with ? = ?2c/?2b; hence gq(i) = ??gq(i) for all i. For the desired linear time-varying equalizer, we execute the following steps: 1. Pick Le and d (= Le2 in the following simulations). Pick Q? ?Q, P >L+Le ?d. 2. Solve (7.3), given data y(n), for gq(i) where 0 ?i?Le and 1 ?q ?Q. Then gd(n;i) = Qsummationdisplay q=1 gq(i)ej?qn. 3. The equalized output is then given by e1(n) = Lesummationdisplay i=0 gHd (n;i)y(n?i) ??1c(n?d) +?2b(n?d) + ?v(n) where ?v(n) is the equalized noise. Estimate ?1 as ??1 = 1 T summationtextT?1 n=0 e1(n)c ?(n?d) 1 T summationtextT?1 n=0 |c(n?d)| 2 = 1 T summationtextT?1 n=0 e1(n)c ?(n?d) ?2c . 195 4. Define e2(n) = e1(n)? ??1c(n?d) ??2b(n?d) + ?v(n). Then we hard-quantize e2(n) to estimate b(n?d). 7.3 Direct FIR Linear Equalization: Multiple Users Multiple access schemes allow multiple users to share a common channel. Random access methods provide each user with a flexible way of gaining access to the channel whenever the user has information (packets) to be sent. In random access, typically when two packets collide, they are discarded and then have to be retransmitted. In wireless ad hoc networks (also known as mobile ad hoc networks?MANET?s), absence of base stations limits the use of traditional media access control (MAC) protocols [34]. In ad hoc networks one needs some sort of distributed MAC requiring some form of random access which makes avoiding collisions difficult. Collisions arising from uncoordinated users decrease system throughput and worsen delay performance. Multiple packet reception (MPR) capability (or signal separation) is one way to resolve packet collisions and thereby enhance throughput, by using signal processing to separate multiple received signals [75]. Recently, wireless ad hoc networks with asynchronous transmissions have been considered in [15,58,60]. The approaches of [15,58] use user-specific modulation induced cyclostationarity coupled with receive antenna array to achieve MPR for frequency-selective time-invariant channels. In [60] user-specific superimposedtraining signals (also called hiddenpilots orimplicit training) have been used for MPR for frequency-selective time-invariant channels. The objective of this section is to investigate approaches using user-specific superimposed training signals 196 for MPR in MANET?s for transmissions over doubly-selective channels, with emphasis on asynchronous networks. Consider a time-varying MIMO FIR linear channel with K inputs (users) and N out- puts (receiver array with N elements at the destination node). Let {sk(n)} denote the k-th user?s information sequence which is input to the MIMO doubly-selective channel with the k-th user?s discrete-time impulse response {hk(n;l)} (N-vector channel response at time n to a unit input at time n?l). Consider a typical (one-hop) MANET structure in an asyn- chronous mode. Assume K active users with a packet length of S symbols, in the coverage area of the node under evaluation. Each node is equipped with N (? 1) receive antennas and receiver node processes a data record block of size T (? S) symbols. Various packets can be located anywhere within this observation block. Using a sliding block approach (as in [15,60]), we assume that the packet of interest is totally within the observation block. (An energy detector or related approaches can be used to ensure this [15,60].) The noisy received (baseband-equivalent, symbol-rate) signal at the node-of-interest at time n is an N-column vector y(n), n1 ? n ? n1 + T ? 1, given by (n1 is the ?initial? time of the observation block) y(n) = Ksummationdisplay k=1 Lsummationdisplay l=0 hk(n;l)sk(n?l)+v(n). (7.7) In a CE-BEM representation it is assumed that the channel for each user follows (2.9) hk(n;l) = ?Qsummationdisplay q=1 hqk(l)ej?qn, k = 1,2,...,K (7.8) 197 where N-column vectors hq(l) are invariant for the whole block n = 0,1,...,T ?1, and ?Q := 2?fdTTs?+1, L := ??d/Ts?, ?q := 2piT (q? 12 ? Q2 ), q = 1,2,...,Q. In superimposed training-based approaches, for the k-th user, one takes sk(n) = bk(n)+ck(n) where{bk(n)}is the informationsequence and{ck(n)}isa user-specificnon-randomperiodic training sequence. The main problem considered here is: How to design an equalizer to estimate {b1 (n)}, the information sequence of user 1 (the desired user), when one knows only {c1 (n)} but not (obviously) {b1 (n)} and one does not also have (frame) synchronization with {c1 (n)} at the receiver. We will design an equalizer to estimate {c1 (n)} with a delay d. In a manner similar to Section 7.2, we will then show that this equalizer is a scaled version of the corresponding equalizer designed to estimate {b1 (n)} with a delay d provided that {ck (n)} satisfies certain properties. 198 7.3.1 User-Specific Training Sequences Each user is assigned (or selects) a user-specific training sequence. The sequences ck (n) := ?ck exp bracketleftbigg j2pi parenleftbiggn2 ?P +?kn parenrightbiggbracketrightbigg , (7.9a) ?k := m?1D , m = 1,2,...,D ?K (7.9b) have been used in [58,60], which are periodic with period P = D?P where D, ?P, and ?ck are design parameters (D and ?P are coprime). Different users are characterized by different ?k?s and distinct sequences are mutually orthogonal and individually periodic white. (There is a common code book at each node of size D containing the possible values of ?k. During the ?initial contact? period, a given node searches for all possible D signals.) In a different context, as in Section 3.5, we have proposed the user-specific sequences in (3.63)?(3.64) by using an m-sequence of periodic ?P with P = D?P. These sequences are periodic with period P, mutually orthogonal and individually ?nearly? periodic-white with period ?P. Given the knowledge of the time-varying channel described by CE-BEM, we investi- gate direct design of time-varying FIR linear equalizers for doubly-selective channels using superimposed training and without first estimating the underlying channel response. 7.3.2 Linear LS Equalizers for the Desired User We look for a time-varying linear equalizer g(n;l) (l = 0,1,...,Le) over the same time block as the received data with channel model (7.8). Following the discussions in Section 7.2.2, we assume g(n;l) = Qsummationdisplay q=1 gq(l)ej?qn, n= 0,1,...,T ?1. 199 In order to estimate the input sequence of the desired user (user 1, with no loss of generality) {s1(n)}(see (7.7)), we may seek a linear time-varying FIR estimator using CE-BEM to yield an estimate with equalization delay d ?s1(n?d) = Lesummationdisplay i=0 gH(n;i)y(n?i). Similar to Section 7.2, we also seek an LS solution g(n;l) to minimize a cost such as J = 1T T?1summationdisplay n=0 |s1(n?d)? ?s1(n?d)|2. We first state the underlying model assumptions. (H7.3.1) The information sequence {bk(n)} is zero-mean, i.i.d. (independent and identically distributed), with E{|bk(n)|2} = ?2bk. They are also independent across users (k = 1,2,...,K). (H7.3.2) The measurement noise {v(n)} is zero-mean (E{v(n)} = 0), white, independent of {bk(n)}, with E{[v(n+?)][v(n)]H} = ?2vIN?(?). (H7.3.3) The superimposed training sequence ck(n) = ck(n + P) for all n is a non-random periodic sequence with period P. Let ?2ck := (1/P)summationtextPn=1|ck(n)|2. The sequences are chosen as in (7.9), or in Section 3.5. (H7.3.4) Record length T and period P satisfy TP?1 > ?Q and TP?1 is an integer. Moreover, ?P >L+Le ?d where d (? 0) is the desired equalization delay. 200 It then follows that [58,77] 1 P P?1summationdisplay n=0 ck (n)c?m (n??) = ?k (?)? parenleftBig ? mod ?P parenrightBig ?(k?m) where ?k (?) = ? ?? ?? ?2ckej2pik?/D for (7.9) ?2ckej2pi(k?1)?/P for the sequences in Section 3.5. The periodic training sequence can be written as ck(n) = P?1summationdisplay m=0 ckmej?mn where ?m := 2pim/P. To design the time-varying linear equalizer to estimate a delayed version of the desired user?s training sequence c1(n?d) (0 ?d?Le), we have ?c1(n?d) = Lesummationdisplay i=0 gHd (n;i)y(n?i) where we assume that gd(n;i) = Qsummationdisplay q=1 gq(i)ej?qn. Choose gq(i)?s to minimize the time-averaged cost Jc := 1T T?1summationdisplay n=0 |c1(n?d)??c1(n?d)|2, 201 and by taking the derivative of Jc and setting it to be zero, we have 0 = ?Jc?g? q1(i1) = ?1T T?1summationdisplay n=0 e?j?q1ny(n?i1) ? ?c?(n?d)? Lesummationdisplay i=0 Qsummationdisplay q=1 ej?qnyH(n?i)gq(i) ? ? for i1 = 0,1,...,Le and q1 = 1,2,...,Q. This leads to Lesummationdisplay i=0 Qsummationdisplay q=1 bracketleftBigg 1 T T?1summationdisplay n=0 ej(?q??q1)ny(n?i1)yH(n?i) bracketrightBigg gq(i) = 1T T?1summationdisplay n=0 c?1(n?d)e?j?q1ny(n?i1) =: Rc(q1,i1). (7.10) To design the time-varying linear equalizer to estimate the desired user?s information sequence b1(n?d) (0 ?d?Le), we have ?b1(n?d) = Lesummationdisplay i=0 ?gHd (n;i)y(n?i) where we assume that ?gd(n;i) = Qsummationdisplay q=1 ?gq(i)ej?qn. Choose ?gq(s)?s to minimize Jb := 1T T?1summationdisplay n=0 vextendsinglevextendsingle vextendsingleb1(n?d)??b1(n?d) vextendsinglevextendsingle vextendsingle 2. 202 Mimicking the results for the superimposed training sequence-based equalization, Lesummationdisplay i=0 Qsummationdisplay q=1 bracketleftBigg 1 T T?1summationdisplay n=0 ej(?q??q1)ny(n?i1)yH(n?i) bracketrightBigg ?gq(i) = 1T T?1summationdisplay n=0 b?1(n?d)e?j?q1ny(n?i1) =: Rb(q1,i1). (7.11) Comparing (7.10) and (7.11), we see that (ignoring the equalizer coefficients) the left sides of the two are identical. We now seek to establish that for large T, Rc(q1,i1) = ?Rb(q1,i1) for all q1,i1, for some scalar ?, so that gq(i) = ??gq(i) for all i. Following similar derivations as in Section 7.2.2, by defining ??1 (i) := Lsummationdisplay l=0 ??1 (d?i?l)? parenleftBig (d?i?l)mod ?P parenrightBig we have lim T?? Rc(q1,i1) m.s.= ? ?? ?? ???1 (i1)e?j?q1i1hq1((d?i1)mod ?P) if 1 ?q1 ? ?Q 0 otherwise (7.12) for i1 = 0,1,...,Le and q1 = 1,2,...,Q. It is also shown that for i1 = 0,1,...,Le and q1 = 1,2,...,Q but 1 ?q1 ? ?Q lim T?? Rb(q1,i1) m.s.= h1q1(d?i1)e?j?q1i1?2b1. (7.13) If ?P > L + Le ?d, then (7.12) equals (7.13) (within a scale factor) and ??1 (i1) = ?2c1. Therefore, for ?large? T, Rc(q1,i1) = ?Rb(q1,i1) for all q1,i1 with ? = ?2c1/?2b1; hence gq(i) = ??gq(i) for all i. 203 For the desired equalizer design, we execute the following steps: 1. Pick Le and d (= Le2 in the following simulations). Pick Q? ?Q, P >L+Le ?d. 2. Solve (7.10), given data y(n), for gq(i) where 0 ?i?Le and 1 ?q ?Q. Then gd(n;i) = Qsummationdisplay q=1 gq(i)ej?qn. 3. The equalized output is then given by e1(n) = Lesummationdisplay i=0 gHd (n;i)y(n?i) ??1c1(n?d) +?2b1(n?d) + ?v(n) where ?v(n) is the equalized noise plus multiple-user interference. Estimate ?1 as ??1 = 1 T summationtextT?1 n=0 e1(n)c ?1(n?d) 1 T summationtextT?1 n=0 |c1(n?d)| 2 = 1 T summationtextT?1 n=0 e1(n)c ?1(n?d) ?2c1 . 4. Define e2(n) = e1(n)? ??1c1(n?d) ??2b1(n?d) + ?v(n). Then we hard-quantize e2(n) to estimate b1(n?d). 7.4 Simulation Examples 7.4.1 Direct FIR Equalization: Single User In this example, we consider the direct FIR linear equalization using superimposed training and CE-BEM. We generate a doubly-selective SIMO Rayleigh fading channel fol- lowing the Jakes? model withN = 1, 2, and 3, andL = 2 (3 taps) in (7.1). In the simulation, 204 0 5 10 15 20 25 3010 ?4 10?3 10?2 10?1 100 SNR (dB) Bit Error Rate Direct equalizer: fd=0Hz, Ts=25?s, L=2, Le=6, Q=5, P=15, T=405, 500 runs. TIR=0.3, N=1 TIR=0.3, N=2 TIR=0.3, N=3 TIR=1.0, N=1 TIR=1.0, N=2 TIR=1.0, N=3 TIR=2.0, N=1 TIR=2.0, N=2 TIR=2.0, N=3 Figure 7.1: Single-user direct FIR equalization: BER vs SNR under fd = 0Hz and length of equalizer Le = 6 with different TIR and number of receivers. we pick a data record length of T = 405 symbols (time duration of approximately 10ms). The communications system described in Section 2.6.3 with Ts = 25?s is employed. We consider the system operating under different Doppler spreads with the number of basis functions Q = 5. We choose TIR = 0.3, 1.0, and 2.0, so that the average transmitted power in {c(n)} can be less, equal, or larger than the power in {b(n)}. The information sequence {b(n)} is BPSK modulated. We take the superimposed training sequence of period P = 15 as c(n) = ?cejpin(n+?)/P where ? = 1 if P is odd and ? = 2 if P is even, as in [59]?this sequence is periodic white. We first consider a single-user scenario. We assume the additive noise {v(n)} is zero-mean, white complex-Gaussian, uncorrelated with {b(n)} with E{v(n+?)vH(n)} = ?2vIN?(?). The (receiver) SNR refers to the energy per bit over one-sided noise spectral 205 0 5 10 15 20 25 3010 ?3 10?2 10?1 100 SNR (dB) Bit Error Rate Direct equalizer: fd=50Hz, Ts=25?s, L=2, Le=6, Q=5, P=15, T=405, 500 runs. TIR=0.3, N=1 TIR=0.3, N=2 TIR=0.3, N=3 TIR=1.0, N=1 TIR=1.0, N=2 TIR=1.0, N=3 TIR=2.0, N=1 TIR=2.0, N=2 TIR=2.0, N=3 Figure 7.2: Single-user direct FIR equalization: BER vs SNR under fd = 50Hz and length of equalizer Le = 6 with different TIR and number of receivers. density with both information and superimposed training sequence counting toward the bit energy. All simulation results are based on 500 Monte Carlo runs. The BER results for and various SNR?s andfd = 0, 50, and 100Hz are shown in Figures 7.1?7.3 respectively. We can clearly see that more receive antennas (larger N) will surely improve the reception, since space diversity can be exploited. However, increasing TIR does not necessarily benefit the performance, since higher TIR leads to more accurate equalizer taps as well as lower effective SNR due to less power assigned to information. Therefore, a trade-off has to made in selecting TIR. Generally speaking, higher SNR or more receive antennas allows for more power allocated to superimposed training, due to less interference present. We also note that the error floors at the BER curves, which are partially attributed to the modeling error of the CE-BEM in approximating the inverse of the channel. 206 0 5 10 15 20 25 3010 ?3 10?2 10?1 100 SNR (dB) Bit Error Rate Direct equalizer: fd=100Hz, Ts=25?s, L=2, Le=6, Q=5, P=15, T=405, 500 runs. TIR=0.3, N=1 TIR=0.3, N=2 TIR=0.3, N=3 TIR=1.0, N=1 TIR=1.0, N=2 TIR=1.0, N=3 TIR=2.0, N=1 TIR=2.0, N=2 TIR=2.0, N=3 Figure 7.3: Single-user direct FIR equalization: BER vs SNR under fd = 100Hz and length of equalizer Le = 6 with different TIR and number of receivers. 7.4.2 Direct FIR Equalization: Multiple Users In this example, we consider a multiple-user scenario in a wireless ad hoc network. We set the number of users K = 3. For each user, the channel follows that described in Section 7.4.1, i.e., a doubly-selective SIMO Rayleigh fading channel following Jakes? model with L = 2 (3 taps). Now we take the numberof receive antennasN = 1, 2, 3, and 4. For different users, the channel hk (n;l) are mutually independent. We also employ the communications system described in Section 2.6.3 with Ts = 25?s. In the simulation, we pick a data record length of T = 832 symbols. We assume the additive noise {v(n)} is zero-mean, white complex-Gaussian, uncorrelated with {b(n)} with E{v(n + ?)vH(n)} = ?2vIN?(?). The (receiver) SNR refers to the energy per bit over one-sided noise spectral density with 207 0 5 10 15 20 25 3010 ?2 10?1 100 SNR (dB) Bit Error Rate Direct equalizer (ad hoc): K=3, fd=100Hz, Ts=25?s, L=2, Le=4, Q=7, P=52, TIR=1.0, T=832, 500 runs. N=1 N=2 N=3 N=4 Figure 7.4: Multiple-user direct FIR equalization (ad hoc): BER vs SNR underfd = 100Hz and length of equalizer Le = 4 with different number of receivers. both information and superimposed training sequence counting toward the bit energy. All simulation results are based on 500 Monte Carlo runs. Information sequences for each user are BPSK. We take the superimposed training sequences? period P = 52 with D = 4 and ?P = 13 in (7.9). The average transmitted power in {ck (n)} is equal to the power in {bk (n)}, leading to TIR of 1.0. In the simulation, we consider an asynchronous case where the observation window fully contains the desired user?s signal and the other two interfering signals (k = 2,3) occupy window [tk,tk +T ?1] wheretk is uniformly distributed in [?T +1,T ?1]; tk changes from run to run. Figure 7.4 shows the BER results versus SNR?s for a channel with fd = 100Hz. We take the equalizer length Le = 4, and the number of basis functions Q = 7. More receive antennas (larger N) enhance the reception significantly. However, due to the presence of 208 0 20 40 60 80 100 120 140 160 180 20010 ?3 10?2 10?1 100 fd (Hz) Bit Error Rate Direct equalizer (ad hoc): K=3, SNR=25dB, Ts=25?s, L=2, Le=4, Q=2?fdTTs?+1, P=52, TIR=1.0, T=832, 500 runs. N=1 N=2 N=3 N=4 Figure 7.5: Multiple-user direct FIR equalization (ad hoc): BER vs Doppler spread fd under SNR = 25dB and length of equalizer Le = 4 with different number of receivers. MUI and the modeling error of CE-BEM, noticeable error floors can be observed at each curves. Figure 7.5 exhibits the BER?s for various Doppler spreads. We also take Le = 4, but Q = ?Q = 2?fdTTs?+ 1, as a function of fd. We can see that gradually BER?s deteriorate with increasing Doppler spread fd. The curves for BER?s versus the equalizer length Le for fd = 50Hz are displayed in Figure 7.6. Longer Le may equalize the received signal better, but more taps add to estimation variance. We can see that Le = 4 is in the neighborhood of the optimal length. The BER?s in all these figures are rather high, due to MUI. This can be alleviated by error-correction coding. 209 2 3 4 5 6 7 8 9 1010 ?2 10?1 100 Le Bit Error Rate Direct equalizer (ad hoc): K=3, fd=50Hz, SNR=25dB, Ts=25?s, L=2, Q=4, P=52, TIR=1.0, T=832, 500 runs. N=1 N=2 N=3 N=4 Figure 7.6: Multiple-user direct FIR equalization (ad hoc): BER vs length of equalizer Le under fd = 50Hz and SNR = 25dB with different number of receivers. 7.5 Conclusions In this chapter, design of doubly-selective linear equalizers for single- and multiple-user frequency-selective time-varying channels was considered, using superimposed training and without first estimating the underlying channel response. Assuming that both the time- varying channel and the linear equalizers can be described by a CE-BEM representation, we showed that if periodic white superimposed training sequences are used, the optimal linear equalizer designed to extract the known training sequence was also a scaled version of the optimal equalizer for the information sequence. Based on this fact, a single-user direct equalizer was designed. By employing user-specific training sequences, this equalizer was extended to a multiple-user scenario, which can be used in a wireless ad hoc network. 210 Chapter 8 Concluding Remarks and Future Work This dissertation considered the issue of channel estimation and equalization and data detection, using superimposed training and BEM?s, for wireless systems in doubly-selective channels. With a detailed analysis of the interference from the information sequence in superimposed training-based methods, our most important contribution is the approaches to suppress the self-interference. Typical wireless channels are characterized by time- and frequency-selectivity: Mul- tipath propagation and limited bandwidth result in frequency selectivity leading to ISI; temporal variation of the channel is attributed to the relative motion between the trans- mitter and the receiver, as well as oscillator drifts and phase noises. In superimposed training-based estimation and equalization, at the transmitter, a pe- riodic (non-random) training sequence is superimposed (at low power) to the information sequence, before modulation and transmission. Compared with conventional TM training, there is no loss in data transmission rate for superimposed training, but some useful power has been allocated to training sequences. We described the doubly-selective channel, over a block of symbol intervals, by various BEM?s (whereas we employed Jakes? model that is independent to BEM?s to generate the ?true? channel in simulations), including CE-, OP-, and DPS-BEM?s, so that estimating the time-varying channel was reduced to estimating fewer time-invariant parameters. 211 8.1 Summary of Original Work In Chapter 3, beginning with the first-order statistics-based channel estimator proposed in [81] that exploits superimposed training and CE-BEM, we modified this estimator to DPS- and OP-BEM?s so that the spectral leakage induced by CE-BEM can be reduced. The OP-BEM-based estimator, moreover, can be applied to any BEM representation, and thus has a more general structure. By utilizing user-specific superimposed training sequences, we assigned distinct cycle frequencies of the periodic training sequences to distinct users so that channel estimation across different users is decoupled and the single-user estimator can be applied to this multiple-user scenario. Performance analysis of the first-order statistics-based estimator proposed in the pre- vious chapter was discussed in Chapter 4. Although the modeling error of BEM?s and noise contribute to the channel estimation mismatch, performance analysis clearly shows that the major interference using superimposed training comes from the unknown information sequences. Power allocation and bias-variance trade-off of the first-order statistics-based estimator are also considered in this chapter, based on the results of performance analysis. We cast these optimization issues as maximization of an SNR for equalizer design. How to reduce the information-induced self-interference, was the topic of the following two chapters. In Chapter 5, by exploiting the fact that training and information sequences pass through an identical channel, an iterative DML approach was proposed to jointly improve the channel and sequence estimation. Beginning with the first-order statistics- based channel estimator, we used the detected data symbols from the preceding iteration to reduce the self-interference at the current iteration. A local maximum of DML function is guaranteed by this method. To reduce the computational complexity of the ML detection, 212 symbol detection techniques such as Kalman filtering can also be adopted instead of the Viterbi algorithm used in the DML approach. In contrast to the receiver-end DML approach, a data-dependent superimposedtraining scheme that is a transmitter-end processing technique, was proposed in Chapter 6. Inspired by the work of [20], we observed that over a channel satisfying a band-limited BEM such as CE- or DPS-BEM, the periodic superimposed training components within the received sig- nal occur only at certain frequencies. By designing a data-dependent superimposed training sequence that suppresses the information sequence at those frequencies, the self-interference adversely affecting the channel estimation at the receiver is greatly reduced. However, the suppressed frequency components of the information sequence carry ?information? as well. Therefore, a PDD superimposed training method was proposed to strike a trade-off be- tween self-interference cancelation and information integrity. Performance analysis and related optimization of parameters were also discussed. In Chapter 7, we considered direct equalization, without first estimating the doubly- selective channel, using superimposed training and CE-BEM. By exploiting periodic white training sequences, we showed that the optimal linear equalizer designed to extract the known training sequence was also a scaled version of the optimal equalizer for the informa- tion sequence. A direct equalizer was designed based on this fact. By employing user-specific training sequences, this direct equalizer was extended to a multiple-user scenario, which can be used in a wireless ad hoc network. Computer simulation examples illustrated the performances of our approaches, and compared them with the conventional TM training schemes. The performance of the first- order statistics-based estimator is inferior to that of the TM training-based estimator, due to 213 the existence of information-induced self-interference. However, when the self-interference is sufficiently suppressed by the DML approach or the PDD training, superimposed training can offer competitive performance with TM training, without incurring any data rate loss, and thus provides us a promising training-based technique at a higher transmission rate. 8.2 Possible Future Directions So far we have discussed channel estimation and equalization using superimposed train- ing and various BEM representations. Future work may include the following areas. First, the capacity of the system employing superimposed training should be well inves- tigated. In other words, the fundamental question should be answered?how can superim- posed training help achieve more capacity than TM training? As we discussed in Chapter 1, several researchers have explored this area (e.g., [4,5,8,73], among others) and obtained important results; however, conclusive statements for more general situations (e.g., over time- or frequency-selective channel or doubly-selective channel) are still open. Another potential direction may lie in incorporating superimposed training into other widelyusedtechniques. For example, inorthogonal frequencydivision multiplexing (OFDM) systems, frequency-multiplexed training is used where equally spaced pilot (training) tones enable thereceiver to achieve MMSE estimate ofthe channel[54]. Thisfrequency-multiplexed training scheme, however, can be viewed as superimposed training in the time domain. Therefore, how to utilize our results of superimposed training in OFDM systems becomes an interesting topic. 214 We should also consider a more general training structure?affine precoding with train- ing [89] that treats the transmitted data block as s = Fb+c, where s = bracketleftbigg s(0) s(1) ??? s(T ?1) bracketrightbiggT is a transmitted block, and c is the training sequence of the same size; b = bracketleftbigg b(0) b(1) ??? b(Tb ?1) bracketrightbiggT is the information sequence of length Tb ? T, and the T ?Tb matrix F is the affine precoder. We can clearly see that TM and superimposed training can both be viewed as special cases of affine precoding. In our data-dependent training scheme, our work is equivalent to designing a precoder F that assign training and information sequences to different dimensions so as to eliminate self-interference. The information loss of this scheme is due to F not being full-rank. We hence design a full-rank F that corresponds to the PDD superimposed training. Therefore, affine precoding can offer us more freedom in designing the communications system, so that better performance than superimposed or TM training can be expected?it is also a hopeful area. 215 Bibliography [1] E. Alameda-Hern?andez, D. C. McLernon, A. G. Orozco-Lugo, M. Lara, andM. Ghogho, ?Synchronisation for superimposed training based channel estimation,? Electron. Lett., vol. 41, no. 9, pp. 565?566, Apr. 2005. [2] I. Barhumi, G. Leus, and M. Moonen, ?Time-varying FIR equalization for doubly selective channels,? IEEE Trans. Wireless Commun., vol. 4, no. 1, pp. 202?214, Jan. 2005. [3] E. Biglieri, J. Proakis, and S. Shamai, ?Fading channels: information-theoretic and communications aspects,? IEEE Trans. Wireless Commun., vol. 44, no. 6, pp. 2619? 2692, Oct. 1998. [4] P. Bohlin and M. Tapio, ?Optimized data aided training in MIMO systems,? in Proc. IEEE VTC?04-Spring, Milan, Italy, May 17?19, 2004, pp. 679?683. [5] ??, ?Performance evaluation of MIMO communication systems based on superim- posed pilots,? in Proc. IEEE ICASSP?04, vol. 4, Montreal, Canada, May 17?21, 2004, pp. 425?428. [6] D. K. Borah and B. D. Hart, ?Frequency-selective fading channel estimation with a polynomial time-varying channel model,? IEEE Trans. Commun., vol. 47, no. 6, pp. 862?873, June 1999. [7] R. Bosisio, M. Nicoli, and U. Spagnolini, ?Kalman filter of channel modes in time- varying wireless systems,? in Proc. IEEE ICASSP?05, vol. 3, Philadelphia, PA, Mar. 18?23 2005, pp. 785?788. [8] C. Budianu and L. Tong, ?Channel estimation for space-time orthogonal block codes,? IEEE Trans. Signal Process., vol. 50, no. 10, pp. 2515?2528, Oct. 2002. [9] N. Chen and G. T. Zhou, ?What is the price paid for superimposedtraining in OFDM,? in Proc. IEEE ICASSP?04, vol. 4, Montreal, Canada, May 17?21, 2004, pp. 421?424. [10] ??, ?Superimposed training for OFDM: a peak-to-average power ratio analysis,? IEEE Trans. Signal Process., vol. 54, no. 6, pp. 2277?2287, June 2006. [11] W. Chen and R. Zhang, ?Estimation of time and frequency selective channels in OFDM systems: a Kalman filter structure,? in Proc. IEEE GLOBECOM ?04, vol. 2, Dallas, TX, Nov. 29?Dec. 3, 2004, pp. 800?803. 216 [12] R. H. Clarke, ?A statistical theory of mobile-radio reception,? Bell Syst. Tech. J., vol. 47, no. 6, pp. 957?1000, July-Aug. 1968. [13] T. Cui and C. Tellambura, ?Superimposed pilot symbols for channel estimation in OFDM systems,? in Proc. IEEE GLOBECOM?05, San Francisco, CA, Nov. 28?Dec. 2, 2005, pp. 2229?2233. [14] A. V. Dandawat?e and G. B. Giannakis, ?Asymptotic theory of mixed time averages and kth-order cyclic-moment and cumulant statistics,? IEEE Trans. Inf. Theory, vol. 41, no. 1, pp. 216?232, Jan. 1995. [15] R. Djapic, A.-J. van der Veen, and L. Tong, ?Synchronization and packet separation in wireless ad hoc networks by known modulus algorithm,? IEEE J. Sel. Areas Commun., vol. 23, no. 1, pp. 51?64, Jan. 2005. [16] M. Dong, L. Tong, and B. M. Sadler, ?Optimal insertion of pilot symbols for trans- missions over time-varying flat fading channels,? IEEE Trans. Signal Process., vol. 52, no. 5, pp. 1403?1418, May 2004. [17] B. Farhang-Boroujeny, ?Pilot-based channel identification: proposal for semi-blind identification of communication channels,? Electron. Lett., vol. 31, no. 13, pp. 1044? 1046, June 1995. [18] G. J. Foschini and M. J. Gans, ?On limits of wireless communications in a fading environment when using multiple antennas,? Wireless Personal Commun., vol. 6, no. 3, pp. 311?335, Mar. 1998. [19] M. Ghogho, ?Channel and DC-offset estimation using data-dependent superimposed training,? Electron. Lett., vol. 41, no. 22, pp. 1250?1251, Oct. 2005. [20] M. Ghogho, D. McLernon, E. Alameda-Hernandez, and A. Swami, ?Channel estima- tion and symbol detection for block transmission using data-dependent superimposed training,? IEEE Signal Process. Lett., vol. 12, no. 3, pp. 226?229, Mar. 2005. [21] ??, ?SISOand MIMO channel estimation and symboldetection using data-dependent superimposed training,? in Proc. IEEE ICASSP?05, vol. 3, Philadelphia, PA, Mar. 18? 23 2005, pp. 461?464. [22] M. Ghogho and A. Swami, ?Optimal training for affine-precoded and cyclic-prefized block transmissions,? in Proc. IEEE Workshop Statistical Signal Processing, Bordeaux, France, July 17?20, 2005, pp. 1358?1363. [23] G. B. Giannakis, Y. Hua, P. Stoica, and L. Tong, Eds., Signal Processing Advances in Wireless and Mobile Communications Volume 1: Trends in Channel Estimation and Equalization. Englewood Cliffs, NJ: Prentice Hall, 2001. 217 [24] G. B. Giannakis and C. Tepedelenlio?glu, ?Basis expansion models and diversity tech- niques for blind identification and equalization of time-varying channels,? Proc. IEEE, vol. 86, no. 10, pp. 1969?1986, Oct. 1998. [25] A. Gorokhov and P. Loubaton, ?Semi-blind second order identification of convolutive channels,? in Proc. IEEE ICASSP?97, vol. 5, Munich, Germany, Apr. 21?24 1997, pp. 3905?3908. [26] F. J. Harris, ?On the use of windows for harmonic analysis with the discrete Fourier transform,? Proc. IEEE, vol. 66, no. 1, pp. 51?83, Jan. 1978. [27] S. He and J. K. Tugnait, ?Direct equalization of multiuser doubly selective channels based on superimposed training,? in Proc. EUSIPCO?06, Florence, Italy, Sept. 4?8, 2006. [28] ??, ?Doubly-selective channel estimation using superimposed training and discrete prolate spheroidal basis models,? in Proc. IEEE GLOBECOM?06, San Francisco, CA, Nov. 27?Dec. 1, 2006. [29] ??, ?On bias-variance trade-off in superimposedtraining-based doublyselective chan- nel estimation,? in Proc. 2006 Conf. Inform. Sciences & Syst., Princeton University, NJ, Mar. 22?24, 2006, pp. 1308?1313. [30] ??, ?Doubly-selective multiuser channel estimation using superimposed training and discrete prolate spheroidal basis expansion models,? in Proc. IEEE ICASSP?07, vol. 2, Honolulu, HI, Apr. 15?20, 2007, pp. 861?864. [31] ??, ?Self-interference suppression in doulby-selective channel estimation using super- imposed training,? in Proc. IEEE ICC?07, Glasgow, UK, June 24?28, 2007. [32] S. He, J. K. Tugnait, and X. Meng, ?On superimposed training for MIMO channel estimation and symbol detection,? IEEE Trans. Signal Process., vol. 55, no. 6, June 2007. [33] P. Hoeher and F. Tufvesson, ?Channel estimation with superimposed pilot sequence,? in Proc. IEEE GLOBECOM?99, Rio de Janeiro, Dec. 5?9, 2006, pp. 2162?2166. [34] IEEE J. Sel. Areas Commun., Special Issue on Wireless Ad Hoc Networks, vol. 23, Jan. 2005. [35] W. C. Jakes, Microwave Mobile Communications. New York, NY: Wiley, 1974. [36] C. E. Kastenholz and W. P. Birkemeier, ?A simultaneous information transfer and channel sounding modulation technique for wide-band channels,? IEEE Trans. Com- mun., vol. 13, no. 2, pp. 162?165, June 1965. 218 [37] X. Li and T. F. Wong, ?Turbo equalization with nonlinear Kalman filtering for time- varying frequency-selective fading channels,? IEEE Trans. Wireless Commun., vol. 6, no. 2, pp. 691?700, Feb. 2007. [38] Z. Liu, X. Ma, and G. B. Giannakis, ?Space-time coding and Kalman filtering for time-selective fading channels,? IEEE Trans. Commun., vol. 50, no. 2, pp. 183?186, Feb. 2002. [39] X. Ma and G. B. Giannakis, ?Maximum-diversity transmissions over doubly selective wireless channels,? IEEE Trans. Inf. Theory, vol. 49, no. 7, pp. 1832?1840, July 2003. [40] X. Ma, G. B. Giannakis, and S. Ohno, ?Optimal training for block transmissions over doubly selective channels,? IEEE Trans. Signal Process., vol. 51, no. 5, pp. 1351?1366, May 2003. [41] X. Ma, L. Yang, and G. B. Giannakis, ?Optimal training for MIMO frequency-selective fading channels,? IEEE Trans. Wireless Commun., vol. 4, no. 2, pp. 453?466, Mar. 2005. [42] J. H. Manton, I. Y. Mareels, and Y. Hua, ?Affine precoders for reliable communica- tions,? in Proc. IEEE ICASSP?00, Istanbul, Turkey, June 5?9, 2000, pp. 2749?2752. [43] M. Martone, ?Wavelet-based separating kernels for array processing of cellular DS/CDMA signals in fast fading,? IEEE Trans. Commun., vol. 48, no. 6, pp. 979? 995, June 2000. [44] F. Mazzenga, ?Channel estimation and equalization for m-QAM transmission with a hidden pilot sequence,? IEEE Trans. Broadcast., vol. 46, no. 2, pp. 170?176, June 2000. [45] D. C. McLernon, A. G. Orozco-Lugo, and M. M. Lara, ?On the structural equivalence of two recent algorithms for implicitly trained channel estimation,? in Proc. IEEE Int. Symposium Signal Processing and Inform. Technol., Rome, Italy, Dec. 18?21 2004, pp. 132?135. [46] X. Meng, ?Estimation of wireless communications channels using superimposed train- ing: Approaches, analysis and applications,? Ph.D. dissertation, Auburn University, Auburn, AL, May 2005. [47] X. Meng and J. K. Tugnait, ?Doubly-selective MIMO channel estimation using super- imposed training,? in Proc. IEEE Sensor Array and Multichannel Signal Processing Workshop, Barcelona, Spain, July 18?21, 2004, pp. 407?411. [48] ??, ?MIMO channel estimation using superimposedtraining,? in Proc. IEEE ICC?04, Paris, France, June 20?24, 2004, pp. 2663?2667. [49] ??, ?Semi-blind channel estimation and detection using superimposed training,? in Proc. IEEE ICASSP?04, vol. 4, Montreal, Canada, May 17?21, 2004, pp. 417?420. 219 [50] ??, ?Semi-blind time-varying channel estimation using superimposed training,? in Proc. IEEE ICASSP?04, vol. 3, Montreal, Canada, May 17?21, 2004, pp. 797?800. [51] ??, ?Superimposed training-based doubly-selective channel estimation using expo- nential and polynomial bases models,? in Proc. 2004 Conf. Inform. Sciences & Syst., Princeton University, NJ, Mar. 17?19, 2004. [52] X. Meng, J. K. Tugnait, and S. He, ?Iterative joint channel estimation and data detec- tion using superimposed training: algorithms and performance analysis,? IEEE Trans. Veh. Technol., to be published. [53] T. K. Moon and W. C. Stirling, Mathematical Methods and Algorithms for Signal Processing. Upper Saddle River: NJ: Prentice Hall, 2000. [54] R. Negi and J. Cioffi, ?Pilot tone selection fro channel estimation in a mobile OFDM system,? IEEE Trans. Consum. Electron., vol. 44, no. 3, pp. 1122?1128, Aug. 1998. [55] M. Nied?zwiecki, Identification of Time-Varying Processes. New York, NY: Wiley, 2000. [56] S. Ohno and G. B. Giannakis, ?Optimal training and redundant precoding for block transmissions with application to wireless OFDM,? IEEE Trans. Commun., vol. 50, no. 12, pp. 2113?2123, Dec. 2002. [57] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal Processing, 2nd ed. Englewood Cliffs: NJ: Prentice Hall, 1999. [58] A. G. Orozco-Lugo, G. M. Galvan-Tejada, M. M. Lara, and D. C. McLernon, ?A new approach to achieve multiple packet reception for ad hoc networks,? in Proc. IEEE ICASSP?04, vol. 4, Montreal, Canada, May 17?21, 2004, pp. 429?432. [59] A. G. Orozco-Lugo, M. M. Lara, and D. C. McLernon, ?Channel estimation using implicit training,? IEEE Trans. Signal Process., vol. 52, no. 1, pp. 240?254, Jan. 2004. [60] A. G. Orozco-Lugo, M. M. Lara, D. C. McLernon, and H. J. Muro-Lemus, ?Multi- ple packet reception in wireless ad hoc networks using polynomial phase-modulating sequences,? IEEE Trans. Signal Process., vol. 51, no. 8, pp. 2093?2110, Aug. 2003. [61] A. Paulraj, R. Nabar, and D. Gore, Introduction to Space-Time Wireless Communica- tions. Cambridge, UK: Cambridge University Press, 2003. [62] H. V. Poor and S. Verd?u, ?Probability of error in MMSE multiuser detection,? IEEE Trans. Inf. Theory, vol. 43, no. 3, pp. 858?871, May 1997. [63] M. F. Pop and N. C. Beaulieu, ?Limitations of sum-of-sinusoids fading channel simu- lators,? IEEE Trans. Commun., vol. 49, no. 4, pp. 699?708, Apr. 2001. 220 [64] J. G. Proakis, Digital Communications, 4th ed. New York, NY: McGraw-Hill, 2001. [65] J. G. Proakis and D. G. Manolakis, Digital Signal Processing, 3rd ed. Upper Saddle River, NJ: Prentice Hall, 1996. [66] T. S. Rappaport, Wireless Communications: Principles and Practice, 2nd ed. Upper Saddle River, NJ: Prentice Hall, 2002. [67] A. M. Sayeed and B. Aazhang, ?Joint multipath-Doppler diversity in mobile wireless communications,? IEEE Trans. Commun., vol. 47, no. 1, pp. 123?132, Jan. 1999. [68] N. Seshadri, ?Joint data and channel estimation using blind trellis search techniques,? IEEE Trans. Commun., vol. 42, no. 2-4, pp. 1000?1011, Feb.?Apr. 1994. [69] D. Slepian, ?Prolate spheroidal wave functions, Fourier analysis, and uncertainty?V: the discrete case,? Bell Syst. Tech. J., vol. 57, no. 5, pp. 1371?1430, May-June 1978. [70] M. D. Srinath, P. K. Rajasekaran, and R. Viswanathan, Introduction to Statistical Signal Processing with Applications. Upper Saddle River, NJ: Prentice Hall, 1996. [71] G. L. St?uber, Principles of Mobile Communication, 2nd ed. Boston, MA: Kluwer, 2002. [72] Z. Tang, R. C. Cannizzaro, G. Leus, and P. Banelli, ?Pilot-assisted time-varying chan- nel estimation for OFDM systems,? IEEE Trans. Signal Process., vol. 55, no. 5, pp. 2226?2238, May 2007. [73] M. Tapio and P. Bohlin, ?A capacity comparison between time-multiplexed and super- imposed pilots,? in Proc. 38th Asilomar Conf. Signals, Syst., Comput., Pacific Grove, CA, Nov. 7?10, 2004, pp. 1049?1053. [74] L. Tong, B. M. Sadler, and M. Dong, ?Pilot-assisted wireless transmissions: genearal model, design criteria, and signal processing,? IEEE Signal Process. Mag., vol. 21, no. 6, pp. 12?25, Nov. 2004. [75] L. Tong, Q. Zhao, and G. Mergen, ?Multipacket reception in random access wireless networks: from signal processing to optimal medium access control,? IEEE Commun. Mag., vol. 39, no. 11, pp. 108?112, Nov. 2001. [76] F. Tsuzuki and T. Ohtsuki, ?Channel estimation with selective superimposed pilot sequences under fast fading environments,? in Proc. IEEE VTC?04-Fall, Los Angeles, CA, Sept. 26?29, 2004, pp. 62?66. [77] J. K.Tugnait and S.He, ?Performance analysis of anMIMO channel estimator basedon superimposed training and first-order statistics,? in Proc. IEEE Workshop Statistical Signal Processing, Bordeaux, France, July 17?20, 2005, pp. 1336?1341. 221 [78] ??, ?Direct FIR linear equalization of doubly selective channels based on superim- posed training,? in Proc. IEEE ICASSP?06, vol. 4, Toulouse, France, May 14?19 2006, pp. 589?592. [79] ??, ?Doubly-selective channel estimation using data-dependent superimposed train- ing and exponential bases models,? in Proc. 2006 Conf. Inform. Sciences & Syst., Princeton University, NJ, Mar. 22?24, 2006, pp. 375?380. [80] J. K. Tugnait, S. He, and X. Meng, ?On superimposed-training power allocation for time-varying channel estimation,? in Proc. IEEE Workshop Statistical Signal Process- ing, Bordeaux, France, July 17?20, 2005, pp. 1330?1335. [81] J. K. Tugnait and W. Luo, ?On channel estimation using superimposed training and first-order statistics,? in Proc. IEEE ICASSP?03, vol. 4, Hong Kong, Apr. 6?10, 2003, pp. 624?627. [82] ??, ?On channel estimation using superimposed training and first-order statistics,? IEEE Commun. Lett., vol. 8, no. 9, pp. 413?415, Sept. 2003. [83] J. K. Tugnait and X. Meng, ?Synchronization of superimposed training for channel estimation,? in Proc. IEEE ICASSP?04, vol. 4, Montreal, Canada, May 17?21, 2004, pp. 853?856. [84] ??, ?Performance analysis and training power allocation for channel estimation using superimposed training,? in Proc. IEEE ICASSP?05, vol. 3, Philadelphia, PA, Mar. 18? 23 2005, pp. 457?460. [85] ??, ?On superimposed training for channel estimation: performance analysis, train- ing power allocation, and frame synchronization,? IEEE Trans. Signal Process., vol. 54, no. 2, pp. 752?765, Feb. 2006. [86] J.K. Tugnait, X. Meng, andS. He, ?Doubly-selective channel estimation usingsuperim- posed training and exponential bases models,? EURASIP J. Applied Signal Processing (Special Issue on Reliable Communications over Rapidly Time-Varying Channels), vol. 2006, Article ID 85303, 11 pages, 2006. [87] J. K. Tugnait, L. Tong, and Z. Ding, ?Single-user channel estimation and equalization,? IEEE Signal Process. Mag., vol. 17, no. 3, pp. 17?28, May 2000. [88] S. Verd?u, ?Minimum probability of error for asynchronous Gaussian multiple-access channels,? IEEE Trans. Inf. Theory, vol. 32, no. 1, pp. 85?96, Jan. 1986. [89] A. Vosoughi and A. Scaglione, ?Everything you always wanted to know about training: guidelines derived using the affine precoding framework and the CRB,? IEEE Trans. Signal Process., vol. 54, no. 3, pp. 940?954, Mar. 2006. 222 [90] H. S. Wang and P.-C. Chang, ?On verifying the first-order Markovian assumption for a Rayleigh fading channel model,? IEEE Trans. Veh. Technol., vol. 45, no. 2, pp. 353?357, May 1996. [91] X. Wang and H. V. Poor, Wireless Communication Systems: Advanced Techiques for Signal Reception. Upper Saddle River, NJ: Prentice Hall, 2004. [92] J. J. Werner, J. Yang, D. D. Harman, and G. A. Dumont, ?Blind equalization for broadband access,? IEEE Commun. Mag., vol. 37, no. 4, pp. 87?93, Apr. 1999. [93] Q. Yang and K. S. Kwak, ?Superimposed-pilot-aided channel estimation for mobile OFDM,? Electron. Lett., vol. 42, no. 12, pp. 73?74, June 2006. [94] T. Zemen and C. F. Mecklenbr?auker, ?Time-variant channel estimation using discrete prolate spheroidal sequences,? IEEE Trans. Signal Process., vol. 53, no. 9, pp. 3597? 3607, Sept. 2005. [95] Q. Zhao and L. Tong, ?Semi-blind equalization by least squares smoothing,? in Proc. 32nd Asilomar Conf. Signals, Syst., Comput., Pacific Grove, CA, Nov. 1?4, 1998, pp. 645?649. [96] Y. R. Zheng and C. Xiao, ?Simulation models with correct statistical properties for Rayleigh fading channels,? IEEE Trans. Commun., vol. 51, no. 6, pp. 920?928, June 2003. [97] G. T. Zhou and N. Chen, ?Superimposed training for doubly selective channels,? in Proc. IEEE Workshop Statistical Signal Processing, St. Louis, MO, Sept. 28?Oct. 1, 2003, pp. 82?85. [98] G. T. Zhou, M. Viberg, and T. McKelvey, ?A first-order statistical method for channel estimation,? IEEE Signal Process. Lett., vol. 10, no. 3, pp. 57?60, Mar. 2003. 223 Appendices 224 Appendix A Optimal Time-Multiplexed Training for Block Transmissions over Doubly-Selective Channels [40] Here we summarize the optimal TM training proposed in [40], based on CE-BEM representations, which acts as a ?reference? training scheme in evaluating our superimposed training-based approaches. In [40], the authors made the following model assumptions: (HA.1) The channel satisfies CE-BEM, i.e., (2.9). (HA.2) The delay spread ?d and the Doppler spread fd are bounded, known (or at least their bounds are known), and satisfy 2fd?d < 1. (HA.3) The coefficients {hq(l)} are zero-mean complex Gaussian random variables, inde- pendent one another, and remain invariant per block but are allowed to change at the next block. Under the above assumptions, the authors sought to design a TM training scheme that optimizes channel MSEandergodic (average) capacity boundsto jointly account notonly for channel estimation performance but also for transmission rate. The optimal block structure s consists of sub-blocks of training and sub-blocks of information, which are transmitted alternately: s = bracketleftbigg bT0 cT0 ??? bTP?1 cTP?1 bracketrightbiggT where bp and cp denote the p-th information and training sub-blocks respectively (p = 0,1,...,P?1). The optimal training sequence contains an impulse guarded by zeros (silent 225 Parameters Optimal Training Placement of information symbols Equally long information sub-blocks (length ?Tb) Placement of training symbols Equally long training sub-blocks Structure of training sub-blocks cp =bracketleftbig0TL c 0TLbracketrightbigT , ?p Number of training symbols 2L+1 per sub-block Number of sub-blocks Q training and Q information sub-blocks Power Allocation ? = 1/(1 +radicalbig(L+1)/?Tb) Table A.1: Optimal TM training. period). The details are shown in Table A.1, where ?Tb denotes the length of the information sub-block, c is a scalar number denoting the training impulse, and ? denotes the training- to-information power ratio (TIR). We also apply this TM training structure for OP- and DPS-BEM?s. It is not known whether it is optimal for these BEM?s, but it can still act as a good reference when studying our superimposed training schemes. For a channel satisfying a BEM (2.20), the noisy channel output is given by y(n) = Lsummationdisplay l=0 Qsummationdisplay q=1 hq (l)?q (n)s(n?l) +v(n), where v(n) is additive white Gaussian noise (AWGN) with zero-mean and variance ?2v. At the time slots np,l := ?Tb+L+pparenleftbig?Tb +2L+1parenrightbig+l (l = 0,1,...L), the received signal depends only on the channel and the training impulse: y(np,l) = c Qsummationdisplay q=1 hq (l)?q (np,l)+v(np,l). (A.1) 226 We define yc (l) = bracketleftbigg y(n0,l) y(n1,l) ??? y(nP?1,l) bracketrightbiggT , (A.2a) vc (l) = bracketleftbigg v(n0,l) v(n1,l) ??? v(nP?1,l) bracketrightbiggT , (A.2b) ?c (l) = ? ?? ?? ?? ?? ?? ?1 (n0,l) ??? ?Q (n0,l) ?1 (n1,l) ??? ?Q (n1,l) ... ... ... ?1 (nP?1,l) ??? ?Q (nP?1,l) ? ?? ?? ?? ?? ?? , (A.2c) h(l) = bracketleftbigg h1 (l) h2 (l) ... hQ (l) bracketrightbiggT , then by (A.1) yc (l) = c?c (l)h(l)+vc (l). The LS estimator of h(l) is given by ?hLS (l) = 1 c? ? c (l)yc (l), (A.3) and the linear MMSE estimator is given by ?hMMSE (l) = c ?2v bracketleftbigg R?1h (l)+ c 2 ?2v? H c (l)?c (l) bracketrightbigg?1 ?Hc (l)yc (l) (A.4) that requires that Rh (l) := Ebraceleftbigh(l)hH (l)bracerightbig is known at the receiver. 227 This optimal TM training can be easily extended to an SIMO channel, since the above procedures can be carried out for each independent output. For a multiple-user time- invariant channel, an extension of this scheme was suggested in [41] for MIMO frequency- selective fading channels. We extend this scheme to a doubly-selective MIMO channel in a way similar to [40], even though we have no clues about its optimality. For a multiple-user channel with K users, y(n) = Ksummationdisplay k=1 Lsummationdisplay l=0 hk(n;l)sk(n?l)+v(n), where {sk(n)} denotes the k-th user?s information sequence and the corresponding channel is denoted by {hk(n;l)}. We assume that the channels satisfy BEM representation, i.e., hk (n;l) = Qsummationdisplay q=1 hqk (l)?q (n). The channel output can be expressed as y(n) = Ksummationdisplay k=1 Lsummationdisplay l=0 Qsummationdisplay q=1 hqk (l)?q (n)sk(n?l) +v(n). We design the training sub-block of the k-th user (k = 1,2,...,K) as ck,p = bracketleftbigg 0T(k?1)(L+1)+L c 0T(K?k)(L+1)+L bracketrightbiggT with length of K(L+1) +L symbols. At the time slots nk,p,l := ?Tb +(k?1)(L+1) +L+pbracketleftbig?Tb +K(L+1) +Lbracketrightbig+l, 228 the received signal depends only on the training impulse and the k-th user?s channel. The channels for different users can thus be decoupled. We have yc (k,l) = c?c (k,l)hk (l) +vc (k,l), where yc (k,l), ?c (k,l), and vc (k,l) are defined as in (A.2), only with np,l replaced with nk,p,l, and hk (l) = bracketleftbigg h1k (l) h2k (l) ... hQk (l) bracketrightbiggT . Then similar to (A.3) and (A.4), we have the LS estimator ?hk,LS (l) = 1 c? ? c (k,l)yc (k,l), (A.5) and the linear MMSE estimator ?hk,MMSE(l) = c ?2v bracketleftbigg R?1h (k,l) + c 2 ?2v? H c (k,l)?c (k,l) bracketrightbigg?1 ?Hc (k,l)yc (k,l) (A.6) if Rh (k,l) := Ebraceleftbighk (l)hHk (l)bracerightbig is known at the receiver. 229 Appendix B Symbol Detection The role of channel estimation is to aid in extracting the desired information data from the distorted received symbols. Two symbol detection techniques are reviewed: Viterbi detector and Kalman filter. B.1 Maximum Likelihood Sequence Detector (Viterbi Detector) [64] Consider an SIMO FIR linear channel with N outputs and discrete-time impulse re- sponse {h(n;l)}. Let {s(n)} denote the input sequence to the SIMO channel. The channel output is given by x(n) = Lsummationdisplay l=0 h(n;l)s(n?l), (B.1) and the noisy measurement is given by y(n) = x(n)+v(n) (B.2) where v(n) is the AWGN. We assume: (HB.1) The {v(n)} is uncorrelated with {s(n)}, with possible unknown mean E{v(n)} = m and E{[v(n+?)?m][v(n)?m]H} = ?2vIN?(?). Given {s(n)}, {y(n)} is a sequence of N-dimensional Gaussian random vectors with mean summationtextLl=0h(n;l)s(n?l)+m and variance ?2vIN. The joint pdf of y(n) given {s(n),s(n? 230 1),...,s(n?L)} is p(y(n)|s(n),...,s(n?L)) = 1(pi? v)N exp ? ? ?? 1 ?2v vextenddoublevextenddouble vextenddoublevextenddouble vextenddoubley(n)? Lsummationdisplay l=0 h(n;l)s(n?l)?m vextenddoublevextenddouble vextenddoublevextenddouble vextenddouble 2?? ? where s(n) = 0 for n< 0. The joint pdf of the random vectors y(0), y(1), ..., y(T ?1) given the transmitted sequence s(0), s(1), ..., s(T ?1) is p(y(0),...,y(T ?1)|s(0),...,s(T ?1)) = 1(pi? v)NT exp ? ? ?? 1 ?2v T?1summationdisplay n=0 vextenddoublevextenddouble vextenddoublevextenddouble vextenddoubley(n)? Lsummationdisplay l=0 h(n;l)s(n?l)?m vextenddoublevextenddouble vextenddoublevextenddouble vextenddouble 2?? ?. Taking the logarithm on both sides of the equation above, we have log p(y(0),...,y(T ?1)|s(0),...,s(T ?1)) = ?NT log(pi?v)? 1?2 v T?1summationdisplay n=0 vextenddoublevextenddouble vextenddoublevextenddouble vextenddoubley(n)? Lsummationdisplay l=0 h(n;l)s(n?l)?m vextenddoublevextenddouble vextenddoublevextenddouble vextenddouble 2 . The maximum likelihood (ML) estimate of the input sequence {s(0), ..., s(T ?1)} is the one that maximizes p(y(0),...,y(T ?1) |s(0),...,s(T ?1)), or equivalently maximizes log p(y(0),...,y(T ?1) |s(0),...,s(T ?1)), 231 or minimizes the Euclidean distance T?1summationdisplay n=0 vextenddoublevextenddouble vextenddoublevextenddouble vextenddoubley(n)? Lsummationdisplay l=0 h(n;l)s(n?l)?m vextenddoublevextenddouble vextenddoublevextenddouble vextenddouble 2 . This ML sequence estimation (MLSE) criterion is equivalent to the problem of estimating the state of a discrete-time ?finite-state machine?. In this case, the finite-state machine is the discrete-time channel with coefficients {h(n;l)} and its state at any time instance n is represented by the L most recent input symbols state(n) = (s(n),s(n?1),...,s(n?L+1)) (B.3) where s(n) = 0 for n < 0. If the input symbols are M-ary, the finite-state machine has ML states. Consequently, the channel is described by an ML-state trellis and the Viterbi algorithm may be used to determine the most probable path through the trellis. In brief, we describe the Viterbi algorithm as the following three steps: Step 1. We begin with y(L), from which we compute the ML+1 metrics Lsummationdisplay n=0 vextenddoublevextenddouble vextenddoublevextenddouble vextenddoubley(n)? Lsummationdisplay l=0 h(n;l)s(n?l)?m vextenddoublevextenddouble vextenddoublevextenddouble vextenddouble 2 . (B.4) TheML+1 possible sequences are divided intoML groups according to theML states. From each group, we pick the one with the minimum metric, i.e., the most probable sequence, and assign to the surviving sequence the metric PM0(s(L),...,s(1)) = min s(0) ? ? ? Lsummationdisplay n=0 vextenddoublevextenddouble vextenddoublevextenddouble vextenddoubley(n)? Lsummationdisplay l=0 h(n;l)s(n?l)?m vextenddoublevextenddouble vextenddoublevextenddouble vextenddouble 2 ?? ?. (B.5) 232 The M ?1 remaining sequences from each of the ML groups are discarded. Step 2. Upon reception of y(L+n), n? 1, compute the ML+1 metrics vextenddoublevextenddouble vextenddoublevextenddouble vextenddoubley(L+n)? Lsummationdisplay l=0 h(n;l)s(L+n?l)?m vextenddoublevextenddouble vextenddoublevextenddouble vextenddouble 2 +PMn?1 (s(L+n?1),...,s(n)). (B.6) Again, the ML+1 sequences are divided into ML groups corresponding to the ML possible state (s(L+n?1),s(L+n?2),...,s(n)) and the most probable sequence from each group is selected while the other M ? 1 sequences are discarded. The surviving metrics are PMn (s(L+n),...,s(n+1)) = min s(n) ? ? ? vextenddoublevextenddouble vextenddoublevextenddouble vextenddoubley(L+n)? Lsummationdisplay l=0 h(n;l)s(L+n?l)?m vextenddoublevextenddouble vextenddoublevextenddouble vextenddouble 2 +PMn?1 (s(L+n?1),...,s(n))}. (B.7) Step 3. If y(L + n) is the last received sample, from the ML survivor sequences, pick the one as the ML sequence estimator which has the minimum metric; otherwise, set n = n+1 and then go to step 2. In a multiple-user (of K users) context, the noisy channel input-output is given by y(n) = Ksummationdisplay k=1 Lsummationdisplay l=0 hk(n;l)sk(n?l)+v(n), (B.8) where {sk(n)} denotes the k-th user?s information sequence and {hk(n;l)} denotes the k-th user?s channel impulse response. The state at time n is now represented by the L most 233 recent input symbols of all the K users, i.e., state(n) = (s1(n),...,s1(n?L+1),...,s2(n?L+1),...,sK(n?L+1)) (B.9) where sk(n) = 0 for n < 0. The channel is now described by an MKL-state trellis. To adapt the Viterbi algorithm in for multiple users, we simply replace the state (B.3) with (B.9), use M? = MK instead of M in all the three steps, and substitute the sum summationtextK k=1 summationtextL l=0hk(n;l)sk(n?l) for summationtextL l=0h(n;l)s(n?l) in (B.4)?(B.7). B.2 Kalman Filtering Viterbi detector (or MLSD) is the optimal receiver that provides the minimum BER. Its computational complexity, however, depends on the number of states. By (B.9), the Viterbi detector has MKL states, given the M-ary input symbols, K users, and an (L+1)- tap MIMO channel. The computational complexity grows exponentially with the length of channel, the number of users, and the constellation of transmitted signal. Viterbi detector may be extremely expensive to implement [64]. Kalman filter, based on MMSE criterion, offers us an alternative symbol detection technique, with much lower computational complexity than that of the optimal detector? even though at the expense of a slight sacrifice of error performance. For the SIMO FIR linear channel with N receivers described by (B.1) and (B.2), we define a (d+1)-column vector (the delay d?L) as the state vector of the Kalman filter: S(n) := bracketleftbigg s(n) s(n?1) ??? s(n?d) bracketrightbiggT . 234 We also define the state transition matrix ? : = ? ??01?d 0 Id 0d?1 ? ??, the control-input matrix ? : = bracketleftbigg 1 01?d bracketrightbiggT , the observation matrix H(n): = bracketleftbigg h(n;0) h(n;1) ??? h(n;L) 0N?(d?L) bracketrightbigg , and the input w(n) := s(n+1). Then we have the time-invariant state equation: S(n+1) = ?S(n)+?w(n). (B.10) By (B.1) and (B.2), the time-varying observation equation is given by y(n) = H(n)S(n)+v(n). (B.11) We assume the AWGN v(n) is zero-mean. The prior statistics of the above parameters are given by E{w(n)} = 0, E{v(n)} = 0, E{S(0)} = ?s (0), 235 E{w(m)w?(n)} = Vw (n)?(m?n), Ebraceleftbigv(m)vH (n)bracerightbig= Vv (n)?(m?n), E braceleftBig w(m)v(n)H bracerightBig = 0, Ebraceleftbigw(m)SH (n)bracerightbig= 0, Ebraceleftbigv(m)SH (n)bracerightbig= 0, E braceleftBig [S(0)?E{S(0)}][S(0)?E{S(0)}]H bracerightBig = Vs (0). Given the state equation (B.10), the observation equation (B.11), and the above prior statistics, the algorithm of Kalman filter is as follows [70]: Initialization: For the time n = 0, ?S(1 | 0) = ?s (0) and V?s (1 | 0) = Vs (0). Filtering: For n= 1,2,... V? (n) = H(n)V?s (n|n?1)HH (n)+Vv (n); K(n) = V?s (n|n?1)HH (n)V?1? (n); ?(n) = y(n)?H(n)?S(n|n?1); ?S(n|n) = ?S(n|n?1) +K(n)?(n); ?S(n+1 |n) = ??S(n|n); V?s (n|n) = [I?K(n)H(n)]V?s (n|n?1); V?s (n+1 |n) = ?V?s (n|n)?H +?Vw (n)?H. Since ?S(n|n) = bracketleftbigg ?s(n|n) ?s(n?1 |n) ??? ?s(n?d|n) bracketrightbiggT , we extract its last term ?s(n?d|n) as the desired equalized output. Then hard-quantize ?s(n?d|n) to acquire the detected symbol ?s(n?d). 236 For a multiple-user (MIMO) channel of (B.8) with total K users, the state vector for the k-th user is Sk (n) = bracketleftbigg sk (n) sk (n?1) ??? sk (n?d) bracketrightbiggT where we also have d?L. Then the augmented K(d+1)-state vector is given by S(n) = bracketleftbigg ST1 (n) ST2 (n) ??? STK (n) bracketrightbiggT . In order to apply Kalman filter, we revise the state transition matrix as ? : = IK ? ? ??01?d 0 Id 0d?1 ? ??, the control-input matrix as ? : = IK ? bracketleftbigg 1 01?d bracketrightbiggT , the observation matrix as Hk (n): = bracketleftbigg hk (n;0) hk (n;1) ??? hk (n;L) 0N?(d?l) bracketrightbigg and H(n): = bracketleftbigg H1 (n) H2 (n) ??? HK (n) bracketrightbigg , and the input as w(n) := bracketleftbigg s1 (n+1) s2 (n+1) ??? sK (n+1) bracketrightbiggT . 237 Then the state equation (B.10) and the observation equation (B.11) still hold. Then apply Kalman filtering to obtain ?S(n|n) = bracketleftbigg ?ST1 (n|n) ?ST2 (n|n) ??? ?STK (n|n) bracketrightbiggT , where ?STk (n|n) = bracketleftbigg ?sk (n|n) ?sk (n?1 |n) ??? ?sk (n?d|n) bracketrightbiggT for k = 1,2,...,K. Fi- nally, we hard-quantize ?sk (n?d|n) as the desired equalized output for the k-th user. 238 Appendix C Mathematical Notations ? approximately equal to ? Kronecker product 0M?N M ?N all zeros matrix a lower case letters for scalars ?a? integer ceiling of a ?a? integer floor of a |a| magnitude of a a lower case letters in bold face for column vectors bardblabardbl Euclidean norm of a A upper case letters in bold face for matrices A? complex conjugate of A A? Moore-Penrose pseudo-inverse operation AH complex conjugate transpose of A AT transpose of A [A]n,m (n,m)-th entry of A A upper case calligraphic letters for matrices argmaxx f(x) value of x for which f(x) attains its maximum argminx f (x) value of x for which f(x) attains its minimum cov{?} covariance operator 239 ?(?) Kronecker delta function, defined as ?(n) = ? ?? ?? 1 if n = 0 0 if nnegationslash= 0, t? Z diag{a1,...,aN} N ?N diagonal matrix with [diag{a1,...,aN}]n,n = an E{?} expectation operator EH{?} expectation operator with respect to H IN N ?N identity matrix max(?) maximum value operator min(?) minimum value operator O(?) big O notation: f (x) = O(g(x)) as x?a (a? R???), iff |f(x)|?M|g(x)| as x?a for some constant M > 0 R real field tr{A} trace of a square matrix A Z integer field 240 Appendix D Abbreviations AM amplitude modulation AR auto-regressive AWGN additive white Gaussian noise BEM basis expansion model BER bit error rate BPSK binary phase-shift keying CE-BEM complex exponential basis expansion model CRLB Cram?er-Rao lower bound CSI channel state information DC direct current DFT discrete Fourier transform DKL-BEM discrete Karhuen-Lo`eve basis expansion model DML deterministic maximum likelihood DPS discrete prolate spheroidal DPS-BEM discrete prolate spheroidal basis expansion model FIR finite impulse response FM frequency modulation ISI inter-symbol interference LMS least mean squares LS least squares 241 MAC media access control MANET mobile ad hoc networks MIMO multiple-input multiple-output ML maximum likelihood MLSE maximum likelihood sequence estimation MSE mean square error MMSE minimum mean square error MPR multiple packet reception m.s. mean-square MUI multiple-user interference NCMSE normalized channel mean square error OFDM orthogonal frequency division multiplexing OP-BEM orthogonal polynomial basis expansion model PDD partially-data-dependent pdf probability density function PN pseudo-noise QAM quadrature amplitude modulation SIMO single-input multiple-output SISO single-input single-output SNR signal-to-noise ratio TIR training-to-information power ratio TM time-multiplexed 242