Doubly-Selective Channel Estimation and Equalization Using Superimposed
Training and Basis Expansion Models
Except where reference is made to the work of others, the work described in this
dissertation is my own or was done in collaboration with my advisory committee. This
dissertation does not include proprietary or classified information.
Shuangchi He
Certificate of Approval:
Stanley J. Reeves
Professor
Electrical and Computer Engineering
Jitendra K. Tugnait, Chair
Professor
Electrical and Computer Engineering
Soo-Young Lee
Professor
Electrical and Computer Engineering
Joe F. Pittman
Interim Dean
Graduate School
Doubly-Selective Channel Estimation and Equalization Using Superimposed
Training and Basis Expansion Models
Shuangchi He
A Dissertation
Submitted to
the Graduate Faculty of
Auburn University
in Partial Fulfillment of the
Requirements for the
Degree of
Doctor of Philosophy
Auburn, Alabama
August 4, 2007
Doubly-Selective Channel Estimation and Equalization Using Superimposed
Training and Basis Expansion Models
Shuangchi He
Permission is granted to Auburn University to make copies of this dissertation at its
discretion, upon the request of individuals or institutions and at
their expense. The author reserves all publication rights.
Signature of Author
Date of Graduation
iii
Dissertation Abstract
Doubly-Selective Channel Estimation and Equalization Using Superimposed
Training and Basis Expansion Models
Shuangchi He
Doctor of Philosophy, August 4, 2007
(M.S., Tsinghua University, 2003)
(B.E., Tsinghua University, 2000)
261 Typed Pages
Directed by Jitendra K. Tugnait
Owing to multipath propagation and Doppler spread, typical wireless channels are
both frequency- and time-selective (doubly-selective). In this dissertation, we concentrate
on channel estimation and equalization over doubly-selective channels, by exploiting both
superimposed training and basis expansion models (BEM).
In contrast to the conventional time-multiplexed (TM) training schemes, at the trans-
mitter, a periodic training sequence is arithmetically added at low power to the information
sequence in superimposed training schemes. There is no loss in data transmission rate,
but some useful power has to be allocated to superimposed training. We also employ vari-
ous BEM?s to describe the temporal variations of the doubly-selective channel so that the
estimation of a time-varying process can be reduced to estimating fewer invariant BEM
coefficients.
Firstly, a channel estimator is presented using superimposedtraining and the first-order
statistics of the observations, based on various BEM?s, where information sequences act as
iv
interference in channel estimation. By using user-specific training sequences, the estimator
can be extended to multiple-user systems.
We next analyze the information-induced self-interference of this estimator. The per-
formance analysis and the parameter optimizations are investigated.
We propose two schemes to alleviate the self-interference in channel estimation. Us-
ing the channel estimates by the first-order statistics-based estimator as an initial guess, a
deterministic maximum likelihood (DML) approach is used to jointly estimate the channel
and the information sequence. Exploiting the channel estimates and the detected informa-
tion data from the previous iteration, the self-interference can be significantly reduced at
the present iteration. We also propose a data-dependent superimposed training scheme.
The training sequence is designed based on the current information sequence so that the
self-interference can be entirely eliminated at the receiver. However, total elimination of
the interference may lead to information loss. We then modify the scheme to the partially-
data-dependent (PDD) training, striking a compromise between interference cancelation
and information integrity.
Using superimposed training and a BEM, direct equalization of doubly-selective chan-
nels is also considered, without estimating the channel first. The direct equalizer is also
extended to a multiple-user scenario, which can be used in a wireless ad hoc network.
The proposed approaches are illustrated by computer simulation examples, and com-
pared with conventional TM training-based approaches. When self-interference is suffi-
ciently suppressed by our proposed schemes, the performance of superimposed training-
based approaches are competitive with the ones usingthe conventional TM training, without
incurring any data-rate loss.
v
Acknowledgments
I have had a phenomenal time during my four years in Auburn. I attribute my good
fortune to the wonderful people I have come to know through the past four years.
First and foremost, I would like to thank my advisor, Prof. Jitendra K. Tugnait, for
his generosity and support over the years. As the guide of my research career, he gave me
careful and rigorous instructions, from which I will surely benefit for life.
Many thanks also go to my committee members, Profs. Stanley J. Reeves and Soo-
Young Lee; they have provided me with invaluable guidance and friendliness during my
studies and contributed much to my dissertation work. Thanks to Prof. Douglas A. Leonard
who served as my outside reader, for his suggestions and instruction. These acknowledge-
ments would be far from complete without thanks to Profs. Tin-Yau Tam, Xiaoli Ma, and
Shiwen Mao, for their sage advice during my study.
I want to thank Weilin Luo and Xiaohong Meng for their foundational work relating to
my dissertation?and especially Xiaohong, for her elegant simulation programs that offered
me a good model to follow.
Finally, I would like to express my gratitude to my parents and my wife, Fan, whose
love and faith in me are the source of my strength.
My studies were funded by National Science Foundation under Grant ECS 0424145
and Vodafone Fellowship.
vi
Style manual or journal used Journal of Approximation Theory (together with the style
known as ?aums?). Bibliography follows van Leunen?s A Handbook for Scholars.
Computer software used The document preparation package TEX (specifically LATEX)
together with the departmental style-file aums.sty.
vii
Table of Contents
List of Figures xi
1 Introduction 1
1.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Representations of Wireless Channels 11
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Jakes? Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Complex Exponential Basis Expansion Model (CE-BEM) . . . . . . . . . . 16
2.4 Orthogonal Polynomial Basis Expansion Model (OP-BEM) . . . . . . . . . 18
2.5 Discrete Prolate Spheroidal Basis Expansion Model (DPS-BEM) . . . . . . 20
2.6 Modeling Error of BEM?s . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6.1 LS Approximation by Basis Expansion Models . . . . . . . . . . . . 23
2.6.2 Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.3 Simulation Example: Modeling Error of CE-, OP-, and DPS-BEM?s
in Approximating a Doubly-Selective Channel . . . . . . . . . . . . . 25
2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 First-Order Statistics-Based Estimation of Doubly-Selective Channels 28
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 First-Order Statistics-Based Channel Estimation Using CE-BEM [81] . . . 28
3.3 First-Order Statistics-Based Channel Estimation Using DPS-BEM . . . . . 36
3.4 First-Order Statistics-Based Channel Estimation Using OP-BEM . . . . . . 40
3.5 First-Order Statistics-Based Channel Estimation: Multiple-User (MIMO)
Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.6 Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.6.1 First-Order Statistics-Based Estimator: Single User . . . . . . . . . 59
3.6.2 First-Order Statistics-Based Estimator: Multiple Users . . . . . . . . 64
3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4 Performance Analysis and Parameter Design for First-Order
Statistics-Based Estimator 71
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Performance Analysis for the First-Order Statistics-Based Estimator Using
BEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
viii
4.2.1 Performance Analysis for CE-BEM-Based Estimator . . . . . . . . . 74
4.2.2 Performance Analysis for DPS-BEM-Based Estimator . . . . . . . . 77
4.2.3 Performance Analysis for OP-BEM-Based Estimator . . . . . . . . . 81
4.2.4 Performance Analysis for Multiple-User (MIMO) Channels . . . . . 84
4.3 Performance Analysis for the First-Order Statistics-Based Estimator: with
Modeling Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4 Training Power Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.5 Bias-Variance Trade-Off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.6 Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.6.1 Performance Analysis for the First-Order Statistics-Based Estimator 106
4.6.2 Training Power Allocation . . . . . . . . . . . . . . . . . . . . . . . . 111
4.6.3 Bias-Variance Trade-Off . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5 Deterministic Maximum Likelihood (DML) Approach 118
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.2 DML Approach Using BEM . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.3 DML Approach: Multiple-User (MIMO) Channels . . . . . . . . . . . . . . 125
5.4 Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.4.1 DML Approach: Single User . . . . . . . . . . . . . . . . . . . . . . 132
5.4.2 DML Approach: Multiple Users . . . . . . . . . . . . . . . . . . . . . 139
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6 Doubly-Selective Channel Estimation Using Data-Dependent Superim-
posed Training 144
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
6.2 Data-Dependent Superimposed Training Using CE-BEM . . . . . . . . . . . 145
6.2.1 Data-Dependent Processing at the Transmitter . . . . . . . . . . . . 146
6.2.2 Data Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.2.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.3 Data-Dependent Superimposed Training Using DPS-BEM . . . . . . . . . . 151
6.3.1 Partially-Data-Dependent (PDD) Superimposed Training . . . . . . 153
6.3.2 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6.3.3 Power Allocation and Self-Interference Suppression . . . . . . . . . . 156
6.3.4 Recovery of Suppressed Frequencies via DML Approach . . . . . . . 163
6.4 Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.4.1 Data-Dependent Superimposed Training Using CE-BEM . . . . . . . 167
6.4.2 (Partially) Data-Dependent Superimposed Training Using DPS-BEM 174
6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
ix
7 Direct FIR Linear Equalization of Doubly-Selective Channels Based
on Superimposed Training 185
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
7.2 Direct FIR Linear Equalization Using CE-BEM . . . . . . . . . . . . . . . . 186
7.2.1 Time-Varying FIR Equalizers . . . . . . . . . . . . . . . . . . . . . . 187
7.2.2 Linear LS Equalizers Based on CE-BEM . . . . . . . . . . . . . . . . 188
7.3 Direct FIR Linear Equalization: Multiple Users . . . . . . . . . . . . . . . . 196
7.3.1 User-Specific Training Sequences . . . . . . . . . . . . . . . . . . . . 199
7.3.2 Linear LS Equalizers for the Desired User . . . . . . . . . . . . . . . 199
7.4 Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
7.4.1 Direct FIR Equalization: Single User . . . . . . . . . . . . . . . . . . 204
7.4.2 Direct FIR Equalization: Multiple Users . . . . . . . . . . . . . . . . 207
7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
8 Concluding Remarks and Future Work 211
8.1 Summary of Original Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
8.2 Possible Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Bibliography 216
Appendices 224
A Optimal Time-Multiplexed Training for Block Transmissions over
Doubly-Selective Channels [40] 225
B Symbol Detection 230
B.1 Maximum Likelihood Sequence Detector (Viterbi Detector) [64] . . . . . . . 230
B.2 Kalman Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
C Mathematical Notations 239
D Abbreviations 241
x
List of Figures
2.1 Modeling error of CE-, OP-, and DPS-BEM?s in approximating a three-tap
(L = 2) Rayleigh fading channel following Jakes? model. . . . . . . . . . . . 25
3.1 First-order statistics-based estimator (SISO): BER vs SNR under fd = 0Hz
(time-invariant) and K = N = 1. The curves for CE-, OP- and DPS-
BEM?s completely overlap, since the three basis functions are all constant
for time-invariant channels (Q = 1). (SI: superimposed training; TM: time-
multiplexed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . 55
3.2 First-order statistics-based estimator (SISO): BER vs SNR underfd = 50Hz
andK = N = 1. (SI: superimposed training; TM: time-multiplexed training;
CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . . . . . . . . . . . . . 56
3.3 First-order statistics-based estimator (SISO): BER vs SNR under fd =
100Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed
training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . . . . . . . . 57
3.4 First-order statistics-based estimator (SISO): BER vs SNR under fd =
200Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed
training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . . . . . . . . 58
3.5 First-order statistics-based estimator (SISO): NCMSE vs SNR under fd =
0Hz (time-invariant) and K = N = 1. The curves for CE-, OP- and DPS-
BEM?s completely overlap, since the three basis functions are all constant
for time-invariant channels (Q = 1). (SI: superimposed training; TM: time-
multiplexed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . 59
3.6 First-order statistics-based estimator (SISO): NCMSE vs SNR under fd =
50Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed
training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . . . . . . . . 60
3.7 First-order statistics-based estimator (SISO): NCMSE vs SNR under fd =
100Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed
training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . . . . . . . . 61
3.8 First-order statistics-based estimator (SISO): NCMSE vs SNR under fd =
200Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed
training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . . . . . . . . 62
xi
3.9 First-order statistics-based estimator (MIMO): BER vs SNR underfd = 0Hz
(time-invariant) and K = N = 2. The curves for CE- and DPS-BEM?s
completely overlap, since the two basis functions are both constant for
time-invariant channels (Q = 1). (SI: superimposed training; TM: time-
multiplexed training; CE: CE-BEM; DPS: DPS-BEM; Kalman: a Kalman
filter as the symbol detector; Viterbi: a Viterbi detector as the symbol de-
tector.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.10 First-order statistics-based estimator (MIMO): BER vs SNR under fd =
50Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed
training; CE: CE-BEM; DPS: DPS-BEM; Kalman: a Kalman filter as the
symbol detector; Viterbi: a Viterbi detector as the symbol detector.) . . . . 64
3.11 First-order statistics-based estimator (MIMO): BER vs SNR under fd =
100Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed
training; CE: CE-BEM; DPS: DPS-BEM; Kalman: a Kalman filter as the
symbol detector; Viterbi: a Viterbi detector as the symbol detector.) . . . . 65
3.12 First-order statistics-based estimator (MIMO): BER vs SNR under fd =
200Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed
training; CE: CE-BEM; DPS: DPS-BEM; Kalman: a Kalman filter as the
symbol detector; Viterbi: a Viterbi detector as the symbol detector.) . . . . 66
3.13 First-order statistics-based estimator (MIMO): NCMSE vs SNR under fd =
0Hz (time-invariant) and K = N = 2. The curves for CE- and DPS-
BEM?s completely overlap, since the two basis functions are both constant
for time-invariant channels (Q = 1). (SI: superimposed training; TM: time-
multiplexed training; CE: CE-BEM; DPS: DPS-BEM.) . . . . . . . . . . . . 67
3.14 First-order statistics-based estimator (MIMO): NCMSE vs SNR under fd =
50Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed
training; CE: CE-BEM; DPS: DPS-BEM.) . . . . . . . . . . . . . . . . . . . 68
3.15 First-order statistics-based estimator (MIMO): NCMSE vs SNR under fd =
100Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed
training; CE: CE-BEM; DPS: DPS-BEM.) . . . . . . . . . . . . . . . . . . . 69
3.16 First-order statistics-based estimator (MIMO): NCMSE vs SNR under fd =
200Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed
training; CE: CE-BEM; DPS: DPS-BEM.) . . . . . . . . . . . . . . . . . . . 70
xii
4.1 Estimation variance: NCMSE vs SNR underfd = 0Hz (time-invariant). The
curves for CE-, OP- and DPS-BEM?s completely overlap, since the three basis
functions are all constant for time-invariant channels (Q = 1). (SI: super-
imposed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM; MSE1:
defined in (4.2); MSE: defined in (4.33).) . . . . . . . . . . . . . . . . . . . . 103
4.2 Estimation variance: NCMSE vs SNR under fd = 50Hz. (SI: superimposed
training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM; MSE1: defined in
(4.2); MSE: defined in (4.33).) . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.3 Estimation variance: NCMSE vs SNR underfd = 100Hz. (SI: superimposed
training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM; MSE1: defined in
(4.2); MSE: defined in (4.33).) . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.4 Estimation variance: NCMSE vs SNR underfd = 200Hz. (SI: superimposed
training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM; MSE1: defined in
(4.2); MSE: defined in (4.33).) . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.5 Training power allocation: BER vs ? under fd = 0Hz (time-invariant). The
curves for CE-, OP- and DPS-BEM?s completely overlap, since the three
basis functions are all constant for time-invariant channels (Q = 1). (SI:
superimposed training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . 107
4.6 Training power allocation: BER vs ? under fd = 50Hz. (SI: superimposed
training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . . . . . . . . 108
4.7 Training power allocation: BER vs ? under fd = 100Hz. (SI: superimposed
training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . . . . . . . . 109
4.8 Training power allocation: BER vs ? under fd = 200Hz. (SI: superimposed
training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.) . . . . . . . . . . 110
4.9 Training power allocation: optimum ? vs SNR for CE-BEM. (?sim.?: simu-
lation results; ?analy.?: ?opt in (4.46); ?analy. app.?: ??opt in (4.48).) . . . . 111
4.10 Training power allocation: optimum ? vs SNR for OP-BEM. (?sim.?: simu-
lation results; ?analy.?: ?opt in (4.46); ?analy. app.?: ??opt in (4.48).) . . . . 112
4.11 Training power allocation: optimum ? vs SNR for DPS-BEM. (?sim.?: sim-
ulation results; ?analy.?: ?opt in (4.46); ?analy. app.?: ??opt in (4.48).) . . . 113
4.12 Bias-variance trade-off: BER vs Q under TIR = 0.3 for different fd?s. . . . . 114
xiii
4.13 Bias-variance trade-off: SNRd (Q) (defined in (4.50)) vs Q under TIR = 0.3
for different fd?s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.14 Bias-variance trade-off: BER vs Q under TIR = 1.0 for different fd?s. . . . . 116
4.15 Bias-variance trade-off: SNRd (Q) (defined in (4.50)) vs Q under TIR = 1.0
for different fd?s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.1 DML approach (SISO): BER vs SNR under fd = 0Hz (time-invariant) and
K = N = 1. The curves for CE- and DPS-BEM?s completely overlap, since
the two basis functions are both constant for time-invariant channels (Q = 1).
(SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM;
DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st
iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd
iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 128
5.2 DML approach (SISO): BER vs SNR underfd = 50Hz and K = N = 1. (SI:
superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS:
DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st iter.?:
the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd iter.?:
the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.3 DML approach (SISO): BER vs SNR under fd = 100Hz and K = N = 1.
(SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM;
DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st
iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd
iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 130
5.4 DML approach (SISO): BER vs SNR under fd = 200Hz and K = N = 1.
(SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM;
DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st
iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd
iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 131
5.5 DML approach (SISO): NCMSE vs SNR under fd = 0Hz (time-invariant)
and K = N = 1. The curves for CE- and DPS-BEM?s completely overlap,
since the two basis functions are both constant for time-invariant channels
(Q = 1). (SI: superimposed training; TM: time-multiplexed training; CE:
CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based esti-
mator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML
iteration; ?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . 132
xiv
5.6 DML approach (SISO): NCMSE vs SNR under fd = 50Hz and K = N = 1.
(SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM;
DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st
iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd
iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 133
5.7 DML approach (SISO): NCMSE vs SNR underfd = 100Hz and K = N = 1.
(SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM;
DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st
iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd
iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 134
5.8 DML approach (SISO): NCMSE vs SNR underfd = 200Hz and K = N = 1.
(SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM;
DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st
iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd
iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 135
5.9 DML approach (MIMO): BER vs SNR under fd = 0Hz (time-invariant) and
K = N = 2. The curves for CE- and DPS-BEM?s completely overlap, since
the two basis functions are both constant for time-invariant channels (Q = 1).
(SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM;
DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st
iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd
iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 136
5.10 DML approach (MIMO): BER vs SNR under fd = 50Hz and K = N = 2.
(SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM;
DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st
iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd
iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 137
5.11 DML approach (MIMO): BER vs SNR under fd = 100Hz and K = N = 2.
(SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM;
DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st
iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd
iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 138
5.12 DML approach (MIMO): BER vs SNR under fd = 200Hz and K = N = 2.
(SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM;
DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st
iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd
iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 139
xv
5.13 DML approach (MIMO): NCMSE vs SNR under fd = 0Hz (time-invariant)
and K = N = 2. The curves for CE- and DPS-BEM?s completely overlap,
since the two basis functions are both constant for time-invariant channels
(Q = 1). (SI: superimposed training; TM: time-multiplexed training; CE:
CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based esti-
mator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML
iteration; ?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . 140
5.14 DML approach (MIMO): NCMSE vs SNR underfd = 50Hz andK = N = 2.
(SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM;
DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator; ?1st
iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration; ?3rd
iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . . . . 141
5.15 DML approach (MIMO): NCMSE vs SNR under fd = 100Hz and K =
N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-
BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator;
?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration;
?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . 142
5.16 DML approach (MIMO): NCMSE vs SNR under fd = 200Hz and K =
N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-
BEM; DPS: DPS-BEM; ?step 1?: the first-order statistics-based estimator;
?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML iteration;
?3rd iter.?: the third DML iteration.) . . . . . . . . . . . . . . . . . . . . . 143
6.1 Data-dependent superimposed training (CE-BEM): BER vs SNR for non-
data-dependent, data-dependent, and time-multiplexed training, under fd =
0 and 50Hz. (SI: superimposed training; TM: time-multiplexed training; ? =
1: non-data-dependent training; ? = 0: total elimination of self-interference.) 167
6.2 Data-dependent superimposed training (CE-BEM): BER vs SNR for non-
data-dependent, data-dependent, and time-multiplexed training, under fd =
100 and 200Hz. (SI: superimposed training; TM: time-multiplexed train-
ing; ? = 1: non-data-dependent training; ? = 0: total elimination of self-
interference.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.3 Data-dependent superimposedtraining (CE-BEM): NCMSE vs SNR for non-
data-dependent, data-dependent, and time-multiplexed training, under fd =
0 and 50Hz. (SI: superimposed training; TM: time-multiplexed training; ? =
1: non-data-dependent training; ? = 0: total elimination of self-interference.) 169
xvi
6.4 Data-dependent superimposed training (CE-BEM): NCMSE vs SNR for
non-data-dependent, data-dependent, and time-multiplexed training, under
fd = 100 and 200Hz. (SI: superimposed training; TM: time-multiplexed
training; ? = 1: non-data-dependent training; ? = 0: total elimination of
self-interference.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.5 Data-dependent superimposed training (CE-BEM): BER vs SNR for non-
data-dependent, data-dependent, and time-multiplexed training, under fd =
100Hz and N = 1, 2, and 3. (SI: superimposed training; TM: time-
multiplexed training; ? = 1: non-data-dependent training; ? = 0: total
elimination of self-interference.) . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.6 Data-dependent superimposed training (fast fading): BER vs SNR under
fd = 100 and 250Hz. (SI: superimposed training; TM: time-multiplexed
training; ? = 1: non-data-dependent training; ? = 0: total elimination of
self-interference.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.7 Estimation variance: NCMSE vs fd under SNR = 25dB for comparison
between analytical and simulation-based results of non-data-dependent and
data-dependent superimposed training. (? = 1: non-data-dependent train-
ing; ? = 0: total elimination of self-interference; ?: standard deviation.) . . 173
6.8 PDD superimposed training: NCMSE vs SNR for CE- and DPS-BEM-based
estimators, under fd = 100Hz. (? = 1: non-data-dependent training; ? = 0:
total elimination of self-interference; ? = 0.2: partial elimination of self-
interference at the channel estimation stage.) . . . . . . . . . . . . . . . . . 174
6.9 PDD superimposed training: BER vs SNR for CE- and DPS-BEM-based
estimators, under fd = 100Hz. (? = 1: non-data-dependent training; ? = 0:
total elimination of self-interference; ? = 0.2: partial elimination of self-
interference at the channel estimation stage.) . . . . . . . . . . . . . . . . . 175
6.10 PDD superimposed training: BER vs (?,?) under SNR = 15dB and fd =
100Hz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
6.11 PDD superimposed training: optimum (?,?) vs SNR under fd = 100Hz. . . 177
6.12 PDD superimposed training: BER vs SNR underfd = 0Hz (time-invariant).
(TM: time-multiplexed training; ?step 1?: the data detection scheme in
Section 6.2.2; ?3rd iteration?: the third DML iteration; ? = 1: non-data-
dependent training; ? = 0: total elimination of self-interference; ? = 0.2:
partial elimination of self-interference at the channel estimation stage.) . . . 178
xvii
6.13 PDD superimposed training: BER vs SNR under fd = 100Hz. (TM: time-
multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2;
?3rd iteration?: the third DML iteration; ? = 1: non-data-dependent train-
ing; ? = 0: total elimination of self-interference; ? = 0.2: partial elimination
of self-interference at the channel estimation stage.) . . . . . . . . . . . . . 179
6.14 PDD superimposed training: BER vs SNR under fd = 200Hz. (TM: time-
multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2;
?3rd iteration?: the third DML iteration; ? = 1: non-data-dependent train-
ing; ? = 0: total elimination of self-interference; ? = 0.2: partial elimination
of self-interference at the channel estimation stage.) . . . . . . . . . . . . . 180
6.15 PDD superimposed training: NCMSE vs SNR under fd = 0Hz (time-
invariant). (TM: time-multiplexed training; ?step 1?: the data detection
scheme in Section 6.2.2; ?3rd iteration?: the third DML iteration; ? = 1:
non-data-dependent training; ? = 0: total elimination of self-interference;
? = 0.2: partial elimination of self-interference at the channel estimation
stage.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.16 PDD superimposed training: NCMSE vs SNR under fd = 100Hz. (TM:
time-multiplexed training; ?step 1?: the data detection scheme in Section
6.2.2; ?3rd iteration?: the third DML iteration; ? = 1: non-data-dependent
training; ? = 0: total elimination of self-interference; ? = 0.2: partial elimi-
nation of self-interference at the channel estimation stage.) . . . . . . . . . 182
6.17 PDD superimposed training: NCMSE vs SNR under fd = 200Hz. (TM:
time-multiplexed training; ?step 1?: the data detection scheme in Section
6.2.2; ?3rd iteration?: the third DML iteration; ? = 1: non-data-dependent
training; ? = 0: total elimination of self-interference; ? = 0.2: partial elimi-
nation of self-interference at the channel estimation stage.) . . . . . . . . . 183
7.1 Single-user direct FIR equalization: BER vs SNR underfd = 0Hz and length
of equalizer Le = 6 with different TIR and number of receivers. . . . . . . . 205
7.2 Single-user direct FIR equalization: BER vs SNR under fd = 50Hz and
length of equalizer Le = 6 with different TIR and number of receivers. . . . 206
7.3 Single-user direct FIR equalization: BER vs SNR under fd = 100Hz and
length of equalizer Le = 6 with different TIR and number of receivers. . . . 207
7.4 Multiple-user direct FIR equalization (ad hoc): BER vs SNR under fd =
100Hz and length of equalizer Le = 4 with different number of receivers. . . 208
xviii
7.5 Multiple-user direct FIR equalization (ad hoc): BER vs Doppler spread fd
under SNR = 25dB and length of equalizer Le = 4 with different number of
receivers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
7.6 Multiple-user direct FIR equalization (ad hoc): BER vs length of equalizer
Le under fd = 50Hz and SNR = 25dB with different number of receivers. . 210
xix
Chapter 1
Introduction
With the emergence of next-generation wireless mobile communications, multimedia
services have increasing demands for higher data rates, better quality of service, and higher
network capacity. In efforts to support such demands, researchers have paid special atten-
tion to wireless channels. Phenomena occurring in wireless channels, such as fading, delay
spread, Doppler spread, co-channel interference, and multi-user interference, may impair
signal transmission and data reception. A wireless channel is a challenging communica-
tions medium with limited bandwidth, relatively low capacity per unit bandwidth, random
amplitude and phase fluctuations, and inter-symbol interference (ISI). To design a physical
link with data rates approaching the fundamental information capacity limits of the wireless
channel, accurate knowledge of the channel state information (CSI) becomes a prerequisite
for many physical layer approaches. Channel estimation thus plays a key role. At the re-
ceive ends, equalizers are usually used to compensate for the signal distortion. One may
design an equalizer based on a channel estimate, or by directly using the received signals.
Traditionally, receivers rely on a transmitter-assisted training session to extract the
desired reference signal for channel estimation or equalization [64]. In a fast-varying envi-
ronment, training sessions have to be transmitted frequently and periodically to keep up
with the temporal variation of the channel. For a band-limited wireless application, frequent
use of training sessions decreases the effective information rate. To save valuable spectrum
resources, blind (self-recovering) channel estimation and equalization, based solely on the
noisy received data, exploiting the statistical or other properties, has attracted researchers?
1
interest, where no training sessions are available nor are used [23]. Semi-blind channel es-
timation, combining explicit (time-multiplexed) training and blind cost functions, has also
attracted considerable attention due to the need for fast and robust channel estimation
and the fact that, for many packet transmission systems, embedded known symbols can be
exploited for channel estimation. In semi-blind approaches, there are training sessions but
one uses information data also to improve the training-based results [23,87]. More recently,
superimposed training-based approaches have been explored where the training sequence is
?on? all the time, and is transmitted (at low power) concurrently with (superimposed on)
the information data. In contrast to explicit training, there is no loss in data transmission
rate. On the other hand, some useful power is wasted in superimposed training sequences
which could have otherwise been allocated to the information data.
Inthis dissertation, we will discuss doubly-selective channel estimation and equalization
using superimposed training. Common wireless channels are frequency-selective (due to
delay spread and multipath propagation) and time-selective (due to mobility). An accurate
model of realistic wireless channels can be complicated and involve too many parameters
for estimation purposes. Therefore, a parsimonious representation is preferred. We employ
basis expansion models (BEM) to represent the doubly-selective channel with many fewer
parameters [24]. In a BEM, the channel is represented as a finite impulse response (FIR)
filter where each tap is a superposition of distinct basis functions that describe the temporal
variations of the channel. Three BEM?s are considered: the complex exponential basis
expansion model (CE-BEM), the orthogonal polynomial basis expansion model (OP-BEM),
and the discrete prolate spheroidal basis expansion model (DPS-BEM).
2
1.1 Previous Work
In this section, we summarize the previous research work on superimposed training-
based channel estimation, equalization, and related areas.
To the best of our knowledge, the idea of superimposed training (simultaneous trans-
mission of information-bearing signal and channel sounding) was first proposed in [36] in
1965 for analog communications, where a pseudo-random channel sounding signal was su-
perimposed upon a frequency-modulated (FM) information-bearing signal by amplitude
modulation (AM). This idea was extended to digital systems in [17] in 1995, where both
least squares (LS) and least mean squares (LMS) methods were considered to build an adap-
tive filter, treating the known superimposed training sequence as the input and the received
signal as the desired output. Periodic superimposed training sequences allowed for the use
of first-order statistics (time-varying mean) of the received signal, which were also exploited
for time-invariant channel estimation in [59,82,98], among others. Using CE-BEM, such
periodic superimposed training schemes were extended to doubly-selective channel environ-
ments in [81,97]. Direct design of FIR equalizers using periodic superimposed training was
investigated in [58] for time-invariant channels. The Cram?er-Rao lower bound (CRLB) on
channel estimation variance was given in [98], under the assumption of Gaussian source
symbols for a special class of training sequences. Such bounds were extended to a general
class of training sequences in [59]. Non-periodic random or pseudo-noise (PN) superim-
posed training sequences (known at the receiver) were used in [33,44]. A linear predictor
was designed in [33] to estimate the time-varying flat fading channel, and based on the
minimum mean square error (MMSE) criterion channel estimation and equalization were
discussed in [44] for M-quadrature amplitude modulated (QAM) symbols.
3
The formulations of [59,82] allowed for the presence of an unknown ?direct current?
(DC) offset at the receiver, whereas [17, 98] did not. The two schemes in [59,82] were
compared by [45], where their structural equivalence was verified and therefore identical
estimates would be got for zero (or known) DC offset. In the presence of an unknown DC
offset, the basic approach of [59] yielded biased channel estimates, so that estimation of the
DC offset was required by using the biased estimates and received data by finding the roots
of a fifth-degree polynomial [59]. In contrast, the method of [82] yielded unbiased channel
estimates directly. Performance analysis (a closed-form solution for the channel estimation
variance) was also performed in [59] for zero (or known) DC offset, which was then used
for an optimal training sequence synthesis to yield a channel-independent performance.
Unfortunately, the synthesized training sequences in [59] do not necessarily have a small
peak-to-average power ratio, whereas that of the training sequence in [82] has the optimal
value of one. The performance analysis that is valid for any DC offset of the approach pro-
posed by [82] was conducted in [84,85], where power allocation for superimposed training in
Rayleigh fading channels was also addressed by maximizing the equivalent signal-to-noise
ratio (SNR) for equalizer design under a fixed power constraint. As in [17], the period
of the superimposed training sequence of [59] was equal to the number of channel taps,
whereas this condition was relaxed in [82] to be greater or equal to the number of chan-
nel taps. Synchronization of the training sequence (frame synchronization), based on the
correlation and the fourth-order cumulant functions of the observations, was also discussed
in [59]. Under mis-synchronization, however, the estimated channel will consequently yield
a circularly shifted estimate whose ?shift? cannot be resolved via the first-order statistics
of the data [59]. Synchronization of the approach in [82] was discussed in [83,85], where
4
the problem of shifted channel estimate was avoided. A synchronization technique based
on subspace projections was discussed in [1]. The estimator proposed in [82] offers the
fundamental approach to channel estimation in this dissertation.
To exploit the enormous capacity potential of multiple-input multiple-output (MIMO)
communications [18], superimposed training-based channel estimation was considered in
[4,5,21,47,48] for MIMO systems. Superimposed training in multi-carrier systems was
considered in [9,10,13,93]. We also note that a more general framework of superimposed
training, engaged in affine precoding, has now attracted much interest and was investigated
in [42,56,89], among others.
Since superimposed training-based methods usually use statistical properties of the
information data, they can be treated as semi-blind approaches [17]. In contrast to slow
convergence and possible convergence toward incorrect solutions occurring in blind meth-
ods [92], identifiability conditions for superimposed training-based methods are much less
stringent [17]. Furthermore, blind approaches cannot resolve complex scaling factor ambi-
guity, so that differential coding and decoding resulting in 3dB SNR loss is required [85],
whereas power allocated to superimposed training is typically much less than 3dB (1dB or
less in [17]).
In superimposed training-based approaches, the unknown information data are typi-
cally incorporated in the noise term, in essence yielding a lower SNR [73]. In other words,
the information data may act as interference at the receiver to the superimposed training
and adversely affect channel estimation and data detection performance?no loss in data
transmission rate might be at the price of degradation in data reception.
5
Several methods were investigated to reduce the interference from information data
(we call it self-interference in this dissertation since it comes from the transmitted signal
itself). A selective superimposed training scheme was proposed in [76], where the selection
of the training sequence depends on each frame of information data. From a candidate
set of orthogonal sequences maintained at the transmitter, a training sequence is chosen
and superimposed on the incoming information frame so as to minimize the correlation be-
tween training and the information frame. In [20], a data-dependent superimposed training
scheme was proposed for time-invariant channel estimation, where the training sequence
is distorted before transmission in order that the self-interference is eliminated at the re-
ceiver. This scheme was extended to MIMO systems in [21], and was modified to allow for
unknown DC offset in [19]. However, in Chapter 6 we will show that the cost of using data-
dependent superimposed training for interference cancelation is information loss to some
extent. Channel estimation and data detection can also be enhanced in an iterative way,
i.e., the detected data can be utilized to cancel the self-interference in the next iteration.
Such applications are available in [13,49?51,97].
Now, one may wonder, in comparison with conventional time-multiplexed (TM) train-
ing, what are the advantages and disadvantages of superimposed training? Since the ulti-
mate goal of communications is to improve the capacity of communication systems to the
Shannon bound, how can superimposed training help?
A superimposed training-based scheme for space-time coded transmission over flat
block fading (quasi-static) channels was considered in [8]. The analysis revealed the weak-
ness of superimposed training in block-stationary (and thus time-invariant) environments,
showing that superimposed training has higher CRLB than that of TM training due to
6
the presence of self-interference. On the other hand, if training must be included in every
block and the channel estimation is accurate, the superimposed scheme gives higher mutual
information (capacity) [8]. Performance bounds for TM and superimposed training-based
semi-blind estimation of time-varying flat fading channels were considered in [16]. Under the
same overall power allocation, it was shown that the superimposed training performs better
for fast fading channels, which confirms the intuition that the constant presence of training
has considerable benefit. For slow fading and high SNR, such an advantage disappears and
there is a penalty for using superimposed training, since data transmission interferes with
channel estimation [74]. This viewpoint was also confirmed by [73], which showed that
when the coherence time is relatively short (compared with the time devoted to training in
each block), superimposed training achieves higher capacity than that of TM training. It is
because superimposed training allows for data transmission over the entire block, whereas
TM training sessions occupy a large portion of time in this situation and hence not much
time remains for information transmission. Capacity of superimposedtraining-based MIMO
systems was considered in [5], where similar results have been derived, i.e., in the scenarios
of high SNR, many receiver antennas, and short coherence time, it is beneficial to employ
superimposed training; otherwise, TM training will be better.
An important conclusion appeared in [4], where the author answered the following
question: How much will the capacity increase by allowing re-estimation of the channel when
the detected symbols are available? It was shown that the capacity after re-estimation can
be very close to the fundamental capacity of the non-coherent channel, especially when the
channel coherence time is short. After iterations, significant improvement is achieved?of
course, at the expense of increased complexity.
7
1.2 Contributions
In this dissertation, we investigate superimposed training-based approaches to the es-
timation and equalization of doubly-selective channels. In order to model channel variation
by a parsimonious representation, we explore various BEM?s, including CE-, OP-, and
DPS-BEM?s, to approximate the multipath channel with Doppler spread.
Our starting point is the first-order statistics-based estimator proposed by [81], us-
ing CE-BEM. In approximating band-limited time-varying channels, the modeling error of
CE-BEM is noticeable. Therefore, we extend this estimator using DPS- and OP-BEM?s
to reduce the modeling error. A more general estimator that applies to arbitrary BEM?s
is also provided. We then further apply this estimator to a multiple-user scenario. Chan-
nel estimation across different users is decoupled by means of user-specific superimposed
training sequences.
Performance analysis is then conducted for the first-order statistics-based estimator
for doubly-selective channels, in which we demonstrate that the interference in estimation
mainly comes from the unknown information sequence (self-interference). Based on the
results of the performance analysis, we cast the issues of power allocation and bias-variance
trade-off as ones of optimizing an SNR for equalizer design, following the method proposed
for time-invariant channels in [85].
The major drawback of superimposed training is that the self-interference from infor-
mation data may adversely affect channel estimation and data reception performance. To
alleviate the effect of self-interference, we propose two methods: A deterministic maximum
likelihood (DML) approach is employed at the receiver, to enhance the channel estimation
8
iteratively, exploiting the detected symbols from the previous iteration to reduce the self-
interference. We can also achieve this by transmitter-end processing?the superimposed
training sequence is modified based on the information sequence, or equivalently, the in-
formation sequence is distorted before transmission so that training and information data
occupy distinct frequencies and hence can be separated at the receiver. However, distor-
tion of the information sequence may cause ?information? loss before transmission, which
cannot be fully recovered by receiver-end processing. A partially-data-dependent (PDD)
superimposed training scheme is proposed in order to strike a trade-off between interference
cancelation and information integrity.
We also design a direct equalizer, without first estimating the channel, using superim-
posed training and CE-BEM. With the aid of periodic white training sequences, we show
that the optimal linear equalizer for the training sequence is also a scaled version of the op-
timal equalizer for the information sequence. By employing user-specific training sequences,
this direct equalizer can be extended to a multiple-user scenario, which can be used in a
wireless ad hoc network.
Computer simulation examples illustrate our proposed approaches. Analytical results
are also compared with simulation results to show their validity. Comparisons with con-
ventional TM training-based approaches are also presented?when self-interference is suf-
ficiently suppressed by our proposed schemes, the performance of superimposed training-
based approaches are competitive with the ones using TM training, without incurring any
data-rate loss.
9
1.3 Organization
The rest of this dissertation is organized as follows.
In Chapter 2, representations of time-varying wireless channels are briefly reviewed,
including Jakes? model, CE-, OP-, and DPS-BEM?s.
Chapter 3 introduces the first-order statistics-based channel estimator using super-
imposed training. We explore this estimator under different channel representations, and
extend it from a single-user scenario to a multiple-user situation by exploiting user-specific
superimposed training sequences.
We consider performanceanalysis forthe first-orderstatistics-based estimators in Chap-
ter 4. Training power allocation and bias-variance trade-off are also optimized from the
viewpoint of equalization.
The DML approach is considered in Chapter 5. Exploiting detected symbols from the
previous iteration, the self-interference is reduced at the current iteration and therefore,
channel estimation and data reception performance are improved. A multiple-user scenario
is also considered.
In Chapter 6, we investigate the data-dependent superimposed training scheme. The
data-dependent processing at the transmitter results in information loss. We propose a PDD
superimposed training scheme to mend this problem. Performance analysis and parameter
design are also provided.
Superimposed training-based direct equalization is considered in Chapter 7, by the aid
of periodic white training sequences. This algorithm is also extended to a multiple-user
wireless ad hoc network.
The dissertation concludes in Chapter 8. Future directions are also suggested.
10
Chapter 2
Representations of Wireless Channels
2.1 Introduction
Due to multipath propagation and Doppler spread, wireless channels are characterized
by frequency- and time-selectivity [66]. A radio signal, experiencing distortions through
transmission by fading, background noise, and interference of every sort, becomes stochastic
to an observer at the receiver. Small-scale fading (or simply fading) is the term to describe
the rapid fluctuations of the amplitudes, phases, or multipath delays of a signal over a short
period of time or travel distance, so that large-scale path loss may be ignored [66]. The
goal of channel estimation and equalization is mainly to combat small-scale fading.
Fading can be attributed to physical factors including multipath propagation, relative
motion between the transmitter and the receiver or surrounding objects, and the trans-
mission bandwidth of the signal, etc [35]. The presence of reflecting objects and scatterers
makes the wireless channel constantly changing, which dissipates the signal energy and dis-
torts the signal in amplitude, phase, and time. Multiple versions of the transmitted signal
arrive at the receiver through different paths. The random amplitudes and phases of the
different multipath components induce fading. The relative motion between the transmitter
and the receiver as well as the motion of the objects within the wireless channel, induces
Doppler spreads, which are typically time-varying and become a source of fading also.
11
For channel estimation or tracking purposes, accurate modeling of the temporal evo-
lution of the channel plays an important role. A parsimonious and accurate channel repre-
sentation is always preferred. Among various models for channel time variations, the au-
toregressive (AR) process, particularly the first-order AR model, is regarded as a tractable
model to describe a time-varying channel, where the channel is assumed to be Markovian,
i.e., for the current channel symbol, the effect of channel symbols other than the immediately
preceding one is negligible [90]. This Markovian assumption has been verified for Rayleigh
fading channels in [90], by considering the mutual information between channel symbols.
The AR model has been used for time-varying channel estimation in [7,11,16,37,38].
The AR model, based on symbol-by-symbol update, is suitable for sequential time-
domain processing. When we deal with block processing schemes, it is more convenient to
use block-based channel models such as BEM?s.
The BEM that is optimal in MSE is the discrete Karhuen-Lo`eve BEM (DKL-BEM),
which is a reduced-rank decomposition of a certain type of Doppler spectrum [72]. The
CE-BEM can be viewed as a special DKL-BEM based on a white Doppler spectrum, and
the DPS-BEM corresponds to the DKL-BEM with a rectangular Doppler spectrum [72].
In this chapter, we briefly review representations of time-varying channels. In Section
2.2, Jakes? model is introduced, which will be used as the model of the ?real? channel in
the simulation examples of this dissertation. In Sections 2.3?2.5, CE-, OP-, and DPS-BEM
representations are discussed. The modeling error of these BEM?s is compared one another
in Section 2.6 via a simulation example. Section 2.7 summarizes this chapter.
12
2.2 Jakes? Model
If we assume that many statistically independent scattering waves with random am-
plitudes and phases reach the receiver with the phases uniformly lying in [0,2pi), and there
is no dominant non-fading signal component present (no line-of-sight), by the central limit
theorem, the real and imaginary parts of the sum of the scattering waves are both Gaussian.
The signal envelope A as a function of time t obeys a Rayleigh distribution, which has a
probability density function (pdf) given by
fA (a) :=
?
??
??
a
?2 exp
parenleftBig
? a22?2
parenrightBig
a? 0,
0 a< 0
(2.1)
with ?2 being the time-average power of the received signal before envelope detection. The
phase ? of the received signal is uniformly distributed with pdf
f? (?) := 12pi, ? ? [0,2pi). (2.2)
The autocorrelation function of the received signal for two-dimensional isotropic scattering
and an omnidirectional receiving antenna is given by [12,71]
RA(?) = ?2 cos(?c?)J0(?m?) (2.3)
where ?c is the carrier radian frequency, J0(?) is the zero-order Bessel function of the first
kind and ?m is the maximum Doppler radian frequency spread. Any model that attempts
to model the Rayleigh flat fading narrow-band wireless channel has to exhibit the statistical
behaviors given by (2.1)?(2.3).
13
Clarke summarized the important characteristics of fading channels and provided a use-
ful mathematical model [12]. According to this model, Jakes proposed a sum-of-sinusoids-
based simulator [35] that has been widely used and studied over the past decades. The
simulator supposes the received signal S(t) to be a superposition of waves
S(t) = E0
Nsummationdisplay
n=1
Cn cos(?ct+?mtcosAn +?n)
where E0 is the amplitude of the transmitted cosine wave, Cn is a random variable repre-
senting the attenuation of the n-th path, An is a random variable representing the angle
of arrival of the n-th ray with respect to the direction of motion of the receiver, ?n is a
random variable representing the phase shift undergone by the n-th ray. Note that the
stochastic signal S(t) representing the flat fading signal can be characterized by N sets
of triples (Cn,An,?n). The random variables Cn, An, and ?n are assumed statistically
independent.
To reduce the complexity, Jakes? model selects
Cn = 1?N, n = 1,2,...,N, (2.4a)
An = 2pinN , n = 1,2,...,N, (2.4b)
?n = 0, n = 1,2,...,N. (2.4c)
Furthermore, N is of the form N = 4M +2 where M is a positive integer.
However, the simplification in (2.4) makes this simulation model deterministic and
wide-sense nonstationary [63,96]. In [96], a modified Jakes? simulator was proposed. It
is wide-sense stationary and its autocorrelation and cross correlation functions match the
14
desired reference model exactly. Following [96], the normalized low-pass fading process of
the statistical sum-of-sinusoids simulation model is defined by
X(t) = Xc (t) +jXs (t), (2.5a)
Xc (t) = 2?M
Msummationdisplay
n=1
cos(?n)cos(?mtcos?n +?), (2.5b)
Xs (t) = 2?M
Msummationdisplay
n=1
sin(?n)cos(?mtcos?n +?) (2.5c)
with
?n = 2pin?pi+?4M , n = 1,2,...,M
where ?, ?, and ?n are statistically independent and uniformly distributed over [?pi,pi)
for all n. As M ? ?, the envelope |X| is Rayleigh distributed and the phase ?X (t) is
uniformly distributed over [?pi,pi), for which the pdf?s are given by
f|X| (x) =xexp
parenleftbigg
?x
2
2
parenrightbigg
, x? 0,
f?X (?) = 12pi, ? ? [?pi,pi).
A minor defect, however, occurs in model (2.5) when ?m = 0 or the Doppler spread
is small: A Rayleigh distribution cannot be guaranteed [94]. This problem can be easily
resolved by replacing a common phase ? by ?n, which is also uniformly distributed over
15
[?pi,pi) for all n. The simulation model is revised as [94]:
X(t) = Xc (t)+jXs (t), (2.6a)
Xc (t) = 2?M
Msummationdisplay
n=1
cos(?n)cos(?mtcos?n +?n), (2.6b)
Xs (t) = 2?M
Msummationdisplay
n=1
sin(?n)cos(?mtcos?n +?n). (2.6c)
2.3 Complex Exponential Basis Expansion Model (CE-BEM)
Recently, deterministic complex exponential basis expansion models (CE-BEM) have
been widely investigated in wireless applications, especially when the multipath is caused
by a few strong reflectors, and path delays exhibit variations due to the kinematics of the
mobiles [24]. In these models, the time-varying taps are expressed as a superposition of
time-varying basis functions in modeling Doppler effects, with time-invariant coefficients.
By assigning temporal variations to basis functions, rapidly fading channels with coherence
time as small as a few tens of symbols can be captured. If the delay spread and the Doppler
spread of the channel (or at least the upper bounds of them) are known, one can infer the
basis functions of the CE-BEM [40]. Treating the basis functions as known parameters,
estimation of a time-varying process is reduced to estimate time-invariant coefficients.
Consider a time-varying channel with impulse response h(t;?) (response at time t to a
unit impulse at time t??) which includes transmit-receive filters as well as doubly-selective
propagation effects. Let s(t) denote the complex baseband, continuous-time input signal
(with symbol durationTs), andx(t) denote the complex baseband, continuous-time received
signal. The noise-free received signal x(t) is the convolution of s(t) and h(t;?) [64]:
16
x(t) =
integraldisplay ?
0
h(t;?)s(t??)d?. (2.7)
Let H(f;?) = integraltext???h(t;?)e?j2piftdt be the Fourier transform of h(t;?). If |H(f;?)| ? 0
for |?| > ?d, then ?d is defined as the delay-spread of the channel; if |H(f;?)| ? 0 for
|f| >fd, then fd is defined as the Doppler spread of the channel [40]. Sampling s(t), x(t)
and h(t;?) in (2.7) at the symbol rate, then for t = nTs ? [t0,t0 +TTs), the sampled signal
x(n) := x(t)|t=nTs has the representation
x(n) =
Lsummationdisplay
l=0
h(n;l)s(n?l). (2.8)
Over the block interval of [t0,t0 +TTs), the channel impulse response {h(n;l)}T?1n=0 can be
represented by Q coefficients {hq (l)}Qq=1 (which remain invariant throughout this block but
are allowed to change at the next block) and the corresponding Q Fourier basis functions
that are common for each block. Then over the interval [t0,t0 + TTs), the discrete-time
baseband equivalent channel model for the block can be described as [39,40]:
h(n;l) =
Qsummationdisplay
q=1
hq(l)ej?qn (2.9a)
Q := 2?fdTTs?+1, (2.9b)
L := ??d/Ts?, (2.9c)
?q := 2piT (q? Q+12 ), q = 1,...,Q. (2.9d)
17
2.4 Orthogonal Polynomial Basis Expansion Model (OP-BEM)
A time-varying channel over a fixed time interval can also be expressed as a superpo-
sition of polynomials with invariant coefficients. Following [6], by a Taylor series expan-
sion, the continuous-time channel impulse response h(t;?) within a window of time interval
[t0,t0 +TTs) with respect to a midpoint nTs +t0 is given by
h(t;?) =
Ksummationdisplay
i=0
?(i)n (?)
parenleftbiggt?nT
s ?t0
Ts
parenrightbiggi
+RK(t;?), (2.10)
where the coefficients ?(i)n (?) are defined as
?(i)n (?) := T
is
i!
bracketleftbiggdih(t;?)
dti
bracketrightbigg
t=nTs+t0
(2.11)
and RK(t;?) is the remainder of the Taylor series, given by
RK(t;?) := (t?nTs ?t0)
K+1
(K +1)!
bracketleftbiggdK+1h(t;?)
dtK+1
bracketrightbigg
t=s?
(2.12)
for some s? ? [t,nTs +t0]. The polynomials [(t?nTs ?t0)/Ts]i (i = 0,1,...,K) serve as
the basis functions in (2.10). In mobile wireless channels, the bandwidth of h(t;?) in t (the
Doppler spread) is strictly bounded above by v/? where v is the velocity of the mobile
and ? is the carrier wavelength. Therefore, h(t;?) can be differentiated to any order with
respect to t in the mean square sense, and so (2.11) and (2.12) are well defined [6].
Since h(t;?) is band-limited in t, for a given window size T, limK??|RK(t;?)|2 = 0.
Thus with increasing K, the polynomial approximation becomes more and more accurate.
As pointed out in [6], increasing the polynomial order K allows the window size to be
18
increased significantly, without the remainder term (2.12) becoming large. Sampling h(t;?)
every Ts seconds (t = nTs +t0 ? [t0,t0 +TTs)) and ignoring the error remainder term, we
have the discrete time-varying channel impulse response as follows
h(n;l) =
Ksummationdisplay
i=0
?(i)(l)(n?n)i, (2.13)
which is valid over a duration of TTs seconds (T samples).
The polynomials {1,t,t2,...,tK} are linearly independent over [?1,1], but not or-
thogonal. A QR-decomposition was suggested in [6] and generated a unitary matrix of-
fering an orthonormal set of basis vectors. Or equivalently, via the Gram-Schmidt pro-
cedure over the interval [?1,1], we get the Legendre polynomials [53]. By appropri-
ate scaling and translation of the (original) Legendre polynomials, we can obtain modi-
fied Legendre polynomials which are orthonormal over the interval [t0,t0 + TTs]. Sam-
pling these polynomials at the symbol interval Ts, we get the orthogonal polynomial ba-
sis expansion model (OP-BEM). Let p(i)(?t) denote the orthonormal Legendre polynomial
of degree (order) i over the interval [?1,1]. To extend [?1,1] to [t0,t0 + TTs], we set
t = (TTs/2)?t +t0 + (TTs/2), leading to ?t = (2/(TTs))[t?t0] ? 1 and modified Legendre
polynomials p(i)?(t) = p(i)((2/(TTs))[t?t0]?1) orthonormal over the interval [t0,t0 +TTs].
Sample p(i)?(t)?s at t = nTs +t0, (n = 0,1,...) to obtain the discretized modified Legendre
polynomials
?i(n) := p(i)(2nT ?1). (2.14)
19
The discretized modified Legendre polynomials up to degree (order) five are as follows:
?0(n) = 1,
?1(n) = c1
bracketleftbigg2
Tn?1
bracketrightbigg
,
?2(n) = c2
bracketleftbigg
(2Tn?1)2 ? 13
bracketrightbigg
,
?3(n) = c3
bracketleftbigg
(2Tn?1)3 ? 35( 2Tn?1)
bracketrightbigg
,
?4(n) = c4
bracketleftbigg
(2Tn?1)4 ? 67( 2Tn?1)2 + 335
bracketrightbigg
,
?5(n) = c5
bracketleftbigg
(2Tn?1)5 ? 109 (2Tn?1)3 + 521(2Tn?1)
bracketrightbigg
where 0 ?n?T ?1 and ci = 1/
radicalBigsummationtext
T?1
n=0 ?i(n)2 for i = 0,1,...,5.
Using the basis functions of (2.14), the polynomial representation of equation (2.13) is
written as
h(n;l) =
Ksummationdisplay
i=0
hi(l)?i(n), 0 ?l?L. (2.15)
Note that the number of basis functions is Q =K +1.
2.5 Discrete Prolate Spheroidal Basis Expansion Model (DPS-BEM)
Using CE-BEM to describe a band-limited channel, it has been observed that this
truncated discrete Fourier transform (DFT)-based model has the following drawback: The
rectangular window associated with the DFT introduces spectral leakage?the energy at
each individual frequency leaks to the full frequency range [65]. An effect similar to Gibbs
phenomenon results in significant amplitude and phase distortion at the beginning and the
end of the block [94]. The modeling error of CE-BEM may cause a noticeable floor in bit
20
error rate (BER) curves, as shown in [2]. Taking advantage of OP-BEM will reduce the
spectral leakage to some extent [51], whereas the polynomial functions are neither time-
limited nor band-limited. Its square bias varies significantly over the range of the Doppler
spread [94].
An ideal basis function should have at least two properties: It is band-limited to the
normalized frequency range [?fdTs,fdTs]; and its energy is time-concentrated in a certain
time interval 0 ?n?T ?1. Given the maximum normalized Doppler bandwidth fdTs and
the window size T, we seek a sequence to maximize
?=
summationtextT?1
n=0 |u(n)|
2
summationtext?
m=??|u(m)|
2 (2.16)
with the band-limited constraint
u(n) =
integraldisplay fdTs
?fdTs
U(f)ej2pifndf
where U(f) =summationtext?m=??u(m)e?j2pifm.
The discrete prolate spheroidal (DPS) sequences {ui (n)} give us the solution of the
above constrained maximization problem [69], which is defined as the real-valued solution
of
T?1summationdisplay
n=0
sin[2pi(n?m)fdTs]
pi(n?m) ui (n) = ?iui (m)
for i = 1,...,T and ?? < m < ?. For the discrete time index 0 ? n ? T ?1, the i-th
time-limited DPS vector ui :=
bracketleftbigg
ui (0) ui (1) ??? ui (T ?1)
bracketrightbiggT
is the i-th eigenvector of
a matrix C:
Cui = ?iui, (2.17)
21
where the (n,m)-th entry of the T ?T matrix C is
[C]n,m = sin[2pi(n?m)fdTs]pi(n?m)
and ?1 ??2 ?...??T are the eigenvalues of C.
The DPS sequences are orthonormal on the finite time interval 0 ? n ? T ? 1, and
orthogonal on the doubly infinite interval, i.e.,
T?1summationdisplay
n=0
ui (n)uk (n) = ?i
?summationdisplay
m=??
ui (m)uk (m) = ?(i?k).
The band-limited (infinite) sequence {u1 (m)} has the maximum energy concentration in
0 ? m ? T ? 1, {u2 (m)} is the next band-limited sequence that has the most energy
concentration among the DPS sequences orthogonal to {u1 (m)}, and so on. By (2.16),
the eigenvalues ?i?s are a measure for energy concentration, which are clustered near 1 for
i? ?2fdTsT?+1 and drop rapidly toward zero when i>?2fdTsT?+1 [94]. Therefore, the
number of dimensions of time-limited snapshots of a band-limited channel is approximately
given by [69]
Q = ?2fdTsT?+1. (2.18)
All the properties describedso far make it possible to greatly reduce spectral leakage induced
by the CE-BEM, by using several DPS sequences to form the basis set to approximate a
band-limited time-varying channel.
22
In a DPS-BEM representation, we assume that
h(n;l) =
Qsummationdisplay
q=1
hq (l)uq (n). (2.19)
The square bias of the DPS-BEM to approximate a time-varying channel is several magni-
tudes lower than that of the CE-BEM over the range of Doppler spreads [94].
2.6 Modeling Error of BEM?s
We illustrate the modeling errorof CE-, OP-, and DPS-BEM?s by a simulation example.
2.6.1 LS Approximation by Basis Expansion Models
In a BEM, we assume that the time-varying channel satisfies
h(n;l) =
Qsummationdisplay
q=1
hq (l)?q (n), (2.20)
where ?q (n) is the q-th basis function (corresponding to ej?qn in (2.9), ?q?1(n) in (2.15),
and uq (n) in (2.19)), and Q is the number of basis functions (Q = K + 1 for (2.15)).
However, the true channel may not exactly follow this expression, for modeling error always
occurs. We revise (2.20) as
h(n;l) =
Qsummationdisplay
q=1
hq (l)?q (n) +e(n;l)
where e(n;l) denotes the modeling error. By the orthogonality principle, e(n;l) is or-
thogonal to the basis set {?q (n)}Qq=1 when the square error summationtextT?1n=0 |e(n;l)|2 is minimized.
23
Then
T?1summationdisplay
n=0
h(n;l)??q? (n) =
Qsummationdisplay
q=1
hq (l)
T?1summationdisplay
n=0
?q (n)??q? (n) = hq? (l)
T?1summationdisplay
n=0
vextendsinglevextendsingle?
q? (n)
vextendsinglevextendsingle2.
Therefore
hq? (l) =
summationtextT?1
n=0 h(n;l)?
?
q? (n)summationtext
T?1
n=0
vextendsinglevextendsingle?
q? (n)
vextendsinglevextendsingle2 .
The LS approximation by a BEM is given by
?h(n;l) =
Qsummationdisplay
q=1
summationtextT?1
n=0 h(n;l)|?q (n)|
2
summationtextT?1
n=0 |?q (n)|
2 . (2.21)
2.6.2 Simulation Model
In the simulations of this dissertation, we use the modified Jakes? model (2.6) to rep-
resent the ?real? channel. We emphasize that BEM representations are only used for pro-
cessing at the receiver. A discrete-time baseband Rayleigh fading channel (which can be
SISO, SIMO, or MIMO in the subsequent chapters) of order L is generated (see (2.8)). For
different taps (i.e., different l?s), h(n;l)?s are mutually independent, and for a given tap, we
follow (2.6) to generate h(n;l) by sampling X(t) with symbol period Ts:
h(n;l) = X(t)|t=nTs.
In simulations, we take M = 25 in (2.6).
24
0 20 40 60 80 100 120 140 160 180 200?60
?50
?40
?30
?20
?10
0
10
fd (Hz)
Normalized Channel MSE (dB)
Modeling Error of BEM?s: L=2, Ts=25?s, T=400, 1000 runs.
 
 
CE?BEM: Q=3,5,7
OP?BEM: Q=4,5,6
DPS?BEM (known Doppler): Q=4,5,6
DPS?BEM (unknown Doppler): Q=4,5,6
Figure 2.1: Modeling error of CE-, OP-, and DPS-BEM?s in approximating a three-tap
(L = 2) Rayleigh fading channel following Jakes? model.
2.6.3 Simulation Example: Modeling Error of CE-, OP-, and DPS-BEM?s in
Approximating a Doubly-Selective Channel
We consider a system with carrier frequency of 2GHz, data rate of 40kBd (kilo-Bauds),
therefore, Ts = 25?s. The maximum Doppler spread (in Hz) fd = ?m/2pi ranges from 0Hz
to 200Hz (or the normalized Doppler shift ranges from 0 to 0.005), corresponding to a
maximum mobile velocity in the range 0 to 108km/h.
A SISO three-tap (L = 2) Rayleigh fading channel is generated using (2.6). We try
to approximate this Jakes? model using different BEM?s: CE-BEM, OP-BEM, DPS-BEM
with known Doppler spread, and DPS-BEM with Doppler spread unknown. If the Doppler
spread is known, we can follow (2.17) to obtain the DPS sequences using the exact Doppler
25
spread; whereas we can only use predetermined DPS sequences if the Doppler spread is
unknown. In this example, we assume fd = 200Hz in (2.17) to get the DPS sequences.
We pick up a data record length of 400 symbols, average over 1000 realizations of
randomly generated channels, and plot the normalized channel mean square error (NCMSE)
defined as
NCMSE :=
summationtext1000
i=1
summationtextT?1
n=0
summationtext2
l=0
vextendsinglevextendsingle
vextendsingleh(i) (n;l)??h(i) (n;l)
vextendsinglevextendsingle
vextendsingle
2
summationtext1000
i=1
summationtextT?1
n=0
summationtext2
l=0
vextendsinglevextendsingleh(i) (n;l)vextendsinglevextendsingle2 ,
where h(i) (n;l) denotes the realization of the channel in the i-th run, and ?h(i) (n;l) denotes
the BEM-based approximation by (2.21). For CE-BEM, we plot the NCMSE curves with
Q = 3, 5, and 7 (note that in (2.9), only odd Q is allowed). For OP- and DPS-BEM
representations, we take Q = 4, 5, and 6.
Using more basis functions apparently reduces modeling error, which is confirmed by
Figure 2.1. The OP- and DPS-BEM with known Doppler spreads outperform CE-BEM
significantly for small fd?s. As fd grows, OP-BEM deteriorates quickly, whereas DPS-BEM
with known Doppler spread is still much better than the other two. If the Doppler spread is
unknown, the performance of DPS-BEM is a little worse, but it still outperforms CE-BEM.
In Figure 2.1, the NCMSE of DPS-BEM is at least two orders of magnitude lower than that
of the CE-BEM, whether with known Doppler spread or not?DPS-BEM is undoubtedly
the best among the three to describe a band-limited channel.
For a fixed Q, the NCMSE?s of the CE-BEM and DPS-BEM with unknown Doppler
spread fluctuate mildly over the range of Doppler spreads. In these two scenarios, the
BEM?s are both band-limited. If the ?true? channel (that is also band-limited within the
Doppler shifts) lies within the frequency band of the BEM, the resulting error should not
fluctuate significantly for different Doppler spreads.
26
2.7 Conclusions
In this chapter, we reviewed characteristics and representations of wireless channels.
We first discussed Jakes? model, which will be used as the ?real? channel in simulation
examples in the following chapters. For channel estimation and data processing at the
receiver, a BEM, a more parsimonious representation that is independent of the ?real?
channel, will be used to describe the temporal variation of the channel. We discussed
CE-, OP-, and DPS-BEM?s. Although the CE-BEM is more convenient in the theoretical
analysis, its modeling error is noticeable due to the spectral leakage. We may employ OP-
and DPS-BEM?s to reduce this phenomenon. We also compared the modeling error of
the three BEM?s?the DPS-BEM has the minimum modeling error among the three, since
spectral leakage is greatly reduced due to the energy concentration of the DPS sequences.
27
Chapter 3
First-Order Statistics-Based Estimation of Doubly-Selective Channels
3.1 Introduction
An estimator for time-invariant frequency-selective channels, using periodic superim-
posed training and the first-order statistics of the observations, was proposed in [82]. This
estimator was soon extended to doubly-selective channels by exploiting CE-BEM in [81].
We start with this CE-BEM-based channel estimator in [81], which offers us the basic
framework of our superimposed training-based channel estimation schemes.
In this chapter, we first review the first-order statistics-based doubly-selective channel
estimator of [81] in Section 3.2. By exploiting the band-limitedness of DPS sequences, in
Section 3.3 this estimator is extended to DPS-BEM. Since polynomial models are not band-
limited, we propose a more general estimator for OP-BEM in Section 3.4, which can apply
to arbitrary BEM representations. In Section 3.5, we extend our CE- and DPS-BEM-based
channel estimators to multiuser systems. Exploiting the band-limitedness of CE- and DPS-
BEM?s, the channel estimation across different users is decoupled by assigning user-specific
training sequences to different frequencies for distinct users. Our approaches are illustrated
by simulation examples in Section 3.6. Section 3.7 concludes this chapter.
3.2 First-Order Statistics-Based Channel Estimation Using CE-BEM [81]
Considera single-input multiple-output (SIMO) discrete-time basebandcommunication
system. Let {s(n)}denote the input symbol sequence that is transmitted over an FIR linear
channel with N outputs and discrete-time impulse response {h(n;l)} (N-column vector
28
channel response at time n to a unit input at time n?l). The vector channel may be the
result of multiple receiver antennas or over-sampling at the receiver. The channel output is
given by
x(n) =
Lsummationdisplay
l=0
h(n;l)s(n?l). (3.1)
The noisy measurement is given by
y(n) = x(n)+v(n) (3.2)
where v(n) is an N-column white complex-Gaussian noise vector. To allow for mean-value
ambiguity, we take E{v(n)} = m, with m unknown. In practice, linear systems arise
because of linearization about some operating (set) point??bias? in BJT/FET amplifiers.
These set points are typically unknown (at least not known precisely) a priori, and one does
not normally worry about them since unknown means are estimated and removed before
processing (blocked by capacitor-coupling etc.) and they are not needed in any processing.
However, we will initially use the first-order statistics, i.e., E{y(n)}, of the noisy data.
Then we must include a term such as nonzero m.
Channel taps are superpositions of complex exponentials weighted by time-invariant
coefficients in CE-BEM. The time-varying SIMO channel response is given by (2.9)
h(n;l) =
Qsummationdisplay
q=1
hq(l)ej?qn, (3.3)
29
for
Q := 2?fdTTs?+1,
L := ??d/Ts?,
?q := 2piT
parenleftbigg
q? 12 ? Q2
parenrightbigg
, q = 1,2,...,Q,
where ?d is the (multipath) delay-spread and fd is the Doppler spread. The above represen-
tation is valid over a duration of TTs sec. with symbol interval Ts sec. If ?d and fd (or their
upper bounds) are known (typically true), then h(n;l) is unknown up to only time-invariant
quantities hq(l)?s.
In superimposed training, the transmitted signal {s(n)} is the superposition of the
information sequence {b(n)} and a training sequence {c(n)}, i.e.,
s(n) = b(n)+c(n). (3.4)
Assume the following:
(H3.2.1) The time-varying channel {h(n;l)} satisfies (3.3) where the frequencies ?q?s (q =
1,2,...,Q) are distinct and known with ?q ? [0,2pi). Also N ? 1.
(H3.2.2) The information sequence {b(n)} is zero-mean, white with E{| b(n)|2} = ?2b.
(H3.2.3) The measurement noise {v(n)} may be nonzero-mean (E{v(n)} = m), white, uncor-
related with {b(n)}, with E{[v(n + ?) ? m][v(n) ? m]H} = ?2vIN?(?). The mean
vector m may be unknown.
30
(H3.2.4) The superimposed training sequencec(n) = c(n+P) for allnis a non-random periodic
sequence with period P.
By (H3.2.4), we have
cm := 1P
P?1summationdisplay
n=0
c(n)e?j?mn, (3.5a)
c(n) =
P?1summationdisplay
m=0
cmej?mn, for all n, (3.5b)
?m = 2pim/P, m = 0,1,...,P ?1. (3.5c)
The coefficients cm?s are known at the receiver since the training sequence {c(n)} is known.
We then have
y(n) =
Lsummationdisplay
l=0
Qsummationdisplay
q=1
hq(l)ej?qn
bracketleftBigg
b(n?l) +
P?1summationdisplay
m=0
cmej?m(n?l)
bracketrightBigg
+v(n).
By (H3.2.2) and (H3.2.3), the expectation of the observation at time n is
E{y(n)} =
Qsummationdisplay
q=1
P?1summationdisplay
m=0
bracketleftBigg Lsummationdisplay
l=0
cmhq(l)e?j?ml
bracketrightBigg
ej(?q+?m)n +m.
Defining
dmq :=
Lsummationdisplay
l=0
cmhq(l)e?j?ml, (3.6)
we then have
E{y(n)} =
Qsummationdisplay
q=1
P?1summationdisplay
m=0
dmqej(?q+?m)n +m. (3.7)
31
Suppose that we pick P to be such that (?q +?m)?s are all distinct for any choice of
m and q; for instance, take TP?1 ? Q. In fact, we will take TP?1 = K ? Q where K is
an integer, so that ?q?s and ?m?s are on the same frequency grid of resolution T?1. Then
the sequence E{y(n)} is periodic with cycle frequencies (?q + ?m), for 1 ? q ? Q and
0 ?m?P ?1.
By (3.7),
y(n) = E{y(n)}+e(n) =
Qsummationdisplay
q=1
P?1summationdisplay
m=0
dmqej(?q+?m)n +m+e(n)
where {e(n)} is a zero-mean random ?error? sequence.
Define the cost function
J =
T?1summationdisplay
n=0
bardble(n)bardbl2 =
T?1summationdisplay
n=0
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddoublevextenddoubley(n)?
Qsummationdisplay
q=1
P?1summationdisplay
m=0
dmqej(?q+?m)n +m
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddoublevextenddouble
2
. (3.8)
Choose dmq?s to minimize J. We must have
?J
?d?mq
vextendsinglevextendsingle
vextendsinglevextendsingle
dmq=?dmq
=
T?1summationdisplay
n=0
e?j(?q+?m)n
?
?y(n)?
Qsummationdisplay
q?=1
P?1summationdisplay
m?=0
dmqej(?q?+?m?)n +m
?
?= 0
for each dmq. A consistent mean-square (m.s.) estimate of dmq, for ?q +?m negationslash= 0, follows as
?dmq = 1
T
T?1summationdisplay
n=0
y(n)e?j(?q+?m)n. (3.9)
It follows from (3.7) and (3.9) that
E
braceleftBig?
dmq
bracerightBig
= dmq (3.10)
32
for ?q +?m negationslash= 0. As T ? ?, ?dmq ? dmq m.s. if ?q +?m negationslash= 0, and ?dmq ? dmq +m m.s. if
?q +?m = 0.
It is established in [82] that given dmq for 1 ? q ? Q and 1 ? m ? P ? 1, we can
(uniquely) estimate hq(l)?s if P ? L+ 2, ?m negationslash= 0, and cm negationslash= 0 for all m negationslash= 0. Since m is
unknown and ?q +?m = 0 only when m = 0, we will omit the term m = 0 for further
discussion. For 1 ?m?P ?1, define the NQ-column vector
Dm := [dTm1, dTm2, ..., dTmQ]T, (3.11)
and for 0 ?l?L, define the NQ-column vector
Hl := [hT1 (l), hT2 (l), ..., hTQ(l)]T. (3.12)
Then by (3.6), we have
Dm =
Lsummationdisplay
l=0
cme?j?mlHl
for 1 ?m?P ?1. Define the NQ(P ?1)?NQ(L+1) matrix
C := (diag{c1,...,cP?1}V)?INQ (3.13)
where
V :=
?
??
??
??
??
??
1 e?j?1 ??? e?j?1L
1 e?j?2 ??? e?j?2L
... ... ... ...
1 e?j?P?1 ??? e?j?P?1L
?
??
??
??
??
??
, (3.14)
33
the NQ(L+1)-column vector
H :=
bracketleftbigg
HT0, HT1, ..., HTL
bracketrightbiggT
, (3.15)
and the NQ(P ?1)-column vector
D :=
bracketleftbigg
DT1, DT2, ..., DTP?1
bracketrightbiggT
. (3.16)
Then (3.6) leads to
CH = D. (3.17)
Since ?m?s are distinct and cm negationslash= 0 for all m, rank(C) = NQ(L+1) if P ?L+2; hence, we
can determine hq(l)?s uniquely. Define ?Dm as in (3.11) with dmq replaced with ?dmq. Define
?D as in (3.16) with Dm replaced with ?Dm. Then the estimate of H is given by
?H = (CHC)?1CH ?D. (3.18)
By (3.10) and (3.17), it follows that
E
braceleftBig?
H
bracerightBig
= H. (3.19)
Denote the corresponding estimate of hq(l) by ?hq(l) for q = 1,2,...,Q and l = 0,1,...,L.
Following (3.3), the time-varying channel coefficients are given by
?h(n;l) =
Qsummationdisplay
q=1
?hq(l)ej?qn, l = 0,1,...,L, 0 ?n?T ?1. (3.20)
34
We summarize this estimator in the following lemmas:
Lemma 3.2.1: Under assumptions (H3.2.1)?(H3.2.4), the channel estimator (3.18)
is unbiased by (3.19) if the periodic training sequence is such that cm negationslash= 0 for all m negationslash= 0,
P ?L+2, and P and T are such that T = KP for integer K ?Q.
Lemma 3.2.2: Under assumptions (H3.2.1)?(H3.2.4), the channel estimator (3.18)
is consistent in probability if the periodic training sequence is such that cm negationslash= 0 for all
mnegationslash= 0, P ?L+2 and P is such that ?q +?m negationslash= 0 for q = 1,2,...,Q and mnegationslash= 0, and Q is
fixed as T becomes large.
Remark 3.2.1: If the channel length L is unknown, an upper bound Lu will suffices.
Then we are estimating ?h(n;l) for l = 0,1,...,Lu and limT?? ?h(n;l) = 0 for l>L.
Remark 3.2.2: We do not need cm negationslash= 0 for every m. We need at least L+2 nonzero
cm?s.
Remark 3.2.3: If the noise v(n) is zero-mean, i.e., m = 0, we do not have to discard
d0q. Thus, by setting
?V :=
?
??
??
??
??
??
1 1 ??? 1
1 e?j?1 ??? e?j?1L
... ... ... ...
1 e?j?P?1 ??? e?j?P?1L
?
??
??
??
??
??
, (3.21)
?C := (diag{c0,c1,...,cP?1}V)?INQ, (3.22)
?D :=
bracketleftbigg
DT0, DT1, ..., DTP?1
bracketrightbiggT
, (3.23)
then we have ?CH = ?D and
?H = (?CH ?C)?1 ?CH ??D. (3.24)
35
To identify the BEM coefficients, we now need P ? L+ 1. All our results hold true if we
use appropriate substitutions.
3.3 First-Order Statistics-Based Channel Estimation Using DPS-BEM
The first-order statistics-based channel estimator described in Section 3.2, using CE-
BEM representation, can be easily extended to DPS-BEM.
As we discussed in Section 2.5, band-limitedness and energy-concentration of DPS
sequences greatly reduce the spectral leakage intrinsic to CE-BEM. Therefore, better per-
formance can be expected if we use DPS-BEM instead in the first-order statistics-based
estimator.
In the DPS-BEM representation, we assume that
h(n;l) =
Qsummationdisplay
q=1
hq (l)uq (n), (3.25)
where {uq (n)} is the q-th DPS sequence.
Similar to (H3.2.1)?(H3.2.4), we assume:
(H3.3.1) The time-varying channel {h(n;l)} satisfies (3.25) with the DPS sequences {uq (n)}
known at the receiver. Also N ? 1.
(H3.3.2) The information sequence {b(n)} is zero-mean, white with E{| b(n)|2} = ?2b.
(H3.3.3) The measurement noise {v(n)} is zero-mean, white, uncorrelated with {b(n)}, with
E{v(n+?)vH(n)} = ?2vIN?(?).
(H3.3.4) The superimposed training sequencec(n) = c(n+P) for allnis a non-random periodic
sequence with period P.
36
Note that for the time being we assume that the measurement noise is zero-mean, i.e.,
m = 0.
By the SIMO channel model (3.1), (3.2), and the DPS-BEM representation (3.25), we
have
y(n) =
Lsummationdisplay
l=0
Qsummationdisplay
q=1
hq(l)uq (n)[b(n?l)+c(n?l)] +v(n)
=
Lsummationdisplay
l=0
Qsummationdisplay
q=1
hq(l)uq (n)
bracketleftBigg
b(n?l) +
P?1summationdisplay
m=0
cmej?m(n?l)
bracketrightBigg
+v(n).
By (H3.3.2) and (H3.3.3), the expectation of the observation at time n is
E{y(n)} =
Qsummationdisplay
q=1
P?1summationdisplay
m=0
bracketleftBigg Lsummationdisplay
l=0
cmhq(l)e?j?ml
bracketrightBigg
uq (n)ej?mn. (3.26)
Using (3.6), we have
E{y(n)} =
Qsummationdisplay
q=1
P?1summationdisplay
m=0
dmquq (n)ej?mn. (3.27)
It follows that
y(n) =
Qsummationdisplay
q=1
P?1summationdisplay
m=0
dmquq (n)ej?mn+e(n)
where {e(n)} is a zero-mean random sequence.
Define the cost function as in (3.8)
J =
T?1summationdisplay
n=0
bardble(n)bardbl2 =
T?1summationdisplay
n=0
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddoublevextenddoubley(n)?
Qsummationdisplay
q=1
P?1summationdisplay
m=0
dmquq (n)ej?mn
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddoublevextenddouble
2
.
37
Choose dmq?s to minimize J. We must have
?J
?d?mq
vextendsinglevextendsingle
vextendsinglevextendsingle
dmq=?dmq
=
T?1summationdisplay
n=0
?
?y(n)?
Qsummationdisplay
q?=1
P?1summationdisplay
m?=0
dm?q?uq? (n)ej?m?n
?
?uq (n)e?j?mn = 0,
which leads to
Qsummationdisplay
q?=1
P?1summationdisplay
m?=0
?dm?q?
bracketleftBiggT?1summationdisplay
n=0
uq? (n)uq (n)ej(?m???m)n
bracketrightBigg
=
T?1summationdisplay
n=0
y(n)uq (n)e?j?mn. (3.28)
We then define
gmq :=
T?1summationdisplay
n=0
y(n)uq (n)e?j?mn
and substitute it into (3.28)
gmq =
Qsummationdisplay
q?=1
P?1summationdisplay
m?=0
?dm?q?
bracketleftBiggT?1summationdisplay
n=0
uq? (n)uq (n)ej(?m???m)n
bracketrightBigg
. (3.29)
By the definitions (3.11), (3.12), (3.15), and (3.21)?(3.23), we also have
?CH = ?D. (3.30)
If P ?L+1, then rank
parenleftBig?
C
parenrightBig
= NQ(L+1) [82]. We can determine the hq (l)?s uniquely.
Define ??D and G in the similar way as (3.23) with dmq replaced with ?dmq or gmq, then
(3.29) turns out to be
G =(??IN) ?D
38
where the entries of the PQ?PQ matrix ? are
[?]mQ+q,m?Q+q? =
T?1summationdisplay
n=0
uq? (n)uq (n)ej(?m???m)n.
The estimate of ?D is given by
??D =parenleftbig??1?INparenrightbigG. (3.31)
By (3.30) and (3.31) we have the estimate of channel coefficient
?H = C? ??D =parenleftBig?CH ?CparenrightBig?1 ?CHparenleftbig??1?INparenrightbigG. (3.32)
The channel estimate is then given by
?h(n;l) =
Qsummationdisplay
q=1
?hq (l)uq (n). (3.33)
Remark 3.3.1: Since DPS sequences are approximately band-limited to the normal-
ized frequency range [?fdTs,fdTs], it follows that
T?1summationdisplay
n=0
uq? (n)uq (n)ej(?m???m)n ??parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig (3.34)
when fdTs ? 1/P and T is a multiple of P or T is ?large?. This is usually true, and a short
period P helps to achieve it. Under the assumption (3.34),
? ? IPQ.
39
By (3.28) and (3.34),
?dmq =
T?1summationdisplay
n=0
y(n)uq (n)e?j?mn. (3.35)
The estimate (3.32) is then given by
?H =parenleftBig?CH ?CparenrightBig?1 ?CH ??D. (3.36)
Remark 3.3.2: If the mean of the noise v(n) is unknown, suppose E{v(n)} = m.
Under the approximation (3.34), we should omit the first row (corresponding to ?0) of ?V
in (3.21) (denote the resulting (P ?1)?(L+1) matrix by V, as in (3.14)), and also omit
the block D0 from ?D in (3.23) (denote the resulting matrix by D, as in (3.16)). We have
CH = D and
?H = (CHC)?1CH ?D, (3.37)
where C is defined in (3.13) and ?D is acquired by (3.35). To identify the BEM coefficients,
we now need P ?L+2. All our results hold true if appropriate substitutions are used.
3.4 First-Order Statistics-Based Channel Estimation Using OP-BEM
The first-order statistics-based channel estimators, using CE-and DPS-BEM?s, exploit
band-limitedness of the basis functions. OP-BEM, however, does not have this property.
We assume the SIMO channel with N outputs satisfies the OP-BEM representation
(2.15), i.e.,
h(n;l) =
Ksummationdisplay
q=0
hq(l)?q(n), 0 ?l ?L (3.38)
where ?q(n) is the discretized modified Legendre polynomial of degree q.
40
We make the following assumptions:
(H3.4.1) The time-varying channel {h(n;l)} satisfies (3.38) where {?q (n)}Kq=0 are known at
the receiver. N ? 1.
(H3.4.2) The information sequence {b(n)} is zero-mean, white with E{| b(n)|2} = ?2b.
(H3.4.3) The measurement noise {v(n)} may be nonzero-mean (E{v(n)} = m), white, uncor-
related with {b(n)}, with E{[v(n + ?) ? m][v(n) ? m]H} = ?2vIN?(?). The mean
vector m may be unknown.
(H3.4.4) The superimposed training sequencec(n) = c(n+P) for allnis a non-random periodic
sequence with period P.
By the SIMO channel model (3.1)?(3.2), and the OP-BEM representation (3.38), we
have
y(n) =
Lsummationdisplay
l=0
Ksummationdisplay
q=0
hq(l)?q (n)[b(n?l)+c(n?l)]+v(n)
with mean
E{y(n)} =
Lsummationdisplay
l=0
Ksummationdisplay
q=0
hq(l)?q (n)c(n?l) +m.
It follows that
y(n) = E{y(n)}+e(n)
where {e(n)} is a zero-mean random ?error? sequence and
e(n) = y(n)?
Lsummationdisplay
l=0
Ksummationdisplay
q=0
hq (l)?q (n)c(n?l)?m.
41
We define the cost function as
J =
T?1summationdisplay
n=0
bardble(n)bardbl2 =
T?1summationdisplay
n=0
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddoublevextenddoubley(n)?
Lsummationdisplay
l=0
Ksummationdisplay
q=0
hq (l)?q (n)c(n?l)?m
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddoublevextenddouble
2
.
Choose m and hq (l)?s (q = 0,1,...,K; l = 0,1,...,L) to minimize the cost function J. We
must have
?J
?m?
vextendsinglevextendsingle
vextendsinglevextendsingle
vextendsingle m=?mhq(l)=?hq(l) = 0 , and
?J
?h?q (l)
vextendsinglevextendsingle
vextendsinglevextendsingle
vextendsinglevextendsingle m=?m
hq(l)=?hq(l)
= 0 ,
leading to
?m = 1T
T?1summationdisplay
n=0
?
?y(n)?
Lsummationdisplay
l=0
Ksummationdisplay
q=0
hq (l)?q (n)c(n?l)
?
? (3.39)
and
Lsummationdisplay
l=0
Ksummationdisplay
q=0
?hq (l)
bracketleftBigg
1
T
T?1summationdisplay
n=0
?q (n)??q1 (n)c(n?l)c?(n?l1)
bracketrightBigg
= 1T
T?1summationdisplay
n=0
[y(n)? ?m]??q1 (n)c?(n?l1). (3.40)
Substitute (3.39) in (3.40), we then have
Lsummationdisplay
l=0
Ksummationdisplay
q=0
?hq (l)?[(q,l),(q1,l1)] = 1
T
T?1summationdisplay
n=0
y(n)
bracketleftBigg
??q1 (n)c?(n?l1)? 1T
T?1summationdisplay
n=0
??q1 (n)c?(n?l1)
bracketrightBigg
,
(3.41)
where we define
?[(q,l),(q1,l1)] := 1T
T?1summationdisplay
n=0
?q (n)??q1 (n)c(n?l)c?(n?l1)
?
bracketleftBigg
1
T
T?1summationdisplay
n=0
?q (n)c(n?l)
bracketrightBiggbracketleftBigg
1
T
T?1summationdisplay
n=0
??q1 (n)c?(n?l1)
bracketrightBigg
. (3.42)
42
We further define a (K +1)(L+1)?(K +1)(L+1) matrix
?OP :=
?
??
??
??
??
??
??
??
??
??
??
?
?[(0,0),(0,0)] ??? ?[(K,0),(0,0)] ??? ?[(K,1),(0,0)] ??? ?[(K,L),(0,0)]
... ... ... ... ... ... ...
?[(0,0),(K,0)] ??? ?[(K,0),(K,0)] ??? ?[(K,1),(K,0)] ??? ?[(K,L),(K,0)]
... ... ... ... ... ... ...
?[(0,0),(K,1)] ??? ?[(K,0),(K,1)] ??? ?[(K,1),(K,1)] ??? ?[(K,L),(K,1)]
... ... ... ... ... ... ...
?[(0,0),(K,L)] ??? ?[(K,0),(K,L)] ??? ?[(K,1),(K,L)] ??? ?[(K,L),(K,L)]
?
??
??
??
??
??
??
??
??
??
??
?
(3.43)
whose [(K +1)l1 +q1 +1,(K +1)l+q+1]-th entry is?[(q,l),(q1,l1)], a (K +1)(L+1)-
column vector
?(n) :=
?
??
??
??
??
??
??
??
??
??
??
??
??
?
??0 (n)c?(n)? 1T summationtextT?1n=0 ??0 (n)c?(n)
...
??K (n)c?(n)? 1T summationtextT?1n=0 ??K (n)c?(n)
??0 (n)c?(n?1)? 1T summationtextT?1n=0 ??0 (n)c?(n?1)
...
??K (n)c?(n?1)? 1T summationtextT?1n=0 ??K (n)c?(n?1)
...
??K (n)c?(n?L)? 1T summationtextT?1n=0 ??K (n)c?(n?L)
?
??
??
??
??
??
??
??
??
??
??
??
??
?
,
43
and the channel coefficient vectors
?Hl :=
bracketleftbigg
?hT0 (l) ?hT1 (l) ??? ?hTK (l)
bracketrightbiggT
, (3.44)
?H : =
bracketleftbigg
?HT0 ?HT1 ??? ?HTL
bracketrightbiggT
. (3.45)
We also define Hl and H as the vectors of true values of channel coefficients, corresponding
to (3.44) and (3.45), respectively. Then (3.41) can be written into
(?OP ?IN) ?H = 1T
T?1summationdisplay
n=0
?(n)?y(n),
which yields
?H = 1
T
T?1summationdisplay
n=0
??OP?(n)?y(n). (3.46)
The estimate of the time-varying channel h(n;l) is then given by
?h(n;l) =
Ksummationdisplay
q=0
?hq (l)?q (n). (3.47)
Remark 3.4.1: We did not use assumption (H3.4.4) throughout the manipulations,
so that aperiodic training can be used also.
Remark 3.4.2: Neither orthogonality nor properties of polynomials was used to ob-
tain (3.46). We can use any basis set {?q (n)}T?1n=0 (q = 1,2,...,Q), as in (2.20), instead
of polynomials in (3.46). The basis functions are only required to be linearly indepen-
dent, not necessarily orthogonal. This offers a general channel estimator using BEM?s and
superimposed training.
44
If we consider CE-BEM in (3.42) and replace ?q (n) with ?q (n) = ej?qn for ?q :=
2pi
T (q?
1
2 ?
Q
2 ), we have
?[(q,l),(q1,l1)] = 1T
T?1summationdisplay
n=0
ej?qne?j?q1nc(n?l)c?(n?l1)
?
bracketleftBigg
1
T
T?1summationdisplay
n=0
ej?qnc(n?l)
bracketrightBiggbracketleftBigg
1
T
T?1summationdisplay
n=0
e?j?q1nc?(n?l1)
bracketrightBigg
. (3.48)
Taking (3.5) into account, we have
1
T
T?1summationdisplay
n=0
ej?qne?j?q1nc(n?l)c?(n?l1) =
P?1summationdisplay
m=0
|cm|2e?j?m(l?l?)?(q?q1) (3.49)
and
1
T
T?1summationdisplay
n=0
ej?qnc(n?l) = c0?
parenleftbigg
q? Q+12
parenrightbigg
, (3.50)
so that (3.48) becomes
?[(q,l),(q1,l1)] =
P?1summationdisplay
m=0
|cm|2e?j?m(l?l1)?(q?q1)?|c0|2?
parenleftbigg
q? Q+12
parenrightbigg
?
parenleftbigg
q1 ? Q+12
parenrightbigg
.
(3.51)
Substituting (3.50) and (3.51) into (3.41), it follows that
P?1summationdisplay
m=0
c?mej?ml1
bracketleftBigg Lsummationdisplay
l=0
cm?hq (l)e?j?ml
bracketrightBigg
?
Lsummationdisplay
l=0
?hQ+1
2
(l)|c0|2?
parenleftbigg
q? Q+12
parenrightbigg
=
P?1summationdisplay
m=0
c?mej?ml1
bracketleftBigg
1
T
T?1summationdisplay
n=0
y(n)e?j(?q+?m)n
bracketrightBigg
? 1T
T?1summationdisplay
n=0
y(n)c?0?
parenleftbigg
q? Q+12
parenrightbigg
. (3.52)
45
By (3.6) and (3.9) we have
Lsummationdisplay
l=0
cm?hq(l)e?j?ml = 1T
T?1summationdisplay
n=0
y(n)e?j(?q+?m)n (3.53)
for q = 1,2,...,Q and m = 1,2,...,P ?1. If we discard the terms corresponding to m = 0,
then (3.52) reduces to
P?1summationdisplay
m=1
c?mej?ml1
bracketleftBigg Lsummationdisplay
l=0
cm?hq (l)e?j?ml
bracketrightBigg
=
P?1summationdisplay
m=1
c?mej?ml1
bracketleftBigg
1
T
T?1summationdisplay
n=0
y(n)e?j(?q+?m)n
bracketrightBigg
(3.54)
for q = 1,2,...,Q. The solution to (3.54) coincides with that of (3.53) if the matrix C in
(3.13) has full column rank. This result demonstrates that the estimator given by (3.46)
is indeed the same as the one proposed in Section 3.2. The only difference appears that
since m is unknown, we simply discard all the terms corresponding to m = 0 in Section
3.2. However, omitting the terms corresponding to ?q +?m = 0 is enough, and the terms
corresponding to m = 0 but ?q +?m negationslash= 0 are still useful?this is the estimator proposed in
this section. Using the band-limitedness of the DPS-BEM (3.34), we have
Lsummationdisplay
l=0
cm?hq(l)e?j?ml =
T?1summationdisplay
n=0
y(n)uq (n)e?j?mn,
which is similar to (3.53). Thus the estimator proposed in Section 3.3 is also a special case
of the estimator defined by (3.41).
46
Remark 3.4.3: If the mean value of the noise m is zero or known, then instead of
(3.41), the solution of the estimator is given by (3.40) with ?m = m. We define
??[(q,l),(q1,l1)] := 1
T
T?1summationdisplay
n=0
?q (n)??q1 (n)c(n?l)c?(n?l1) (3.55)
and ??OP in the same way as in (3.43) with ?[(q,l),(q1,l1)] replaced with ??[(q,l),(q1,l1)].
Also define
??(n) :=
bracketleftbigg
c?(n) ??? c?(n?L)
bracketrightbiggT
?
bracketleftbigg
??0 (n) ??? ??K (n)
bracketrightbiggT
.
The estimator is given by
?H = 1
T
T?1summationdisplay
n=0
???OP ??(n)?y(n). (3.56)
Note that the estimators given by (3.24) and (3.36), with zero or known m, are special cases
of the estimator of (3.56), using CE- and DPS-BEM respectively.
3.5 First-Order Statistics-Based Channel Estimation: Multiple-User (MIMO)
Channels
The first-order statistics-based channel estimator using CE- or DPS-BEM can be easily
extended to a multiple-user (MIMO) system, by exploiting the band-limitedness of the basis
functions.
Consider an MIMO FIR linear channel with K inputs and N outputs. Let {sk(n)}
denote k-th user?s information sequence which is input to the MIMO channel with the k-th
47
user?s time-varying discrete-time impulse response {hk(n;l)} (channel response for the k-th
user at time instance n to a unit input at time instance n?l). The symbol-rate, N-column
channel output vector is given by
x(n) =
Ksummationdisplay
k=1
Lsummationdisplay
l=0
hk(n;l)sk(n?l). (3.57)
The noisy measurement is given by
y(n) = x(n) +v(n), (3.58)
where v(n) is the additive N-column vector white complex Gaussian noise.
In superimposed training-based approaches, for the k-th user, one takes
sk(n) = bk(n)+ck(n) (3.59)
where {bk(n)} and {ck(n)} are information sequence and non-random periodic training
sequence of the k-th user. Then the noisy channel output becomes
y(n) =
Ksummationdisplay
k=1
Lsummationdisplay
l=0
hk(n;l)[bk(n?l)+ck(n?l)] +v(n).
Assume the following:
(H3.5.1) The time-varying channel {hk(n;l)} satisfies CE- or DPS-BEM, i.e.,
hk(n;l) =
Qsummationdisplay
q=1
hqk(l)ej?qn (3.60)
48
where hqk(l) is the N-column time-invariant coefficient vector for the k-th user, or
hk (n;l) =
Qsummationdisplay
q=1
hqk (l)uq (n). (3.61)
Also N ? 1.
(H3.5.2) The information sequence {bk(n)} is zero-mean, white with E{|bk(n)|2} = ?2bk and
mutually independent for k = 1,2,...,K.
(H3.5.3) The measurement noise {v(n)} is zero-mean, white, uncorrelated with {bk(n)}, with
E{v(n+?)vH(n)} = ?2vIN?(?).
(H3.5.4) The superimposed training sequence ck(n) = ck(n + P) for all n is a non-random
periodic sequence with period P such that cmk negationslash= 0 for all m,k, and ?P is integer with
P = ?PK.
The expected value of the noisy channel output is given by
E{y(n)} =
Ksummationdisplay
k=1
Lsummationdisplay
l=0
hk(n;l)ck(n?l). (3.62)
We pick user-specific training sequences so that channel estimation is decoupled across
various users?this allows us to use the single user superimposed training based approach
outlined in Section 3.2. We assign distinct cycle frequencies of the periodic training se-
quences to distinct users. Suppose that for each user k, {ck(n)} is periodic with period
P = ?PK where ?P is a positive integer. Then
ck(n) =
P?1summationdisplay
m?=0
cm?kej(2pim?/P)n for all n.
49
Define
cm?k := 1P
P?1summationdisplay
n=0
ck(n)e?j(2pim?/P)n.
Pick {ck(n)} so that only ?P coefficients (out of total P) cm?k?s, associated with ?P distinct
frequencies, are nonzero. For instance, we may choose
ck(n) =
?P?1summationdisplay
m=0
cmkej(2pi/P)(Km+k?1)n, for all n (3.63)
such that cmk negationslash= 0 for all m,k. Define the frequencies
?mk := 2piP (Km+k?1) (3.64)
for m = 0,1,..., ?P ?1 and k = 1,2,...,K. For example, we show how to use m-sequences
(maximal length pseudo-random binary sequences) for training. Pick ?P = 2n ?1 for some
integer n such that ?P ? L + 2. Let {?c0 (n)} be an m-sequence of length ?P. Pick the
superimposedtraining sequence{c1 (n)}P?1n=0 for user1 asK repetitions of{?c0 (n)}multiplied
by a factor ?c1 so that P?1summationtextP?1n=0 |c1 (n)|2 = ?2c1. This choice satisfies (3.63) and (3.64) for
k = 1. Pick ?ck (n) = ?c1 (n)ej(2pi/P)(k?1)n for k = 2,3,...,K and ck (n) = ?ck?ck (n). Then
{ck (n)}P?1n=0 satisfies (3.63) and (3.64) for k ? 2. The above procedures can be used to
generate a user-specific training sequence of period P = ?PK from a sequence of period ?P.
It follows that
E{y(n)} =
Ksummationdisplay
k=1
Lsummationdisplay
l=0
hk(n;l)
?
?
?P?1summationdisplay
m=0
cmkej?mk
?
?, for all n. (3.65)
50
We can write (3.60) and (3.61) into a unified form as (2.20)
hk (n;l) =
Qsummationdisplay
q=1
hqk (l)?q (n), (3.66)
where ?q (n) = ej?qn in CE-BEM, and ?q (n) = uq (n) in DPS-BEM. The expected value
of the observations (3.65) can be rewritten as
E{y(n)} =
Ksummationdisplay
k=1
?P?1summationdisplay
m=0
Qsummationdisplay
q=1
bracketleftBigg Lsummationdisplay
l=0
hqk(l)cmke?j?mkl
bracketrightBigg
?q (n)ej?mkn, for all n.
Define
dmqk :=
Lsummationdisplay
l=0
hqk(l)cmke?j?mkl. (3.67)
We have
E{y(n)} =
Ksummationdisplay
k=1
?P?1summationdisplay
m=0
Qsummationdisplay
q=1
dmqk?q (n)ej?mkn.
It follows that
y(n) =
Ksummationdisplay
k=1
Qsummationdisplay
q=1
P?1summationdisplay
m=0
dmqk?q (n)ej?mkn+e(n) (3.68)
where {e(n)} is a zero-mean random sequence.
Define the cost function by (3.68)
J =
T?1summationdisplay
n=0
bardble(n)bardbl2 =
T?1summationdisplay
n=0
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddoublevextenddoubley(n)?
Ksummationdisplay
k=1
Qsummationdisplay
q=1
P?1summationdisplay
m=0
dmqk?q (n)ej?mkn
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddoublevextenddouble
2
.
51
Choose dmqk?s to minimize J. We must have
?J
?d?mqk
vextendsinglevextendsingle
vextendsinglevextendsingle
vextendsingled
mqk=?dmqk
=
T?1summationdisplay
n=0
?
?y(n)?
Ksummationdisplay
k?=1
Qsummationdisplay
q?=1
P?1summationdisplay
m?=0
dm?q?k??q? (n)ej?m?k?n
?
???q (n)e?j?mkn
= 0,
leading to
Ksummationdisplay
k?=1
Qsummationdisplay
q?=1
P?1summationdisplay
m?=0
?dm?q?k?
bracketleftBiggT?1summationdisplay
n=0
?q? (n)??q (n)ej(?m?k???mk)n
bracketrightBigg
=
T?1summationdisplay
n=0
y(n)??q (n)e?j?mkn. (3.69)
For CE-BEM, suppose that we pick P to be such that (?q +?mk)?s are all distinct for
any choice of m, k and q, e.g., take T/P ?Q. Then the sequence E{y(n)} is periodic [14]
with cycle frequencies (?q +?mk), where 1 ? q ? Q, 0 ? m ? P ? 1 and 1 ? k ? K, so
that we have
T?1summationdisplay
n=0
ej(?q???q+?m?k???mk)n = T?parenleftbigm? ?mparenrightbig?parenleftbigk? ?kparenrightbig?parenleftbigq? ?qparenrightbig. (3.70)
It follows that
?dmqk = 1
T
Tsummationdisplay
n=1
y(n)e?j(?q+?mk)n. (3.71)
52
If DPS-BEM applies, we use the fact that DPS sequences are approximately band-limited
to the normalized frequency range [?fdTs,fdTs]. Then if 2fdTs ? 1/P, we use the approx-
imation
T?1summationdisplay
n=0
uq? (n)uq (n)ej(?m?k???mk)n ??parenleftbigk? ?kparenrightbig?parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig. (3.72)
By using the approximation (3.72),
?dmqk =
Tsummationdisplay
n=1
y(n)uq (n)e?j?mkn. (3.73)
For 0 ?m? ?P ?1 and 1 ?k ?K, define an NQ-column vector
Dmk :=
bracketleftbigg
dTm1k dTm2k ??? dTmQk
bracketrightbiggT
, (3.74)
and for 0 ?l?L and 1 ?k ?K, define an NQ-column vector
Hkl :=
bracketleftbigg
hH1k(l) hH2k(l) ??? hHQk(l)
bracketrightbiggH
.
Then by equation (3.67), we have
Dmk =
Lsummationdisplay
l=0
cmke?j?mklHkl (3.75)
for 0 ?m? ?P ?1 and 1 ?k ?K. Define a ?P ?(L+1) matrix
53
?Vk :=
?
??
??
??
??
??
1 e?j?0k ??? e?j?0kL
1 e?j?1k ??? e?j?1kL
... ... ... ...
1 e?j?(?P?1)k ??? e?j?(?P?1)kL
?
??
??
??
??
??
, (3.76)
an NQ?P ?NQ(L+1) matrix
?Ck :=parenleftbigdiagbraceleftbigc0k,c1k,...,c(P?1)kbracerightbigVkparenrightbig?INQ,
an NQ(L+1)-column vector
Hk =
bracketleftbigg
HHk0 HHk1 ??? HHkL
bracketrightbiggH
(3.77)
and an NQP-column vector
?Dk =
bracketleftbigg
DH0k DH1k ??? DH(?P?1)k
bracketrightbiggH
. (3.78)
Then (3.75) leads to
?CkHk = ?Dk. (3.79)
Since ?mk?s are distinct and cmk negationslash= 0 for all m,k, rank(Ck) = NQ(L+1) if ?P ?L+1;
hence, we can determine hqk(l)?s uniquely. Define ?Dmk as in (3.74) with dmqk replaced with
?dmqk and define ??Dk as in (3.78) with Dmk replaced with ?Dmk. The estimate of Hk is given
by
?Hk = (?CHk ?Ck)?1CHk ??Dk. (3.80)
54
0 5 10 15 20 25 3010
?5
10?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=0Hz, 500 runs.
 
 
SI&CE
SI&OP
SI&DPS
TM&CE
TM&OP
TM&DPS
Figure 3.1: First-order statistics-based estimator (SISO): BER vs SNR under fd = 0Hz
(time-invariant) and K = N = 1. The curves for CE-, OP- and DPS-BEM?s completely
overlap, since the three basis functions are all constant for time-invariant channels (Q = 1).
(SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; OP: OP-BEM;
DPS: DPS-BEM.)
Denote the corresponding estimate of hqk(l) as ?hqk(l). Following (3.66), for k = 1,2,...,K,
l = 0,1,...,L and for all n, the estimate of time-varying channel is given by
?hk(n;l) =
Qsummationdisplay
q=1
?hqk(l)?q (n). (3.81)
Remark 3.5.1: If the additive noise v(n) is nonzero-mean (E{v(n)} = m) with the
mean unknown. Then (3.69) can be modified as
Ksummationdisplay
k?=1
Qsummationdisplay
q?=1
P?1summationdisplay
m?=0
?dm?q?k?
bracketleftBiggT?1summationdisplay
n=0
?q? (n)??q (n)ej(?m?k???mk)n
bracketrightBigg
=
T?1summationdisplay
n=0
[y(n)?m]??q (n)e?j?mkn.
55
0 5 10 15 20 25 3010
?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=50Hz, 500 runs.
 
 
SI&CE
SI&OP
SI&DPS
TM&CE
TM&OP
TM&DPS
Figure 3.2: First-order statistics-based estimator (SISO): BER vs SNR under fd = 50Hz
andK = N = 1. (SI: superimposedtraining; TM: time-multiplexed training; CE: CE-BEM;
OP: OP-BEM; DPS: DPS-BEM.)
Note that
T?1summationdisplay
n=0
e?j(?q+?mk)n = T?(?q +?mk),
and if 2fdTs ? 1/P, approximately we have
T?1summationdisplay
n=0
uq (n)e?j?mkn ? 0, for all ?mk negationslash= 0.
Since T/P ?Q and ?mk = 0 only happens when m = 0 and k = 1, if m = 0 we have
T?1summationdisplay
n=0
m??q (n)e?j?mkn = 0.
56
0 5 10 15 20 25 3010
?5
10?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=100Hz, 500 runs.
 
 
SI&CE
SI&OP
SI&DPS
TM&CE
TM&OP
TM&DPS
Figure 3.3: First-order statistics-based estimator (SISO): BER vs SNR under fd = 100Hz
andK = N = 1. (SI: superimposedtraining; TM: time-multiplexed training; CE: CE-BEM;
OP: OP-BEM; DPS: DPS-BEM.)
We hence omit the terms corresponding to m = 0, and then m has no effect on estimation.
We should only omit the first row of ?Vk in (3.76), and the resulting
parenleftBig?
P ?1
parenrightBig
? (L+1)
matrix is denoted by
Vk :=
?
??
??
??
??
??
1 e?j?1k ??? e?j?1kL
1 e?j?2k ??? e?j?2kL
... ... ... ...
1 e?j?(?P?1)k ??? e?j?(?P?1)kL
?
??
??
??
??
??
.
We also define an NQ
parenleftBig?
P ?1
parenrightBig
?NQ(L+1) matrix
Ck :=
parenleftBig
diag
braceleftBig
c1k,c2k,...,c(?P?1)k
bracerightBig
Vk
parenrightBig
?INQ
57
0 5 10 15 20 25 3010
?5
10?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=200Hz, 500 runs.
 
 
SI&CE
SI&OP
SI&DPS
TM&CE
TM&OP
TM&DPS
Figure 3.4: First-order statistics-based estimator (SISO): BER vs SNR under fd = 200Hz
andK = N = 1. (SI: superimposedtraining; TM: time-multiplexed training; CE: CE-BEM;
OP: OP-BEM; DPS: DPS-BEM.)
and an NQ(P ?1)-column vector
Dk =
bracketleftbigg
DH1k DH2k ??? DH(?P?1)k
bracketrightbiggH
.
Then as in (3.79), we have
CkHk = Dk,
so that the channel estimation is given by
?Hk = (CHk Ck)?1CHk ?Dk,
where ?Dk is also acquired by (3.73). For identifiability, we now need ?P ?L+2.
58
0 5 10 15 20 25 30?45
?40
?35
?30
?25
?20
?15
?10
SNR (dB)
Normalized Channel MSE (dB)
K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=0Hz, 500 runs.
 
 
SI&CE
SI&OP
SI&DPS
TM&CE
TM&OP
TM&DPS
Figure 3.5: First-order statistics-based estimator (SISO): NCMSE vs SNR under fd = 0Hz
(time-invariant) and K = N = 1. The curves for CE-, OP- and DPS-BEM?s completely
overlap, since the three basis functions are all constant for time-invariant channels (Q = 1).
(SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; OP: OP-BEM;
DPS: DPS-BEM.)
3.6 Simulation Examples
3.6.1 First-Order Statistics-Based Estimator: Single User
In this example, we generate a doubly-selective Rayleigh fading channel as we men-
tioned in Section 2.6.2, with N = 1 and L = 2, satisfying modified Jakes? model. We also
employ the communication system described in Section 2.6.3. We emphasize again that
BEM?s are only used for processing at the receiver; the ?true? channel follows Jakes? model,
not BEM?s.
In simulations, we pick a data record length of 420 symbols (time duration of approxi-
mately 10ms). We consider the system operating under different Doppler spreads. For the
59
0 5 10 15 20 25 30?35
?30
?25
?20
?15
?10
?5
SNR (dB)
Normalized Channel MSE (dB)
K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=50Hz, 500 runs.
 
 
SI&CE
SI&OP
SI&DPS
TM&CE
TM&OP
TM&DPS
Figure 3.6: First-order statistics-based estimator (SISO): NCMSE vs SNR underfd = 50Hz
andK = N = 1. (SI: superimposedtraining; TM: time-multiplexed training; CE: CE-BEM;
OP: OP-BEM; DPS: DPS-BEM.)
Doppler spreads fd = 0, 50, 100, and 200Hz, we take Q = 1, 3, 5, and 7 in CE-BEM by
(2.9b), andQ = 1, 3, 4, and 6 for OP- and DPS-BEM representations by (2.18). The average
transmitted power in {c(n)} is 0.3 of that in {b(n)}, leading to a training-to-information
power ratio (TIR) of 0.3.
We consider a single-user scenario. The information sequence {b(n)} and the train-
ing sequence {c(n)} are all modulated by binary phase-shift keying (BPSK). The periodic
training sequence {c(n)} is generated from the m-sequence of period P = 7, one period of
which is given by
{c1(n)}6n=0 = {1,?1,?1,1,1,1,?1}. (3.82)
60
0 5 10 15 20 25 30?30
?25
?20
?15
?10
?5
0
SNR (dB)
Normalized Channel MSE (dB)
K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=100Hz, 500 runs.
 
 
SI&CE
SI&OP
SI&DPS
TM&CE
TM&OP
TM&DPS
Figure 3.7: First-order statistics-based estimator (SISO): NCMSE vs SNR under fd =
100Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE:
CE-BEM; OP: OP-BEM; DPS: DPS-BEM.)
To explore different estimators under equal conditions, we assume the additive noise
{v(n)} is zero-mean (i.e., m = 0), white complex-Gaussian, uncorrelated with {b(n)} with
E{v(n+?)vH(n)} = ?2vIN?(?), so that no terms are discarded. The (receiver) SNR refers
to the energy per bit over one-sided noise spectral density with both information and su-
perimposed training sequence counting toward the bit energy.
The result are shown in Figures 3.1?3.8 for various Doppler spreads and SNR?s. The re-
sults are based on 500 Monte Carlo runs for Viterbi detectors (see Appendix B.1). For com-
parison, CE-, OP- and DPS-BEM-based periodically placed TM training with zero-padding
(see AppendixA) is also considered for doubly-selective channel estimation. We take a train-
ing session of length 2L+ 1 = 5 symbols with the training sequence braceleftbig0,0,?2L+1,0,0bracerightbig,
and at the receiver an LS estimation is performed. A data session of 17 symbols is inserted
61
0 5 10 15 20 25 30?25
?20
?15
?10
?5
0
SNR (dB)
Normalized Channel MSE (dB)
K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=200Hz, 500 runs.
 
 
SI&CE
SI&OP
SI&DPS
TM&CE
TM&OP
TM&DPS
Figure 3.8: First-order statistics-based estimator (SISO): NCMSE vs SNR under fd =
200Hz and K = N = 1. (SI: superimposed training; TM: time-multiplexed training; CE:
CE-BEM; OP: OP-BEM; DPS: DPS-BEM.)
between two such training sessions to form a frame of length 22 symbols. Such a block is
repeated over a record length of 418 symbols. Thus, we have training-to-information bit
ratio as well as training-to-information power ratio of about 0.3.
For comparison, we plot the results of CE-, OP-, and DPS-based superimposed and
TM training approaches in each figure. Figures 3.1?3.4 show the BER?s with a Viterbi
detector at the receiver. Figures 3.5?3.8 show the normalized channel mean square error
(NCMSE) correspondingly, which is defined as
NCMSE :=
summationtextMc
i=1
summationtextT?1
n=0
summationtext2
l=0
vextenddoublevextenddouble
vextenddoubleh(i) (n;l)??h(i) (n;l)
vextenddoublevextenddouble
vextenddouble
2
summationtextMc
i=1
summationtextT?1
n=0
summationtext2
l=0
vextenddoublevextenddoubleh(i) (n;l)vextenddoublevextenddouble2 (3.83)
62
0 5 10 15 20 25 3010
?6
10?5
10?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=0Hz, 500 runs.
 
 
SI&CE: Kalman
SI&DPS: Kalman
TM&CE: Kalman
TM&DPS: Kalman
SI&CE: Viterbi
SI&DPS: Viterbi
TM&CE: Viterbi
TM&DPS: Viterbi
Figure 3.9: First-order statistics-based estimator (MIMO): BER vs SNR under fd = 0Hz
(time-invariant) and K = N = 2. The curves for CE- and DPS-BEM?s completely overlap,
since the two basis functions are both constant for time-invariant channels (Q = 1). (SI:
superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM;
Kalman: a Kalman filter as the symbol detector; Viterbi: a Viterbi detector as the symbol
detector.)
where Mc denotes the number of Monte Carlo runs, h(i) (n;l) denotes the i-th realization
of the time-varying channel, and ?h(i) (n;l) denotes the acquired channel estimate.
It is seen from the figures that the DPS-BEM-based estimators, no matter whether
superimposed or TM training is used, outperform the CE- and OP-BEM-based solutions.
(For fd = 0 andQ = 1, all the three models gives the same results, since the three models all
use constants as the basis functions. The performances of superimposed and TM training,
however, are different.) It is consistent with the fact that DPS-BEM can efficiently remove
spectral leakage, and it is a much better model in describing a band-limited channel. Due
to severe spectral leakage, the CE-BEM-based results are often the worst among the three.
63
0 5 10 15 20 25 3010
?5
10?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=50Hz, 500 runs.
 
 
SI&CE: Kalman
SI&DPS: Kalman
TM&CE: Kalman
TM&DPS: Kalman
SI&CE: Viterbi
SI&DPS: Viterbi
TM&CE: Viterbi
TM&DPS: Viterbi
Figure 3.10: First-order statistics-based estimator (MIMO): BER vs SNR underfd = 50Hz
and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-
BEM; DPS: DPS-BEM; Kalman: a Kalman filter as the symbol detector; Viterbi: a Viterbi
detector as the symbol detector.)
Comparing superimposed training with TM training, we see that the superimposed
training-based estimators performs worse than their TM counterparts, although superim-
posed training can offer higher data transmission rate. As we have discussed in Chapter
1, the major issue of superimposed training is the information-induced interference (self-
interference), which results in a notable error floor at BER and NCMSE curves. We will
discuss the issue of self-interference in Chapters 4?6.
3.6.2 First-Order Statistics-Based Estimator: Multiple Users
In this example, we follow the conditions addressed in Section 3.6.1 except that a
multiple-user scenario is considered.
64
0 5 10 15 20 25 3010
?6
10?5
10?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=100Hz, 500 runs.
 
 
SI&CE: Kalman
SI&DPS: Kalman
TM&CE: Kalman
TM&DPS: Kalman
SI&CE: Viterbi
SI&DPS: Viterbi
TM&CE: Viterbi
TM&DPS: Viterbi
Figure 3.11: First-order statistics-based estimator (MIMO): BER vs SNR underfd = 100Hz
and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-
BEM; DPS: DPS-BEM; Kalman: a Kalman filter as the symbol detector; Viterbi: a Viterbi
detector as the symbol detector.)
In simulations, all the users have the same transmitted power in training and in-
formation data. The average transmitted power in {ck (n)} is 0.3 of that in {bk (n)}
(k = 1,2,...K), leading to the same TIR as in Section 3.6.1. We consider a simple two-user
scenario, i.e., K = 2, with two receive antennas, i.e., N = 2. The information sequences
{bk(n)} and the training sequences {ck(n)} are all BPSK modulated. The training sequence
is generated from the m-sequence of period ?P = 7 by the procedure we introduced in Sec-
tion 3.5. The training sequences are of length P = 14, and the training sequence for the
first user is
{c1(n)}13n=0 = {1,?1,?1,1,1,1,?1,1,?1,?1,1,1,1,?1}, (3.84)
the repetitions of the m-sequence of period ?P = 7.
65
0 5 10 15 20 25 3010
?6
10?5
10?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=200Hz, 500 runs.
 
 
SI&CE: Kalman
SI&DPS: Kalman
TM&CE: Kalman
TM&DPS: Kalman
SI&CE: Viterbi
SI&DPS: Viterbi
TM&CE: Viterbi
TM&DPS: Viterbi
Figure 3.12: First-order statistics-based estimator (MIMO): BER vs SNR underfd = 200Hz
and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE: CE-
BEM; DPS: DPS-BEM; Kalman: a Kalman filter as the symbol detector; Viterbi: a Viterbi
detector as the symbol detector.)
The additive noise {v(n)} is also zero-mean, white complex-Gaussian, uncorrelated
with {bk(n)} with E{v(n+?)vH(n)} = ?2vI2?(?). The (receiver) SNR refers to the energy
per bit per user over one-sided noise spectral density with both information and superim-
posed training sequence counting toward the bit energy.
At the receiver end, a Viterbi detector or a Kalman filter (see Appendix B.2) acts as
the symbol detector. We consider different Doppler spreads of fd = 0, 50, 100, and 200Hz
for this communications system. We also pick Q for CE-BEM as 1,3,5,7 by (2.9b) and
DPS-BEM as 1,3,4,6 by (2.18).
The results for a record length of T = 420 symbols are shown in Figures 3.9?3.16
for various Doppler spreads and SNR?s. The results are based on 500 Monte Carlo runs
66
0 5 10 15 20 25 30?50
?45
?40
?35
?30
?25
?20
?15
?10
SNR (dB)
Normalized Channel MSE (dB)
K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=0Hz, 500 runs.
 
 
SI&CE
SI&DPS
TM&CE
TM&DPS
Figure 3.13: First-order statistics-based estimator (MIMO): NCMSE vs SNR under fd =
0Hz (time-invariant) and K = N = 2. The curves for CE- and DPS-BEM?s completely
overlap, since the two basis functions are both constant for time-invariant channels (Q =
1). (SI: superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-
BEM.)
for Viterbi detectors, and 1000 runs for Kalman filters. For comparison, CE-BEM and
DPS-BEM-based periodically placed TM training (see Appendix A) is also considered for
doubly-selective channel estimation. We take a training session of length of (K +1)L+K =
8 symbols with the first user?s training sequence
braceleftBig
0,0,radicalbig(K +1)L+K,0,0,0,0,0
bracerightBig
and
the second user?s
braceleftBig
0,0,0,0,0,radicalbig(K +1)L+K,0,0
bracerightBig
. An information data session of 27
symbols is inserted between two such training sessions to form a frame of length 35 symbols.
Such a frame is repeated over a record length of 420 symbols. Thus, we have a training-
to-information bit and power ratio of about 0.3. For multiple-user communications, the
67
0 5 10 15 20 25 30?35
?30
?25
?20
?15
?10
?5
0
SNR (dB)
Normalized Channel MSE (dB)
K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=50Hz, 500 runs.
 
 
SI&CE
SI&DPS
TM&CE
TM&DPS
Figure 3.14: First-order statistics-based estimator (MIMO): NCMSE vs SNR under fd =
50Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE:
CE-BEM; DPS: DPS-BEM.)
NCMSE is defined as
NCMSE :=
summationtextK
k=1
summationtextMc
i=1
summationtextT?1
n=0
summationtext2
l=0
vextenddoublevextenddouble
vextenddoubleh(i)k (n;l)??h(i)k (n;l)
vextenddoublevextenddouble
vextenddouble
2
summationtextK
k=1
summationtextMc
i=1
summationtextT?1
n=0
summationtext2
l=0
vextenddoublevextenddouble
vextenddoubleh(i)k (n;l)
vextenddoublevextenddouble
vextenddouble
2 (3.85)
We plot the curves for Viterbi detectors and Kalman filters in each figure. The discus-
sion in Section 3.6.1 for the SISO channel applies to the MIMO channel also: DPS-BEM
performs best and CE-BEM is the worst; TM training outperforms its superimposed rival.
The optimal Viterbi detector shows its advantage in error probability over the Kalman
filter, at the expense of increased computational complexity.
68
0 5 10 15 20 25 30?30
?25
?20
?15
?10
?5
0
SNR (dB)
Normalized Channel MSE (dB)
K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=100Hz, 500 runs.
 
 
SI&CE
SI&DPS
TM&CE
TM&DPS
Figure 3.15: First-order statistics-based estimator (MIMO): NCMSE vs SNR under fd =
100Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE:
CE-BEM; DPS: DPS-BEM.)
3.7 Conclusions
In this chapter, we discussed a first-order statistics-based estimator of doubly-selective
channels using superimposed training and BEM?s. Our starting point was the CE-BEM-
based estimator proposed by [81]. Due to the spectral leakage of CE-BEM, this estimator
does not perform well in estimating a band-limited channel. We thus extended the estimator
to using DPS- and OP-BEM?s to reduce the modeling error. We further considered this
estimator in a multiple-user scenario. By assigning distinct cycle frequencies of the periodic
training sequences to distinct users, channel estimation across various users is decoupled
so that the single-user estimator can be used. Our schemes are illustrated by simulation
examples, and compared with the conventional TM training: Although higher transmission
rate has been achieved by superimposedtraining, the performance of the proposed estimator
69
0 5 10 15 20 25 30?25
?20
?15
?10
?5
0
SNR (dB)
Normalized Channel MSE (dB)
K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=200Hz, 500 runs.
 
 
SI&CE
SI&DPS
TM&CE
TM&DPS
Figure 3.16: First-order statistics-based estimator (MIMO): NCMSE vs SNR under fd =
200Hz and K = N = 2. (SI: superimposed training; TM: time-multiplexed training; CE:
CE-BEM; DPS: DPS-BEM.)
using superimposed training is inferior to that of TM training, due to the existence of
information-induced self-interference. Analyzing and reducing self-interference will be the
topic of the following three chapters.
70
Chapter 4
Performance Analysis and Parameter Design for First-Order
Statistics-Based Estimator
4.1 Introduction
A first-order statistics-based channel estimator, using superimposed training and vari-
ous BEM?s, was discussed in Chapter 3. We present performance analysis of this estimator
in this chapter.
Several parameters may affect the performance of the estimator. For example, a por-
tion of transmitted power is allocated to the superimposed training, and this portion will
apparently affect the estimator?s behavior?more training power leads to higher estimation
accuracy, but suppression of information power worsens the effective information SNR in
the meantime. Therefore, a trade-off must be made to achieve a balance for the power
allocation between training and information.
A similar consideration lies in selecting the number of basis functions for BEM?s. More
basis functions yield more accurate approximation (or less bias, see Section 2.6), and higher
estimation variance. A trade-off between bias and variance should also be studied to achieve
better estimation.
In Section 4.2, assuming that the ?true? channel follows a BEM, performance analysis
for the estimators using CE-, DPS-, and OP-BEM?s is explored; performance analysis for
the estimator of multiple-user channels is also discussed in this section. In Section 4.3,
modeling error of a BEM is counted in performance analysis. Based on the results of
performance analysis, we cast the issues of power allocation (Section 4.4) and bias-variance
71
trade-off (Section 4.5) as ones of optimizing an SNR for equalizer design. Simulation results
are provided in Section 4.6. Section 4.7 concludes this chapter.
4.2 Performance Analysis for the First-Order Statistics-Based Estimator Us-
ing BEM
We assume the following.
(H4.2.1) The SIMO channel satisfies a BEM representation as in (2.20), i.e.,
h(n;l) = hBEM (n;l) =
Qsummationdisplay
q=1
hq (l)?q (n), (4.1)
where ?q (n) is the q-th basis function (corresponding to ej?qn in CE-BEM (2.9),
?q?1(n) in OP-BEM (2.15), and uq (n) in DPS-BEM (2.19)), and Q is the number of
basis functions (note that Q = K +1 for K in (2.15)). Also N ? 1.
(H4.2.2) The information sequence {b(n)} is zero-mean, white with E{| b(n)|2} = ?2b.
(H4.2.3) The measurement noise {v(n)} is zero-mean, white, uncorrelated with {b(n)},
with E{v(n+?)vH(n)} = ?2vIN?(?).
(H4.2.4) The superimposed training sequence c(n) = c(n+P) for all n is a non-random
periodic sequence with period P, with average power ?2c :=summationtextP?1n=0 |c(n)|2/P.
(H4.2.5) The time-varying channel {h(n;l)} is zero-mean, complex Gaussian with vari-
ance ?2h, and mutually independent for distinct l?s: Ebraceleftbigh(n;l)hH (n;l)bracerightbig= ?2hIN and
E{h(n1;l1)hH (n2;l2)} = 0, for l1 negationslash= l2, for all n1, n2, i.e., different channel taps are
72
independent of each other and are identically distributed zero-mean complex Gaus-
sian.
We wish to evaluate the MSE of channel estimation when the true channel follows
(4.1). The MSE of estimation is defined as
MSE1 = 1T
T?1summationdisplay
n=0
Lsummationdisplay
l=0
E
braceleftbiggvextenddoublevextenddouble
vextenddoublehBEM (n;l)??hBEM (n;l)
vextenddoublevextenddouble
vextenddouble
2bracerightbigg (4.2)
where ?hBEM (n;l) is also given by the BEM
?hBEM (n;l) =
Qsummationdisplay
q=1
?hq (l)?q (n). (4.3)
By (H4.2.4), we define the normalized training sequence as
?c(n) = c(n)/?c, for all n
and
?cm := cm/?c, m = 0,1,...,P ?1
where
cm := 1P
P?1summationdisplay
n=0
c(n)e?j?mn,
c(n) =
P?1summationdisplay
m=0
cmej?mn, for all n,
?m = 2pim/P, m = 0,1,...,P ?1.
73
4.2.1 Performance Analysis for CE-BEM-Based Estimator
Since we have assumed the measurement noise {v(n)} is zero-mean, the CE-BEM based
channel estimate is acquired through (3.24), and
?hBEM(n;l) =
Qsummationdisplay
q=1
?hq(l)ej?qn, l = 0,1,...,L, 0 ?n?T ?1.
Let
Em (n) =
bracketleftbigg
e?j(?1+?m)n e?j(?2+?m)n ??? e?j(?Q+?m)n
bracketrightbiggT
,
and
E (n) =
bracketleftbigg
EH0 (n) EH1 (n) ??? EHP?1 (n)
bracketrightbiggH
. (4.4)
By (3.6),
??D = 1
T
T?1summationdisplay
n=0
E (n)?y(n),
and by (3.24), we have
?H = 1
T(
?CH ?C)?1 ?CH
T?1summationdisplay
n=0
E (n)?y(n). (4.5)
We define
?x(n) := y(n)?E{y(n)|H} =
Lsummationdisplay
l=0
h(n;l)b(n?l) +v(n), (4.6)
74
then
Ebraceleftbig?x(n1)?xH (n2)vextendsinglevextendsingleHbracerightbig
=
Lsummationdisplay
l1=0
Lsummationdisplay
l2=0
h(n1;l1)hH (n2;l2)?2b?(n1 ?n2 ?l1 +l2) +?2vIN?(n1 ?n2).
Using (H4.2.5),
EHbraceleftbigEbraceleftbig?x(n1)?xH (n2)vextendsinglevextendsingleHbracerightbigbracerightbig=bracketleftbig(L+1)?2h?2b +?2vbracketrightbigIN?(n1 ?n2).
Since summationtextT?1n=0 E (n)EH (n) = TIPQ, by defining cov{ ?H, ?H | H} := E{[ ?H?H][ ?H?H]H} and
using (4.5) we have (see also (3.21)?(3.23))
EH
braceleftBig
cov
braceleftBig ?
H, ?H
vextendsinglevextendsingle
vextendsingleH
bracerightBigbracerightBig
= 1T2(?CH ?C)?1 ?CH cov
braceleftBiggT?1summationdisplay
n=0
E (n)?y(n),
T?1summationdisplay
n=0
E (n)?y(n)
bracerightBigg
?CH(?CH ?C)?1
= 1T2(?CH ?C)?1 ?CH
bracketleftBiggT?1summationdisplay
n1=0
T?1summationdisplay
n2=0
E (n1)EH (n2)
bracketrightBigg
?cov{y(n),y(n)} ?CH(?CH ?C)?1
= (L+1)?
2
h?
2
b +?
2v
T2 (
?CH ?C)?1 ?CH
bracketleftBiggT?1summationdisplay
n1=0
T?1summationdisplay
n2=0
E (n1)EH (n2)
bracketrightBigg
?IN?(n1 ?n2) ?CH(?CH ?C)?1
= (L+1)?
2
h?
2
b +?
2v
T (
?CH ?C)?1 ?CHIPQ ?IN ?CH(?CH ?C)?1
= (L+1)?
2
h?
2
b +?
2v
T
parenleftBig?
CH ?C
parenrightBig?1
= (L+1)?
2?2
b +?
2v
T
parenleftBig?
VH diag
braceleftBig
|c0|2,|c1|2,...,|cP?1|2
bracerightBig?
V
parenrightBig?1
?INQ.
75
Since ??D = ?C ?H, then E{??D} = ?CE{ ?H}. By (3.9), it follows that
E{?dmq} = 1T
T?1summationdisplay
n=0
E{y(n)}e?j(?q+?m)n
= 1T
T?1summationdisplay
n=0
Lsummationdisplay
l=0
h(n;l)c(n?l)e?j(?q+?m)n
=
Lsummationdisplay
l=0
Qsummationdisplay
q?=1
hq? (l)
P?1summationdisplay
m?=0
cm?e?j?m?l
bracketleftBigg
1
T
T?1summationdisplay
n=0
e?j(?q??q?+?m??m?)n
bracketrightBigg
=
Lsummationdisplay
l=0
Qsummationdisplay
q?=1
hq? (l)
P?1summationdisplay
m?=0
cm?e?j?m?l?parenleftbigm?m?parenrightbig?parenleftbigq?q?parenrightbig
=
Lsummationdisplay
l=0
hq (l)cme?j?ml.
Therefore, E{??D} = ?CH, or ?CE
braceleftBig?
H
bracerightBig
= ?CH. Since ?C is full column-rank and P ?L+1, we
have E
braceleftBig?
H
bracerightBig
= H. Now we evaluate the MSE of the channel estimate given by (4.3):
MSE1
= 1T
T?1summationdisplay
n=0
Lsummationdisplay
l=0
E
?
?
?
?
?
Qsummationdisplay
q1=1
bracketleftBig
hq1 (l)??hq1 (l)
bracketrightBigH
e?j?q1n
?
?
?
?
Qsummationdisplay
q2=1
bracketleftBig
hq2 (l)??hq2 (l)
bracketrightBig
ej?q2n
?
?
?
?
?
=
Lsummationdisplay
l=0
E
??
?
Qsummationdisplay
q1=1
Qsummationdisplay
q2=1
bracketleftBig
hq1 (l)??hq1 (l)
bracketrightBigHbracketleftBig
hq2 (l)??hq2 (l)
bracketrightBigbracketleftBigg1
T
T?1summationdisplay
n=0
e?j(?q1??q2)n
bracketrightBigg??
?
=
Lsummationdisplay
l=0
E
?
?
?
Qsummationdisplay
q1=1
Qsummationdisplay
q2=1
bracketleftBig
hq1 (l)??hq1 (l)
bracketrightBigHbracketleftBig
hq2 (l)??hq2 (l)
bracketrightBig
?(q1 ?q2)
?
?
?
= tr
braceleftBig
EH
braceleftBig
cov
braceleftBig ?
H, ?H
vextendsinglevextendsingle
vextendsingleH
bracerightBigbracerightBigbracerightBig
=
bracketleftbig(L+1)?2
h?
2
b +?
2vbracketrightbigNQ
T?2c tr
braceleftbiggparenleftBig
?VH diagbraceleftBig|?c0|2,|?c1|2,...,|?cP?1|2bracerightBig?VparenrightBig?1
bracerightbigg
. (4.7)
76
Remark 4.2.1.1: If the zero-mean assumption of (H4.2.1.3) is relaxed, i.e., we follow
the assumption (H3.2.3) instead, then the estimator follows (3.18) and the terms corre-
sponding to m = 0 are discarded. We omit the entry E0 (n) in (4.4), and use C, V, and
?D in (3.13) (3.14), and (3.16) instead of ?C, ?V, and ??D. We have the MSE of the channel
estimation when the mean of the noise is unknown:
MSE1 =
bracketleftbig(L+1)?2
h?
2
b +?
2vbracketrightbigNQ
T?2c tr
braceleftbiggparenleftBig
VH diag
braceleftBig
|?c1|2,|?c2|2,...,|?cP?1|2
bracerightBig
V
parenrightBig?1bracerightbigg
. (4.8)
Remark 4.2.1.2: If we define an interference factor If as
If = NQT?2
c
tr
braceleftbiggparenleftBig
?VH diagbraceleftBig|?c0|2,|?c1|2,...,|?cP?1|2bracerightBig?VparenrightBig?1
bracerightbigg
or
If = NQT?2
c
tr
braceleftbiggparenleftBig
VH diag
braceleftBig
|?c1|2,|?c2|2,...,|?cP?1|2
bracerightBig
V
parenrightBig?1bracerightbigg
,
we can clearly see that the MSE of the channel estimation consists of two parts: one is
given by (L+1)?2h?2bIf coming from the self-interference, and the other one ?2vIf is the
noise-induced part. Normally (L+1)?2h?2b ??2v, so that the estimation error mainly comes
from the interference from information data.
4.2.2 Performance Analysis for DPS-BEM-Based Estimator
Consider (3.35). From observation y(n), the estimate ?dmq has contributions from the
information sequence {b(n)}, which is unknown at the receiver, the superimposed training
{c(n)}, which is known at the receiver, and the measurement noise v(n). It follows (3.1),
77
(3.2), (3.4), (3.25), and (3.35) that
?dmq =
T?1summationdisplay
n=0
y(n)uq (n)e?j?mn
=
T?1summationdisplay
n=0
bracketleftBigg Lsummationdisplay
l=0
h(n;l)c(n?l)+
Lsummationdisplay
l=0
h(n;l)b(n?l)+v(n)
bracketrightBigg
uq (n)e?j?mn
=
T?1summationdisplay
n=0
bracketleftBigg
E{y(n)}+
Lsummationdisplay
l=0
h(n;l)b(n?l)+v(n)
bracketrightBigg
uq (n)e?j?mn.
Then by (H4.2.1)?(H4.2.4), (3.27), (3.34), and (3.35)
E{?dmq} =
T?1summationdisplay
n=0
E{y(n)}uq (n)e?j?mn
=
T?1summationdisplay
n=0
Qsummationdisplay
q?=1
P?1summationdisplay
m?=0
dm?q?uq? (n)ej?m?nuq (n)e?j?mn
=
Qsummationdisplay
q?=1
P?1summationdisplay
m?=0
dm?q??parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig
= dmq. (4.9)
Define
wmq :=
T?1summationdisplay
n=0
v(n)uq (n)e?j?mn,
which is zero-mean and by (H4.2.3) and (3.34)
Ebraceleftbigwm?q?wHmqbracerightbig=
T?1summationdisplay
n=0
T?1summationdisplay
n?=0
Ebraceleftbigvparenleftbign?parenrightbigvH (n)bracerightbiguq?parenleftbign?parenrightbiguq (n)ej(?mn??m?n?)
=
T?1summationdisplay
n=0
T?1summationdisplay
n?=0
?2vIN?parenleftbign? ?nparenrightbiguq?parenleftbign?parenrightbiguq (n)ej(?mn??m?n?)
= ?2vIN?parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig. (4.10)
78
Thus,
?dmq = dmq +smq +wmq, (4.11)
where
smq :=
T?1summationdisplay
n=0
bracketleftBigg Lsummationdisplay
l=0
h(n;l)b(n?l)
bracketrightBigg
uq (n)e?j?mn. (4.12)
Clearly, the information sequence?s contribution, which is given bysmq above, interferes with
the estimation of dmq, hence with channel estimation from the observations (see (3.36)).
Since ?C is full column-rank when P ? L+ 1, by (3.30) and (4.9), we have E{??D} = ?D
and E{ ?H} = H. Then from (3.36)
cov
braceleftBig ?
H, ?H
vextendsinglevextendsingle
vextendsingleH
bracerightBig
=
parenleftBig?
CH ?C
parenrightBig?1 ?
CH cov{??D, ??D}?C
parenleftBig?
CH ?C
parenrightBig?1
. (4.13)
Consider the zero-mean interference smq in (4.12), by (H4.2.2), (H4.2.5), and (3.34)
Ebraceleftbigsm?q?sHmqbracerightbig=
T?1summationdisplay
n?=0
Lsummationdisplay
l?=0
T?1summationdisplay
n=0
Lsummationdisplay
l=0
Ebraceleftbighparenleftbign?;l?parenrightbighH (n;l)bracerightbigEbraceleftbigbparenleftbign? ?l?parenrightbigb?(n?l)bracerightbig
?uq (n)uq?parenleftbign?parenrightbigej(?mn??m?n?)
= (L+1)?2h?2bIN
T?1summationdisplay
n=0
uq (n)uq? (n)ej(?m??m?)n
= (L+1)?2h?2bIN?parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig. (4.14)
By (H4.2.3), since the noise v(n) uncorrelated with the zero-mean information sequence
{b(n)}
Ebraceleftbigsm?q?wHmqbracerightbig= 0.
79
Then by (4.10), (4.11) and (4.14),
E
braceleftBig
[?dmq ?dmq][?dm?q? ?dm?q?]H
bracerightBig
= E
braceleftBig
[smq +wmq]bracketleftbigsm?q? +wm?q?bracketrightbigH
bracerightBig
= Ebraceleftbigsm?q?sHmqbracerightbig+Ebraceleftbigwm?q?wHmqbracerightbig
=bracketleftbig(L+1)?2h?2b +?2vbracketrightbigIN?parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig.
We further have
cov{??D, ??D} =bracketleftbig(L+1)?2h?2b +?2vbracketrightbigINPQ. (4.15)
Substitute (4.15) for (4.13)
cov
braceleftBig ?
H, ?H
vextendsinglevextendsingle
vextendsingleH
bracerightBig
=bracketleftbig(L+1)?2h?2b +?2vbracketrightbig
parenleftBig?
CH ?C
parenrightBig?1
.
Using the orthonormality of the DPS sequence, the MSE in channel estimation (4.2) is then
given by
MSE1
= 1T
T?1summationdisplay
n=0
Lsummationdisplay
l=0
E
?
?
?
?
?
Qsummationdisplay
q1=1
bracketleftBig
hq1 (l)??hq1 (l)
bracketrightBigH
uq1 (n)
?
?
?
?
Qsummationdisplay
q2=1
bracketleftBig
hq2 (l)??hq2 (l)
bracketrightBig
uq2 (n)
?
?
?
?
?
= 1T
Lsummationdisplay
l=0
E
?
?
?
Qsummationdisplay
q1=1
Qsummationdisplay
q2=1
bracketleftBig
hq1 (l)??hq1 (l)
bracketrightBigHbracketleftBig
hq2 (l)??hq2 (l)
bracketrightBig
?(q1 ?q2)
?
?
?
= 1T tr{cov
braceleftBig ?
H, ?H
vextendsinglevextendsingle
vextendsingleH
bracerightBig
}
=
bracketleftbig(L+1)?2
h?
2
b +?
2vbracketrightbigNQ
T?2c tr
braceleftbiggparenleftBig
?VH diagbraceleftBig|?c0|2,|?c1|2,...,|?cP?1|2bracerightBig?VparenrightBig?1
bracerightbigg
. (4.16)
80
Remark 4.2.2.1: If the mean of noise v(n) is unknown, we replace ?C in (4.16) with
C as in (3.13). Then the MSE in channel estimation is
MSE1 =
bracketleftbig(L+1)?2
h?
2
b +?
2vbracketrightbigNQ
T?2c tr
braceleftbiggparenleftBig
VH diag
braceleftBig
|?c1|2,|?c2|2,...,|?cP?1|2
bracerightBig
V
parenrightBig?1bracerightbigg
. (4.17)
Whether the noise is zero-mean or not, the interference from the information sequence
contributes a majority of the estimation error.
Remark 4.2.2.2: We assume (3.34) which holds precisely for CE-BEM if replacing
uq (n) withej?qn, so that the CE-BEM-based estimator and the DPS-BEM-based estimator
give us the same MSE results (compare (4.7) and (4.8) with (4.16) and (4.17)), given the
assumption that the ?true? channel follows CE- or DPS-BEM respectively. In this section,
the modeling error of a BEM in describing a real channel has been omitted.
4.2.3 Performance Analysis for OP-BEM-Based Estimator
Now we turn to the OP-BEM-based channel estimator (3.46). The noise v(n) is of
unknown mean m. Similar to (4.6) we define
?x(n) := y(n)?E{y(n)|H} =
Lsummationdisplay
l=0
h(n;l)b(n?l)+v(n)?m,
then
Ebraceleftbig?x(n1)?xH (n2)vextendsinglevextendsingleHbracerightbig
=
Lsummationdisplay
l1=0
Lsummationdisplay
l2=0
h(n1;l1)hH (n2;l2)?2b?(n1 ?n2 ?l1 +l2) +?2vIN?(n1 ?n2).
81
Since h(n;l)?s are independent for different l?s,
EHbraceleftbigEbraceleftbig?x(n1)?xH (n2)vextendsinglevextendsingleHbracerightbigbracerightbig
= E
braceleftBigg Lsummationdisplay
l=0
h(n1;l)hH (n2;l)
bracerightBigg
?2b?(n1 ?n2) +?2vIN?(n1 ?n2)
=bracketleftbig(L+1)?2h?2b +?2vbracketrightbigIN?(n1 ?n2). (4.18)
By (3.46) and (4.18), we have
EH
braceleftBig
cov
braceleftBig ?
H, ?H
vextendsinglevextendsingle
vextendsingleH
bracerightBigbracerightBig
= 1T2
T?1summationdisplay
n1=0
T?1summationdisplay
n2=0
??OP?(n1)?H (n2)??HOP ?EHbraceleftbigEbraceleftbig?x(n1)?xH (n2)vextendsinglevextendsingleHbracerightbigbracerightbig
=
bracketleftbig(L+1)?2
h?
2
b +?
2vbracketrightbig
T2 ?
?
OP
bracketleftBiggT?1summationdisplay
n=0
?(n)?H (n)
bracketrightBigg
??HOP ?IN. (4.19)
Note that 1T summationtextT?1n=0 ?(n)?H (n) = ?OP and ?OP = ?HOP. We define the normalized
??OP := ?OP/?2c, then (4.19) becomes
EH
braceleftBig
cov
braceleftBig ?
H, ?H
vextendsinglevextendsingle
vextendsingleH
bracerightBigbracerightBig
=
bracketleftbig(L+1)?2
h?
2
b +?
2vbracketrightbig
T?2c
???OP ?IN.
Let
?(n) =
bracketleftbigg
?0 (n)IN ?1 (n)IN ??? ?K (n)IN
bracketrightbigg
,
?(n) = IL+1 ??(n).
82
By the orthonormality of {?q (n)}T?1n=0, summationtextTn=1 ?H (n)?(n) = IN(L+1)(K+1). It follows the
OP-BEM (3.38) and (3.47) that
hBEM (n;l) = ?(n)Hl and ?hBEM (n;l) = ?(n) ?Hl.
The estimation MSE is given by
MSE1 = 1T
T?1summationdisplay
n=0
Lsummationdisplay
l=0
EHE
braceleftbiggbraceleftbiggvextendsinglevextendsingle
vextendsingle?hBEM (n;l)?hBEM (n;l)
vextendsinglevextendsingle
vextendsingle
2vextendsinglevextendsinglevextendsingle
vextendsingleH
bracerightbiggbracerightbigg
= 1T
T?1summationdisplay
n=0
tr
braceleftBig
?(n)EH
braceleftBig
cov
braceleftBig ?
H, ?H
vextendsinglevextendsingle
vextendsingleH
bracerightBigbracerightBig
?H (n)
bracerightBig
= (L+1)?
2
h?
2
b +?
2v
T2?2c
T?1summationdisplay
n=0
tr
braceleftBig
?(n)
parenleftBig?
??OP ?IN
parenrightBig
?H (n)
bracerightBig
= (L+1)?
2
h?
2
b +?
2v
T2?2c
T?1summationdisplay
n=0
tr
braceleftBig
IN(L+1)(K+1)
parenleftBig?
??OP ?IN
parenrightBigbracerightBig
=
bracketleftbig(L+1)?2
h?
2
b +?
2vbracketrightbigN
T2?2c tr
???OP.
Remark 4.2.3.1: If the measurement noise is zero-mean, i.e., m = 0, the channel
MSE is given by
MSE1 =
bracketleftbig(L+1)?2
h?
2
b +?
2vbracketrightbigN
T2?2c tr
????OP, (4.20)
where ???OP = ??OP/?2c (see Remark 3.4.3 for the definition of ??OP and related discus-
sions).
Remark 4.2.3.2: In Remark 3.4.2, we have shown that this estimator can apply to
any BEM representation. To reconfirm this, we now consider the channel MSE given by
(4.7), (4.16), and (4.20), where we assume m = 0 in all the three cases. The entries of ???OP
83
are
???(q,q1,l,l1) := 1
T
T?1summationdisplay
n=0
?q (n)??q1 (n)?c(n?l)?c?(n?l1).
We denote ???CE and ???DPS if we replace ?q (n) with ej?qn/?T and uq (n) in ???OP. Since we
have
1
T
T?1summationdisplay
n=0
ej?qn?
T
e?j?q1n?
T ?c(n?l)?c
?(n?l1) = 1
T
P?1summationdisplay
m=0
|?cm|2e?j?m(l?l?)?(q?q1)
by (3.49), and
1
T
T?1summationdisplay
n=0
uq (n)uq1 (n)?c(n?l)?c?(n?l1) = 1T
P?1summationdisplay
m=0
|?cm|2e?j?m(l?l?)?(q?q1)
by (3.34), then it follows that
???DPS ? ???CE = ?VH diagbraceleftBig|c0|2,|c1|2,...,|cP?1|2bracerightBig?V?IQ.
Thus, if we replace ?q (n) with ej?qn/?T or uq (n) in ???OP, (4.20) gives us the same result
as (4.7) or (4.16).
4.2.4 Performance Analysis for Multiple-User (MIMO) Channels
We now analyze the estimation performance of the MIMO estimator proposed in Sec-
tion 3.5.
Due to band-limitedness, the analysis approach used in Section 4.2.2 can also apply to
CE-BEM. We hence consider the performance of CE- and DPS-BEM-based MIMO channel
estimator in this way.
84
We assume the following:
(H4.2.4.1) The time-varying channel satisfies a BEM representation, as in (2.20), i.e.,
hk (n;l) = hBEMk (n;l) =
Qsummationdisplay
q=1
hqk (l)?q (n),
where ?q (n) is the q-th basis function (corresponding to ej?qn in CE-BEM (2.9) and
uq (n) in DPS-BEM (2.19)), and Q is the number of basis functions. Also N ? 1.
(H4.2.4.2) The information sequence {bk(n)} is zero-mean, white with E{| bk(n)|2} = ?2bk
and mutually independent for k = 1,2,...,K.
(H4.2.4.3) The measurement noise {v(n)} is zero-mean (m = 0), white, uncorrelated with
{bk(n)}, with E{v(n+?)vH(n)} = ?2vIN?(?).
(H4.2.4.4) The superimposed training sequence ck(n) = ck(n + P) for all n is a non-
random periodic sequence with period P and average power ?2ck :=summationtextP?1n=0 |ck (n)|2/P
such that cmk negationslash= 0 for all m,k, and ?P is integer with P = ?PK.
(H4.2.4.5) The time-varying channel{hk (n;l)}is zero-mean, complex Gaussian with vari-
ance ?2hk, and mutually independent for distinct l?s: Ebraceleftbighk (n;l)hHk (n;l)bracerightbig = ?2hIN
and E{hk (n1;l1)hHk (n2;l2)} = 0, for l1 negationslash= l2, for all n1, n2, i.e., different channel
taps are independent of each other and are identically distributed zero-mean complex
Gaussian. In addition, E{hk? (n1;l1)hHk (n2;l2)} = 0 for for all n1, n2, l1, and l2 if
k? negationslash= k, i.e., the channels of different users are mutually independent.
85
Considering (3.71) and (3.73), we have
?dmqk =
Tsummationdisplay
n=1
y(n)??q (n)e?j?mkn, (4.21)
where ?q (n) := ej?qn/?T for the CE-BEM and ?q (n) := uq (n) for the DPS-BEM. By
(3.70) and (3.72),
T?1summationdisplay
n=0
?q? (n)??q (n)ej(?m?k???mk)n ??parenleftbigk? ?kparenrightbig?parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig. (4.22)
From y(n), the estimate ?dmqk has contributions from the information sequence {bk (n)}
(k = 1,2,...,K) unknown at the receiver, the superimposed training {ck (n)} known at the
receiver, and the measurement noise v(n). It follows (3.57)?(3.59) that
?dmqk =
T?1summationdisplay
n=0
y(n)??q (n)e?j?mkn
=
T?1summationdisplay
n=0
bracketleftBigg
E{y(n)}+
Ksummationdisplay
k=1
Lsummationdisplay
l=0
hk (n;l)b(n?l)+v(n)
bracketrightBigg
??q (n)e?j?mkn.
By (H4.2.4.1)?(H4.2.4.5), (3.62) (note that m = 0), (4.21), and (4.22)
E{?dmqk} =
T?1summationdisplay
n=0
E{y(n)}??q (n)e?j?mkn
=
T?1summationdisplay
n=0
Ksummationdisplay
k=1
?P?1summationdisplay
m=0
Qsummationdisplay
q=1
dm?q?k??q? (n)ej?m?k?n??q (n)e?j?mkn
= dmqk. (4.23)
86
Define
wmqk :=
T?1summationdisplay
n=0
v(n)?q (n)e?j?mkn,
which is zero-mean and by (H4.2.4.3) and (4.22)
Ebraceleftbigwm?q?k?wHmqkbracerightbig=
T?1summationdisplay
n=0
T?1summationdisplay
n?=0
Ebraceleftbigvparenleftbign?parenrightbigvH (n)bracerightbig?q?parenleftbign?parenrightbig??q (n)ej(?mkn??m?k?n?)
= ?2vIN?parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig?parenleftbigk? ?kparenrightbig. (4.24)
Thus ?dmqk = dmqk +smqk +wmqk where
smqk :=
T?1summationdisplay
n=0
bracketleftBigg Ksummationdisplay
k?=1
Lsummationdisplay
l=0
hk? (n;l)bk? (n?l)
bracketrightBigg
??q (n)e?j?mkn. (4.25)
Also, the information sequence?s contribution, given by smqk above, interferes with the
estimation of dmqk, hence with channel estimation from the observations.
Since ?Ck is full column-rank when ?P ?L+1, by (3.79) and (4.23), we haveE{??Dk} = ?Dk
and E{ ?Hk} = Hk. Then by (3.79)
cov
braceleftBig ?
Hk, ?Hk
vextendsinglevextendsingle
vextendsingleHk
bracerightBig
=
parenleftBig?
CHk ?Ck
parenrightBig?1 ?
CHk cov{??Dk, ??Dk}?Ck
parenleftBig?
CHk ?Ck
parenrightBig?1
. (4.26)
Consider the zero-mean interference smqk in (4.25), by (H4.2.4.2), (H4.2.4.5), and (4.22)
Ebraceleftbigsm?q?k?sHmqkbracerightbig
=
T?1summationdisplay
n?=0
Lsummationdisplay
l?=0
T?1summationdisplay
n=0
Lsummationdisplay
l=0
Ksummationdisplay
k?=1
Ksummationdisplay
k=1
Ebraceleftbighk?parenleftbign?;l?parenrightbighHk (n;l)bracerightbigEbraceleftbigbk?parenleftbign? ?l?parenrightbigb?k (n?l)bracerightbig
87
??q (n)??q?parenleftbign?parenrightbigej(?mkn??m?k?n?)
=
T?1summationdisplay
n=0
Lsummationdisplay
l=0
Ksummationdisplay
k?=1
Ksummationdisplay
k=1
Ebraceleftbighk (n;l)hHk (n;l)bracerightbig?2bk?parenleftbigk? ?kparenrightbig?q (n)??q? (n)ej(?mkn??m?k?n)
=
Ksummationdisplay
k=1
(L+1)?2hk?2bkIN
T?1summationdisplay
n=0
?q (n)??q? (n)ej(?mkn??m?k?n)
= (L+1)IN
parenleftBigg Ksummationdisplay
k=1
?2hk?2bk
parenrightBigg
?parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig?parenleftbigk? ?kparenrightbig. (4.27)
By (H4.2.4.3), since the noise v(n) are uncorrelated with {bk(n)}, E
braceleftBig
sm?q?k?wHmqk
bracerightBig
= 0.
Then by (4.24) and (4.27),
E
braceleftBig
[?dmqk ?dmqk][?dm?q?k? ?dm?q?k?]H
bracerightBig
= Ebraceleftbigsm?q?k?sHmqkbracerightbig+Ebraceleftbigwm?q?k?wHmqkbracerightbig
=
bracketleftBigg
(L+1)
Ksummationdisplay
k=1
?2hk?2bk +?2v
bracketrightBigg
IN?parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig?parenleftbigk? ?kparenrightbig.
We have
cov{??Dk, ??Dk} =
bracketleftBigg
(L+1)
Ksummationdisplay
k=1
?2hk?2bk +?2v
bracketrightBigg
IN ?PQ. (4.28)
Substitute (4.28) for (4.26)
cov
braceleftBig ?
Hk, ?Hk
vextendsinglevextendsingle
vextendsingleHk
bracerightBig
=
bracketleftBigg
(L+1)
Ksummationdisplay
k=1
?2hk?2bk +?2v
bracketrightBiggparenleftBig
?CHk ?CkparenrightBig?1.
Using orthonormality of the basis functions, the channel MSE for thek-th user is then given
by
MSE1k = 1T
T?1summationdisplay
n=0
Lsummationdisplay
l=0
E
?
?
?
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddoublevextenddouble
Qsummationdisplay
q=1
bracketleftBig
hqk (l)??hqk (l)
bracketrightBigH
??qk (n)
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddoublevextenddouble
2??
?
88
= 1T tr{cov
braceleftBig ?
Hk, ?Hk
vextendsinglevextendsingle
vextendsingleHk
bracerightBig
}
=
bracketleftBigg
(L+1)
Ksummationdisplay
k=1
?2hk?2bk +?2v
bracketrightBigg
tr
braceleftbiggparenleftBig
?CHk ?CkparenrightBig?1
bracerightbigg
=
bracketleftBig
(L+1)summationtextKk=1?2hk?2bk +?2v
bracketrightBig
NQ
T?2c tr
braceleftbiggparenleftBig
?VHk diagbraceleftBig|?c0|2,|?c1|2,...,vextendsinglevextendsingle?c?P?1vextendsinglevextendsingle2bracerightBig?VkparenrightBig?1
bracerightbigg
.
(4.29)
Remark 4.2.4.1: If the mean of v(n) is unknown, we have
MSE1k
=
bracketleftBig
(L+1)summationtextKk=1?2hk?2bk +?2v
bracketrightBig
NQ
T?2c tr
braceleftbiggparenleftBig
VHk diag
braceleftBig
|?c1|2,|?c2|2,...,vextendsinglevextendsingle?c?P?1vextendsinglevextendsingle2
bracerightBig
Vk
parenrightBig?1bracerightbigg
.
Remark 4.2.4.2: Compare (4.29) with (4.7) and (4.16), we can see that for the esti-
mation of a multiple-user channel, multiple-user interference (MUI) exists and information
data from all users act as interference. The MUI linearly increases as more users join the
system.
4.3 Performance Analysis for the First-Order Statistics-Based Estimator: with
Modeling Error
In Section 4.2, we assumed that the channel (SIMO or MIMO) follows a BEM rep-
resentation, i.e., the modeling error was omitted. In practice, modeling error always exits
(see Section 2.6). In approximating a band-limited channel, CE-, OP-, and DPS-BEM have
89
distinct performances. If modeling error is considered, (4.1) is now revised as
h(n;l) = hBEM (n;l) +eBEM (n;l) =
Qsummationdisplay
q=1
hq (l)?q (n)+eBEM (n;l)
where eBEM (n;l) is the modeling error that is intrinsic to the BEM representation and
has nothing to do with BEM-based channel estimation. Therefore, the MSE of channel
estimation consists of two parts: One comes from the estimation, which has been discussed
in Section 4.2; the other part arises from the modeling error.
Consider a ?complete? basis matrix
?T :=
?
??
??
??
??
??
?1 (0) ?2 (0) ??? ?T (0)
?1 (1) ?2 (1) ??? ?T (1)
... ... ... ...
?1 (T ?1) ?2 (T ?1) ??? ?T (T ?1)
?
??
??
??
??
??
,
where{?q (n)}Tq=1 represents
braceleftBig
ej2pi[q?(T+1)/2]/T/?T
bracerightBigT
q=1
in CE-BEM, or modifiedLegendre
polynomials of degree 0 to T ?1 in OP-BEM, or the total T eigenvectors of the matrix C
defined in (2.17). Note that ?T is unitary, and for an arbitrary channel {h(n;l)}T?1n=0, the
following formula is always true:
h(n;l) =
Tsummationdisplay
q=1
hq (l)?q (n),
90
or equivalently
bracketleftbigg
hT (0;l) hT (1;l) ??? hT (T ?1;l)
bracketrightbiggT
= (?T ?IN)
bracketleftbigg
hT1 (l) hT2 (l) ??? hTT (l)
bracketrightbiggT
,
(4.30)
where
hq (l) =
T?1summationdisplay
n=0
h(n;l)??q (n). (4.31)
Given the above representations, one may view the BEM representation (2.20) as an ap-
proximation where we use only Q (out of total T) basis functions to describe the channel.
In a BEM, (4.30) is approximated by using
bracketleftbigg
hTBEM (0;l) hTBEM (1;l) ??? hTBEM (T ?1;l)
bracketrightbiggT
= (?Q ?IN)
bracketleftbigg
hT1 (l) hT2 (l) ??? hTQ (l)
bracketrightbiggT
where ?Q consists of the first Q columns of ?T. The modeling error is given by
bracketleftbigg
eTBEM (0;l) eTBEM (1;l) ??? eTBEM (T ?1;l)
bracketrightbiggT
= (?T?Q ?IN)
bracketleftbigg
hTQ+1 (l) hTQ+2 (l) ??? hTT (l)
bracketrightbiggT
, (4.32)
where ?T?Q consists of the last T ?Q columns of ?T.
The MSE in channel estimation is now given by
MSEc = 1T
T?1summationdisplay
n=0
Lsummationdisplay
l=0
E
braceleftbiggvextenddoublevextenddouble
vextenddoubleh(n;l)??hBEM (n;l)
vextenddoublevextenddouble
vextenddouble
2bracerightbigg (4.33)
91
where ?hBEM (n;l) follows (4.3). Since ?HQ?T?Q = 0, we have
1
T
T?1summationdisplay
n=0
bracketleftBig
hBEM (n;l)??hBEM (n;l)
bracketrightBigH
eBEM (n;l) = 0,
so that
MSEc = 1T
T?1summationdisplay
n=0
Lsummationdisplay
l=0
E
braceleftbiggvextenddoublevextenddouble
vextenddoublehBEM (n;l)??hBEM (n;l)
vextenddoublevextenddouble
vextenddouble
2bracerightbigg+ 1
T
T?1summationdisplay
n=0
Lsummationdisplay
l=0
E
braceleftBig
bardbleBEM (n;l)bardbl2
bracerightBig
= MSE1 +MSE2 (4.34)
where we define
MSE2 := 1T
T?1summationdisplay
n=0
Lsummationdisplay
l=0
E
braceleftBig
bardbleBEM (n;l)bardbl2
bracerightBig
as the mean square modeling error. It follows by (4.31) and (4.32) that
MSE2 = 1T
Tsummationdisplay
q=Q+1
Lsummationdisplay
l=0
E
braceleftBig
bardblhq (l)bardbl2
bracerightBig
= 1T
Tsummationdisplay
q=Q+1
Lsummationdisplay
l=0
T?1summationdisplay
n1=0
T?1summationdisplay
n2=0
Rh (n1;n2)?q (n1)??q (n2)
where
Rh (n1,n2;l) := EbraceleftbighH (n1;l)h(n2;l)bracerightbig.
For example, we consider the CE-BEM representation for a modified Jakes? channel.
We follow all the other assumptions in Section 4.2 except (H4.2.1). Then
Rh (n1,n2;l) = N?2hJ0 (2pifdTs (n1 ?n2))
92
where J0 (?) denotes the zero-th order Bessel function of the first kind. We have
MSE2 = L+1T2
T?1summationdisplay
n1=0
T?1summationdisplay
n2=0
N?2hJ0 (2pifdTs (n1 ?n2))
?
??
T?Q+12summationdisplay
q=Q+12
ej2piq(n1?n2)T
?
??
= L+1T N?2h
bracketleftBigg
(T ?Q)?2
T?1summationdisplay
?=1
parenleftBig
1? ?T
parenrightBig
J0 (2pifdTs?) sin
pi?Q
T
sin pi?T
bracketrightBigg
. (4.35)
4.4 Training Power Allocation
We address the issue of superimposed training power allocation in this section, i.e., we
seek the optimal power assignment to training and information under a fixed transmitted
power budget. To this end, the more general OP-BEM-based estimator in Section 3.4 is
considered that can easily apply to other BEM representations. For our convenience, we
assume (H4.2.1) holds, i.e., the channel satisfies a BEM so that modeling error is omitted.
We define the training power overhead ? as
? :=
1
P
summationtextP
n=1|c(n)|
2
1
P
summationtextP
n=1E
braceleftBig
|s(n)|2
bracerightBig = ?
2c
?2b +?2c. (4.36)
For a fixed SNR or transmitted power budget, higher ? implies smaller effective SNR at the
receiver, due to decreased power in the information sequence, but higher channel estimation
accuracy. Removing the estimated time-varying mean from the received data, define
?y(n) := y(n)?
Lsummationdisplay
l=0
?hBEM (n;l)c(n?l)? ?m.
93
Assuming that ?m = m, we have
?y(n) ?
Lsummationdisplay
l=0
?hBEM (n;l)b(n?l)
+
Lsummationdisplay
l=0
bracketleftBig
hBEM (n;l)??hBEM (n;l)
bracketrightBig
[b(n?l) +c(n?l)]+ ?v(n). (4.37)
In (4.37), define the effective signal as
xs (n) :=
Lsummationdisplay
l=0
?hBEM (n;l)b(n?l), (4.38)
and the effective noise as
w(n) :=
Lsummationdisplay
l=0
bracketleftBig
hBEM (n;l)??hBEM (n;l)
bracketrightBig
[b(n?l) +c(n?l)] + ?v(n). (4.39)
When using ?hBEM (n;l) for equalization or detection, the variance of the effective noise
w(n) contains channel estimation error variance as a component, which in turn depends
on ?. An ?optimum? value of ? for the superimposed training method may be obtained by
maximizing the SNR in (4.37) with respect to ?, which is defined as
SNRe (?,n) = ?
2xs (n)
?2w (n) (4.40)
94
under the constraint of a fixed transmitted power, i.e., ?2b +?2c = PT. In (4.38), the signal
power at time n is given by
?2xs (n) = E
braceleftBig
bardblxs (n)bardbl2
bracerightBig
= ?2b
Lsummationdisplay
l=0
EH
braceleftbigg
E
braceleftbiggvextenddoublevextenddouble
vextenddouble?hBEM (n;l)
vextenddoublevextenddouble
vextenddouble
2vextendsinglevextendsinglevextendsingle
vextendsingleH
bracerightbiggbracerightbigg
+Oparenleftbig1/T2parenrightbig
= ?2b
Lsummationdisplay
l=0
bracketleftbigg
EH
braceleftbigg
E
braceleftbiggvextenddoublevextenddouble
vextenddoublehBEM (n;l)??hBEM (n;l)
vextenddoublevextenddouble
vextenddouble
2vextendsinglevextendsinglevextendsingle
vextendsingleH
bracerightbiggbracerightbigg
+E
braceleftBig
bardblhBEM (n;l)bardbl2
bracerightBigbracketrightbigg
+Oparenleftbig1/T2parenrightbig, (4.41)
where the Oparenleftbig1/T2parenrightbig term accounts for the dependence between {?hBEM (n;l)} and {b(n)}
(see Appendix of [32] for the corresponding details in the time-invariant case). Therefore,
the time average of signal power is given by (omitting Oparenleftbig1/T2parenrightbig terms)
??2xs := 1T
T?1summationdisplay
n=0
?2xs (n) = ?2b bracketleftbigMSE1 +(L+1)?2hbracketrightbig. (4.42)
The noise power at time n is given by
?2w (n) = E
braceleftBig
bardblw(n)bardbl2
bracerightBig
= ?2b
Lsummationdisplay
l=0
EH
braceleftbigg
E
braceleftbiggvextenddoublevextenddouble
vextenddoublehBEM (n;l)??hBEM (n;l)
vextenddoublevextenddouble
vextenddouble
2vextendsinglevextendsinglevextendsingle
vextendsingleH
bracerightbiggbracerightbigg
+N?2v
+?2c
Lsummationdisplay
l1=0
Lsummationdisplay
l2=0
E
braceleftbiggbracketleftBig
?hBEM (n;l1)?hBEM (n;l1)bracketrightBigHbracketleftBig?hBEM (n;l2)?hBEM (n;l2)bracketrightBig
bracerightbigg
??c(n?l1)? ?c(n?l2)+Oparenleftbig1/T2parenrightbig, (4.43)
95
where in a manner similar to (3.41), theOparenleftbig1/T2parenrightbigterm accounts for the dependence between
{?hBEM (n;l)} and {b(n)}. We define
E (n) :=
bracketleftbigg
?c(n)IN(K+1) ?c(n?1)IN(K+1) ??? ?c(n?L)IN(K+1)
bracketrightbigg
and consider
Lsummationdisplay
l1=0
Lsummationdisplay
l2=0
E
braceleftbiggbracketleftBig
?hBEM (n;l1)?hBEM (n;l1)bracketrightBigHbracketleftBig?hBEM (n;l2)?hBEM (n;l2)bracketrightBig
bracerightbigg
??c(n?l1)? ?c(n?l2)
=
Lsummationdisplay
l1=0
Lsummationdisplay
l2=0
EH
braceleftbigg
E
braceleftbigg
?(n)
parenleftBig?
Hl1 ?Hl1
parenrightBigparenleftBig?
Hl2 ?Hl2
parenrightBigH
?H (n)
vextendsinglevextendsingle
vextendsinglevextendsingleH
bracerightbiggbracerightbigg
?c(n?l1)?c?(n?l2)
= ?(n)EH
braceleftbigg
E
braceleftbigg
E (n)
parenleftBig?
H?H
parenrightBigparenleftBig?
H?H
parenrightBigH
EH (n)
vextendsinglevextendsingle
vextendsinglevextendsingleH
bracerightbiggbracerightbigg
?H (n)
=
bracketleftbig(L+1)?2
h?
2
b +?
2vbracketrightbig
T?2c tr
braceleftBigparenleftBig?
??OP ?IN
parenrightBig
EH (n)?H (n)?(n)E (n)
bracerightBig
.
Therefore (4.43) can be written as
?2w (n) = ?2b
Lsummationdisplay
l=0
EH
braceleftbigg
E
braceleftbiggvextenddoublevextenddouble
vextenddoublehBEM (n;l)??hBEM (n;l)
vextenddoublevextenddouble
vextenddouble
2vextendsinglevextendsinglevextendsingle
vextendsingleH
bracerightbiggbracerightbigg
+N?2v +Oparenleftbig1/T2parenrightbig
+
bracketleftbig(L+1)?2
h?
2
b +?
2vbracketrightbig
T tr
braceleftBigparenleftBig?
??OP ?IN
parenrightBig
EH (n)?H (n)?(n)E (n)
bracerightBig
.
Taking time average of the noise power and omitting Oparenleftbig1/T2parenrightbig terms, we have
??2w := 1T
T?1summationdisplay
n=0
?2w (n) = ?2b MSE1 +N?2v +
bracketleftbig(L+1)?2?2
b +?
2vbracketrightbig
T2 tr
braceleftBigparenleftBig?
??OP ?IN
parenrightBig
K
bracerightBig
, (4.44)
96
where
K :=
Tsummationdisplay
n=1
EH (n)?H (n)?(n)E (n).
We define the time average version of (4.40) as
SNRd (?) = ??
2xs
??2w . (4.45)
Using the constraint ?2b +?2c = PT, we have ?2c = ?PT and ?2b = (1??)PT. Incorporating
these constraint-carrying variables in (4.42), (4.44), and (4.45), we have an unconstrained
cost
SNRd (?) = f1?
2 +f2?+f3
g1?2 +g2? +g3,
where
f1 = (L+1)?2h
parenleftBig
Ntr ???OP ?T2
parenrightBig
,
f2 = ?(L+1)?2h
parenleftBig
2Ntr ???OP ?T2
parenrightBig
? ?
2vNtr ???
OP
PT ,
f3 = (L+1)?2hNtr ???OP + N?
2v tr ???
OP
PT ,
g1 = (L+1)?2h
parenleftBig
Ntr ???OP ?tr
braceleftBigparenleftBig?
??OP ?IN
parenrightBig
K
bracerightBigparenrightBig
,
g2 = ?(L+1)?2h
parenleftBig
2N tr ???OP ?tr
braceleftBigparenleftBig?
??OP ?IN
parenrightBig
K
bracerightBigparenrightBig
+
?2v
parenleftBig
tr
braceleftBigparenleftBig?
??OP ?IN
parenrightBig
K
bracerightBig
?Ntr ???OP +T2
parenrightBig
PT ,
g3 = f3.
97
We seek the optimum value of ? by taking the derivative
d[SNRd (?)]
d? =
(f1g2 ?f2g1)?2 +2(f1g3 ?f3g1)? +f2g3 ?f3g2
(g1?2 +g2?+g3)2 = 0,
the root of which lying in [0,1] is
?opt = (f1g2 ?f2g1)?1 (f3g1 ?f1g3
?
radicalBig
?f1f2g2g3 ?2f1f3g1g3 ?f2f3g1g2 +f22g1g3 +f1f3g22 +f21g23 +f23g21
parenrightbigg
(4.46)
Since h(n;l)?s are mutually independent for different l?s, the calculation can be sim-
plified if we suppose ?hBEM (n;l)?s are also approximately uncorrelated for distinct l?s, i.e.,
E
braceleftbiggbracketleftBig
?hBEM (n;l1)?h(n;l1)bracketrightBigHbracketleftBig?hBEM (n;l2)?h(n;l2)bracketrightBig
bracerightbigg
? 0, for l1 negationslash= l2.
Then (4.43) becomes (omitting Oparenleftbig1/T2parenrightbig terms)
?2w (n) ??2b
Lsummationdisplay
l=0
EH
braceleftbigg
E
braceleftbiggvextenddoublevextenddouble
vextenddouble?hBEM (n;l)
vextenddoublevextenddouble
vextenddouble
2vextendsinglevextendsinglevextendsingle
vextendsingleH
bracerightbiggbracerightbigg
+N?2v
+?2c
Lsummationdisplay
l=0
E
braceleftbiggvextenddoublevextenddouble
vextenddouble?hBEM (n;l)?hBEM (n;l)
vextenddoublevextenddouble
vextenddouble
2bracerightbigg?c?(n?l)?c(n?l). (4.47)
In addition, if ?c?(n)?c(n) keeps constant for all n (e.g., ?c(n) = ?1), (4.47) can be further
reduced to
?2w (n) ?
Lsummationdisplay
l=0
EH
braceleftbigg
E
braceleftbiggvextenddoublevextenddouble
vextenddouble?hBEM (n;l)
vextenddoublevextenddouble
vextenddouble
2vextendsinglevextendsinglevextendsingle
vextendsingleH
bracerightbiggbracerightbiggparenleftbig
?2b +?2cparenrightbig+?2v.
Then
??2xs = ?2b bracketleftbigMSE1 +(L+1)?2hbracketrightbig and ??2w = MSE1parenleftbig?2b +?2cparenrightbig+?2v,
98
and
SNRd (?) = ??
2xs
??2w =
f1?2 +f2? +f3
g?1?+g?2
where
f1 = (L+1)?2
parenleftBig
Ntr ???OP ?T2
parenrightBig
,
f2 = ?(L+1)?2
parenleftBig
2Ntr ???OP ?T2
parenrightBig
? N?
2v tr ???
OP
PT ,
f3 = (L+1)?2N tr ???OP + N?
2v tr ???
OP
PT ,
g?1 = ?
2vT2
PT ?(L+1)?
2Ntr ???
OP,
g?2 = f3.
Setting the first derivative of SNRd (?) to be zero to get the optimum ?, we have
??opt = g
?2
g?1
bracketleftBigg
?1+
radicalBigg
1 + g
?1 (f3g?1 ?f2g?2)
g?22 f1
bracketrightBigg
. (4.48)
4.5 Bias-Variance Trade-Off
In Section 4.4, we have addressed the issue of training power allocation from an equal-
ization viewpoint by maximizing SNR for data detection. This method can also be applied
in selection of the number of basis functions Q, since SNRd in (4.45) is also a function of Q.
In this section, we give the CE-BEM-based analysis as an example to clarify our trade-off.
Note that modeling error of a BEM must be considered in this issue.
Consider(4.37)?(4.39), which will beusedfor equalization withxs (n) acting as ?signal?
and w(n) as ?noise?. Since the modeling error is now considered, by (4.41) the time-average
99
of ?2xs (n) now becomes (omitting Oparenleftbig1/T2parenrightbig terms)
??2xs := 1T
T?1summationdisplay
n=0
?2xs (n) = ?2b
bracketleftBigg
MSE1 +1T
T?1summationdisplay
n=0
Lsummationdisplay
l=0
E
braceleftBig
bardblhBEM (n;l)bardbl2
bracerightBigbracketrightBigg
= ?2b
bracketleftBigg
MSE1 +1T
T?1summationdisplay
n=0
Lsummationdisplay
l=0
E
braceleftBig
bardblh(n;l)bardbl2
bracerightBig
? 1T
T?1summationdisplay
n=0
Lsummationdisplay
l=0
E
braceleftBig
bardbleBEM (n;l)bardbl2
bracerightBigbracketrightBigg
= ?2b bracketleftbigMSE1 +N(L+1)?2h ?MSE2bracketrightbig.
Next consider the power of the noise (4.43), the time-average of which (omitting Oparenleftbig1/T2parenrightbig
terms) is given by
??2w := 1T
T?1summationdisplay
n=0
?2w (n) = ?2b MSEc +N?2v +R
where we define
R := 1T
T?1summationdisplay
n=0
Lsummationdisplay
l1=0
Lsummationdisplay
l2=0
EbraceleftbigeH2 (n;l1)e2 (n;l2)bracerightbigc?(n?l1)c(n?l2)
e1 (n;l) := ?hBEM (n;l)?hBEM (n;l),
e2 (n;l) := e1 (n;l)?eBEM (n;l).
It turns out that R = R1 +R2, where
R1 := 1T
T?1summationdisplay
n=0
Lsummationdisplay
l1=0
Lsummationdisplay
l2=0
EbraceleftbigeH1 (n;l1)e1 (n;l2)bracerightbigc?(n?l1)c(n?l2),
R2 := 1T
T?1summationdisplay
n=0
Lsummationdisplay
l1=0
Lsummationdisplay
l2=0
EbraceleftbigeHBEM (n;l1)eBEM (n;l2)bracerightbigc?(n?l1)c(n?l2).
100
By (H4.2.5), then EbraceleftbigeHBEM (n;l1)eBEM (n;l2)bracerightbig= 0 for l1 negationslash= l2. Thus
R2 = 1T
T?1summationdisplay
n=0
Lsummationdisplay
l=0
EbraceleftbigeHBEM (n;l)eBEM (n;l)bracerightbig
P?1summationdisplay
m1=0
P?1summationdisplay
m2=0
c?m1cm2ej(?m1??m2)le?j(?m1??m2)n
=
Lsummationdisplay
l=0
E
??
?
??
T?Q+12summationdisplay
q1=Q+12
T?Q+12summationdisplay
q2=Q+12
hHq1 (l)hq2 (l)
??
?
??
bracketleftBigg
1
T
T?1summationdisplay
n=0
ej(?q2??q1??m1+?m2)n
bracketrightBigg
?
P?1summationdisplay
m1=0
P?1summationdisplay
m2=0
c?m1cm2ej(?m1??m2)l.
We consider the correlation between hq1 (l) and hq2 (l),
EbraceleftbighHq1 (l)hq2 (l)bracerightbig= 1T2
T?1summationdisplay
n1=0
T?1summationdisplay
n2=0
EbraceleftbighH (n1;l)h(n2;l)bracerightbige?j?q1n1ej?q2n2.
By defining
Rh (n1 ?n2;l) := EbraceleftbighH (n1;l)h(n2;l)bracerightbig
and setting ? := n1 ?n2, we have
EbraceleftbighHq1 (l)hq2 (l)bracerightbig= 1T
T?1summationdisplay
?=?(T?1)
Rh (?;l)e?j?q1?
?
?1T
min(T?1,T?1??)summationdisplay
n2=max(0,??)
ej(?q2??q1)n2
?
?.
Note that vextendsinglevextendsingle
vextendsinglevextendsingle
vextendsinglevextendsingle
1
T
min(T?1,T?1??)summationdisplay
n2=max(0,??)
ej(?q2??q1)n2 ? 1T
T?1summationdisplay
n2=0
ej(?q2??q1)n2
vextendsinglevextendsingle
vextendsinglevextendsingle
vextendsinglevextendsingle?
|?|
T .
101
Since the coherence time of the channel is limited, we can select a number Tcoh such that
|Rh (n1 ?n2;l)| ? 0 for |?|>Tcoh. Therefore,
1
T
min(T?1,T?1??)summationdisplay
n2=max(0,??)
ej(?q2??q1)n2 = ?(q1 ?q2)+O
parenleftbigg1
T
parenrightbigg
.
This fact leads to
EbraceleftbighHq1 (l)hq2 (l)bracerightbig? 0 (4.49)
for q1 negationslash= q2 and ?large? T. Omitting the O(1/T) term, R2 can be rewritten as
R2 =
Lsummationdisplay
l=0
E
?
??
??
T?Q+12summationdisplay
q1=Q+12
T?Q+12summationdisplay
q2=Q+12
hHq1 (l)hq2 (l)
?
??
???(q1 ?q2)?(m1 ?m2)
?
P?1summationdisplay
m1=0
P?1summationdisplay
m2=0
c?m1cm2ej(?m1??m2)l
=
Lsummationdisplay
l=0
E
?
??
??
T?Q+12summationdisplay
q=Q+12
hHq (l)hq (l)
?
??
??
P?1summationdisplay
m=0
|cm|2 = MSE2
P?1summationdisplay
m=0
|cm|2.
For the first part of R, by using (4.49)
R1 =
Lsummationdisplay
l1=0
Lsummationdisplay
l2=0
E
?
?
?
Qsummationdisplay
q1=1
Qsummationdisplay
q2=1
bracketleftBig?
hHq1 (l1)?hHq1 (l1)
bracketrightBigbracketleftBig?
hq2 (l2)?hq2 (l2)
bracketrightBig
?
?
?
?
P?1summationdisplay
m1=0
P?1summationdisplay
m2=0
c?m1cm2ej(?m1l1??m2l2)
bracketleftBigg
1
T
T?1summationdisplay
n=0
ej(?q2??q1??m1+?m2)n
bracketrightBigg
=
Lsummationdisplay
l1=0
Lsummationdisplay
l2=0
E
?
??
??
Q?1
2summationdisplay
q1=?Q?12
Q?1
2summationdisplay
q2=?Q?12
bracketleftBig?
hHq1 (l1)?hHq1 (l1)
bracketrightBigbracketleftBig?
hq2 (l2)?hq2 (l2)
bracketrightBig
?
??
??
102
0 5 10 15 20 25 30?20
?18
?16
?14
?12
?10
?8
?6
?4
?2
0
SNR (dB)
Normalized Channel MSE (dB)
Estimation variance: L=2, T=399, Ts=25?s, TIR=0.3, P=7, fd=0Hz, 500 runs.
 
 
SI&CE: K=N=1
SI&OP: K=N=1
SI&DPS: K=N=1
SI&CE: K=N=2
SI&DPS: K=N=2
MSE1: K=N=1
MSE: K=N=1
MSE1: K=N=2
MSE: K=N=2
Figure 4.1: Estimation variance: NCMSE vs SNR under fd = 0Hz (time-invariant). The
curves for CE-, OP- and DPS-BEM?s completely overlap, since the three basis functions are
all constant for time-invariant channels (Q = 1). (SI: superimposed training; CE: CE-BEM;
OP: OP-BEM; DPS: DPS-BEM; MSE1: defined in (4.2); MSE: defined in (4.33).)
?
P?1summationdisplay
m1=0
P?1summationdisplay
m2=0
c?m1cm2ej(?m1l1??m2l2)?(q1 ?q2)?(m1 ?m2)
= E
?
?
?
bracketleftBigg Lsummationdisplay
l=0
parenleftBig?
Hl ?Hl
parenrightBig
e?j?ml
bracketrightBiggHbracketleftBigg Lsummationdisplay
l=0
parenleftBig?
Hl ?Hl
parenrightBig
e?j?ml
bracketrightBigg??
?
P?1summationdisplay
m=0
|cm|2.
Define
B(m) :=
bracketleftbigg
1 e?j?m ??? e?j?mL
bracketrightbigg
,
103
0 5 10 15 20 25 30?20
?18
?16
?14
?12
?10
?8
?6
?4
SNR (dB)
Normalized Channel MSE (dB)
Estimation variance: L=2, T=399, Ts=25?s, TIR=0.3, P=7, fd=50Hz, 500 runs.
 
 
SI&CE: K=N=1
SI&OP: K=N=1
SI&DPS: K=N=1
SI&CE: K=N=2
SI&DPS: K=N=2
MSE1: K=N=1
MSE: K=N=1
MSE1: K=N=2
MSE: K=N=2
Figure 4.2: Estimation variance: NCMSE vs SNR under fd = 50Hz. (SI: superimposed
training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM; MSE1: defined in (4.2); MSE:
defined in (4.33).)
then R1 becomes
R1 = E
braceleftbiggbracketleftBig
B(m)?INQ
parenleftBig?
H?H
parenrightBigbracketrightBigHbracketleftBig
B(m)?INQ
parenleftBig?
H?H
parenrightBigbracketrightBigbracerightbiggP?1summationdisplay
m=0
|cm|2
= E
braceleftbiggparenleftBig
?H?HparenrightBigH CHCparenleftBig?H?HparenrightBig
bracerightbigg
= (L+1)?
2
h?
2
b +?
2v
T (L+1)NQ.
Thus
R =(L+1)?
2
h?
2
b +?
2v
T (L+1)NQ+MSE2?
2
c,
and
??2w = ?2b MSEc +N?2v + (L+1)?
2
h?
2
b +?
2v
T (L+1)NQ+MSE2?
2
c.
104
0 5 10 15 20 25 30?20
?18
?16
?14
?12
?10
?8
?6
?4
?2
0
SNR (dB)
Normalized Channel MSE (dB)
Estimation variance: L=2, T=399, Ts=25?s, TIR=0.3, P=7, fd=100Hz, 500 runs.
 
 
SI&CE: K=N=1
SI&OP: K=N=1
SI&DPS: K=N=1
SI&CE: K=N=2
SI&DPS: K=N=2
MSE1: K=N=1
MSE: K=N=1
MSE1: K=N=2
MSE: K=N=2
Figure 4.3: Estimation variance: NCMSE vs SNR under fd = 100Hz. (SI: superimposed
training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM; MSE1: defined in (4.2); MSE:
defined in (4.33).)
From an equalization viewpoint, based on (4.37)?(4.39), an equivalent SNR for data detec-
tion is defined as
SNRd (Q) = ??
2xs (Q)
??2w (Q) =
?2b bracketleftbigMSE1 +(L+1)N?2h ?MSE2bracketrightbig
?2b MSEc +N?2v + (L+1)?2h?2b+?2vT (L+1)NQ+MSE2?2c
. (4.50)
We pick Q to maximize SNRd (Q) as we expect the detection performance to improve with
increasing SNRd (Q).
105
0 5 10 15 20 25 30?20
?18
?16
?14
?12
?10
?8
?6
?4
?2
0
SNR (dB)
Normalized Channel MSE (dB)
Estimation variance: L=2, T=399, Ts=25?s, TIR=0.3, P=7, fd=200Hz, 500 runs.
 
 
SI&CE: K=N=1
SI&OP: K=N=1
SI&DPS: K=N=1
SI&CE: K=N=2
SI&DPS: K=N=2
MSE1: K=N=1
MSE: K=N=1
MSE1: K=N=2
MSE: K=N=2
Figure 4.4: Estimation variance: NCMSE vs SNR under fd = 200Hz. (SI: superimposed
training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM; MSE1: defined in (4.2); MSE:
defined in (4.33).)
4.6 Simulation Examples
4.6.1 Performance Analysis for the First-Order Statistics-Based Estimator
Inthis example, weexplore the variance ofchannel estimation ofthe first-orderstatistics-
based estimator. Simulation results are compared with theoretical results to show the va-
lidity of our analysis.
We employ the same model for simulation as in Section 3.6, i.e., a doubly-selective
Rayleigh fading channel with L = 2, satisfying modified Jakes? model. Performance of the
channel estimation for both SISO (K = N = 1) and MIMO (K = N = 2) channels is
investigated. One more time, we emphasize that BEM?s are only used for processing at the
receiver; random channels are generated using Jakes? model, not BEM representations.
106
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110
?5
10?4
10?3
10?2
10?1
100
?
Bit Error Rate
Kalman filter: K=N=1, L=2, T=399, Ts=25?s, P=7, SNR=0,10,20,30dB, fd=0Hz, 1000 runs.
 
 
SI&CE
SI&OP
SI&DPS
Figure 4.5: Training power allocation: BER vs ? under fd = 0Hz (time-invariant). The
curves for CE-, OP- and DPS-BEM?s completely overlap, since the three basis functions are
all constant for time-invariant channels (Q = 1). (SI: superimposed training; CE: CE-BEM;
OP: OP-BEM; DPS: DPS-BEM.)
In simulations, we pick a data record length of 399 symbols (time duration of approxi-
mately 10ms). We consider the system operating under different Doppler spreads. For the
Doppler spreads fd = 0, 50, 100, and 200Hz, we take the number of basis functions Q = 1,
3, 5, and 7 for CE-BEM, and Q = 1, 3, 4, and 6 for OP- and DPS-BEM representations.
The average transmitted power in {c(n)} is 0.3 of that in {b(n)}, leading to TIR = 0.3.
In the single-user scenario, the information sequences {b(n)} and the training sequences
{c(n)} are all BPSK modulated. The periodic training sequence {c(n)} is generated from
the m-sequence of period P = 7, one period of which is given by (3.82).
For the MIMO (multiple-user) case with K = N = 2, all the users have the same
transmitted power in training and information data. The average transmitted power in
107
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110
?4
10?3
10?2
10?1
100
?
Bit Error Rate
Kalman filter: K=N=1, L=2, T=399, Ts=25?s, P=7, SNR=0,10,20,30dB, fd=50Hz, 1000 runs.
 
 
SI&CE
SI&OP
SI&DPS
Figure 4.6: Training power allocation: BER vs ? under fd = 50Hz. (SI: superimposed
training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.)
{ck (n)} is 0.3 of that in {bk (n)} (k = 1,2,...,K). The information sequences {bk(n)} and
the training sequences {ck(n)} are also both BPSK modulated. The training sequence is
generated from the above m-sequence of period ?P = 7 by the procedure we introduced in
Section 3.5. The training sequences are of length P = 14, and the training sequence for the
first user is given by (3.84).
To explore different estimators in equal conditions, we assume the additive noise{v(n)}
is zero-mean (i.e., m = 0), white complex-Gaussian, uncorrelated with {b(n)} withE{v(n+
?)vH(n)} = ?2vIN?(?), so that no terms are discarded. The (receiver) SNR refers to the
energy per bit per user over one-sided noise spectral density with both information and
superimposed training sequence counting toward the bit energy. The results for SISO and
MIMO scenarios are shown in Figures 4.1?4.4, for various Doppler spreads and SNR?s, based
on 500 Monte Carlo runs.
108
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110
?3
10?2
10?1
100
?
Bit Error Rate
Kalman filter: K=N=1, L=2, T=399, Ts=25?s, P=7, SNR=0,10,20,30dB, fd=100Hz, 1000 runs.
 
 
SI&CE
SI&OP
SI&DPS
Figure 4.7: Training power allocation: BER vs ? under fd = 100Hz. (SI: superimposed
training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.)
In each figure, performances of CE-, OP-, and DPS-BEM-based estimators using first-
order statistics are shown for the SISO case, together with CE- and DPS-BEM-based MIMO
estimators. The normalized channel MSE in simulation is defined as (3.83) for the SISO
channel and (3.85) for the MIMO channel. Simulation results are compared with the theo-
retical analysis of the ?pure? estimation error MSE1 (defined in (4.2)) as well as the ?entire?
error MSEc (defined in (4.33)) counting modeling error. We plot MSE1 based on CE- (using
(4.7)), DPS- (using (4.16)), and OP-BEM (using (4.20)), for SISO and MIMO respectively,
and MSEc based only on CE-BEM (using (4.7), (4.34), and (4.35)).
Figure 4.1 exhibits the normalized channel MSE for simulation and theoretical results
for Doppler spreadfd = 0Hz. Note that since the channel is time-invariant, the basis sets of
CE-, OP-, and DPS-BEM are the same (constant sequences), leading to the same simulated
and analytical results. Since no modeling error is present, MSE1 and MSEc are equal. From
109
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110
?3
10?2
10?1
100
?
Bit Error Rate
Kalman filter: K=N=1, L=2, T=399, Ts=25?s, P=7, SNR=0,10,20,30dB, fd=200Hz, 1000 runs.
 
 
SI&CE
SI&OP
SI&DPS
Figure 4.8: Training power allocation: BER vs ? under fd = 200Hz. (SI: superimposed
training; CE: CE-BEM; OP: OP-BEM; DPS: DPS-BEM.)
this figure, we can see that the simulation results and the analytical results agree very well,
whether the channel is SISO or MIMO.
For time-varying channels, the modeling error cannot be eliminated. Considering Fig-
ures 4.2?4.4 corresponding to fd = 50, 100, and 200Hz, different channel models now give
distinct estimation error. The CE-BEM-based estimator has the highest estimation vari-
ance, and that of DPS-BEM is the lowest. For low Doppler spreads (slow fading channel),
the OP-BEM-based solution has similar performance as that of the DPS-BEM-based one
(see Figure 4.2), for the modeling errors are both tiny. As the Doppler spreads increases,
however, the OP-BEM-based estimator deteriorates until reaching similar performance as
the CE-BEM-based estimator for fd = 200Hz. In these three figures, the theoretically
derived MSE1 fits the simulation results of the DPS-BEM-based estimator very well, which
confirms the fact that DPS-BEM offers the smallest (usually negligible) modeling error
110
0 5 10 15 20 25 300
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
SNR (dB)
Optimum 
?
Kalman filter (CE?BEM): K=N=1, L=2, T=399, Ts=25?s, P=7.
 
 
sim.: fd=0Hz
sim.: fd=50Hz
sim.: fd=100Hz
sim.: fd=200Hz
analy.: Q=1
analy.: Q=3
analy.: Q=5
analy.: Q=7
analy. app.: Q=1
analy. app.: Q=3
analy. app.: Q=5
analy. app.: Q=7
Figure 4.9: Training power allocation: optimum? vs SNR for CE-BEM. (?sim.?: simulation
results; ?analy.?: ?opt in (4.46); ?analy. app.?: ??opt in (4.48).)
among the three. Counting the modeling error, the curves for MSEc (based on CE-BEM)
also fit the simulation results of the CE-BEM-based estimator well.
4.6.2 Training Power Allocation
Under the same settings, we now consider the issue of training power allocation for the
first-order statistics-based estimator. Only the SISO scenario is considered.
Figures 4.5?4.8 show the curves for BER versus different ??s (defined in (4.36)) that
stands for the ratio of the power assigned to superimposed training to the total transmitted
power, for different Doppler spreads. At the receiver, a Kalman filter is applied based on
the estimated channel to detect the information sequence. All three BEM representations
are studied.
111
0 5 10 15 20 25 300
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
SNR (dB)
Optimum 
?
Kalman filter (OP?BEM): K=N=1, L=2, T=399, Ts=25?s, P=7.
 
 
sim.: fd=0Hz
sim.: fd=50Hz
sim.: fd=100Hz
sim.: fd=200Hz
analy.: Q=1
analy.: Q=3
analy.: Q=4
analy.: Q=6
analy. app.: Q=1
analy. app.: Q=3
analy. app.: Q=4
analy. app.: Q=6
Figure 4.10: Training power allocation: optimum ? vs SNR for OP-BEM. (?sim.?: simula-
tion results; ?analy.?: ?opt in (4.46); ?analy. app.?: ??opt in (4.48).)
We choose the optimal ? as that corresponding to the smallest BER. As we expected,
the optimal ? grows with increasing SNR?at low SNR?s, more power assigned to training
for better channel estimation cannot offset the effective SNR loss for information symbols,
so that we should allocate more power to information for higher effective SNR; for higher
SNR, noise is no longer the key factor, so that more training power for better estimation
can achieve better BER?s.
Figures 4.9?4.11 compare the analytical results of the optimal ? with the simulated
results based on a Kalman filter. We consider three cases: the value of ? minimizing the
BER (denoted by ?sim.? in the figures), the theoretical result ?opt in (4.46) (denoted by
?analy.? in the figures), and an approximation of it in (4.48) (denoted by ?analy. app.?
in the figures). Note that we select ? that minimizes the BER from Figures 4.5?4.8. The
112
0 5 10 15 20 25 300
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
SNR (dB)
Optimum 
?
Kalman filter (DPS?BEM): K=N=1, L=2, T=399, Ts=25?s, P=7.
 
 
sim.: fd=0Hz
sim.: fd=50Hz
sim.: fd=100Hz
sim.: fd=200Hz
analy.: Q=1
analy.: Q=3
analy.: Q=4
analy.: Q=6
analy. app.: Q=1
analy. app.: Q=3
analy. app.: Q=4
analy. app.: Q=6
Figure 4.11: Training power allocation: optimum ? vs SNR for DPS-BEM. (?sim.?: simu-
lation results; ?analy.?: ?opt in (4.46); ?analy. app.?: ??opt in (4.48).)
analytical results in (4.46) and(4.48) may producenegative solutions if the SNR is extremely
low (e.g., 0dB)?we simply take ? = 0 in that case. In the three figures, the approximation
solution of (4.48) agrees well with that of (4.46). It is also seen that for all the curves, the
optimal ? grows as SNR increases.
We do not consider the modeling error in (4.46) and (4.48). In simulations, modeling
error acts as a noise term in data reception, decreasing the actual SNRd (?), so that at high
SNR?s the simulated optimal ? is smaller than that of the analytical one. Due to larger
modeling errors, the simulated optimal ? for CE- and OP-BEM is smaller than that of the
DPS-BEM. The simulation results for DPS-BEM fit the analytical solutions well for the
range of SNR from 5dB to 20dB.
113
2 4 6 8 10 12 1410
?4
10?3
10?2
10?1
100
Q
Bit Error Rate
Kalman filter (CE?BEM): K=N=1, L=2, T=399, Ts=25?s, TIR=0.3, SNR=20dB, 1000 runs.
 
 
fd=0Hz
fd=50Hz
fd=100Hz
fd=200Hz
Figure 4.12: Bias-variance trade-off: BER vs Q under TIR = 0.3 for different fd?s.
fd = 0Hz fd = 50Hz fd = 100Hz fd = 200Hz
Analytical 1 5 3 5
TIR = 0.3 Simulation 1 5 3 5
(2.9b) 1 3 3 5
Analytical 1 7 5 7
TIR = 1.0 Simulation 1 7 5 7
(2.9b) 1 3 3 5
Table 4.1: Selected optimal Q.
4.6.3 Bias-Variance Trade-Off
Under the same settings, we now consider the problem of bias-variance trade-off, i.e.,
selecting appropriate basis functions. Only CE-BEM and SISO channels are considered.
Figure 4.12 shows BER?s for different Q, the number of basis functions employed in
CE-BEM, for Doppler spreads fd = 0, 50, 100, and 200Hz where we choose TIR = 0.3.
At the receiver, a Kalman filter is adopted for information symbol detection. Figure 4.14
displays those of TIR = 1.0. Figures 4.13 and 4.15 show the detection SNR defined in
114
1 3 5 7 9 11 13 15?10
?5
0
5
10
15
Q
SNR
d(Q)
Bias?variance trade?off (CE?BEM): K=N=1, L=2, T=399, Ts=25?s, TIR=0.3, SNR=20dB.
 
 
fd=0Hz
fd=50Hz
fd=100Hz
fd=200Hz
Figure 4.13: Bias-variance trade-off: SNRd (Q) (defined in (4.50)) vs Q under TIR = 0.3
for different fd?s.
(4.50), as a function of Q. The agreement between the two sets of figures is very good:
BER?s are minimized for the same values of Q that maximize SNRd (Q).
In CE-BEM, the number of basis functions is usually given by (2.9b). Using super-
imposed training, TIR needs also to be considered to select Q: If more power has been
assigned to superimposed training, more basis functions can be employed to get more accu-
rate estimation, and vice versa. In Table 4.1, we compare the selected Q by simulation, and
that maximize SNRd (Q) (denoted by ?Analytical?), and by (2.9b). Our analytical results
agree with the simulation better than that given by (2.9b).
115
2 4 6 8 10 12 1410
?4
10?3
10?2
10?1
100
Q
Bit Error Rate
Kalman filter (CE?BEM): K=N=1, L=2, T=399, Ts=25?s, TIR=1.0, SNR=20dB, 1000 runs.
 
 
fd=0Hz
fd=50Hz
fd=100Hz
fd=200Hz
Figure 4.14: Bias-variance trade-off: BER vs Q under TIR = 1.0 for different fd?s.
4.7 Conclusions
In this chapter, performance analysis of the first-order statistics-based estimator pro-
posed in the previous chapter was discussed, under different BEM settings. Modeling error
was also considered. We clearly showed that in this estimator, the major interference using
superimposed training comes from the unknown information sequences. Power allocation
and bias-variance trade-off of the first-order statistics-based estimator were also considered
in this chapter, based on the results of performance analysis. We cast these optimization
issues as maximization of an SNR for equalizer design. Numerical examples illustrated good
agreement of our analytical results with the simulations.
116
1 3 5 7 9 11 13 15?15
?10
?5
0
5
10
15
Q
SNR
d(Q)
Bias?variance trade?off (CE?BEM): K=N=1, L=2, T=399, Ts=25?s, TIR=1.0, SNR=20dB.
 
 
fd=0Hz
fd=50Hz
fd=100Hz
fd=200Hz
Figure 4.15: Bias-variance trade-off: SNRd (Q) (defined in (4.50)) vs Q under TIR = 1.0
for different fd?s.
117
Chapter 5
Deterministic Maximum Likelihood (DML) Approach
5.1 Introduction
By the performance analysis in Chapter 4, we can clearly see that the first-order
statistics-based estimator proposed in Chapter 3 views the information sequence as interfer-
ence in channel estimation, which leads to a poor received SNR. Since the training and infor-
mation sequences pass through an identical channel, we exploit this fact to enhance channel
estimation. Now we consider joint channel and information sequence estimation via an
iterative DML approach, assuming that the noise v(n) is complex Gaussian. Convergence
to a local extremum is guaranteed, and moreover, if the initial superimposed training-based
solution is ?good?, the global extremum (minimum error probability sequence detector) can
be achieved by the DML approach.
We discuss the DML approach in this chapter. Section 5.2 deals with the single-user
scenario and Section 5.3 considers the multiple-user case. Simulation examples illustrate
our approach in Section 5.4, and Section 5.5 concludes this chapter.
5.2 DML Approach Using BEM
Consider the first-order statistics-based channel estimator described in Chapter 3. As
in (3.1) and (3.2), the SIMO channel output is given by
x(n) =
Lsummationdisplay
l=0
h(n;l)s(n?l), (5.1)
118
and its noisy measurement is given by
y(n) = x(n) +v(n). (5.2)
We make the following assumptions:
(H5.2.1) The time-varying channel {h(n;l)} satisfies a BEM representation as in (2.20)
h(n;l) =
Qsummationdisplay
q=1
hq (l)?q (n) (5.3)
where the basis functions {?q (n)}Qq=1 are known at the receiver. Also N ? 1.
(H5.2.2) The complex Gaussian noise {v(n)} may be of unknown mean E{v(n)} = m, white,
uncorrelated with {b(n)}, and E{[v(n+?)?m][v(n)?m]H} = ?2vIN?(?).
We collect T ?L samples of the observations into the vector
Y : =
bracketleftbigg
yT (T ?1) yT (T ?2) ??? yT (L)
bracketrightbiggT
. (5.4)
Define
s :=
bracketleftbigg
s(T ?1) s(T ?2) ??? s(0)
bracketrightbiggT
,
and let
?v(n) := v(n)?m.
119
Given the vectors of the BEM coefficients in (3.12) and (3.15),
Hl := [hT1 (l), hT2 (l), ..., hTQ(l)]T,
H :=
bracketleftbigg
HT0, HT1, ..., HTL
bracketrightbiggT
.
Define
?V :=
bracketleftbigg
?vT (T ?1) ?vT (T ?2) ??? ?vT (L)
bracketrightbiggT
,
M :=
bracketleftbigg
mT mT ??? mT
bracketrightbiggT
where M is of the same size as ?V, and V = ?V+M is a column-vector consisting of samples
of noise {v(n)} in a manner similar to (5.4). Using (5.1)?(5.3) we have the following linear
model
Y = T (s)H+ ?V+M (5.5)
where T (s) is a block Hankel matrix (a block Hankel matrix has identical block entries on
its antidiagonals) given by
T (s) :=
?
??
??
??
??
??
s(T ?1)?T?1 s(T ?2)?T?1 ??? s(T ?L?1)?T?1
s(T ?2)?T?2 s(T ?3)?T?2 ??? s(T ?L?2)?T?2
... ... ... ...
s(L)?L s(L?1)?L ??? s(0)?L
?
??
??
??
??
??
,
?n :=
bracketleftbigg
?1 (n)IN ?2 (n)IN ... ?Q (n)IN
bracketrightbigg
.
120
Also using (5.1) and (5.2), an alternative linear model for Y is given by
Y = F (H)s+ ?V+M (5.6)
where
F (H) :=
?
??
??
??
h(T ?1;0) ??? h(T ?1;L)
... ...
h(L;0) ??? h(L;L)
?
??
??
??
is a ?filtering matrix?.
Consider (5.1), (5.2), and (5.5). Under the assumption of temporally white complex
Gaussian measurement noise, consider the joint estimation
braceleftBig?
H,?s, ?m
bracerightBig
= arg min
H,s,m
bardblY?T (s)H?Mbardbl2, (5.7)
where ?s is the estimate of s. We follow a DML approach assuming no statistical model for
the input sequence {s(n)}. Under a white Gaussian noise assumption, the DML estimates
are obtained by the nonlinear LS optimization (5.7). Using (5.5) and (5.6), we have a
separable nonlinear LS problem that can be solved sequentially as (joint optimization with
respect to H and m can be further ?separated?)
braceleftBig?
H,?s, ?m
bracerightBig
= argmins
braceleftbigg
min
H,m
bardblY?T (s)H?Mbardbl2
bracerightbigg
= argmin
H,m
braceleftBig
mins bardblY?F (H)s?Mbardbl
bracerightBig
.
121
The finite alphabet properties of the information sequences can also be incorporated
into the DML methods. These algorithms, first proposed by [68] and also applied in [50],
iterate between estimates of the channel and the input sequences. At iteration i, with
an initial guess of the channel H(i) and the mean m(i), the algorithm estimates the input
sequence s(i) and the channel H(i+1) and mean m(i+1) for the next iteration by
s(i) = argmin
s?S
vextenddoublevextenddouble
vextenddoubleY?F
parenleftBig
H(i)
parenrightBig
s?M(i)
vextenddoublevextenddouble
vextenddouble
2, (5.8a)
H(i+1) = argmin
H
vextenddoublevextenddouble
vextenddoubleY?T
parenleftBig
s(i)
parenrightBig
H?M(i)
vextenddoublevextenddouble
vextenddouble
2, (5.8b)
m(i+1) = argminm
vextenddoublevextenddouble
vextenddoubleY?T
parenleftBig
s(i)
parenrightBig
H(i+1)?M
vextenddoublevextenddouble
vextenddouble
2 (5.8c)
where S is the (discrete) domain of s. The optimizations in (5.8b) and (5.8c) are linear
LS problems whereas the optimization in (5.8a) can be achieved by using the Viterbi al-
gorithm. Since the above iterative procedure involving (5.8a)?(5.8c) decreases the cost at
every iteration, one achieves a local minimum of the nonlinear LS cost (local maximum of
DML function).
The maximum likelihood estimation of the noise mean in the optimization (5.8c) may
be obtained by letting
?vextenddoublevextenddoubleY?T parenleftbigs(i)parenrightbigH(i+1)?Mvextenddoublevextenddouble2
?m
vextendsinglevextendsingle
vextendsinglevextendsingle
vextendsinglem=m(i+1) = 0,
which yields
m(i+1) = 1T ?L
T?1summationdisplay
n=L
bracketleftBigg
y(n)?
Lsummationdisplay
l=0
h(i+1) (n;l)s(i) (n?l)
bracketrightBigg
.
122
We now summarize our DML approach:
1. a. Use the first-order statistics-based approach described in Chapter 3 to estimate
the channel. Denote the estimate of the channel coefficients by ?H(1) and ?h(1)q (l).
In this method {c(n)} is known and {b(n)} is regarded as interference. (If we
assume m = 0 or known, the following steps to estimate m should be skipped
and simply set ?m = 0 or the known value.)
b. Estimate the mean ?m(1) as
?m(1) := 1T
T?1summationdisplay
n=0
bracketleftBigg
y(n)?
Lsummationdisplay
l=0
?h(1) (n;l)c(n?l)
bracketrightBigg
(5.9)
where ?h(1) (n;l) :=summationtextQq=1 ?h(1)q (l)?q (n) is given by (5.3).
c. Design a Viterbi sequence detector (see Appendix B.1) to estimate {s(n)} as
{?s(n)} using the estimated channel ?H(1), mean ?m(1), and cost function in (5.8a)
with i = 1. Note that knowledge of {c(n)} is used in s(n) = b(n) + c(n),
therefore, we are in essence estimating {b(n)}.
2. a. Substitute ?s(n) for s(n) in (5.1) and use the corresponding formulation in (5.5)
to estimate the channel H as
?H(2) = T ?(?s)bracketleftBigY? ?M(1)bracketrightBig.
Following (5.9) the mean m is estimated as ?m(2) for i = 1.
b. Design a Viterbi sequence detector using the estimated channel ?H(2), the mean
?m(2), and cost (5.8a) with i = 2, as in Step 1.c.
123
3. Step 2 provides one iteration of (5.8a)?(5.8c). Repeat a few times till any (relative)
improvement in channel estimation over previous iteration is below a pre-specified
threshold.
Since the Viterbi detector used in the proposed DML approach is computationally
burdensome, we can replace it with a Kalman filter with hard decisions to expedite the
iterations?at the expense of a little BER loss. This iterative method can follow these
steps:
1. a. As Step 1.a of the DML approach.
b. As Step 1.b of the DML approach.
c. Design a Kalman filter (see Appendix B.2) of delay d to estimate {s(n)} as {?s(n)}
using the estimated channel ?H(1) and mean ?m(1). Quantize {?s(n)} into {?s(n)}
withthe knowledge ofthe symbolalphabet (harddecisions). Note that knowledge
of {c(n)} is used in s(n) = b(n) +c(n), therefore, we are in essence estimating
{b(n)}.
2. a. As Step 2.a of the DML approach.
b. Design a Kalman filter using the estimated channel ?H(2), the mean ?m(2), as in
Step 1.c.
3. Step 2 provides one iteration of our proposed iterative method. Repeat a few times
till any (relative) improvement in channel estimation over previous iteration is below
a pre-specified threshold.
124
5.3 DML Approach: Multiple-User (MIMO) Channels
In this section we extend the DML approach to multiple-user (MIMO) channels corre-
sponding to the estimator described in Section 3.5.
We collect T ?L samples of the observations to form the N(T ?L)-column vector as
in (5.4)
Y : =
bracketleftbigg
yT(T ?1) yT(T ?2) ??? yT(L)
bracketrightbiggT
and the KT-column vector
s : =
bracketleftbigg
s1(T ?1) ??? sK(T ?1) ??? s1(0) ??? sK(0)
bracketrightbiggT
. (5.10)
Define the N ?NQ matrix
?n :=
bracketleftbigg
?1 (n)IN ?2 (n)IN ??? ?Q (n)IN
bracketrightbigg
,
the N(T ?L)?NQ(L+1)K matrix
T (s) :=
?
??
??
??
??
??
s1(T ?1)?T?1 ??? s1(T ?L?1)?T?1 ??? sK(T ?L?1)?T?1
s1(T ?2)?T?2 ??? s1(T ?L?2)?T?2 ??? sK(T ?L?2)?T?2
... ... ... ... ...
s1(L)?L ??? s1(0)?L ??? sK(0)?L
?
??
??
??
??
??
125
the NQ(L+1)K-column vector (by (3.77))
H :=
bracketleftbigg
HT1 HT2 ??? HTK
bracketrightbiggT
,
and the N(T ?L)-column vector
?V : =
bracketleftbigg
?vT(T ?1), ?vT(T ?2), ..., ?vT(L)
bracketrightbiggT
where ?v(n) := v(n)?m. We also have the following linear model
Y = T (s)H+ ?V+M (5.11)
where M :=
bracketleftbigg
mT mT ??? mT
bracketrightbiggT
.
We further define the N(T ?L)?K(L+1) matrix
F(H) :=
?
??
??
??
h1(T ?1;0) ??? hK(T ?1;0) ??? hK(T ?1;L)
... ... ...
h1(L;0) ??? h1(L;L) ??? hK(L;L)
?
??
??
??
and obtain another linear model as follows
Y = F(H)s+ ?V +M. (5.12)
126
By (5.11) and (5.12), the DML approach described in Section 5.2 can be followed.
Under the assumption of white complex Gaussian measurement noise, the joint estimators
of the relevant parameters are given by the following nonlinear optimization problem
{ ?H,?s, ?m} = arg
braceleftbigg
min
H,s,m
bardblY?T (s)?Mbardbl2
bracerightbigg
= arg
braceleftbigg
min
H,s,m
bardblY?F (H)?Mbardbl2
bracerightbigg
.
We also follow the DML approach assuming no statistical model for the input sequences
{sk(n)}. We may separate the nonlinear LS problem sequentially as
{ ?H,?s, ?m} = argmins {min
H,m
||Y?T (s)H?M||2}
= argmin
H,m
{mins ||Y?F(H)s?M||2}.
At iterationi, with an initial guess of the channelH(i) and the mean m(i), the algorithm
estimates the input sequence s(i) and the channel H(i+1) and mean m(i+1) for the next
iteration by
s(i) = argmin
s?S
||Y ?C(H(i))s?M(i)||2, (5.13a)
H(i+1) = argmin
H
||Y ?T (s(i))H?M(i)||2, (5.13b)
m(i+1) = argminm ||Y ?T (s(i))H(i+1) ?M||2. (5.13c)
We now summarize our MIMO DML approach:
127
0 5 10 15 20 25 3010
?6
10?5
10?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=0Hz, 500 runs.
 
 
SI&DPS: step 1
SI&DPS: 1st iter.
SI&DPS: 2nd iter.
SI&DPS: 3rd iter.
SI&CE: step 1
SI&CE: 1st iter.
SI&CE: 2nd iter.
SI&CE: 3rd iter.
TM&DPS
TM&CE
Figure 5.1: DML approach (SISO): BER vs SNR under fd = 0Hz (time-invariant) and
K = N = 1. The curves for CE- and DPS-BEM?s completely overlap, since the two basis
functions are both constant for time-invariant channels (Q = 1). (SI: superimposedtraining;
TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order
statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML
iteration; ?3rd iter.?: the third DML iteration.)
1. a. Use (3.80) and (3.81) to estimate the channel. Denote the channel estimates by
?H(1)k and ?h(1)k (n;l). In this method {ck(n)} is known and {bk(n)} is regarded as
interference.
b. The noise mean m is estimated as
?m(1) = 1T
Tsummationdisplay
n=1
bracketleftBigg
y(n)?
Ksummationdisplay
k=1
Lsummationdisplay
l=0
?h(1)k (n;l)ck(n?l)
bracketrightBigg
. (5.14)
c. Design a Viterbi sequence detector (see Appendix B.1) to estimate {sk(n)} as
{?sk(n)} using the estimated channel ?H(1), mean ?m(1) and cost (5.13a) withi= 1.
128
0 5 10 15 20 25 3010
?6
10?5
10?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=50Hz, 500 runs.
 
 
SI&DPS: step 1
SI&DPS: 1st iter.
SI&DPS: 2nd iter.
SI&DPS: 3rd iter.
SI&CE: step 1
SI&CE: 1st iter.
SI&CE: 2nd iter.
SI&CE: 3rd iter.
TM&DPS
TM&CE
Figure 5.2: DML approach (SISO): BER vs SNR under fd = 50Hz and K = N = 1. (SI:
superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM;
?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration;
?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.)
Note that knowledge of {ck(n)} is used in sk(n) = bk(n) +ck(n), therefore, we
are in essence estimating bk(n) in the Viterbi detector.
2. a. Substitute ?sk(n) for sk(n) in (5.10) and use the corresponding formulation in
equation (5.13b) to estimate the time-invariant channel coefficient matrix H as
?H(2) = T ?(?s)bracketleftBigY? ?M(1)bracketrightBig
and estimate the time-varying channel as ?h(2)k (n;l) using (3.81). The mean m is
estimated as ?m(2) using (5.14) with ?h(1)k (n;l) replaced with ?h(2)k (n;l).
129
0 5 10 15 20 25 3010
?6
10?5
10?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=100Hz, 500 runs.
 
 
SI&DPS: step 1
SI&DPS: 1st iter.
SI&DPS: 2nd iter.
SI&DPS: 3rd iter.
SI&CE: step 1
SI&CE: 1st iter.
SI&CE: 2nd iter.
SI&CE: 3rd iter.
TM&DPS
TM&CE
Figure 5.3: DML approach (SISO): BER vs SNR under fd = 100Hz and K = N = 1. (SI:
superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM;
?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration;
?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.)
b. Design a Viterbi sequence detector using the estimated channel ?H(2), mean ?m(2)
and cost (5.13a) with i= 2, as in Step 1.c.
3. Step 2 provides one iteration of (5.13a)?(5.13c). Repeat a few times until reaching
the desired point.
An approximation of the MIMO DML approach by replacing Viterbi detector with
multiple-user Kalman filter is given by the following steps:
1. a. As Step 1.a of the MIMO DML approach.
b. As Step 1.b of the MIMO DML approach.
130
0 5 10 15 20 25 3010
?5
10?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=200Hz, 500 runs.
 
 
SI&DPS: step 1
SI&DPS: 1st iter.
SI&DPS: 2nd iter.
SI&DPS: 3rd iter.
SI&CE: step 1
SI&CE: 1st iter.
SI&CE: 2nd iter.
SI&CE: 3rd iter.
TM&DPS
TM&CE
Figure 5.4: DML approach (SISO): BER vs SNR under fd = 200Hz and K = N = 1. (SI:
superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM;
?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration;
?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.)
c. Design a multiple-user Kalman filter (see Appendix B.2) of delay d to estimate
{sk (n)} as {?sk (n)} using the estimated channel ?H(1) and mean ?m(1). Quan-
tize {?sk (n)} into {?sk (n)} with the knowledge of the symbol alphabet (hard
decisions). Note that knowledge of {ck (n)} is used in sk (n) = bk (n) +ck (n),
therefore, we are in essence estimating {bk (n)}.
2. a. As Step 2.a of the DML approach.
b. Design a multiple-user Kalman filter using the estimated channel ?H(2), the mean
?m(2), as in Step 1.c.
131
0 5 10 15 20 25 30?55
?50
?45
?40
?35
?30
?25
?20
?15
?10
SNR (dB)
Normalized Channel MSE (dB)
Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=0Hz, 500 runs.
 
 
SI&DPS: step 1
SI&DPS: 1st iter.
SI&DPS: 2nd iter.
SI&DPS: 3rd iter.
SI&CE: step 1
SI&CE: 1st iter.
SI&CE: 2nd iter.
SI&CE: 3rd iter.
TM&DPS
TM&CE
Figure 5.5: DML approach (SISO): NCMSE vs SNR under fd = 0Hz (time-invariant) and
K = N = 1. The curves for CE- and DPS-BEM?s completely overlap, since the two basis
functions are both constant for time-invariant channels (Q = 1). (SI: superimposedtraining;
TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order
statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML
iteration; ?3rd iter.?: the third DML iteration.)
3. Step 2 provides one iteration of our proposed iterative method. Repeat a few times
till any (relative) improvement in channel estimation over previous iteration is below
a pre-specified threshold.
5.4 Simulation Examples
5.4.1 DML Approach: Single User
In this example, we adopt the same simulation conditions as in Section 3.6.1 to perform
a comparison between the first-order statistics-based estimator in Chapter 3 and our DML
132
0 5 10 15 20 25 30?45
?40
?35
?30
?25
?20
?15
?10
?5
SNR (dB)
Normalized Channel MSE (dB)
Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=50Hz, 500 runs.
 
 
SI&DPS: step 1
SI&DPS: 1st iter.
SI&DPS: 2nd iter.
SI&DPS: 3rd iter.
SI&CE: step 1
SI&CE: 1st iter.
SI&CE: 2nd iter.
SI&CE: 3rd iter.
TM&DPS
TM&CE
Figure 5.6: DML approach (SISO): NCMSE vs SNR under fd = 50Hz and K = N = 1.
(SI: superimposedtraining; TM: time-multiplexed training; CE:CE-BEM; DPS: DPS-BEM;
?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd
iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.)
approach. We generate a doubly-selective Rayleigh fading channel following Jakes? model
with N = 1 and L = 2.
In simulations, we pick a data record length of 420 symbols (time duration of approx-
imately 10ms). We consider the system operating under different Doppler spreads with
different number of basis functions Q. For the Doppler spreadsfd = 0, 50, 100, and 200Hz,
we take Q = 1, 3, 5, and 7 for the CE-BEM-based solution, and Q = 1, 3, 4, and 6 for
the DPS-BEM representations. The average transmitted power in {c(n)} is 0.3 of that in
{b(n)}, leading to TIR of 0.3.
133
0 5 10 15 20 25 30?40
?35
?30
?25
?20
?15
?10
?5
SNR (dB)
Normalized Channel MSE (dB)
Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=100Hz, 500 runs.
 
 
SI&DPS: step 1
SI&DPS: 1st iter.
SI&DPS: 2nd iter.
SI&DPS: 3rd iter.
SI&CE: step 1
SI&CE: 1st iter.
SI&CE: 2nd iter.
SI&CE: 3rd iter.
TM&DPS
TM&CE
Figure 5.7: DML approach (SISO): NCMSE vs SNR under fd = 100Hz and K = N = 1.
(SI: superimposedtraining; TM: time-multiplexed training; CE:CE-BEM; DPS: DPS-BEM;
?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd
iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.)
We first consider a single-user scenario. The information sequences {b(n)} and the
training sequences {c(n)} are all BPSK modulated. The periodic training sequence {c(n)}
is generated from the m-sequence of period P = 7, one period of which is given by (3.82).
To explore different estimators and their iterative versions under identical conditions,
we assume the additive noise {v(n)} is zero-mean (i.e., m = 0), white complex-Gaussian,
uncorrelated with {b(n)} with E{v(n +?)vH(n)} = ?2vIN?(?), so that no terms are dis-
carded in the first-order statistics-based estimator, the first step of our DML approach. The
(receiver) SNR refers to the energy per bit over one-sided noise spectral density with both
information and superimposed training sequence counting toward the bit energy. At the
receiver, a Viterbi detector is used for data reception.
134
0 5 10 15 20 25 30?40
?35
?30
?25
?20
?15
?10
?5
0
SNR (dB)
Normalized Channel MSE (dB)
Viterbi detector: K=N=1, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=200Hz, 500 runs.
 
 
SI&DPS: step 1
SI&DPS: 1st iter.
SI&DPS: 2nd iter.
SI&DPS: 3rd iter.
SI&CE: step 1
SI&CE: 1st iter.
SI&CE: 2nd iter.
SI&CE: 3rd iter.
TM&DPS
TM&CE
Figure 5.8: DML approach (SISO): NCMSE vs SNR under fd = 200Hz and K = N = 1.
(SI: superimposedtraining; TM: time-multiplexed training; CE:CE-BEM; DPS: DPS-BEM;
?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd
iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.)
The results for a record length of T = 420 symbols are shown in Figures 5.1?5.8 for
various Doppler spreads and SNR?s. The results are based on 500 Monte Carlo runs. To
compare with other possible approaches, CE- and DPS-BEM-based TM training described
in Appendix A is also considered for doubly-selective channel estimation. Training sessions
are periodically inserted in the transmitted symbol frame. We take a training session of
length of 2L+ 1 = 5 symbols with the training sequence braceleftbig0,0,?2L+1,0,0bracerightbig, and at the
receiver an LS estimation is performed. A data session of 18 symbols is inserted between
two successive training sessions to form a frame of length 23 symbols. Such a frame is
repeated over a record length of 418 symbols. Thus, we have a training-to-information bit
and power ratio of about 0.3.
135
0 5 10 15 20 25 3010
?6
10?5
10?4
10?3
10?2
10?1
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=0Hz, 500 runs.
 
 
SI&DPS: step 1
SI&DPS: 1st iter.
SI&DPS: 2nd iter.
SI&DPS: 3rd iter.
SI&CE: step 1
SI&CE: 1st iter.
SI&CE: 2nd iter.
SI&CE: 3rd iter.
TM&DPS
TM&CE
Figure 5.9: DML approach (MIMO): BER vs SNR under fd = 0Hz (time-invariant) and
K = N = 2. The curves for CE- and DPS-BEM?s completely overlap, since the two basis
functions are both constant for time-invariant channels (Q = 1). (SI: superimposedtraining;
TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the first-order
statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the second DML
iteration; ?3rd iter.?: the third DML iteration.)
For comparison, we plot the results of the CE- and DPS-BEM-based superimposed
training schemes (denoted as SI in the figures), including the first-order statistics-based
estimator (denoted as ?step 1? in the figures), and the DML approach after one, two, and
three iterations (denoted as ?1st iter.?, ?2nd iter.?, and ?3rd iter.? in the figures), and
TM training approaches (denoted as TM in the figures). Figures 5.1?5.4 show BER?s for
fd = 0, 50, 100, and 200 respectively. Figures 5.5?5.8 show the normalized channel MSE
correspondingly, which is defined as (3.83).
From the eight figures, we can see that after iterations, superimposed training-based
estimation and detection performances improve a lot, because the information data that
136
0 5 10 15 20 25 3010
?6
10?5
10?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=50Hz, 500 runs.
 
 
SI&DPS: step 1
SI&DPS: 1st iter.
SI&DPS: 2nd iter.
SI&DPS: 3rd iter.
SI&CE: step 1
SI&CE: 1st iter.
SI&CE: 2nd iter.
SI&CE: 3rd iter.
TM&DPS
TM&CE
Figure 5.10: DML approach (MIMO): BER vs SNR underfd = 50Hz and K = N = 2. (SI:
superimposed training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM;
?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration;
?2nd iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.)
are viewed as interference by the first-order statistics-based estimator are now exploited to
enhance the channel estimation for the next iteration. Therefore, the self-interference is
efficiently removed after iterations. The DML approach provides comparable error perfor-
mance with TM training, but at a higher data transmission rate. The valuable bandwidth
resources can thus be saved by using iterative estimation, at the expense of increased com-
putational complexity.
Since we assume the channel satisfies a BEM, the modeling error of the prescribed
BEM sets a limit for the estimation performance. Except for fd = 0Hz, the DPS-BEM is
much more accurate in describing a band-limited time-varying channel, so that it provides
much better estimation performance than the CE-BEM. For both BER and NCMSE curves,
137
0 5 10 15 20 25 3010
?6
10?5
10?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=100Hz, 500 runs.
 
 
SI&DPS: step 1
SI&DPS: 1st iter.
SI&DPS: 2nd iter.
SI&DPS: 3rd iter.
SI&CE: step 1
SI&CE: 1st iter.
SI&CE: 2nd iter.
SI&CE: 3rd iter.
TM&DPS
TM&CE
Figure 5.11: DML approach (MIMO): BER vs SNR under fd = 100Hz and K = N = 2.
(SI: superimposedtraining; TM: time-multiplexed training; CE:CE-BEM; DPS: DPS-BEM;
?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd
iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.)
the error floors of the DPS-BEM-based solutions are lower than that of the CE-BEM-based
ones.
As the Doppler spread fd increases, the performance of the DML-based superimposed
training deteriorates compared with the TM training. Figures 5.4 and 5.8 for fd = 200Hz
clearly show this result. It is partially because the first-order statistics-based estimator
performs worse for fast fading channels, since more basis functions involved to represent the
channel result in higher estimation variance. More iterations are required to approach the
performance of TM training.
138
0 5 10 15 20 25 3010
?6
10?5
10?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=200Hz, 500 runs.
 
 
SI&DPS: step 1
SI&DPS: 1st iter.
SI&DPS: 2nd iter.
SI&DPS: 3rd iter.
SI&CE: step 1
SI&CE: 1st iter.
SI&CE: 2nd iter.
SI&CE: 3rd iter.
TM&DPS
TM&CE
Figure 5.12: DML approach (MIMO): BER vs SNR under fd = 200Hz and K = N = 2.
(SI: superimposedtraining; TM: time-multiplexed training; CE:CE-BEM; DPS: DPS-BEM;
?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd
iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.)
5.4.2 DML Approach: Multiple Users
In this example, we follow the settings in Section 5.4.1 except that a multiple-user
scenario is considered. It can also be viewed as an extension of Section 3.6.2, for now the
iterative DML approach based on the multiple-user channel estimator using the first-order
statistics is considered.
Insimulations, we assume that all the usershave the same transmitted power in training
and information data. The average transmitted power in {ck (n)} is 0.3 of that in {bk (n)}
(k = 1,2,...,K), leading to the same TIR as in Section 5.4.1. We consider a simple two-user
scenario, i.e., K = 2, each user with two receive antennas, i.e., N = 2. The information
sequences{bk(n)}andthe training sequences{ck(n)}are all BPSKmodulated. The training
139
0 5 10 15 20 25 30?55
?50
?45
?40
?35
?30
?25
?20
?15
?10
SNR (dB)
Normalized Channel MSE (dB)
Viterbi detector: K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=0Hz, 500 runs.
 
 
SI&DPS: step 1
SI&DPS: 1st iter.
SI&DPS: 2nd iter.
SI&DPS: 3rd iter.
SI&CE: step 1
SI&CE: 1st iter.
SI&CE: 2nd iter.
SI&CE: 3rd iter.
TM&DPS
TM&CE
Figure 5.13: DML approach (MIMO): NCMSE vs SNR under fd = 0Hz (time-invariant)
and K = N = 2. The curves for CE- and DPS-BEM?s completely overlap, since the two
basis functions are both constant for time-invariant channels (Q = 1). (SI: superimposed
training; TM: time-multiplexed training; CE: CE-BEM; DPS: DPS-BEM; ?step 1?: the
first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd iter.?: the
second DML iteration; ?3rd iter.?: the third DML iteration.)
sequence is generated from the m-sequence of period ?P = 7 by the procedure we introduced
in Section 3.5.
The additive noise {v(n)} is also zero-mean, white complex-Gaussian, uncorrelated
with {bk(n)} with E{v(n+?)vH(n)} = ?2vI2?(?). The (receiver) SNR refers to the energy
per bit per user over one-sided noise spectral density with both information and superim-
posed training sequence counting toward the bit energy.
At the receive end, a Viterbi detector is usedforsymbol detection. We considerdifferent
Doppler spreads of fd = 0, 50, 100, and 200Hz for this communications system. We also
pick Q of CE-BEM as 1,3,5,7 and DPS-BEM as 1,3,4,6.
140
0 5 10 15 20 25 30?45
?40
?35
?30
?25
?20
?15
?10
?5
SNR (dB)
Normalized Channel MSE (dB)
Viterbi detector: K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=50Hz, 500 runs.
 
 
SI&DPS: step 1
SI&DPS: 1st iter.
SI&DPS: 2nd iter.
SI&DPS: 3rd iter.
SI&CE: step 1
SI&CE: 1st iter.
SI&CE: 2nd iter.
SI&CE: 3rd iter.
TM&DPS
TM&CE
Figure 5.14: DML approach (MIMO): NCMSE vs SNR under fd = 50Hz and K = N = 2.
(SI: superimposedtraining; TM: time-multiplexed training; CE:CE-BEM; DPS: DPS-BEM;
?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd
iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.)
The results for a record length ofT = 420 symbols are shown in Figures 5.9?5.16 for var-
ious Doppler spreads and SNR?s. The results are based on 500 Monte Carlo runs. For com-
parison, CE-BEM and DPS-BEM-based periodically placed TM training with zero-padding,
as we described in Appendix A, is also considered. We take a training session of length
(K +1)L+K = 8 symbols with the first user?s training
braceleftBig
01?2,radicalbig(K +1)L+K,01?5
bracerightBig
and
the second user?s
braceleftBig
01?5,radicalbig(K +1)L+K,01?2
bracerightBig
. A data session of 27 symbols is inserted
between two such training sessions to form a frame of length 35 symbols. Such a frame is
repeated over a record length of 420 bits. Thus, we have a training-to-information bit and
power ratio of about 0.3. At the receiver an LS estimation is performed.
141
0 5 10 15 20 25 30?40
?35
?30
?25
?20
?15
?10
?5
SNR (dB)
Normalized Channel MSE (dB)
Viterbi detector: K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=100Hz, 500 runs.
 
 
SI&DPS: step 1
SI&DPS: 1st iter.
SI&DPS: 2nd iter.
SI&DPS: 3rd iter.
SI&CE: step 1
SI&CE: 1st iter.
SI&CE: 2nd iter.
SI&CE: 3rd iter.
TM&DPS
TM&CE
Figure 5.15: DML approach (MIMO): NCMSE vs SNR underfd = 100Hz and K = N = 2.
(SI: superimposedtraining; TM: time-multiplexed training; CE:CE-BEM; DPS: DPS-BEM;
?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd
iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.)
Figures 5.9?5.16 show similar results as in the SISO case: The DML approach enhances
the channel estimation and data detection performances significantly over the first-order
statistics-based estimator for a multiple-user channel; DPS-BEM well outperforms CE-
BEM, so that it appears to be a good choice to approximate the time-varying channel.
5.5 Conclusions
We explored the DML approach in this chapter. By exploiting the fact that training
and information sequences pass through an identical channel, the iterative DML approach
was used to jointly improve the channel and sequence estimation. Beginning with the
first-order statistics-based channel estimator, the detected data symbols from the preceding
142
0 5 10 15 20 25 30?40
?35
?30
?25
?20
?15
?10
?5
0
SNR (dB)
Normalized Channel MSE (dB)
Viterbi detector: K=N=2, L=2, T=420, Ts=25?s, TIR=0.3, P=7, fd=200Hz, 500 runs.
 
 
SI&DPS: step 1
SI&DPS: 1st iter.
SI&DPS: 2nd iter.
SI&DPS: 3rd iter.
SI&CE: step 1
SI&CE: 1st iter.
SI&CE: 2nd iter.
SI&CE: 3rd iter.
TM&DPS
TM&CE
Figure 5.16: DML approach (MIMO): NCMSE vs SNR underfd = 200Hz and K = N = 2.
(SI: superimposedtraining; TM: time-multiplexed training; CE:CE-BEM; DPS: DPS-BEM;
?step 1?: the first-order statistics-based estimator; ?1st iter.?: the first DML iteration; ?2nd
iter.?: the second DML iteration; ?3rd iter.?: the third DML iteration.)
iteration are used to reduce the self-interference at the current iteration. A local maximum
of DML function is guaranteed. Symbol detection techniques such as Kalman filtering can
also be adopted instead of Viterbi algorithm to reduce the computational complexity in
iterations?the method can be viewed as an approximation of the DML approach.
143
Chapter 6
Doubly-Selective Channel Estimation Using Data-Dependent Superimposed
Training
6.1 Introduction
For the first-order statistics-based channel estimator proposed in Chapter 3, the infor-
mation sequence acts as interference resulting in a poor training SNR. Simulation results
have shown that noticeable error floors occur in BER and channel MSE curves for this
estimator. Although we can employ the DML method described in Chapter 5 to reduce
the interference, DML iterations add to computational complexity and delay in symbol
detection at the receiver.
Inspired by the work of [20], where the training sequence is distorted according to
the information data before transmission so as to eliminate the self-interference on recep-
tion, we extend this data-dependent method to time-varying channels by the aid of BEM
representations.
For the first-order statistics-based estimator using CE- or DPS-BEM, (4.11) addresses
the source of the estimation error (see Remark 4.2.2.2 for a detailed discussion):
?dmq = dmq +smq +wmq,
144
where the information sequence?s contribution, given by smq, interferes with the estimation
of dmq from ?dmq, and hence with channel estimation from the observations. For the CE-
BEM-based estimator,
smq := 1T
T?1summationdisplay
n=0
braceleftBigg Lsummationdisplay
l=0
h(n;l)b(n?l)
bracerightBigg
e?j(?q+?m)n, (6.1)
and for the DPS-BEM-based estimator,
smq :=
T?1summationdisplay
n=0
bracketleftBigg Lsummationdisplay
l=0
h(n;l)b(n?l)
bracketrightBigg
uq (n)e?j?mn (6.2)
as in (4.12). Our goal of the data-dependent superimposed training is to null out the
influence of smq on the channel estimation by transmitter-end processing.
In this chapter, we focus on transmitter-end processing techniques to reduce self-
interference of superimposed training. In Section 6.2, we present the data-dependent su-
perimposed training based on CE-BEM; this scheme is extended to channels satisfying
DPS-BEM representation in Section 6.3, where the approach of partially-data-dependent
superimposed training is also proposed. Our approaches are demonstrated by simulation
examples in Section 6.4, and Section 6.5 concludes this chapter.
6.2 Data-Dependent Superimposed Training Using CE-BEM
We assume:
(H6.2.1) The time-varying channel satisfies CE-BEM (3.3) where the frequencies?q?s (q =
1,2,...,Q) are distinct and known with ?q ? [0,2pi). Also N ? 1.
145
(H6.2.2) The information data sequence {b(n)} is zero-mean and white, with the variance
E{|b(n)|2} = ?2b;
(H6.2.3) Themeasurement noise{v(n)}iszero-mean, white, anduncorrelated with{b(n)},
with the autocorrelation Ebraceleftbigv(n+?)vH (n)bracerightbig= ?2vIN?(?);
(H6.2.4) The superimposed training sequence c(n) = c(n+P) for all n is a non-random
periodic sequence with period P and average power ?2c :=summationtextP?1n=0 |c(n)|2/P.
6.2.1 Data-Dependent Processing at the Transmitter
Consider the DFT of information sequence {b(n)} over the block n = 0,1,...,T ?1,
br := 1T
T?1summationdisplay
n=0
b(n)e?j?rn, ?r := 2pirT , (6.3)
for r = 0,1,...,T ?1 and b(n) =summationtextT?1r=0 brej?rn. Then the interference smq of (6.1) can be
expressed as
smq = 1T
T?1summationdisplay
n=0
?
?
?
Qsummationdisplay
q1=1
Lsummationdisplay
l=0
hq1 (l)ej?q1n
T?1summationdisplay
r=0
brej?r(n?l)
?
?
?e
?j(?q+?m)n
=
Qsummationdisplay
q1=1
Lsummationdisplay
l=0
T?1summationdisplay
r=0
bracketleftBig
hq1 (l)e?j?rlbr
bracketrightBigbracketleftBigg1
T
T?1summationdisplay
n=0
ej(?q1??q+?r??m)n
bracketrightBigg
=
Qsummationdisplay
q1=1
Lsummationdisplay
l=0
T?1summationdisplay
r=0
bracketleftBig
hq1 (l)e?j?rlbr
bracketrightBig
?((q1 ?q+r?mK)modT).
Therefore, if we can make br = 0 for r = q+mK ?q1, 1 ? q,q1 ?Q, m = 0,1,...,P ?1,
then smq = 0. We do so by modifying {c(n)} based on {b(n)} (at the transmitter).
146
Define a set
? := {r : ?(Q?1) +mK ?r ? (Q?1) +mK, m = 0,1,...,P ?1}. (6.4)
The frequency components {br : r ? ?} of the information sequence are hence the ?self-
interference?. Define a ?self-interference? sequence
be (n) :=
summationdisplay
r??
brej 2pirnT (6.5)
and a data-dependent superimposed training ?c(n) over the block n = 0,1,...,T ? 1 such
that
?c(n) := c(n)?be (n). (6.6)
Note that {?c(n)} is no longer periodic with period P. At the transmitter, we transmit
?c(n)+b(n) = c(n)+[b(n)?be (n)].
The model (3.1)?(3.4) holds with c(n) replaced with ?c(n). By construction, the DFT of
b(n)?be (n) over the block n = 0,1,...,T ? 1 vanishes at frequencies in the set ?. Also
the DFT of b(n?l)?be (n?l) over the block n = 0,1,...,T ?1 vanishes at frequencies
in the set ? provided that a cyclic prefix of length M ?L is used. A cyclic prefix of length
M is added at the transmitter by choosing
s(?i) = s(T ?i), i = 1,2,...,M ?L
147
where s(i) = ?c(i) + b(i). This allows linear convolution in (3.1) to be equal to circular
convolution (implicit in the DFT operation) over the block length n= 0,1,...,T ?1.
We summarize our data-dependent channel estimation solution as follows:
1. At the transmitter, we are given information sequence over a block as {b(n)} for
n = 0,1,...,T ?1 with T chosen as T =KP, K ?Q. Calculate the DFT by (6.3).
2. To eliminate interference with channel estimation at the receiver, we need to set br?s
to be zero for r ? ?. Define the self-interference sequence {be (n)} as in (6.5).
3. Define the data-dependent superimposed training ?c(n) as in (6.6). Use a cyclic prefix
of length M ?L and transmit.
The channel estimation given in (3.18) stays the same for data-dependent superimposed
training, because we still use periodic {c(n)} at the receiver, and we do not know be (n) or
b(n) at the receiver. It is easily established that now there is no contribution of {b(n)} to
?dmq for 0 ?m?P ?1 and 1 ?q ?Q.
6.2.2 Data Detection
Now the ?information sequence? is {b(n)?be (n)} whereas we are interested in {b(n)}.
We will follow an iterative solution, similar to the time-invariant results of [20]. The first
step in our solution is to use the estimated channel to detect {b(n)} via Viterbi algorithm
(ignoring be (n) but accounting for the known {c(n)}). Use the detected {b(n)} to estimate
{be (n)}, and iterate the detection procedure (but not channel estimation) with known
{c(n)} and estimate
braceleftBig?
be (e)
bracerightBig
from the previous iteration. Note that although iterations
148
are also employed, as the DML algorithm, we do not have to re-estimate the channel in the
subsequent iterations.
6.2.3 Performance Analysis
If the true channel follows the CE-BEM representation (3.3), the MSE in channel esti-
mation is then given by (4.2). We now relax the assumption (H6.2.3), i.e., the measurement
noise {v(n)} may be nonzero-mean, as in (H3.2.3). We further assume that
(H6.2.5) The time-varying channel {h(n;l)} is zero-mean, complex Gaussian with vari-
ance ?2h, and mutually independent for distinct l?s: Ebraceleftbigh(n;l)hH (n;l)bracerightbig= ?2hIN and
E{h(n1;l1)hH (n2;l2)} = 0, for l1 negationslash= l2, for all n1, n2, i.e., different channel taps are
independent of each other and are identically distributed zero-mean complex Gaus-
sian.
In the data-dependent superimposed training, the interference from the information
sequence {b(n)} has been canceled out, so smq = 0. Then by (4.11), ?dmq = dmq +wmq, so
that
E
braceleftBig
[?dm1q1 ?dm1q1][?dm2q2 ?dm2q2]H
bracerightBig
= Ebraceleftbigwm1q1wHm2q2bracerightbig
= 1T?2vIN?(m1 ?m2)?(q1 ?q2).
By (3.11) and (3.16), it follows that
cov{?D, ?D} = 1T?2vINQ(P?1). (6.7)
149
By (3.18), we also have
cov
braceleftBig?
H, ?H
bracerightBig
:= E
braceleftbiggparenleftBig
?H?HparenrightBigparenleftBig?H?HparenrightBigH
bracerightbigg
= (CHC)?1CH cov
braceleftBig?
D, ?D
bracerightBig
C(CHC)?1. (6.8)
Substitute (6.7) into (6.8), we have
cov
braceleftBig?
H, ?H
bracerightBig
= 1T?2v(CHC)?1.
In a manner similar to (4.16), it then follows that
MSE1 = E
?
?
?
Lsummationdisplay
l=0
Qsummationdisplay
q=1
vextenddoublevextenddouble
vextenddoublehq (l)??hq (l)
vextenddoublevextenddouble
vextenddouble
2
?
?
?
= NQ?
2v
T tr
braceleftbiggparenleftBig
VH diag
braceleftBig
|c1|2,|c2|2,...,|cP?1|2
bracerightBig
V
parenrightBig?1bracerightbigg
.
We note that the real channel over a block cannot be exactly equal to the CE-BEM
representation. Counting the modeling error, the total channel MSE can be expressed by
(4.34)
MSEc = MSE1 +MSE2,
where MSE1 comes from the estimation and MSE2 is the mean square modeling error.
Under the assumption (H4.2.5) and Jakes? model, the modeling error of CE-BEM is given
by (4.35).
150
6.3 Data-Dependent Superimposed Training Using DPS-BEM
Exploiting the fact that DPS sequences are also approximately band-limited, in this
section we extend the data-dependent superimposed training to DPS-BEM, so that spectral
leakage arising from CE-BEM is efficiently reduced.
We follow the assumptions:
(H6.3.1) The time-varying channel {h(n;l)} satisfies (3.25) with the DPS sequences {uq (n)}
known at the receiver. Also N ? 1.
(H6.3.2) The information sequence {b(n)} is zero-mean, white with E{| b(n)|2} = ?2b.
(H6.3.3) The measurement noise {v(n)} is zero-mean, white, uncorrelated with {b(n)}, with
E{v(n+?)vH(n)} = ?2vIN?(?).
(H6.3.4) The superimposed training sequencec(n) = c(n+P) for allnis a non-random periodic
sequence with period P.
Consider the interference smq in (6.2), which can be expressed by using (6.3) as
smq =
T?1summationdisplay
n=0
bracketleftBigg Lsummationdisplay
l=0
h(n;l)b(n?l)
bracketrightBigg
uq (n)e?j?mn
=
T?1summationdisplay
n=0
?
?
?
Qsummationdisplay
q1=1
Lsummationdisplay
l=0
hq1 (l)uq1 (n)
T?1summationdisplay
r=0
brej?r(n?l)
?
?
?uq (n)e
?j?mn
=
Qsummationdisplay
q1=1
Lsummationdisplay
l=0
T?1summationdisplay
r=0
bracketleftBig
hq1 (l)e?j?rlbr
bracketrightBigbracketleftBiggT?1summationdisplay
n=0
uq1 (n)uq (n)ej2pi(r?mK)T n
bracketrightBigg
.
151
We exploit the approximate band-limitedness of the time-limited DPS sequences, assuming
that
T?1summationdisplay
n=0
uq? (n)uq (n)ej2pi(r?mK)n/T ? 0
for |r?mK| ?Q+k, where k is an integer and (Q+k)/T > 2fdTs (k ? ?1). Therefore,
the information-induced interference comes from the frequency components br?s for those
r?s belonging to a set
? := {r : ?(Q+k) +mK ?r? (Q+k)+mK,m = 0,...,P ?1}. (6.9)
The frequency components {br :r ? ?} of the information sequence are hence the self-
interference. If we set br = 0 for those r ? ?, then smq = 0. We do so by modifying {c(n)}
based on {b(n)} at the transmitter. Define a self-interference sequence
be (n) :=
T?1summationdisplay
r=0,r??
brej?rn (6.10)
and a data-dependent superimposed training sequence {?c(n)}T?1n=0 such that
?c(n) := c(n)?be (n).
All the other steps for the DPS-BEM-based channel estimator using data-dependent super-
imposed training just follow the steps described in Section 6.2.1. For symbol detection, we
can also follow the iterative approach addressed in Section 6.2.2.
152
6.3.1 Partially-Data-Dependent (PDD) Superimposed Training
In the data-dependent superimposed training method, by setting br = 0 for r ? ?
before transmission, we discard the ?frequency components? of the information sequence
corresponding to ?, so that information-induced self-interference is eliminated. By (6.9),
however, the information contained at those P(2Q+ 2k + 1) (among total T) frequencies
is also discarded. Though it may be partially recovered by other properties (e.g., the finite
alphabet of the information sequence [22]), it can cause severe detection errors under severe
frequency loss.
For the first-order statistics-based estimator we discussed in Section 3.3, the sequence
{be (n)} acts as self-interference in channel estimation, whereas it also bears ?information?
so that it should not be totally discarded. Given {be (n)} as in (6.10), we now transmit
s(n) = c(n)+b(n)?(1??)be (n) (6.11)
at the transmitter, where 0 ? ? ? 1 is the self-interference factor. When ? = 1, the
information sequence {b(n)} keeps intact, corresponding to the first-order statistics-based
estimator in Section 3.3. When ? = 0, the interference-induced frequency components br?s
(r ? ?) are totally annihilated, corresponding to the data-dependent solution described in
Section 6.2 and Section 6.3. If 0 < ? ? 1, which is the partially-data-dependent (PDD)
case, at each r? ? the frequency componentbr is reduced to ?br. Then the self-interference
will be ?sufficiently? suppressed when conducting channel estimation, while the frequency
components at r ? ? remain ?partially? intact. Note that in this PDD method, the self-
interference is not completely removed, so that the channel estimation is not as accurate
153
as that of the data-dependent solution. But since no information-bearing frequencies are
nulled out, the information contained there can be recovered in data reception.
Our PDD superimposed training-based channel estimation follows the data-dependent
solution described in Section 6.2, except that we define the PDD superimposed training
sequence as
?c(n) := c(n)?(1??)be (n)
instead of (6.6) with a prescribed interference factor ?.
For data detection at the receiver, the ?information sequence? is {b(n)?(1??)be (n)}
(0 ? ? ? 1), while we are interested in {b(n)}. We can first use the estimated channel to
detect {b(n)} via Viterbi algorithm (ignoring (1??)be (n) but accounting for the known
{c(n)}). Since the training andinformation sequences passthroughan identical channel, the
DML approach described in Section 5.2 can be exploited to recover the suppressedfrequency
components br?s (r ? ?), as well as enhance the channel estimation, in an iterative way (see
Section 6.3.4).
6.3.2 Performance Analysis
We wish to evaluate the MSE in channel estimation, defined in (4.2), using PDD
superimposed training when the true channel follows DPS-BEM. We make the following
assumption about the channel h(n;l):
(H6.3.5) The time-varying channel {h(n;l)} is zero-mean, complex Gaussian with vari-
ance ?2h, and mutually independent for distinct l?s: Ebraceleftbigh(n;l)hH (n;l)bracerightbig= ?2hIN and
E{h(n1;l1)hH (n2;l2)} = 0, for l1 negationslash= l2, for all n1, n2, i.e., different channel taps are
154
independent of each other and are identically distributed zero-mean complex Gaus-
sian.
Note that in PDD superimposed training, (4.11) still holds if smq is revised as
smq :=
T?1summationdisplay
n=0
braceleftBigg Lsummationdisplay
l=0
h(n;l)[b(n?l)?(1??)be (n)]
bracerightBigg
uq (n)e?j?mn.
By (3.34) and (6.10), we have
Ebraceleftbigsm?q?sHmqbracerightbig=?2 (L+1)?2h?2bIN?parenleftbigm? ?mparenrightbig?parenleftbigq? ?qparenrightbig. (6.12)
Then from (3.36)
cov{ ?H, ?H} := E{[ ?H?H][ ?H?H]H} =
parenleftBig?
CH ?C
parenrightBig?1 ?
CH cov{??D, ??D}?C
parenleftBig?
CH ?C
parenrightBig?1
. (6.13)
Since
E
braceleftBig
[?dmq ?dmq][?dm?q? ?dm?q?]H
bracerightBig
= Ebraceleftbigsm?q?sHmqbracerightbig+Ebraceleftbigwm?q?wHmqbracerightbig,
by (4.10) and (6.12) we have
cov{?D, ?D} =bracketleftbig?2 (L+1)?2h?2b +?2vbracketrightbigINPQ. (6.14)
Substitute (6.14) for (6.13)
cov{ ?H, ?H} =bracketleftbig?2 (L+1)?2h?2b +?2vbracketrightbig
parenleftBig?
CH ?C
parenrightBig?1
.
155
Using orthonormality of DPS sequences, the MSE in channel estimation is given by
MSE1 = 1TE
?
?
?
Lsummationdisplay
l=0
Qsummationdisplay
q=1
vextenddoublevextenddouble
vextenddoublehq (l)??hq (l)
vextenddoublevextenddouble
vextenddouble
2
?
?
?=
1
T tr{cov{
?H, ?H}}
= ?
2 (L+1)?2
h?
2
b +?
2v
T tr{
parenleftBig?
CH ?C
parenrightBig?1
}. (6.15)
The MSE of the first-order statistics-based channel estimator in Section 3.3 is given by
(6.15) for ? = 1, and that of the data-dependent solution in Section 6.3 is corresponding to
? = 0. For a PDD scheme with 0<? ? 1, the interference is significantly suppressed.
6.3.3 Power Allocation and Self-Interference Suppression
We consider the issues of superimposed training power allocation and self-interference
suppression under the channel assumption (H6.3.5). Using the channel estimation variance
developed in (6.15), we cast power allocation and self-interference suppression as jointly
optimizing an SNR for equalizer design, as we discussed in Section 4.4. Since the channel
estimate is used for equalizer design, we set up a model for the received signal, in which an
estimation error-related term and the additive noise act as effective noise, while the effective
signal is given by the output of the estimated channel driven by the information sequence.
This SNR is maximized under a transmission power constraint.
Removing the estimated time-varying mean from the received data, define
?y(n) := y(n)?
Lsummationdisplay
l=0
?h(n;l)c(n?l).
156
We then have
?y(n) =
Lsummationdisplay
l=0
?h(n;l)b(n?l)
bracehtipupleft bracehtipdownrightbracehtipdownleft bracehtipupright
=:xs(n)
+
Lsummationdisplay
l=0
bracketleftBig
h(n;l)??h(n;l)
bracketrightBig
[b(n?l) +c(n?l)]?
Lsummationdisplay
l=0
h(n;l)(1??)be (n?l)+v(n)
bracehtipupleft bracehtipdownrightbracehtipdownleft bracehtipupright
=:w(n)
.
(6.16)
The power of the ?signal? part xs (n) at time n is
E
braceleftBig
bardblxs (n)bardbl2
bracerightBig
= ?2b
Lsummationdisplay
l=0
E
braceleftbiggvextenddoublevextenddouble
vextenddouble?h(n;l)
vextenddoublevextenddouble
vextenddouble
2bracerightbigg+Oparenleftbig1/T2parenrightbig (6.17)
whereOparenleftbig1/T2parenrightbigterm accounts for dependence between ?hq (l) and {b(n)} (see the Appendix
of [32] for details). Furthermore,
Lsummationdisplay
l=0
E
braceleftbiggvextenddoublevextenddouble
vextenddouble?h(n;l)
vextenddoublevextenddouble
vextenddouble
2bracerightbigg= Lsummationdisplay
l=0
bracketleftbigg
EH
braceleftbigg
E
braceleftbiggvextenddoublevextenddouble
vextenddouble?h(n;l)?h(n;l)
vextenddoublevextenddouble
vextenddouble
2vextendsinglevextendsinglevextendsingle
vextendsingleH
bracerightbiggbracerightbigg
+E
braceleftBig
bardblh(n;l)bardbl2
bracerightBigbracketrightbigg
= MSE+N(L+1)?2h. (6.18)
Taking the time-average of E
braceleftBig
bardblxs (n)bardbl2
bracerightBig
, by (6.17) and (6.18) we obtain (neglecting
Oparenleftbig1/T2parenrightbig term)
??2xs := 1T
T?1summationdisplay
n=0
E
braceleftBig
bardblxs (n)bardbl2
bracerightBig
= ?2b bracketleftbigMSE1 +N(L+1)?2hbracketrightbig. (6.19)
157
The time-averaged ?noise? power in (6.16) is given by
??2w := 1T
T?1summationdisplay
n=0
E
braceleftBig
bardblw(n)bardbl2
bracerightBig
= ?2b MSE1 +N?2v +B +R?S1 ?S2 +Oparenleftbig1/T2parenrightbig,
where
B = (1??)
2
T
T?1summationdisplay
n=0
Lsummationdisplay
l1=0
Lsummationdisplay
l2=0
EbraceleftbighH (n;l1)h(n;l2)bracerightbigE{b?e (n?l1)be (n?l2)};
R = 1T
T?1summationdisplay
n=0
Lsummationdisplay
l1=0
Lsummationdisplay
l2=0
E
braceleftbiggbracketleftBig
?h(n;l1)?h(n;l1)bracketrightBigHbracketleftBig?h(n;l2)?h(n;l2)bracketrightBig
bracerightbigg
?c(n?l1)?c(n?l2);
S1 = 1??T
T?1summationdisplay
n=0
Lsummationdisplay
l1=0
Lsummationdisplay
l2=0
E
braceleftbiggbracketleftBig
h(n;l1)??h(n;l1)
bracketrightBigH
h(n;l2)
bracerightbigg
E{b?(n?l1)be (n?l2)};
S2 = SH1 .
Since
E
braceleftbiggbracketleftBig
h(n;l1)??h(n;l1)
bracketrightBigH
h(n;l2)
bracerightbigg
= EH
braceleftbigg
E
braceleftbiggbracketleftBig
h(n;l1)??h(n;l1)
bracketrightBigH
h(n;l2)
vextendsinglevextendsingle
vextendsinglevextendsingleH
bracerightbiggbracerightbigg
= EH
braceleftBig
[h(n;l1)?h(n;l1)]H h(n;l2)
bracerightBig
= 0,
we have S1 = SH2 = 0.
158
Note that |?| = P (2Q+2k+1). Consider
E{be (n?l)b?e (n?l)} = E
?
?
?
summationdisplay
r1??
summationdisplay
r2??
br1b?r2ej 2piT (r1?r2)(n?l)
?
?
?
= 1T2E
?
?
?
summationdisplay
r1??
summationdisplay
r2??
T?1summationdisplay
n1=0
T?1summationdisplay
n2=0
b(n1)b?(n2)e?j 2piT r1n1ej 2piT r2n2ej 2piT (r1?r2)(n?l)
?
?
?
= ?
2
b
T2
summationdisplay
r1??
summationdisplay
r2??
T?1summationdisplay
n1=0
e?j 2piT (r1?r2)n1ej 2piT (r1?r2)(n?l) = P (2Q+2k+1)?
2
b
T .
Therefore,
B = (1??)
2
T
T?1summationdisplay
n=0
Lsummationdisplay
l1=0
Lsummationdisplay
l2=0
N?2h?(l1 ?l2)E{b?e (n?l1)be (n?l2)}
= N(L+1)PT (2Q+2k+1)(1??)2?2h?2b.
Now we consider R,
R = 1T
Lsummationdisplay
l1=0
Lsummationdisplay
l2=0
Qsummationdisplay
q1=1
Qsummationdisplay
q2=1
E
braceleftBigbracketleftBig?
hHq1 (l1)?hHq1 (l1)
bracketrightBigbracketleftBig?
hq2 (l2)?hq2 (l2)
bracketrightBigbracerightBig
?
P?1summationdisplay
m1=0
P?1summationdisplay
m2=0
c?m1cm2ej(?m1l1??m2l2)
bracketleftBiggT?1summationdisplay
n=0
uq1 (n)uq2 (n)ej(?m2??m1)n
bracketrightBigg
= 1T
Lsummationdisplay
l1=0
Lsummationdisplay
l2=0
Qsummationdisplay
q1=1
Qsummationdisplay
q2=1
E
braceleftBigbracketleftBig?
hHq1 (l1)?hHq1 (l1)
bracketrightBigbracketleftBig?
hq2 (l2)?hq2 (l2)
bracketrightBigbracerightBig
?
P?1summationdisplay
m1=0
P?1summationdisplay
m2=0
c?m1cm2ej(?m1l1??m2l2)?(q1 ?q2)?(m1 ?m2)
= 1TE
?
?
?
bracketleftBigg Lsummationdisplay
l=0
parenleftBig?
Hl ?Hl
parenrightBig
e?j?ml
bracketrightBiggHbracketleftBigg Lsummationdisplay
l=0
parenleftBig?
Hl ?Hl
parenrightBig
e?j?ml
bracketrightBigg??
?
P?1summationdisplay
m=0
|cm|2.
159
Define E (m) :=
bracketleftbigg
1 e?j?m ??? e?j?mL
bracketrightbigg
, then R becomes
R = 1TE
braceleftbiggbracketleftBig
E (m)?INQ
parenleftBig?
H?H
parenrightBigbracketrightBigHbracketleftBig
E (m)?INQ
parenleftBig?
H?H
parenrightBigbracketrightBigbracerightbiggP?1summationdisplay
m=0
|cm|2
= 1TE
braceleftbiggparenleftBig
?H?HparenrightBigH CHCparenleftBig?H?HparenrightBig
bracerightbigg
= 1T tr
braceleftBig
cov
braceleftBig?
H, ?H
bracerightBig
CHC
bracerightBig
= NQ(L+1)T bracketleftbig?2 (L+1)?2h?2b +?2vbracketrightbig.
Using these expressions we obtain (also neglecting Oparenleftbig1/T2parenrightbig term)
??2w = ?2b MSE+N?2v + N(L+1)PT (2Q+2k+1)(1??)2?2h?2b
+ NQ(L+1)T bracketleftbig?2 (L+1)?2h?2b +?2vbracketrightbig. (6.20)
We define the training power overhead ? as
? :=
1
P
summationtextP?1
n=0 |c(n)|
2
1
P
summationtextP?1
n=0 E
braceleftBig
|b(n)+c(n)|2
bracerightBig = ?
2c
?2b +?2c. (6.21)
For a fixed SNR or transmitted power budget, higher ? implies smaller effective SNR at the
receiver due to decreased power in the information sequence but higher channel estimation
accuracy; similarly, higher ? in (6.11) implies less ?information? impairment but higher
self-interference remaining. A trade-off must be considered in choosing appropriate ? and
? for the PDD superimposed training.
The equalization SNR of (6.16), as a function of ? and ?, is (implicitly) obtained as
SNRd (?,?) := ??
2xs (?,?)
??2w (?,?). (6.22)
160
Our objective is to maximize SNRd (?,?) with respect to ? and ? under the constraint
of a fixed power: PT := ?2b + ?2c is fixed. Then ?2c = PT? and ?2b = PT (1??). Incor-
porating these constraint-carrying variables in (6.22) via (6.19) and (6.20), we obtain an
unconstrained cost
SNRd (?,?) = f1?
2 +f2?+f3
g1?2 +g2?+g3
where with ?C := ??1c ?C
f1 = (L+1)?2hPT
bracketleftBig
?2 tr{parenleftbig?CH ?Cparenrightbig?1}?NT
bracketrightBig
,
f2 = N(L+1)?2hPTT ?bracketleftbig2?2 (L+1)?2hPT +?2vbracketrightbigtr{parenleftbig?CH ?Cparenrightbig?1},
f3 =bracketleftbig?2 (L+1)?2hPT +?2vbracketrightbigtr{parenleftbig?CH ?Cparenrightbig?1},
g1 = (L+1)?2hPT
bracketleftBig
?2 tr{parenleftbig?CH ?Cparenrightbig?1}?NP (2Q+2k+1)(1??)2 ?N(L+1)Q?2
bracketrightBig
,
g2 = ?bracketleftbig2?2 (L+1)?2hPT +?2vbracketrightbigtr{parenleftbig?CH ?Cparenrightbig?1}
+N(L+1)?2hPT
bracketleftBig
P (2Q+2k+1)(1??)2 +(L+1)Q?2
bracketrightBig
+N?2v [Q(L+1) +T],
g3 = f3.
Similarly, this unconstrained cost can also be expressed as
SNRd (?,?) = f
?1?2 +f?2?+f?3
g?1?2 +g?2?+g?3
161
where
f?1 = (L+1)?2hPT (1??)2 tr{parenleftbig?CH ?Cparenrightbig?1}, f?2 = 0,
f?3 = NT (L+1)?2hPT?(1??) +?2v (1??)tr{parenleftbig?CH ?Cparenrightbig?1},
g?1 = (L+1)?2hPT (1??)
?bracketleftbig(1??)tr{(?CH ?C)?1}+NP (2Q+2k+1)?+NQ(L+1)?bracketrightbig,
g?2 = ?2N (L+1)?2hP (2Q+2k+1)PT (1??)?,
g?3 = ?2v tr{parenleftbig?CH ?Cparenrightbig?1}(1??) +N(L+1)?2hPTP (2Q+2k+1)(1??)?
+N?2v?[Q(L+1) +T].
We seek the optimum value of ? and ? by setting the partial derivatives of the uncon-
strained cost (6.22) to zero:
?[SNRd (?,?)]
?? = 0, and
?[SNRd (?,?)]
?? = 0.
The above quadratic in ? has two roots, of which the root lying in [0,1] is
?o (?) = (f1g2 ?f2g1)?1 (f3g1 ?f1g3
?
radicalBig
?f1f2g2g3 ?2f1f3g1g3 ?f2f3g1g2 +f22g1g3 +f1f3g22 +f21g23 +f23g21
parenrightbigg
. (6.23)
Note that this optimum ? is a function of ?. The optimum ?, which is a function of ?
likewise, can be acquired in the same way as in (6.23) with fi and gi (i = 1,2,3) replaced
162
with f?i and g?i:
?o (?) =parenleftbigf?1g?2 ?f?2g?1parenrightbig?1parenleftbigf?3g?1 ?f?1g?3
?
radicalBig
?f?1f?2g?2g?3 ?2f?1f?3g?1g?3 ?f?2f?3g?1g?2 +f?22 g?1g?3 +f?1f?3g?22 +f?21 g?23 +f?23 g?21
parenrightbigg
.
(6.24)
The joint optimization of ? and ? can be achieved by applying (6.23) and (6.24) iteratively:
The result of (6.23) is substituted in (6.24) and vice-versa; there is no guarantee of global
optimization. An alternative is to do a two-dimensional grid search for joint optimization of
(6.22) with respect to ? and ?, both restricted to [0,1], to obtain a ?coarse? optimization,
and then follow up with a ?fine? optimization via iterative computation of (6.23) and (6.24).
6.3.4 Recovery of Suppressed Frequencies via DML Approach
We now consider joint channel and information sequence estimation via an iterative
DML approach assumingthat the noisev(n)is complex Gaussian. As notedin Section 6.3.1,
for data detection at the receiver, now the ?information sequence? is {b(n)?(1??)be (n)}
(0 ? ? ? 1), while we are interested in {b(n)}. We can first use the estimated channel to
detect {b(n)} via Viterbi algorithm (ignoring (1??)be (n) but accounting for the known
{c(n)}). Since the training and information sequences pass through an identical channel,
this fact can be exploited to recover the suppressed frequency components br?s (r ? ?), as
well as enhance the channel estimation, in an iterative way. In subsequent iterations, we
regenerate be (n) as ?be (n) using the detected b(n) from the previous iteration, and so on.
163
To consider a general case, we assumethe measurement noise v(n)is complex Gaussian,
but may be nonzero-mean as in Section 5.2 (E{v(n)} = m, with m unknown). We define
Y : =
bracketleftbigg
yT (T ?1) yT (T ?2) ??? yT (L)
bracketrightbiggT
,
s :=
bracketleftbigg
s(T ?1) s(T ?2) ??? s(0)
bracketrightbiggT
, (6.25)
and similarly define b, be, and c from b(n), be (n), and c(n), respectively, following (6.25).
Therefore,
s := b+c+(1??)be.
Let ?v(n) := v(n)?m. We then have the linear model
Y = T (s)H+
?
??
??
??
?v(T ?1)
...
?v(L)
?
??
??
??
bracehtipupleft bracehtipdownrightbracehtipdownleft bracehtipupright
=:?V
+
?
??
??
??
m
...
m
?
??
??
??
bracehtipupleftbracehtipdownrightbracehtipdownleftbracehtipupright
=:M
(6.26)
where T (s) is a block Hankel matrix given by
T (s) :=
?
??
??
??
s(T ?1)?T?1 ... s(T ?L?1)?T?1
... ... ...
s(L)?L ... s(0)?L
?
??
??
??
?n :=
bracketleftbigg
u1 (n)IN u2 (n)IN ... uQ (n)IN
bracketrightbigg
.
164
Also, an alternative linear model for Y is given by
Y = F (H)s+ ?V+M (6.27)
where
F (H) :=
?
??
??
??
h(T ?1;0) ??? h(T ?1;L)
... ...
h(L;0) ??? h(L;L)
?
??
??
??
is a ?filtering matrix?.
By (6.26), consider the joint estimation
braceleftBig?
H,?s, ?m
bracerightBig
= arg min
H,s,m
bardblY?T (s)H?Mbardbl2, (6.28)
where, when we estimate s, we are estimating b with known or estimated values of c and
be. Under the white Gaussian noise assumption, the DML estimations are obtained by the
nonlinear least-squares optimization (6.28). Using (6.26) and (6.27), we have a separable
nonlinear LS problem that can be solved sequentially as
braceleftBig?
H,?s, ?m
bracerightBig
= argmins
braceleftbigg
min
H,m
bardblY?T (s)H?Mbardbl2
bracerightbigg
= argmin
H,m
braceleftBig
mins bardblY?F (H)s?Mbardbl
bracerightBig
.
The finite alphabet properties of the information sequences can also be incorporated
into the DML methods. These algorithms iterate between estimates of the channel and the
input sequences. At iteration i, with an initial guess of the channel H(i), self-interference
165
b(i)e and the mean m(i), the algorithm estimates the input sequence s(i) and the channel
H(i+1) and mean m(i+1) for the next iteration by
s(i) = argmin
b?B
vextenddoublevextenddouble
vextenddoubleY?F
parenleftBig
H(i)
parenrightBig
s?M(i)
vextenddoublevextenddouble
vextenddouble
2, with s := b+c+(1??)b(i)
e (6.29)
H(i+1) = argmin
H
vextenddoublevextenddouble
vextenddoubleY?T
parenleftBig
s(i)
parenrightBig
H?M(i)
vextenddoublevextenddouble
vextenddouble
2, (6.30)
m(i+1) = argminm
vextenddoublevextenddouble
vextenddoubleY?T
parenleftBig
s(i)
parenrightBig
H(i+1)?M
vextenddoublevextenddouble
vextenddouble
2 (6.31)
where B is the (discrete) domain of b. The optimizations in (6.30) and (6.31) are linear LS
problems having the solutions
m(i+1) = 1T ?L
T?1summationdisplay
n=L
bracketleftBigg
y(n)?
Lsummationdisplay
l=0
h(i+1) (n;l)s(i) (n?l)
bracketrightBigg
.
?H(i+1) = T ?parenleftBigs(i)parenrightBigbracketleftBigY?M(i)bracketrightBig.
whereas the optimization in (6.29) can be achieved by using the Viterbi algorithm (see
Appendix B.1 for details).
During the ?start-up? (Step 1), b(1)e = 0 and ?H(1) is obtained from the end of Section
6.3.1. Since the above iterative procedure involving (6.29)?(6.31) decreases the cost at every
iteration, one achieves a local minimum of the nonlinear least-squares cost (local maximum
of DML function).
166
0 5 10 15 20 25 3010
?5
10?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=1, L=5, T=840, Ts=25?s, TIR=0.1, P=7, 500 runs.
 
 
SI: ?=1, fd=0Hz
SI: ?=1, fd=50Hz
SI: ?=0, fd=0Hz
SI: ?=0, fd=50Hz
TM: fd=0Hz
TM: fd=50Hz
Figure 6.1: Data-dependent superimposed training (CE-BEM): BER vs SNR for non-data-
dependent, data-dependent, and time-multiplexed training, under fd = 0 and 50Hz. (SI:
superimposed training; TM: time-multiplexed training; ? = 1: non-data-dependent train-
ing; ? = 0: total elimination of self-interference.)
6.4 Simulation Examples
6.4.1 Data-Dependent Superimposed Training Using CE-BEM
We first consider a doubly-selective Rayleigh fading channel following the Jakes? model
with N = 1, 2, or 3, andL = 5 (6 taps). We scale {h(n;l)} to achieve an exponential power
delay profile given by E
braceleftBig
|h(n;l)|2
bracerightBig
= e?0.2l/(L+1). We employ the communications system
described in Section 2.6.3 withTs = 25?s. We consider the system operating under different
Doppler spreads with different number of basis functionsQ. For the Doppler spreadsfd = 0,
50, 100, and 200Hz (corresponding to the normalized Doppler spreads fdTs = 0, 0.00125,
0.0025, and 0.005), we take Q = 1, 5, 7, and 11.
167
0 5 10 15 20 25 3010
?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=1, L=5, T=840, Ts=25?s, TIR=0.1, P=7, 500 runs.
 
 
SI: ?=1, fd=100Hz
SI: ?=1, fd=200Hz
SI: ?=0, fd=100Hz
SI: ?=0, fd=200Hz
TM: fd=100Hz
TM: fd=200Hz
TM: fd=200Hz, TIR=0.167
Figure 6.2: Data-dependent superimposed training (CE-BEM): BER vs SNR for non-data-
dependent, data-dependent, and time-multiplexed training, under fd = 100 and 200Hz.
(SI: superimposed training; TM: time-multiplexed training; ? = 1: non-data-dependent
training; ? = 0: total elimination of self-interference.)
Additive noise in each example is zero-mean complex white Gaussian. The (receiver)
SNR refers to the energy per symbol over one-sided noise spectral density with both informa-
tion and superimposed training sequence counting toward the symbol energy. Information
sequences are BPSK. We take the superimposed training sequence period P = 7; it is given
by c(n) = ?cejpin(n+?)/P where ? = 1 if P is odd, and ? = 2 if P is even, as in [59]. For
superimposed training, the average transmitted power in {c(n)} is 0.1 of that in {b(n)},
leading to TIR of 0.1. All the simulation results are based on 500 Monte Carlo runs.
The results for a data record length of T = 840 symbols are shown in Figures 6.1?
6.4 for various Doppler spreads and SNR?s. We use the formulation suggested in Section
3.2 by omitting terms corresponding to ?0 in (3.5c). For comparison, CE-BEM-based TM
training described in Appendix A is also considered, where training sessions are periodically
168
0 5 10 15 20 25 30?45
?40
?35
?30
?25
?20
?15
?10
?5
0
5
SNR (dB)
Normalized Channel MSE (dB)
 K=N=1, L=5, T=840, Ts=25?s, TIR=0.1, P=7, 500 runs.
 
 
SI: ?=1, fd=0Hz
SI: ?=1, fd=50Hz
SI: ?=0, fd=0Hz
SI: ?=0, fd=50Hz
TM: fd=0Hz
TM: fd=50Hz
Figure 6.3: Data-dependent superimposed training (CE-BEM): NCMSE vs SNR for non-
data-dependent, data-dependent, and time-multiplexed training, under fd = 0 and 50Hz.
(SI: superimposed training; TM: time-multiplexed training; ? = 1: non-data-dependent
training; ? = 0: total elimination of self-interference.)
inserted between information data sessions. We take a training session of length of 2L+1 =
11 symbols with the training sequence braceleftbig01?5,?2L+1,01?5bracerightbig, and at the receiver an LS
estimation is performed. A data session of 110 symbols is inserted between two successive
training sessions to form a frame of length 121 symbols. Such a frame is repeated over a
record length of 847 symbols, i.e., a block consists nf = 7 such frames. Therefore, we have
a training-to-information bit and power ratio of about 0.1. The normalized channel MSE
(NCMSE) in channel estimation shown in the figures is defined in (3.83). The corresponding
detection results are based on Viterbi algorithm utilizing the estimated channel.
It is seen that the data-dependent superimposed training (? = 0) yields superior results
compared with the common (non-data dependent) superimposed training (? = 1), and
furthermore it is competitive with TM training without incurring the 10% training overhead
169
0 5 10 15 20 25 30?25
?20
?15
?10
?5
0
5
10
SNR (dB)
Normalized Channel MSE (dB)
 K=N=1, L=5, T=840, Ts=25?s, TIR=0.1, P=7, 500 runs.
 
 
SI: ?=1, fd=100Hz
SI: ?=1, fd=200Hz
SI: ?=0, fd=100Hz
SI: ?=0, fd=200Hz
TM: fd=100Hz
TM: fd=200Hz
TM: fd=200Hz, TIR=0.167
Figure 6.4: Data-dependent superimposed training (CE-BEM): NCMSE vs SNR for non-
data-dependent, data-dependent, and time-multiplexed training, under fd = 100 and
200Hz. (SI: superimposed training; TM: time-multiplexed training; ? = 1: non-data-
dependent training; ? = 0: total elimination of self-interference.)
penalty resulting ina data transmissionrate loss. ComparedwithTMtraining, the proposed
data-dependent superimposed training yields slightly better BER for SNR ? 15dB when
fd negationslash= 0, and (slightly) worse BER for SNR ? 10dB. There is an error floor with increasing
SNR for all nonzero fd?s due to modeling errors of CE-BEM. Notice the lack of error floor
for fd = 0 (no modeling error) in Figures 6.1 and 6.3. When Q = 11 (corresponding to
fd = 200Hz), for TM training we cannot satisfy training-to-information bit and power
ratio of 0.1, since the number of basis functions is larger than the number of frames in one
block, i.e., Q>nf. We had to settle for nf = 7 leading to loss of parameter identifiability
(more unknown than equations). In Figures 6.2 and 6.4 we also show the results fornf = 11
leading to a shorter data session with training-to-information bit and power ratio of 0.167?
it is so labeled in Figures 6.2 and 6.4. The performance clearly improves and it is better
170
0 5 10 15 20 25 3010
?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=1, L=5, T=840, fd=100Hz, Ts=25?s, TIR=0.1, P=7, 500 runs.
 
 
SI: ?=0, N=1
SI: ?=0, N=2
SI: ?=0, N=3
TM: N=1
TM: N=2
TM: N=3
SI: ?=1, N=1
SI: ?=1, N=2
SI: ?=1, N=3
Figure 6.5: Data-dependent superimposed training (CE-BEM): BER vs SNR for non-data-
dependent, data-dependent, and time-multiplexed training, under fd = 100Hz and N = 1,
2, and 3. (SI: superimposed training; TM: time-multiplexed training; ? = 1: non-data-
dependent training; ? = 0: total elimination of self-interference.)
than that of data-dependent superimposed training, but at the cost of 16.7% reduction in
transmission rate.
Figure 6.5 shows the detection results (based on estimated channel and Viterbi algo-
rithm) for multiple receivers when fd = 100Hz and N = 1, 2, and 3. Again we see that
data-dependent superimposed training is better than TM training without incurring the
10% training overhead penalty (which results in a transmission rate penalty).
Now we consider a fast fading channel: N = 1, L = 1 (2 taps), a uniform power delay
profile and Ts = 200?s; the rest is as the above channel. Therefore, for fd = 100 and
250Hz, the normalized Doppler spreads fdTs = 0.02 and 0.05 (corresponding value of Q are
35 and 85, respectively). Here we also keep T = 840.
171
0 5 10 15 20 25 3010
?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=1, L=1, T=840, Ts=25?s, TIR=0.1, P=2, 500 runs.
 
 
SI: ?=1, fd=100Hz
SI: ?=1, fd=250Hz
SI: ?=0, fd=100Hz
SI: ?=0, fd=250Hz
TM: fd=100Hz
TM: fd=250Hz
TM: fd=100Hz, TIR=0.143
TM: fd=250Hz, TIR=0.429
Figure 6.6: Data-dependent superimposed training (fast fading): BER vs SNR under fd =
100 and 250Hz. (SI: superimposed training; TM: time-multiplexed training; ? = 1: non-
data-dependent training; ? = 0: total elimination of self-interference.)
Now the performance of all schemes is worse because of a large number of unknowns
to be estimated. However, data-dependent superimposed training still outperforms TM
training when we enforce the constraint training-to-information bit and power ratio of 0.1,
because we cannot get nf ? Q. With the length of a training session 2L+ 1 = 3 bits, we
show in Figure 6.6 the results for nf = 35 and 85 for fd = 100 and 250Hz respectively,
leading to reduced data sessions with training-to-information bit and power ratio of 0.143
or 0.429. The performance clearly improves and it is better than that of data-dependent
superimposed training, but at the cost of 14.3% or 42.9% reduction in the transmission
rate. It is also seen from Figure 6.6 that superimposed training does not perform well for
fd = 250Hz because of loss of ?information? due to nulling of contribution from information
sequence at a ?large? number of frequencies (related to Q).
172
0 20 40 60 80 100 120 140 160 180 200?50
?45
?40
?35
?30
?25
?20
?15
?10
?5
0
5
fd (Hz)
Normalized Channel MSE (dB)
K=N=1, L=2, T=400, Ts=25?s, TIR=0.1, SNR=25dB, Q=2?fdTTs?+1, P=4,500 runs.
 
 
simulated, ?=1
analytical, ?=1
simulated, ?=0
theoretical, ?=1
simulated ? ?, ?=1
simulated ? ?, ?=1
Figure 6.7: Estimation variance: NCMSE vs fd under SNR = 25dB for comparison be-
tween analytical and simulation-based results of non-data-dependent and data-dependent
superimposed training. (? = 1: non-data-dependent training; ? = 0: total elimination of
self-interference; ?: standard deviation.)
We next consider the performance analysis of our data-dependent superimposed train-
ing scheme. We revise the channel to be with N = 1, L = 2 (3 taps), a uniform power
delay profile and Ts = 25?s. We also omit ?0 in the receiver-end processing. In Figure
6.7, we show the channel MSE versus Doppler spreads where we compare our theoretical
expressions with simulation-based MSE results and ?? bounds. The channel MSE for non-
data-dependent superimposed training is given by (4.8) and (4.35). The agreement is good
between the theoretical and simulations-based results. Note that the discontinuities in the
theoretical curves as a function of fd as we picked Q values based on fd per (2.9b).
173
0 5 10 15 20 25 30?15
?10
?5
0
SNR (dB)
Normalized Channel MSE (dB)
K=N=1, L=4, T=420, Ts=25?s, TIR=0.15, fd=100Hz, P=5, 500 Runs.
 
 
CE?BEM, ?=1
CE?BEM, ?=0
DPS?BEM, ?=1
DPS?BEM, ?=0
DPS?BEM, ?=0.2
Figure 6.8: PDD superimposed training: NCMSE vs SNR for CE- and DPS-BEM-based
estimators, under fd = 100Hz. (? = 1: non-data-dependent training; ? = 0: total elim-
ination of self-interference; ? = 0.2: partial elimination of self-interference at the channel
estimation stage.)
6.4.2 (Partially) Data-Dependent Superimposed Training Using DPS-BEM
We consider a random doubly-selective Rayleigh fading channel. We take N = 1 and
L = 4 (5 taps) with h(n;l) as in (H6.3.5) satisfying the Jakes? model.
We consider a communication system described in Section 2.6.3 with symbol interval
Ts = 25?s, record length T = 420 symbols, and varying Doppler spreads fd in the range
of 0Hz to 200Hz. For fd = 100Hz, the normalized Doppler spread fdTs = 0.0025 and for
fd = 200Hz, fdTs = 0.005. We emphasize that in the simulations the DPS-BEM is used
only for processing at the receiver; the random channels are generated by the Jakes? model,
not the DPS-BEM. In the processing of channel estimation, we select Q, the number of
basis functions, as in Q = ?2fdTsT?+ 1. Using the estimated channel, a Viterbi detector
is used for data symbol detection at the receiver. For the DPS-BEM-based estimator using
174
0 5 10 15 20 25 3010
?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=1, L=4, T=420, Ts=25?s, TIR=0.15, fd=100Hz, P=5, 500 Runs.
 
 
CE?BEM, ?=1
CE?BEM, ?=0
DPS?BEM, ?=1
DPS?BEM, ?=0
DPS?BEM, ?=0.2
Figure 6.9: PDD superimposed training: BER vs SNR for CE- and DPS-BEM-based esti-
mators, under fd = 100Hz. (? = 1: non-data-dependent training; ? = 0: total elimination
of self-interference; ? = 0.2: partial elimination of self-interference at the channel estimation
stage.)
PDD superimposed training, we took k = ?1 (the minimum allowed value) in (6.9), so that
the information loss is comparatively mild.
The additive noise is zero-mean complex white Gaussian. The (receiver) SNR refers
to the energy per bit over one-sided noise spectral density with both information and
superimposed training sequence counting toward the bit energy. Information sequences
are BPSK. We take the superimposed training sequence of period P = L + 1 = 5 as
c(n) = ?cejpin(n+?)/P where ? = 1 if P is odd and ? = 2 if P is even, as in [59]. All
simulated results were based on 500 Monte Carlo runs.
To show the advantage of DPS-BEM over the Fourier-based CE-BEM, we compare the
performance of our proposed channel estimators using the two BEM?s. Figure 6.8 shows the
comparison for normalized channel MSE?s (NCMSE). In an environment of fd = 100Hz,
175
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
10?3
10?2
10?1
100
?
Viterbi detector: K=N=1, L=4, T=420, Ts=25?s, fd=100Hz, SNR=15dB, Q=4, 500 runs.
?
Bit Error Rate
Figure 6.10: PDD superimposed training: BER vs (?,?) under SNR = 15dB and fd =
100Hz.
we had Q = 5 for CE-BEM following (2.9b) and Q = 4 for DPS-BEM. The channel
estimators have identical design parameters except for using different basis models. In
the simulation, the average transmitted power ?2c in c(n) is 0.15 of the power in b(n),
leading to a training-to-information power ratio TIR := ?2c/?2b = ?/(1??) = 0.15 (or
? = 0.13). We consider the CE-BEM-based estimator using superimposed training with
? = 1 (corresponding to the first-order statistics-based estimator in Chapter 3) and ? = 0
(corresponding to the ?fully? data-dependent scheme), and the DPS-BEM-based estimator
using PDD superimposed training with ? = 0, 0.2, and 1. At the receiver, we follow the
data detection scheme described in Section 6.2.2.
It is seen from Figure 6.8 that the data-dependent (? = 0) training offers much lower
estimation variance than that of the ?non-data-dependent? (? = 1) training, whether using
CE- or DPS-BEM, since the self-interference is eliminated or greatly reduced with ? = 0.
176
5 10 15 20 25 300
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
SNR (dB)
Optimum Value
Viterbi detector: K=N=1, L=4, T=420, Ts=25?s, fd=100Hz, Q=4, 500 runs.
 
 
analytical: ?
analytical: ?
simulated: ?
simulated: ?
Figure 6.11: PDD superimposed training: optimum (?,?) vs SNR under fd = 100Hz.
For the same values of ??s, the estimator exploiting DPS-BEM provides better estimation
compared with the CE-BEM-based one, because DPS-BEM has smaller modeling error than
CE-BEM. These conclusions are also confirmed by the BER results in Figure 6.9. Note
that for the DPS-BEM-based estimator with ? = 0.2, although the estimation cannot be
as accurate as the ?completely-data-dependent? (? = 0) case, the BER result is still better
since the ?information? lost in the ?completely-data-dependent? training is now partially
retained.
Figure 6.10 depicts the BER surface as a function of ? (defined in (6.21)) and ?,
for SNR = 15dB and fd = 100Hz. The BER performance varies along ?- and ?-axes.
We selected the coordinate point (??o,??o) corresponding to the minimum value at the BER
surface as the simulation-based optimum value for the given SNR. In Figure 6.11 we compare
the simulation-based optimum value with the ?analytical? optimum value (?o,?o) derived
recursively (with the initial value ? = 0) by (6.23) and (6.24). In Figure 6.11, the analytical
177
0 5 10 15 20 25 30
10?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=1, L=4, T=420, Ts=25?s, TIR=0.15, fd=0Hz, P=5, 500 Runs.
 
 
?=0, step 1
?=0, 3rd iteration
?=1, step 1
?=1, 3rd iteration
?=0.2, step 1
?=0.2, 3rd iteration
TM
Figure 6.12: PDD superimposed training: BER vs SNR under fd = 0Hz (time-invariant).
(TM: time-multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2;
?3rd iteration?: the third DML iteration; ? = 1: non-data-dependent training; ? = 0:
total elimination of self-interference; ? = 0.2: partial elimination of self-interference at the
channel estimation stage.)
and the simulation-based results follow the same trend, and the agreement between them
is good. It is also seen in Figure 6.11 that as the received signal SNR increases, the
optimum ? and ? increase too. Higher ? implies that a higher fraction of the transmitted
power is allocated to training, leading to more accurate channel estimates (with smaller
estimation variance). Intuitively, for higher SNR?s, it pays to achieve more accurate channel
estimates in order to achieve a lower effective noise power ??2w. On the other hand, when the
SNR is low, improving channel estimation does not have much effect on the effective noise
power ??2w. Similar comments apply to changes in optimum ? with SNR. Higher ? implies
lower power in effective noise component (1??)be (n?l) but higher self-interference, hence
higher channel estimation variance ? these two effects need to be counter-balanced. Finally,
178
0 5 10 15 20 25 30
10?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=1, L=4, T=420, Ts=25?s, TIR=0.15, fd=100Hz, P=5, 500 Runs.
 
 
?=0, step 1
?=0, 3rd iteration
?=1, step 1
?=1, 3rd iteration
?=0.2, step 1
?=0.2, 3rd iteration
TM
Figure 6.13: PDD superimposed training: BER vs SNR under fd = 100Hz. (TM: time-
multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2; ?3rd iteration?:
the third DML iteration; ? = 1: non-data-dependent training; ? = 0: total elimination of
self-interference; ? = 0.2: partial elimination of self-interference at the channel estimation
stage.)
observe from Figure 6.10 that the ?bottom? (corresponding to the neighborhood of the
minimum BER point) of the BER surface is rather ?flat?: the BER performance is not
sensitive to changes in ? and ? over a rather ?wide? area around the minimum, so that the
analysis described in Section 6.3.3 provides us an effective means for power allocation and
interference suppression.
The DML approach is now investigated to enhance the channel estimation and data
detection performance. We considered the channels with Doppler spread fd = 0, 100, and
200Hz respectively, with the corresponding number of basis functions Q = 1, 4, and 6.
Note that the channel with fd = 0Hz is time-invariant; this case is considered because here
there are no modeling errors. The ?non-data-dependent? (? = 1) superimposed training
179
0 5 10 15 20 25 3010
?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Viterbi detector: K=N=1, L=4, T=420, Ts=25?s, TIR=0.15, fd=200Hz, P=5, 500 Runs.
 
 
?=0, step 1
?=0, 3rd iteration
?=1, step 1
?=1, 3rd iteration
?=0.2, step 1
?=0.2, 3rd iteration
TM
Figure 6.14: PDD superimposed training: BER vs SNR under fd = 200Hz. (TM: time-
multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2; ?3rd iteration?:
the third DML iteration; ? = 1: non-data-dependent training; ? = 0: total elimination of
self-interference; ? = 0.2: partial elimination of self-interference at the channel estimation
stage.)
was compared with the ?completely? (? = 0) and ?partially? (? = 0.2) data-dependent
scheme. We again set TIR = 0.15 for the superimposed training. At the receiver, DML
iterations follow the data detection scheme we described at Section 6.2.2 (denoted by ?step
1? in the figures). We show NCMSE and BER in Figures 6.12?6.17 where the results of Step
1 and the third iteration only are depicted. It is seen that the DML algorithm significantly
improves the performances.
For the purpose of comparison, the TM training approach described in Appendix A,
originally proposed for CE-BEM as an optimal scheme, is applied to the DPS-BEM case.
We take a training block of 2L + 1 = 9 symbols as braceleftbig0,0,0,0,?2L+1,0,0,0,0bracerightbig, which
follows an information data block of length 60 leading to a frame of 69 symbols. This
180
0 5 10 15 20 25 30?50
?45
?40
?35
?30
?25
?20
?15
?10
?5
SNR (dB)
Normalized Channel MSE (dB)
Viterbi detector: K=N=1, L=4, T=420, Ts=25?s, TIR=0.15, fd=0Hz, P=5, 500 Runs.
 
 
?=0, step 1
?=0, 3rd iteration
?=1, step 1
?=1, 3rd iteration
?=0.2, step 1
?=0.2, 3rd iteration
TM
Figure 6.15: PDD superimposed training: NCMSE vs SNR under fd = 0Hz (time-
invariant). (TM: time-multiplexed training; ?step 1?: the data detection scheme in Section
6.2.2; ?3rd iteration?: the third DML iteration; ? = 1: non-data-dependent training; ? = 0:
total elimination of self-interference; ? = 0.2: partial elimination of self-interference at the
channel estimation stage.)
subblock was repeated over a record length of 414 symbols with a total of 6 subblocks. The
information data is also BPSK and have unit power. Thus, the training-to-information bit
and power ratios are both 0.15 (the amplitude of the single nonzero training bit was picked
to achieve this power ratio). Using the training sequence, we can uniquely determine hq (l)?s
via an LS approach.
By (6.9) note that |?| = P (2Q+2k+1) (k = ?1 in the simulations). For the time-
invariant channel (fd = 0) Q = 1, so that the information contained in the self-interference
part of the information sequence is comparatively small. Therefore, total suppression (? =
0) of the self-interference does not have a significant deleterious effect on the BER, while the
improvement in channel estimation is significant?it is the scenario described by [20] and
181
0 5 10 15 20 25 30?35
?30
?25
?20
?15
?10
?5
0
SNR (dB)
Normalized Channel MSE (dB)
Viterbi detector: K=N=1, L=4, T=420, Ts=25?s, TIR=0.15, fd=100Hz, P=5, 500 Runs.
 
 
?=0, step 1
?=0, 3rd iteration
?=1, step 1
?=1, 3rd iteration
?=0.2, step 1
?=0.2, 3rd iteration
TM
Figure 6.16: PDD superimposed training: NCMSE vs SNR under fd = 100Hz. (TM: time-
multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2; ?3rd iteration?:
the third DML iteration; ? = 1: non-data-dependent training; ? = 0: total elimination of
self-interference; ? = 0.2: partial elimination of self-interference at the channel estimation
stage.)
depicted in Figures 6.12 and 6.15. Exploiting the PDD scheme (? = 0.2) in this case does
not have much impact. All the three schemes (? = 0, 0.2, and 1) after three iterations of
the DML scheme have BER performance in Figure 6.12 similar to that of the TM training.
As the Doppler spread fd increases, we have to employ a larger Q to describe the
channel and the self-interference thus grows. Now total suppression (? = 0) is no longer a
wise option, whereas the PDD scheme still performs well. In Figures 6.13 and 6.14, with
? = 0.2 thePDD scheme is superiorto theother two superimposedtraining-basedschemes in
data detection, whether before or after DML iterations, since the self-interference has been
greatly suppressed while the information loss has been effectively reduced. In Figures 6.16
and 6.17, although the ?completely? data-dependent (? = 0) training has the best channel
182
0 5 10 15 20 25 30?30
?25
?20
?15
?10
?5
0
5
SNR (dB)
Normalized Channel MSE (dB)
Viterbi detector: K=N=1, L=4, T=420, Ts=25?s, TIR=0.15, fd=200Hz, P=5, 500 Runs.
 
 
?=0, step 1
?=0, 3rd iteration
?=1, step 1
?=1, 3rd iteration
?=0.2, step 1
?=0.2, 3rd iteration
TM
Figure 6.17: PDD superimposed training: NCMSE vs SNR under fd = 200Hz. (TM: time-
multiplexed training; ?step 1?: the data detection scheme in Section 6.2.2; ?3rd iteration?:
the third DML iteration; ? = 1: non-data-dependent training; ? = 0: total elimination of
self-interference; ? = 0.2: partial elimination of self-interference at the channel estimation
stage.)
estimation before iterations, the PDD scheme with ? = 0.2 yields better performance after
three iterations, due to the lower BER in the DML approach. The BER performance of
the PDD scheme after several DML iterations is competitive with the TM training, without
incurring any training overhead penalty.
6.5 Conclusions
In this chapter, we presented a data-dependent superimposed training scheme to re-
duce the self-interference in channel estimation. Inspired by the work of [20], we observed
that over a channel satisfying a band-limited BEM such as CE- or DPS-BEM, the periodic
183
superimposed training components within the received signal occur only at certain fre-
quencies. By designing a data-dependent superimposed training sequence that suppresses
the information sequence at those frequencies, the self-interference adversely affecting the
channel estimation at the receiver is greatly reduced. However, the suppressed frequency
components of the information sequence carry ?information? as well. A PDD superimposed
training method was proposed to strike a trade-off between self-interference cancelation and
information integrity. Performance analysis and related optimization of parameters were
also discussed. Computer simulation examples demonstrated that by using PDD superim-
posed training, competitive performance with TM training can be achieved but with no
training overhead penalty.
184
Chapter 7
Direct FIR Linear Equalization of Doubly-Selective Channels Based on
Superimposed Training
7.1 Introduction
Two powerful tools have been exploited so far in our previous chapters on channel
estimation: superimposed training, providing us a means to track the temporal variation
of the channel, and BEM?s, reducing the problem of estimation of a time-varying channel
over a period of time to estimation of time-invariant parameters.
In wireless channels, signal distortion due to multipath propagation or band-limited
transmission may cause ISI at reception. An equalizer is the device to compensate ISI at
the receive end. The noise-free received signal is the convolution between the transmitted
symbols and the impulse response of the channel. Therefore, the equalizer, whose task
is to recover the transmitted symbols, is a deconvolution device. Since a doubly-selective
channel can be well described by BEM?s, we expect that an equalizer, as an inverse of
the channel, can also be well represented by BEM?s. Given the knowledge of the time-
varying channel described by CE-BEM, design of serial time-varying FIR equalizers has
been discussed in [2]. Direct design of time-invariant FIR equalizers based on superimposed
training, for time-invariant channels, has been investigated in [58]. In this chapter, we
investigate direct design of time-varying FIR linear equalizers for doubly-selective channels
using superimposed training and without first estimating the underlying channel response.
We exploit the prior results of [2,58].
185
In Section 7.2, the direct FIR linear equalization using superimposed training and
CE-BEM is discussed. By utilizing user-specific training sequences, this direct equalizer is
extended to a multiple-user wireless ad hoc network in Section 7.3.
7.2 Direct FIR Linear Equalization Using CE-BEM
Consider a time-varying SIMO FIR linear channel withN outputs. Let {s(n)} denote a
scalar sequence which is input to the SIMO time-varying channel with discrete-time impulse
response {h(n;l)} (N-vector channel response at time nto a unit input at timen?l). Then
the symbol-rate, channel output vector is given by (3.1)
x(n) :=
Lsummationdisplay
l=0
h(n;l)s(n?l). (7.1)
The noisy measurements of x(n) are given by
y(n) = x(n) +v(n).
In a CE-BEM representation it is assumed that the channel follows (2.9)
h(n;l) =
?Qsummationdisplay
q=1
hq(l)ej?qn (7.2)
where N-column vectors hq(l) are invariant for the whole block n = 0,1,...,T ?1, and
?Q := 2?fdTTs?+1,
L := ??d/Ts?,
186
?q := 2piT (q? 12 ? Q2 ), q = 1,2,...,Q.
In superimposed training one takes
s(n) = b(n)+c(n),
where {b(n)} is the information sequence and {c(n)} is the training sequence added (super-
imposed) at low power to the information sequence at the transmitter before modulation
and transmission.
Given this channel model, the main problem considered here is: how to design an
equalizer to estimate {b(n)} when one knows only {c(n)} but not (obviously) {b(n)} and
one does not also have (frame) synchronization with {c(n)} at the receiver. We will design
an equalizer to estimate {c(n)} with a delay d. We will then show that this equalizer is a
scaled version of the corresponding equalizer designed to estimate {b(n)} with a delay d.
7.2.1 Time-Varying FIR Equalizers
We will restrict ourselves to serial linear equalizers instead of block linear equalizers,
since as shown in [2], the latter are computationally prohibitive (compared with the former).
We look for a time-varying linear equalizer g(n;l) (l = 0,1,...,Le) over the same time block
as the received data with channel model (7.2). We note that for an arbitrary time-varying
impulse response ?g(n;l), the following is always true
?g(n;l) =
Tsummationdisplay
q=1
?gq(l)ej?qn, n= 0,1,...,T ?1.
187
We would like to use a more parsimonious (but approximate) representation for ?g(n;l),
denoted by g(n;l), given by
g(n;l) =
Qsummationdisplay
q=1
gq(l)ej?qn, n= 0,1,...,T ?1,
where Q?T ?1. In order to estimate the input sequence {s(n)} (see (7.1)), we may seek
a linear time-varying FIR estimator using CE-BEM to yield an estimate with equalization
delay d (0 ?d?Le)
?s(n?d) =
Lesummationdisplay
i=0
gH(n;i)y(n?i).
Existence of a zero-forcing linear equalizer has been discussed in [2]. Their conclusion is that
if N is at least two, then with probability one, one has a zero-forcing solution for sufficiently
large Le and Q. For a linear MMSE solution, existence is not an issue, although MMSE
equalizer performance can be expected to be ?good? if zero-forcing equalizers exist [2]. Here
we will seek a least squares solution g(n;l) to minimize a cost such as
J = 1T
T?1summationdisplay
n=0
|s(n?d)? ?s(n?d)|2.
7.2.2 Linear LS Equalizers Based on CE-BEM
Our algorithm is based on the following model assumptions:
(H7.2.1) The information sequence {b(n)} is zero-mean, i.i.d. (independent and identically
distributed), with E{|b(n)|2} = ?2b.
(H7.2.2) The measurement noise {v(n)} is zero-mean (E{v(n)} = 0), white, independent of
{b(n)}, with E{[v(n+?)][v(n)]H} = ?2vIN?(?).
188
(H7.2.3) The superimposed training sequencec(n) = c(n+P) for allnis a non-random periodic
sequence with period P. Let ?2c := (1/P)summationtextPn=1|c(n)|2.
(H7.2.4) Record length T and period P satisfy TP?1 > ?Q. Moreover, P >L+Le ?d.
As in (3.5), the periodic training sequence of period P can be written as
c(n) =
P?1summationdisplay
m=0
cmej?mn
where ?m := 2pim/P. To design the time-varying linear equalizer to estimate a delayed
version of the training sequence c(n?d) (0 ?d?Le), we have
?c(n?d) =
Lesummationdisplay
i=0
gHd (n;i)y(n?i)
where we assume that
gd(n;i) =
Qsummationdisplay
q=1
gq(i)ej?qn.
Choose gq(i)?s to minimize the time-averaged cost
Jc := 1T
T?1summationdisplay
n=0
|c(n?d)??c(n?d)|2
= 1T
T?1summationdisplay
n=0
vextendsinglevextendsingle
vextendsinglevextendsingle
vextendsinglevextendsinglec(n?d)?
Lesummationdisplay
i=0
Qsummationdisplay
q=1
gHq (i)e?j?qny(n?i)
vextendsinglevextendsingle
vextendsinglevextendsingle
vextendsinglevextendsingle
2
.
By taking the derivative and setting it to be zero, we have
?Jc
?g?q1(i1) = ?
1
T
T?1summationdisplay
n=0
e?j?q1ny(n?i1)
?
?c?(n?d)?
Lesummationdisplay
i=0
Qsummationdisplay
q=1
ej?qnyH(n?i)gq(i)
?
?= 0
189
for i1 = 0,1,...,Le and q1 = 1,2,...,Q. This leads to
Lesummationdisplay
i=0
Qsummationdisplay
q=1
bracketleftBigg
1
T
T?1summationdisplay
n=0
ej(?q??q1)ny(n?i1)yH(n?i)
bracketrightBigg
gq(i)
= 1T
T?1summationdisplay
n=0
c?(n?d)e?j?q1ny(n?i1) =: Rc(q1,i1). (7.3)
To design the time-varying linear equalizer to estimate the information sequenceb(n?d)
(0 ?d?Le), we have
?b(n?d) =
Lesummationdisplay
i=0
?gHd (n;i)y(n?i)
where we assume that
?gd(n;i) =
Qsummationdisplay
q=1
?gq(i)ej?qn.
Choose ?gq(s)?s to minimize
Jb := 1T
T?1summationdisplay
n=0
vextendsinglevextendsingle
vextendsingleb(n?d)??b(n?d)
vextendsinglevextendsingle
vextendsingle
2.
Mimicking the results for the superimposed training sequence-based equalization,
Lesummationdisplay
i=0
Qsummationdisplay
q=1
bracketleftBigg
1
T
T?1summationdisplay
n=0
ej(?q??q1)ny(n?i1)yH(n?i)
bracketrightBigg
?gq(i)
= 1T
T?1summationdisplay
n=0
b?(n?d)e?j?q1ny(n?i1) =: Rb(q1,i1). (7.4)
Comparing (7.3) and (7.4), we see that (ignoring the equalizer coefficients) the left sides
of the two are identical. We now seek to establish that for large T, Rc(q1,i1) = ?Rb(q1,i1)
for all q1,i1, for some scalar ?, so that gq(i) = ??gq(i) for all i.
190
We first consider
Rc(q1,i1)
= 1T
T?1summationdisplay
n=0
c?(n?d)e?j?q1n
braceleftBigg Lsummationdisplay
l=0
h(n?i1;l)[b(n?i1 ?l) +c(n?i1 ?l)] +v(n?i1)
bracerightBigg
= 1T
T?1summationdisplay
n=0
P?1summationdisplay
m1=0
c?m1e?j?m1(n?d)e?j?q1n
?
?
?
?
Lsummationdisplay
l=0
?Qsummationdisplay
q=1
hq (l)ej?q(n?i1)
bracketleftBiggP?1summationdisplay
m2=0
cm2ej?m2(n?i1?l) +b(n?i1 ?l)
bracketrightBigg
+v(n?i1)
?
?
?
=
?Qsummationdisplay
q=1
Lsummationdisplay
l=0
P?1summationdisplay
m1=0
P?1summationdisplay
m2=0
c?m1cm2ej?m1de?j?m2(i1+l)e?j?qi1hq(l)A0
+
?Qsummationdisplay
q=1
Lsummationdisplay
l=0
P?1summationdisplay
m1=0
c?m1ej?m1de?j?qi1hq(l)A1 +
P?1summationdisplay
m1=0
c?m1ej?m1dA2
where
A0 := 1T
T?1summationdisplay
n=0
ej(??m1+?m2??q1+?q)n,
A1 := 1T
T?1summationdisplay
n=0
ej(??m1??q1+?q)nb(n?i1 ?l),
A2 := 1T
T?1summationdisplay
n=0
e?j(?m1+?q1)nv(n?i1).
Under the condition TP?1 > ?Q (then (?m1 +?q1) = (?m2 +?q2) if and only if m1 = m2
and q1 = q2), we have
A0 = ?(m1 ?m2)?(q1 ?q).
191
Furthermore, we have
E
braceleftBig
|A1|2
bracerightBig
= 1T2
T?1summationdisplay
n1=0
T?1summationdisplay
n2=0
ej(??m1??q1+?q)(n1?n2)E{b(n1 ?i1 ?l)b?(n2 ?i1 ?l)}
= 1T2
T?1summationdisplay
n1=0
T?1summationdisplay
n2=0
ej(??m1??q1+?q)(n1?n2)?2b?(n1 ?n2) = ?
2
b
T .
Similarly, it follows that
E
braceleftBig
bardblA2bardbl2
bracerightBig
= 1T2
T?1summationdisplay
n1=0
T?1summationdisplay
n2=0
e?j(?m1+?q1)(n1?n2)vH (n1 ?i1)v(n2 ?i1)
= 1T2
T?1summationdisplay
n1=0
T?1summationdisplay
n2=0
e?j(?m1+?q1)(n1?n2)N?2v?(n1 ?n2)
= N?
2v
T .
In the mean square sense (and thus in probability), we then have the following two limits
lim
T??
A1 m.s.= 0, and lim
T??
A2 m.s.= 0.
Thus for ?large? T, we have
lim
T??
Rc(q1,i1) m.s.=
?Qsummationdisplay
q=1
Lsummationdisplay
l=0
P?1summationdisplay
m1=0
P?1summationdisplay
m2=0
c?m1cm2ej?m1de?j?m2(i1+l)e?j?qi1hq (l)
bracketleftBigg
1
T
T?1summationdisplay
n=0
ej(??m1+?m2??q1+?q)n
bracketrightBigg
192
=
?Qsummationdisplay
q=1
Lsummationdisplay
l=0
P?1summationdisplay
m1=0
P?1summationdisplay
m2=0
c?m1cm2ej?m1de?j?m2(s1+l)e?j?ki1hq (l)?(m1 ?m2)?(q1 ?q)
=
?Qsummationdisplay
q=1
Lsummationdisplay
l=0
P?1summationdisplay
m=0
|cm|2ej?m(d?i1?l)e?j?qi1hk (l)?(q1 ?q).
If the training sequence {c(n)} is periodic white, i.e.,
P?1
P?1summationdisplay
n=0
c(n)c?(n?l) = ?2c?(lmodP),
then
P?1summationdisplay
m=0
|cm|2ej?m(d?i1?l) = ?2c?((d?i1 ?l) modP).
This fact then leads to
lim
T??
Rc(q1,i1) m.s.=
??
?
??
?2ce?j?q1i1hq1((d?i1)mod P) if 1 ?q1 ? ?Q
0 otherwise
(7.5)
for i1 = 0,1,...,Le and q1 = 1,2,...,Q.
Turning to (7.4), we have
Rb (q1,i1) =
?Qsummationdisplay
q=1
Lsummationdisplay
l=0
P?1summationdisplay
m=0
cmhq(l)e?j?m(i1+l)e?j?ki1A3 +
?Qsummationdisplay
q=1
Lsummationdisplay
l=0
hq(l)e?j?qi1A4 +A5
where
A3 := 1T
T?1summationdisplay
n=0
ej(?m??q1+?q)nb?(n?d),
A4 := 1T
T?1summationdisplay
n=0
ej(?q??q1)nb(n?i1 ?l)b?(n?d),
193
A5 := 1T
T?1summationdisplay
n=0
e?j?q1nv(n?i1)b?(n?d).
We can show (as before) that
lim
T??
A3 m.s.= 0,
lim
T??
A5 m.s.= 0.
Consider
A6 := 1T
T?1summationdisplay
n=0
ej(?q??q1)nbracketleftbigb(n?i1 ?l)b?(n?d)??2b?(d?i1 ?l)bracketrightbig.
It then follows that
E
braceleftBig
|A6|2
bracerightBig
= 1T2
T?1summationdisplay
n1=0
T?1summationdisplay
n2=0
ej(?q??q1)(n1?n2)E
braceleftBigvextendsinglevextendsingle
b(n?i1 ?l)b?(n?d)??2b?(d?i1 ?l)vextendsinglevextendsingle2
bracerightBig
= 1T2
T?1summationdisplay
n1=0
T?1summationdisplay
n2=0
ej(?q??q1)(n1?n2)
bracketleftBig
E
braceleftBig
|b(n)|4
bracerightBig
??4b
bracketrightBig
?(n1 ?n2)?(d?i1 ?l)
= 1T
bracketleftBig
E
braceleftBig
|b(n)|4
bracerightBig
??4b
bracketrightBig
?(d?i1 ?l).
Therefore, we have limT??A6 m.s.= 0, and consequently
lim
T??
A4 m.s.= 1T
T?1summationdisplay
n=0
ej(?q??q1)n?2b?(d?i1 ?l)
= ?2b?(d?i1 ?l)?(q1 ?q).
194
Hence, for ?large? T, we have
lim
T??
Rb (q1,i1) m.s.=
?Qsummationdisplay
q=1
Lsummationdisplay
l=0
hq(l)e?j?qi1?2b?(d?i1 ?l)?(q1 ?q).
For i1 = 0,1,...,Le and q1 = 1,2,...,Q but q1 ? ?Q, we therefore have
lim
T??
Rb(q1,i1) m.s.= hq1(d?i1)e?j?q1i1?2b. (7.6)
If P >L+Le?d, then (7.5) equals (7.6) (within a scale factor). Therefore, for ?large?
T, Rc(q1,i1) = ?Rb(q1,i1) for all q1,i1 with ? = ?2c/?2b; hence gq(i) = ??gq(i) for all i.
For the desired linear time-varying equalizer, we execute the following steps:
1. Pick Le and d (= Le2 in the following simulations). Pick Q? ?Q, P >L+Le ?d.
2. Solve (7.3), given data y(n), for gq(i) where 0 ?i?Le and 1 ?q ?Q. Then
gd(n;i) =
Qsummationdisplay
q=1
gq(i)ej?qn.
3. The equalized output is then given by
e1(n) =
Lesummationdisplay
i=0
gHd (n;i)y(n?i) ??1c(n?d) +?2b(n?d) + ?v(n)
where ?v(n) is the equalized noise. Estimate ?1 as
??1 =
1
T
summationtextT?1
n=0 e1(n)c
?(n?d)
1
T
summationtextT?1
n=0 |c(n?d)|
2 =
1
T
summationtextT?1
n=0 e1(n)c
?(n?d)
?2c .
195
4. Define
e2(n) = e1(n)? ??1c(n?d) ??2b(n?d) + ?v(n).
Then we hard-quantize e2(n) to estimate b(n?d).
7.3 Direct FIR Linear Equalization: Multiple Users
Multiple access schemes allow multiple users to share a common channel. Random
access methods provide each user with a flexible way of gaining access to the channel
whenever the user has information (packets) to be sent. In random access, typically when
two packets collide, they are discarded and then have to be retransmitted. In wireless ad
hoc networks (also known as mobile ad hoc networks?MANET?s), absence of base stations
limits the use of traditional media access control (MAC) protocols [34]. In ad hoc networks
one needs some sort of distributed MAC requiring some form of random access which makes
avoiding collisions difficult. Collisions arising from uncoordinated users decrease system
throughput and worsen delay performance. Multiple packet reception (MPR) capability (or
signal separation) is one way to resolve packet collisions and thereby enhance throughput,
by using signal processing to separate multiple received signals [75]. Recently, wireless ad
hoc networks with asynchronous transmissions have been considered in [15,58,60]. The
approaches of [15,58] use user-specific modulation induced cyclostationarity coupled with
receive antenna array to achieve MPR for frequency-selective time-invariant channels. In
[60] user-specific superimposedtraining signals (also called hiddenpilots orimplicit training)
have been used for MPR for frequency-selective time-invariant channels. The objective of
this section is to investigate approaches using user-specific superimposed training signals
196
for MPR in MANET?s for transmissions over doubly-selective channels, with emphasis on
asynchronous networks.
Consider a time-varying MIMO FIR linear channel with K inputs (users) and N out-
puts (receiver array with N elements at the destination node). Let {sk(n)} denote the k-th
user?s information sequence which is input to the MIMO doubly-selective channel with the
k-th user?s discrete-time impulse response {hk(n;l)} (N-vector channel response at time n
to a unit input at time n?l). Consider a typical (one-hop) MANET structure in an asyn-
chronous mode. Assume K active users with a packet length of S symbols, in the coverage
area of the node under evaluation. Each node is equipped with N (? 1) receive antennas
and receiver node processes a data record block of size T (? S) symbols. Various packets
can be located anywhere within this observation block. Using a sliding block approach (as
in [15,60]), we assume that the packet of interest is totally within the observation block.
(An energy detector or related approaches can be used to ensure this [15,60].) The noisy
received (baseband-equivalent, symbol-rate) signal at the node-of-interest at time n is an
N-column vector y(n), n1 ? n ? n1 + T ? 1, given by (n1 is the ?initial? time of the
observation block)
y(n) =
Ksummationdisplay
k=1
Lsummationdisplay
l=0
hk(n;l)sk(n?l)+v(n). (7.7)
In a CE-BEM representation it is assumed that the channel for each user follows (2.9)
hk(n;l) =
?Qsummationdisplay
q=1
hqk(l)ej?qn, k = 1,2,...,K (7.8)
197
where N-column vectors hq(l) are invariant for the whole block n = 0,1,...,T ?1, and
?Q := 2?fdTTs?+1,
L := ??d/Ts?,
?q := 2piT (q? 12 ? Q2 ), q = 1,2,...,Q.
In superimposed training-based approaches, for the k-th user, one takes
sk(n) = bk(n)+ck(n)
where{bk(n)}is the informationsequence and{ck(n)}isa user-specificnon-randomperiodic
training sequence.
The main problem considered here is: How to design an equalizer to estimate {b1 (n)},
the information sequence of user 1 (the desired user), when one knows only {c1 (n)} but
not (obviously) {b1 (n)} and one does not also have (frame) synchronization with {c1 (n)}
at the receiver. We will design an equalizer to estimate {c1 (n)} with a delay d. In a
manner similar to Section 7.2, we will then show that this equalizer is a scaled version
of the corresponding equalizer designed to estimate {b1 (n)} with a delay d provided that
{ck (n)} satisfies certain properties.
198
7.3.1 User-Specific Training Sequences
Each user is assigned (or selects) a user-specific training sequence. The sequences
ck (n) := ?ck exp
bracketleftbigg
j2pi
parenleftbiggn2
?P +?kn
parenrightbiggbracketrightbigg
, (7.9a)
?k := m?1D , m = 1,2,...,D ?K (7.9b)
have been used in [58,60], which are periodic with period P = D?P where D, ?P, and ?ck
are design parameters (D and ?P are coprime). Different users are characterized by different
?k?s and distinct sequences are mutually orthogonal and individually periodic white. (There
is a common code book at each node of size D containing the possible values of ?k. During
the ?initial contact? period, a given node searches for all possible D signals.) In a different
context, as in Section 3.5, we have proposed the user-specific sequences in (3.63)?(3.64) by
using an m-sequence of periodic ?P with P = D?P. These sequences are periodic with period
P, mutually orthogonal and individually ?nearly? periodic-white with period ?P.
Given the knowledge of the time-varying channel described by CE-BEM, we investi-
gate direct design of time-varying FIR linear equalizers for doubly-selective channels using
superimposed training and without first estimating the underlying channel response.
7.3.2 Linear LS Equalizers for the Desired User
We look for a time-varying linear equalizer g(n;l) (l = 0,1,...,Le) over the same time
block as the received data with channel model (7.8). Following the discussions in Section
7.2.2, we assume
g(n;l) =
Qsummationdisplay
q=1
gq(l)ej?qn, n= 0,1,...,T ?1.
199
In order to estimate the input sequence of the desired user (user 1, with no loss of generality)
{s1(n)}(see (7.7)), we may seek a linear time-varying FIR estimator using CE-BEM to yield
an estimate with equalization delay d
?s1(n?d) =
Lesummationdisplay
i=0
gH(n;i)y(n?i).
Similar to Section 7.2, we also seek an LS solution g(n;l) to minimize a cost such as
J = 1T
T?1summationdisplay
n=0
|s1(n?d)? ?s1(n?d)|2.
We first state the underlying model assumptions.
(H7.3.1) The information sequence {bk(n)} is zero-mean, i.i.d. (independent and identically
distributed), with E{|bk(n)|2} = ?2bk. They are also independent across users (k =
1,2,...,K).
(H7.3.2) The measurement noise {v(n)} is zero-mean (E{v(n)} = 0), white, independent of
{bk(n)}, with E{[v(n+?)][v(n)]H} = ?2vIN?(?).
(H7.3.3) The superimposed training sequence ck(n) = ck(n + P) for all n is a non-random
periodic sequence with period P. Let ?2ck := (1/P)summationtextPn=1|ck(n)|2. The sequences are
chosen as in (7.9), or in Section 3.5.
(H7.3.4) Record length T and period P satisfy TP?1 > ?Q and TP?1 is an integer. Moreover,
?P >L+Le ?d where d (? 0) is the desired equalization delay.
200
It then follows that [58,77]
1
P
P?1summationdisplay
n=0
ck (n)c?m (n??) = ?k (?)?
parenleftBig
? mod ?P
parenrightBig
?(k?m)
where
?k (?) =
?
??
??
?2ckej2pik?/D for (7.9)
?2ckej2pi(k?1)?/P for the sequences in Section 3.5.
The periodic training sequence can be written as
ck(n) =
P?1summationdisplay
m=0
ckmej?mn
where ?m := 2pim/P. To design the time-varying linear equalizer to estimate a delayed
version of the desired user?s training sequence c1(n?d) (0 ?d?Le), we have
?c1(n?d) =
Lesummationdisplay
i=0
gHd (n;i)y(n?i)
where we assume that
gd(n;i) =
Qsummationdisplay
q=1
gq(i)ej?qn.
Choose gq(i)?s to minimize the time-averaged cost
Jc := 1T
T?1summationdisplay
n=0
|c1(n?d)??c1(n?d)|2,
201
and by taking the derivative of Jc and setting it to be zero, we have
0 = ?Jc?g?
q1(i1)
= ?1T
T?1summationdisplay
n=0
e?j?q1ny(n?i1)
?
?c?(n?d)?
Lesummationdisplay
i=0
Qsummationdisplay
q=1
ej?qnyH(n?i)gq(i)
?
?
for i1 = 0,1,...,Le and q1 = 1,2,...,Q. This leads to
Lesummationdisplay
i=0
Qsummationdisplay
q=1
bracketleftBigg
1
T
T?1summationdisplay
n=0
ej(?q??q1)ny(n?i1)yH(n?i)
bracketrightBigg
gq(i)
= 1T
T?1summationdisplay
n=0
c?1(n?d)e?j?q1ny(n?i1) =: Rc(q1,i1). (7.10)
To design the time-varying linear equalizer to estimate the desired user?s information
sequence b1(n?d) (0 ?d?Le), we have
?b1(n?d) =
Lesummationdisplay
i=0
?gHd (n;i)y(n?i)
where we assume that
?gd(n;i) =
Qsummationdisplay
q=1
?gq(i)ej?qn.
Choose ?gq(s)?s to minimize
Jb := 1T
T?1summationdisplay
n=0
vextendsinglevextendsingle
vextendsingleb1(n?d)??b1(n?d)
vextendsinglevextendsingle
vextendsingle
2.
202
Mimicking the results for the superimposed training sequence-based equalization,
Lesummationdisplay
i=0
Qsummationdisplay
q=1
bracketleftBigg
1
T
T?1summationdisplay
n=0
ej(?q??q1)ny(n?i1)yH(n?i)
bracketrightBigg
?gq(i)
= 1T
T?1summationdisplay
n=0
b?1(n?d)e?j?q1ny(n?i1) =: Rb(q1,i1). (7.11)
Comparing (7.10) and (7.11), we see that (ignoring the equalizer coefficients) the left
sides of the two are identical. We now seek to establish that for large T, Rc(q1,i1) =
?Rb(q1,i1) for all q1,i1, for some scalar ?, so that gq(i) = ??gq(i) for all i.
Following similar derivations as in Section 7.2.2, by defining
??1 (i) :=
Lsummationdisplay
l=0
??1 (d?i?l)?
parenleftBig
(d?i?l)mod ?P
parenrightBig
we have
lim
T??
Rc(q1,i1) m.s.=
?
??
??
???1 (i1)e?j?q1i1hq1((d?i1)mod ?P) if 1 ?q1 ? ?Q
0 otherwise
(7.12)
for i1 = 0,1,...,Le and q1 = 1,2,...,Q. It is also shown that for i1 = 0,1,...,Le and
q1 = 1,2,...,Q but 1 ?q1 ? ?Q
lim
T??
Rb(q1,i1) m.s.= h1q1(d?i1)e?j?q1i1?2b1. (7.13)
If ?P > L + Le ?d, then (7.12) equals (7.13) (within a scale factor) and ??1 (i1) = ?2c1.
Therefore, for ?large? T, Rc(q1,i1) = ?Rb(q1,i1) for all q1,i1 with ? = ?2c1/?2b1; hence
gq(i) = ??gq(i) for all i.
203
For the desired equalizer design, we execute the following steps:
1. Pick Le and d (= Le2 in the following simulations). Pick Q? ?Q, P >L+Le ?d.
2. Solve (7.10), given data y(n), for gq(i) where 0 ?i?Le and 1 ?q ?Q. Then
gd(n;i) =
Qsummationdisplay
q=1
gq(i)ej?qn.
3. The equalized output is then given by
e1(n) =
Lesummationdisplay
i=0
gHd (n;i)y(n?i) ??1c1(n?d) +?2b1(n?d) + ?v(n)
where ?v(n) is the equalized noise plus multiple-user interference. Estimate ?1 as
??1 =
1
T
summationtextT?1
n=0 e1(n)c
?1(n?d)
1
T
summationtextT?1
n=0 |c1(n?d)|
2 =
1
T
summationtextT?1
n=0 e1(n)c
?1(n?d)
?2c1 .
4. Define
e2(n) = e1(n)? ??1c1(n?d) ??2b1(n?d) + ?v(n).
Then we hard-quantize e2(n) to estimate b1(n?d).
7.4 Simulation Examples
7.4.1 Direct FIR Equalization: Single User
In this example, we consider the direct FIR linear equalization using superimposed
training and CE-BEM. We generate a doubly-selective SIMO Rayleigh fading channel fol-
lowing the Jakes? model withN = 1, 2, and 3, andL = 2 (3 taps) in (7.1). In the simulation,
204
0 5 10 15 20 25 3010
?4
10?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Direct equalizer: fd=0Hz, Ts=25?s, L=2, Le=6, Q=5, P=15, T=405, 500 runs.
 
 
TIR=0.3, N=1
TIR=0.3, N=2
TIR=0.3, N=3
TIR=1.0, N=1
TIR=1.0, N=2
TIR=1.0, N=3
TIR=2.0, N=1
TIR=2.0, N=2
TIR=2.0, N=3
Figure 7.1: Single-user direct FIR equalization: BER vs SNR under fd = 0Hz and length
of equalizer Le = 6 with different TIR and number of receivers.
we pick a data record length of T = 405 symbols (time duration of approximately 10ms).
The communications system described in Section 2.6.3 with Ts = 25?s is employed. We
consider the system operating under different Doppler spreads with the number of basis
functions Q = 5. We choose TIR = 0.3, 1.0, and 2.0, so that the average transmitted power
in {c(n)} can be less, equal, or larger than the power in {b(n)}. The information sequence
{b(n)} is BPSK modulated. We take the superimposed training sequence of period P = 15
as c(n) = ?cejpin(n+?)/P where ? = 1 if P is odd and ? = 2 if P is even, as in [59]?this
sequence is periodic white.
We first consider a single-user scenario. We assume the additive noise {v(n)} is
zero-mean, white complex-Gaussian, uncorrelated with {b(n)} with E{v(n+?)vH(n)} =
?2vIN?(?). The (receiver) SNR refers to the energy per bit over one-sided noise spectral
205
0 5 10 15 20 25 3010
?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Direct equalizer: fd=50Hz, Ts=25?s, L=2, Le=6, Q=5, P=15, T=405, 500 runs.
 
 
TIR=0.3, N=1
TIR=0.3, N=2
TIR=0.3, N=3
TIR=1.0, N=1
TIR=1.0, N=2
TIR=1.0, N=3
TIR=2.0, N=1
TIR=2.0, N=2
TIR=2.0, N=3
Figure 7.2: Single-user direct FIR equalization: BER vs SNR under fd = 50Hz and length
of equalizer Le = 6 with different TIR and number of receivers.
density with both information and superimposed training sequence counting toward the bit
energy. All simulation results are based on 500 Monte Carlo runs.
The BER results for and various SNR?s andfd = 0, 50, and 100Hz are shown in Figures
7.1?7.3 respectively. We can clearly see that more receive antennas (larger N) will surely
improve the reception, since space diversity can be exploited. However, increasing TIR does
not necessarily benefit the performance, since higher TIR leads to more accurate equalizer
taps as well as lower effective SNR due to less power assigned to information. Therefore,
a trade-off has to made in selecting TIR. Generally speaking, higher SNR or more receive
antennas allows for more power allocated to superimposed training, due to less interference
present. We also note that the error floors at the BER curves, which are partially attributed
to the modeling error of the CE-BEM in approximating the inverse of the channel.
206
0 5 10 15 20 25 3010
?3
10?2
10?1
100
SNR (dB)
Bit Error Rate
Direct equalizer: fd=100Hz, Ts=25?s, L=2, Le=6, Q=5, P=15, T=405, 500 runs.
 
 
TIR=0.3, N=1
TIR=0.3, N=2
TIR=0.3, N=3
TIR=1.0, N=1
TIR=1.0, N=2
TIR=1.0, N=3
TIR=2.0, N=1
TIR=2.0, N=2
TIR=2.0, N=3
Figure 7.3: Single-user direct FIR equalization: BER vs SNR under fd = 100Hz and length
of equalizer Le = 6 with different TIR and number of receivers.
7.4.2 Direct FIR Equalization: Multiple Users
In this example, we consider a multiple-user scenario in a wireless ad hoc network. We
set the number of users K = 3. For each user, the channel follows that described in Section
7.4.1, i.e., a doubly-selective SIMO Rayleigh fading channel following Jakes? model with
L = 2 (3 taps). Now we take the numberof receive antennasN = 1, 2, 3, and 4. For different
users, the channel hk (n;l) are mutually independent. We also employ the communications
system described in Section 2.6.3 with Ts = 25?s. In the simulation, we pick a data
record length of T = 832 symbols. We assume the additive noise {v(n)} is zero-mean,
white complex-Gaussian, uncorrelated with {b(n)} with E{v(n + ?)vH(n)} = ?2vIN?(?).
The (receiver) SNR refers to the energy per bit over one-sided noise spectral density with
207
0 5 10 15 20 25 3010
?2
10?1
100
SNR (dB)
Bit Error Rate
Direct equalizer (ad hoc): K=3, fd=100Hz, Ts=25?s, L=2, Le=4, Q=7, P=52, TIR=1.0, T=832, 500 runs.
 
 
N=1
N=2
N=3
N=4
Figure 7.4: Multiple-user direct FIR equalization (ad hoc): BER vs SNR underfd = 100Hz
and length of equalizer Le = 4 with different number of receivers.
both information and superimposed training sequence counting toward the bit energy. All
simulation results are based on 500 Monte Carlo runs.
Information sequences for each user are BPSK. We take the superimposed training
sequences? period P = 52 with D = 4 and ?P = 13 in (7.9). The average transmitted power
in {ck (n)} is equal to the power in {bk (n)}, leading to TIR of 1.0.
In the simulation, we consider an asynchronous case where the observation window fully
contains the desired user?s signal and the other two interfering signals (k = 2,3) occupy
window [tk,tk +T ?1] wheretk is uniformly distributed in [?T +1,T ?1]; tk changes from
run to run.
Figure 7.4 shows the BER results versus SNR?s for a channel with fd = 100Hz. We
take the equalizer length Le = 4, and the number of basis functions Q = 7. More receive
antennas (larger N) enhance the reception significantly. However, due to the presence of
208
0 20 40 60 80 100 120 140 160 180 20010
?3
10?2
10?1
100
fd (Hz)
Bit Error Rate
Direct equalizer (ad hoc): K=3, SNR=25dB, Ts=25?s, L=2, 
Le=4, Q=2?fdTTs?+1, P=52, TIR=1.0, T=832, 500 runs.
 
 
N=1
N=2
N=3
N=4
Figure 7.5: Multiple-user direct FIR equalization (ad hoc): BER vs Doppler spread fd
under SNR = 25dB and length of equalizer Le = 4 with different number of receivers.
MUI and the modeling error of CE-BEM, noticeable error floors can be observed at each
curves. Figure 7.5 exhibits the BER?s for various Doppler spreads. We also take Le = 4, but
Q = ?Q = 2?fdTTs?+ 1, as a function of fd. We can see that gradually BER?s deteriorate
with increasing Doppler spread fd. The curves for BER?s versus the equalizer length Le for
fd = 50Hz are displayed in Figure 7.6. Longer Le may equalize the received signal better,
but more taps add to estimation variance. We can see that Le = 4 is in the neighborhood
of the optimal length. The BER?s in all these figures are rather high, due to MUI. This can
be alleviated by error-correction coding.
209
2 3 4 5 6 7 8 9 1010
?2
10?1
100
Le
Bit Error Rate
Direct equalizer (ad hoc): K=3, fd=50Hz, SNR=25dB, 
Ts=25?s, L=2, Q=4, P=52, TIR=1.0, T=832, 500 runs.
 
 
N=1
N=2
N=3
N=4
Figure 7.6: Multiple-user direct FIR equalization (ad hoc): BER vs length of equalizer Le
under fd = 50Hz and SNR = 25dB with different number of receivers.
7.5 Conclusions
In this chapter, design of doubly-selective linear equalizers for single- and multiple-user
frequency-selective time-varying channels was considered, using superimposed training and
without first estimating the underlying channel response. Assuming that both the time-
varying channel and the linear equalizers can be described by a CE-BEM representation,
we showed that if periodic white superimposed training sequences are used, the optimal
linear equalizer designed to extract the known training sequence was also a scaled version
of the optimal equalizer for the information sequence. Based on this fact, a single-user
direct equalizer was designed. By employing user-specific training sequences, this equalizer
was extended to a multiple-user scenario, which can be used in a wireless ad hoc network.
210
Chapter 8
Concluding Remarks and Future Work
This dissertation considered the issue of channel estimation and equalization and data
detection, using superimposed training and BEM?s, for wireless systems in doubly-selective
channels. With a detailed analysis of the interference from the information sequence in
superimposed training-based methods, our most important contribution is the approaches
to suppress the self-interference.
Typical wireless channels are characterized by time- and frequency-selectivity: Mul-
tipath propagation and limited bandwidth result in frequency selectivity leading to ISI;
temporal variation of the channel is attributed to the relative motion between the trans-
mitter and the receiver, as well as oscillator drifts and phase noises.
In superimposed training-based estimation and equalization, at the transmitter, a pe-
riodic (non-random) training sequence is superimposed (at low power) to the information
sequence, before modulation and transmission. Compared with conventional TM training,
there is no loss in data transmission rate for superimposed training, but some useful power
has been allocated to training sequences.
We described the doubly-selective channel, over a block of symbol intervals, by various
BEM?s (whereas we employed Jakes? model that is independent to BEM?s to generate the
?true? channel in simulations), including CE-, OP-, and DPS-BEM?s, so that estimating
the time-varying channel was reduced to estimating fewer time-invariant parameters.
211
8.1 Summary of Original Work
In Chapter 3, beginning with the first-order statistics-based channel estimator proposed
in [81] that exploits superimposed training and CE-BEM, we modified this estimator to
DPS- and OP-BEM?s so that the spectral leakage induced by CE-BEM can be reduced. The
OP-BEM-based estimator, moreover, can be applied to any BEM representation, and thus
has a more general structure. By utilizing user-specific superimposed training sequences,
we assigned distinct cycle frequencies of the periodic training sequences to distinct users
so that channel estimation across different users is decoupled and the single-user estimator
can be applied to this multiple-user scenario.
Performance analysis of the first-order statistics-based estimator proposed in the pre-
vious chapter was discussed in Chapter 4. Although the modeling error of BEM?s and noise
contribute to the channel estimation mismatch, performance analysis clearly shows that
the major interference using superimposed training comes from the unknown information
sequences. Power allocation and bias-variance trade-off of the first-order statistics-based
estimator are also considered in this chapter, based on the results of performance analysis.
We cast these optimization issues as maximization of an SNR for equalizer design.
How to reduce the information-induced self-interference, was the topic of the following
two chapters. In Chapter 5, by exploiting the fact that training and information sequences
pass through an identical channel, an iterative DML approach was proposed to jointly
improve the channel and sequence estimation. Beginning with the first-order statistics-
based channel estimator, we used the detected data symbols from the preceding iteration to
reduce the self-interference at the current iteration. A local maximum of DML function is
guaranteed by this method. To reduce the computational complexity of the ML detection,
212
symbol detection techniques such as Kalman filtering can also be adopted instead of the
Viterbi algorithm used in the DML approach.
In contrast to the receiver-end DML approach, a data-dependent superimposedtraining
scheme that is a transmitter-end processing technique, was proposed in Chapter 6. Inspired
by the work of [20], we observed that over a channel satisfying a band-limited BEM such as
CE- or DPS-BEM, the periodic superimposed training components within the received sig-
nal occur only at certain frequencies. By designing a data-dependent superimposed training
sequence that suppresses the information sequence at those frequencies, the self-interference
adversely affecting the channel estimation at the receiver is greatly reduced. However, the
suppressed frequency components of the information sequence carry ?information? as well.
Therefore, a PDD superimposed training method was proposed to strike a trade-off be-
tween self-interference cancelation and information integrity. Performance analysis and
related optimization of parameters were also discussed.
In Chapter 7, we considered direct equalization, without first estimating the doubly-
selective channel, using superimposed training and CE-BEM. By exploiting periodic white
training sequences, we showed that the optimal linear equalizer designed to extract the
known training sequence was also a scaled version of the optimal equalizer for the informa-
tion sequence. A direct equalizer was designed based on this fact. By employing user-specific
training sequences, this direct equalizer was extended to a multiple-user scenario, which can
be used in a wireless ad hoc network.
Computer simulation examples illustrated the performances of our approaches, and
compared them with the conventional TM training schemes. The performance of the first-
order statistics-based estimator is inferior to that of the TM training-based estimator, due to
213
the existence of information-induced self-interference. However, when the self-interference
is sufficiently suppressed by the DML approach or the PDD training, superimposed training
can offer competitive performance with TM training, without incurring any data rate loss,
and thus provides us a promising training-based technique at a higher transmission rate.
8.2 Possible Future Directions
So far we have discussed channel estimation and equalization using superimposed train-
ing and various BEM representations. Future work may include the following areas.
First, the capacity of the system employing superimposed training should be well inves-
tigated. In other words, the fundamental question should be answered?how can superim-
posed training help achieve more capacity than TM training? As we discussed in Chapter
1, several researchers have explored this area (e.g., [4,5,8,73], among others) and obtained
important results; however, conclusive statements for more general situations (e.g., over
time- or frequency-selective channel or doubly-selective channel) are still open.
Another potential direction may lie in incorporating superimposed training into other
widelyusedtechniques. For example, inorthogonal frequencydivision multiplexing (OFDM)
systems, frequency-multiplexed training is used where equally spaced pilot (training) tones
enable thereceiver to achieve MMSE estimate ofthe channel[54]. Thisfrequency-multiplexed
training scheme, however, can be viewed as superimposed training in the time domain.
Therefore, how to utilize our results of superimposed training in OFDM systems becomes
an interesting topic.
214
We should also consider a more general training structure?affine precoding with train-
ing [89] that treats the transmitted data block as
s = Fb+c,
where s =
bracketleftbigg
s(0) s(1) ??? s(T ?1)
bracketrightbiggT
is a transmitted block, and c is the training
sequence of the same size; b =
bracketleftbigg
b(0) b(1) ??? b(Tb ?1)
bracketrightbiggT
is the information sequence
of length Tb ? T, and the T ?Tb matrix F is the affine precoder. We can clearly see that
TM and superimposed training can both be viewed as special cases of affine precoding.
In our data-dependent training scheme, our work is equivalent to designing a precoder F
that assign training and information sequences to different dimensions so as to eliminate
self-interference. The information loss of this scheme is due to F not being full-rank. We
hence design a full-rank F that corresponds to the PDD superimposed training. Therefore,
affine precoding can offer us more freedom in designing the communications system, so that
better performance than superimposed or TM training can be expected?it is also a hopeful
area.
215
Bibliography
[1] E. Alameda-Hern?andez, D. C. McLernon, A. G. Orozco-Lugo, M. Lara, andM. Ghogho,
?Synchronisation for superimposed training based channel estimation,? Electron. Lett.,
vol. 41, no. 9, pp. 565?566, Apr. 2005.
[2] I. Barhumi, G. Leus, and M. Moonen, ?Time-varying FIR equalization for doubly
selective channels,? IEEE Trans. Wireless Commun., vol. 4, no. 1, pp. 202?214, Jan.
2005.
[3] E. Biglieri, J. Proakis, and S. Shamai, ?Fading channels: information-theoretic and
communications aspects,? IEEE Trans. Wireless Commun., vol. 44, no. 6, pp. 2619?
2692, Oct. 1998.
[4] P. Bohlin and M. Tapio, ?Optimized data aided training in MIMO systems,? in Proc.
IEEE VTC?04-Spring, Milan, Italy, May 17?19, 2004, pp. 679?683.
[5] ??, ?Performance evaluation of MIMO communication systems based on superim-
posed pilots,? in Proc. IEEE ICASSP?04, vol. 4, Montreal, Canada, May 17?21, 2004,
pp. 425?428.
[6] D. K. Borah and B. D. Hart, ?Frequency-selective fading channel estimation with a
polynomial time-varying channel model,? IEEE Trans. Commun., vol. 47, no. 6, pp.
862?873, June 1999.
[7] R. Bosisio, M. Nicoli, and U. Spagnolini, ?Kalman filter of channel modes in time-
varying wireless systems,? in Proc. IEEE ICASSP?05, vol. 3, Philadelphia, PA, Mar.
18?23 2005, pp. 785?788.
[8] C. Budianu and L. Tong, ?Channel estimation for space-time orthogonal block codes,?
IEEE Trans. Signal Process., vol. 50, no. 10, pp. 2515?2528, Oct. 2002.
[9] N. Chen and G. T. Zhou, ?What is the price paid for superimposedtraining in OFDM,?
in Proc. IEEE ICASSP?04, vol. 4, Montreal, Canada, May 17?21, 2004, pp. 421?424.
[10] ??, ?Superimposed training for OFDM: a peak-to-average power ratio analysis,?
IEEE Trans. Signal Process., vol. 54, no. 6, pp. 2277?2287, June 2006.
[11] W. Chen and R. Zhang, ?Estimation of time and frequency selective channels in OFDM
systems: a Kalman filter structure,? in Proc. IEEE GLOBECOM ?04, vol. 2, Dallas,
TX, Nov. 29?Dec. 3, 2004, pp. 800?803.
216
[12] R. H. Clarke, ?A statistical theory of mobile-radio reception,? Bell Syst. Tech. J.,
vol. 47, no. 6, pp. 957?1000, July-Aug. 1968.
[13] T. Cui and C. Tellambura, ?Superimposed pilot symbols for channel estimation in
OFDM systems,? in Proc. IEEE GLOBECOM?05, San Francisco, CA, Nov. 28?Dec.
2, 2005, pp. 2229?2233.
[14] A. V. Dandawat?e and G. B. Giannakis, ?Asymptotic theory of mixed time averages and
kth-order cyclic-moment and cumulant statistics,? IEEE Trans. Inf. Theory, vol. 41,
no. 1, pp. 216?232, Jan. 1995.
[15] R. Djapic, A.-J. van der Veen, and L. Tong, ?Synchronization and packet separation in
wireless ad hoc networks by known modulus algorithm,? IEEE J. Sel. Areas Commun.,
vol. 23, no. 1, pp. 51?64, Jan. 2005.
[16] M. Dong, L. Tong, and B. M. Sadler, ?Optimal insertion of pilot symbols for trans-
missions over time-varying flat fading channels,? IEEE Trans. Signal Process., vol. 52,
no. 5, pp. 1403?1418, May 2004.
[17] B. Farhang-Boroujeny, ?Pilot-based channel identification: proposal for semi-blind
identification of communication channels,? Electron. Lett., vol. 31, no. 13, pp. 1044?
1046, June 1995.
[18] G. J. Foschini and M. J. Gans, ?On limits of wireless communications in a fading
environment when using multiple antennas,? Wireless Personal Commun., vol. 6, no. 3,
pp. 311?335, Mar. 1998.
[19] M. Ghogho, ?Channel and DC-offset estimation using data-dependent superimposed
training,? Electron. Lett., vol. 41, no. 22, pp. 1250?1251, Oct. 2005.
[20] M. Ghogho, D. McLernon, E. Alameda-Hernandez, and A. Swami, ?Channel estima-
tion and symbol detection for block transmission using data-dependent superimposed
training,? IEEE Signal Process. Lett., vol. 12, no. 3, pp. 226?229, Mar. 2005.
[21] ??, ?SISOand MIMO channel estimation and symboldetection using data-dependent
superimposed training,? in Proc. IEEE ICASSP?05, vol. 3, Philadelphia, PA, Mar. 18?
23 2005, pp. 461?464.
[22] M. Ghogho and A. Swami, ?Optimal training for affine-precoded and cyclic-prefized
block transmissions,? in Proc. IEEE Workshop Statistical Signal Processing, Bordeaux,
France, July 17?20, 2005, pp. 1358?1363.
[23] G. B. Giannakis, Y. Hua, P. Stoica, and L. Tong, Eds., Signal Processing Advances in
Wireless and Mobile Communications Volume 1: Trends in Channel Estimation and
Equalization. Englewood Cliffs, NJ: Prentice Hall, 2001.
217
[24] G. B. Giannakis and C. Tepedelenlio?glu, ?Basis expansion models and diversity tech-
niques for blind identification and equalization of time-varying channels,? Proc. IEEE,
vol. 86, no. 10, pp. 1969?1986, Oct. 1998.
[25] A. Gorokhov and P. Loubaton, ?Semi-blind second order identification of convolutive
channels,? in Proc. IEEE ICASSP?97, vol. 5, Munich, Germany, Apr. 21?24 1997, pp.
3905?3908.
[26] F. J. Harris, ?On the use of windows for harmonic analysis with the discrete Fourier
transform,? Proc. IEEE, vol. 66, no. 1, pp. 51?83, Jan. 1978.
[27] S. He and J. K. Tugnait, ?Direct equalization of multiuser doubly selective channels
based on superimposed training,? in Proc. EUSIPCO?06, Florence, Italy, Sept. 4?8,
2006.
[28] ??, ?Doubly-selective channel estimation using superimposed training and discrete
prolate spheroidal basis models,? in Proc. IEEE GLOBECOM?06, San Francisco, CA,
Nov. 27?Dec. 1, 2006.
[29] ??, ?On bias-variance trade-off in superimposedtraining-based doublyselective chan-
nel estimation,? in Proc. 2006 Conf. Inform. Sciences & Syst., Princeton University,
NJ, Mar. 22?24, 2006, pp. 1308?1313.
[30] ??, ?Doubly-selective multiuser channel estimation using superimposed training and
discrete prolate spheroidal basis expansion models,? in Proc. IEEE ICASSP?07, vol. 2,
Honolulu, HI, Apr. 15?20, 2007, pp. 861?864.
[31] ??, ?Self-interference suppression in doulby-selective channel estimation using super-
imposed training,? in Proc. IEEE ICC?07, Glasgow, UK, June 24?28, 2007.
[32] S. He, J. K. Tugnait, and X. Meng, ?On superimposed training for MIMO channel
estimation and symbol detection,? IEEE Trans. Signal Process., vol. 55, no. 6, June
2007.
[33] P. Hoeher and F. Tufvesson, ?Channel estimation with superimposed pilot sequence,?
in Proc. IEEE GLOBECOM?99, Rio de Janeiro, Dec. 5?9, 2006, pp. 2162?2166.
[34] IEEE J. Sel. Areas Commun., Special Issue on Wireless Ad Hoc Networks, vol. 23,
Jan. 2005.
[35] W. C. Jakes, Microwave Mobile Communications. New York, NY: Wiley, 1974.
[36] C. E. Kastenholz and W. P. Birkemeier, ?A simultaneous information transfer and
channel sounding modulation technique for wide-band channels,? IEEE Trans. Com-
mun., vol. 13, no. 2, pp. 162?165, June 1965.
218
[37] X. Li and T. F. Wong, ?Turbo equalization with nonlinear Kalman filtering for time-
varying frequency-selective fading channels,? IEEE Trans. Wireless Commun., vol. 6,
no. 2, pp. 691?700, Feb. 2007.
[38] Z. Liu, X. Ma, and G. B. Giannakis, ?Space-time coding and Kalman filtering for
time-selective fading channels,? IEEE Trans. Commun., vol. 50, no. 2, pp. 183?186,
Feb. 2002.
[39] X. Ma and G. B. Giannakis, ?Maximum-diversity transmissions over doubly selective
wireless channels,? IEEE Trans. Inf. Theory, vol. 49, no. 7, pp. 1832?1840, July 2003.
[40] X. Ma, G. B. Giannakis, and S. Ohno, ?Optimal training for block transmissions over
doubly selective channels,? IEEE Trans. Signal Process., vol. 51, no. 5, pp. 1351?1366,
May 2003.
[41] X. Ma, L. Yang, and G. B. Giannakis, ?Optimal training for MIMO frequency-selective
fading channels,? IEEE Trans. Wireless Commun., vol. 4, no. 2, pp. 453?466, Mar.
2005.
[42] J. H. Manton, I. Y. Mareels, and Y. Hua, ?Affine precoders for reliable communica-
tions,? in Proc. IEEE ICASSP?00, Istanbul, Turkey, June 5?9, 2000, pp. 2749?2752.
[43] M. Martone, ?Wavelet-based separating kernels for array processing of cellular
DS/CDMA signals in fast fading,? IEEE Trans. Commun., vol. 48, no. 6, pp. 979?
995, June 2000.
[44] F. Mazzenga, ?Channel estimation and equalization for m-QAM transmission with a
hidden pilot sequence,? IEEE Trans. Broadcast., vol. 46, no. 2, pp. 170?176, June 2000.
[45] D. C. McLernon, A. G. Orozco-Lugo, and M. M. Lara, ?On the structural equivalence
of two recent algorithms for implicitly trained channel estimation,? in Proc. IEEE Int.
Symposium Signal Processing and Inform. Technol., Rome, Italy, Dec. 18?21 2004, pp.
132?135.
[46] X. Meng, ?Estimation of wireless communications channels using superimposed train-
ing: Approaches, analysis and applications,? Ph.D. dissertation, Auburn University,
Auburn, AL, May 2005.
[47] X. Meng and J. K. Tugnait, ?Doubly-selective MIMO channel estimation using super-
imposed training,? in Proc. IEEE Sensor Array and Multichannel Signal Processing
Workshop, Barcelona, Spain, July 18?21, 2004, pp. 407?411.
[48] ??, ?MIMO channel estimation using superimposedtraining,? in Proc. IEEE ICC?04,
Paris, France, June 20?24, 2004, pp. 2663?2667.
[49] ??, ?Semi-blind channel estimation and detection using superimposed training,? in
Proc. IEEE ICASSP?04, vol. 4, Montreal, Canada, May 17?21, 2004, pp. 417?420.
219
[50] ??, ?Semi-blind time-varying channel estimation using superimposed training,? in
Proc. IEEE ICASSP?04, vol. 3, Montreal, Canada, May 17?21, 2004, pp. 797?800.
[51] ??, ?Superimposed training-based doubly-selective channel estimation using expo-
nential and polynomial bases models,? in Proc. 2004 Conf. Inform. Sciences & Syst.,
Princeton University, NJ, Mar. 17?19, 2004.
[52] X. Meng, J. K. Tugnait, and S. He, ?Iterative joint channel estimation and data detec-
tion using superimposed training: algorithms and performance analysis,? IEEE Trans.
Veh. Technol., to be published.
[53] T. K. Moon and W. C. Stirling, Mathematical Methods and Algorithms for Signal
Processing. Upper Saddle River: NJ: Prentice Hall, 2000.
[54] R. Negi and J. Cioffi, ?Pilot tone selection fro channel estimation in a mobile OFDM
system,? IEEE Trans. Consum. Electron., vol. 44, no. 3, pp. 1122?1128, Aug. 1998.
[55] M. Nied?zwiecki, Identification of Time-Varying Processes. New York, NY: Wiley,
2000.
[56] S. Ohno and G. B. Giannakis, ?Optimal training and redundant precoding for block
transmissions with application to wireless OFDM,? IEEE Trans. Commun., vol. 50,
no. 12, pp. 2113?2123, Dec. 2002.
[57] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal Processing,
2nd ed. Englewood Cliffs: NJ: Prentice Hall, 1999.
[58] A. G. Orozco-Lugo, G. M. Galvan-Tejada, M. M. Lara, and D. C. McLernon, ?A new
approach to achieve multiple packet reception for ad hoc networks,? in Proc. IEEE
ICASSP?04, vol. 4, Montreal, Canada, May 17?21, 2004, pp. 429?432.
[59] A. G. Orozco-Lugo, M. M. Lara, and D. C. McLernon, ?Channel estimation using
implicit training,? IEEE Trans. Signal Process., vol. 52, no. 1, pp. 240?254, Jan. 2004.
[60] A. G. Orozco-Lugo, M. M. Lara, D. C. McLernon, and H. J. Muro-Lemus, ?Multi-
ple packet reception in wireless ad hoc networks using polynomial phase-modulating
sequences,? IEEE Trans. Signal Process., vol. 51, no. 8, pp. 2093?2110, Aug. 2003.
[61] A. Paulraj, R. Nabar, and D. Gore, Introduction to Space-Time Wireless Communica-
tions. Cambridge, UK: Cambridge University Press, 2003.
[62] H. V. Poor and S. Verd?u, ?Probability of error in MMSE multiuser detection,? IEEE
Trans. Inf. Theory, vol. 43, no. 3, pp. 858?871, May 1997.
[63] M. F. Pop and N. C. Beaulieu, ?Limitations of sum-of-sinusoids fading channel simu-
lators,? IEEE Trans. Commun., vol. 49, no. 4, pp. 699?708, Apr. 2001.
220
[64] J. G. Proakis, Digital Communications, 4th ed. New York, NY: McGraw-Hill, 2001.
[65] J. G. Proakis and D. G. Manolakis, Digital Signal Processing, 3rd ed. Upper Saddle
River, NJ: Prentice Hall, 1996.
[66] T. S. Rappaport, Wireless Communications: Principles and Practice, 2nd ed. Upper
Saddle River, NJ: Prentice Hall, 2002.
[67] A. M. Sayeed and B. Aazhang, ?Joint multipath-Doppler diversity in mobile wireless
communications,? IEEE Trans. Commun., vol. 47, no. 1, pp. 123?132, Jan. 1999.
[68] N. Seshadri, ?Joint data and channel estimation using blind trellis search techniques,?
IEEE Trans. Commun., vol. 42, no. 2-4, pp. 1000?1011, Feb.?Apr. 1994.
[69] D. Slepian, ?Prolate spheroidal wave functions, Fourier analysis, and uncertainty?V:
the discrete case,? Bell Syst. Tech. J., vol. 57, no. 5, pp. 1371?1430, May-June 1978.
[70] M. D. Srinath, P. K. Rajasekaran, and R. Viswanathan, Introduction to Statistical
Signal Processing with Applications. Upper Saddle River, NJ: Prentice Hall, 1996.
[71] G. L. St?uber, Principles of Mobile Communication, 2nd ed. Boston, MA: Kluwer,
2002.
[72] Z. Tang, R. C. Cannizzaro, G. Leus, and P. Banelli, ?Pilot-assisted time-varying chan-
nel estimation for OFDM systems,? IEEE Trans. Signal Process., vol. 55, no. 5, pp.
2226?2238, May 2007.
[73] M. Tapio and P. Bohlin, ?A capacity comparison between time-multiplexed and super-
imposed pilots,? in Proc. 38th Asilomar Conf. Signals, Syst., Comput., Pacific Grove,
CA, Nov. 7?10, 2004, pp. 1049?1053.
[74] L. Tong, B. M. Sadler, and M. Dong, ?Pilot-assisted wireless transmissions: genearal
model, design criteria, and signal processing,? IEEE Signal Process. Mag., vol. 21,
no. 6, pp. 12?25, Nov. 2004.
[75] L. Tong, Q. Zhao, and G. Mergen, ?Multipacket reception in random access wireless
networks: from signal processing to optimal medium access control,? IEEE Commun.
Mag., vol. 39, no. 11, pp. 108?112, Nov. 2001.
[76] F. Tsuzuki and T. Ohtsuki, ?Channel estimation with selective superimposed pilot
sequences under fast fading environments,? in Proc. IEEE VTC?04-Fall, Los Angeles,
CA, Sept. 26?29, 2004, pp. 62?66.
[77] J. K.Tugnait and S.He, ?Performance analysis of anMIMO channel estimator basedon
superimposed training and first-order statistics,? in Proc. IEEE Workshop Statistical
Signal Processing, Bordeaux, France, July 17?20, 2005, pp. 1336?1341.
221
[78] ??, ?Direct FIR linear equalization of doubly selective channels based on superim-
posed training,? in Proc. IEEE ICASSP?06, vol. 4, Toulouse, France, May 14?19 2006,
pp. 589?592.
[79] ??, ?Doubly-selective channel estimation using data-dependent superimposed train-
ing and exponential bases models,? in Proc. 2006 Conf. Inform. Sciences & Syst.,
Princeton University, NJ, Mar. 22?24, 2006, pp. 375?380.
[80] J. K. Tugnait, S. He, and X. Meng, ?On superimposed-training power allocation for
time-varying channel estimation,? in Proc. IEEE Workshop Statistical Signal Process-
ing, Bordeaux, France, July 17?20, 2005, pp. 1330?1335.
[81] J. K. Tugnait and W. Luo, ?On channel estimation using superimposed training and
first-order statistics,? in Proc. IEEE ICASSP?03, vol. 4, Hong Kong, Apr. 6?10, 2003,
pp. 624?627.
[82] ??, ?On channel estimation using superimposed training and first-order statistics,?
IEEE Commun. Lett., vol. 8, no. 9, pp. 413?415, Sept. 2003.
[83] J. K. Tugnait and X. Meng, ?Synchronization of superimposed training for channel
estimation,? in Proc. IEEE ICASSP?04, vol. 4, Montreal, Canada, May 17?21, 2004,
pp. 853?856.
[84] ??, ?Performance analysis and training power allocation for channel estimation using
superimposed training,? in Proc. IEEE ICASSP?05, vol. 3, Philadelphia, PA, Mar. 18?
23 2005, pp. 457?460.
[85] ??, ?On superimposed training for channel estimation: performance analysis, train-
ing power allocation, and frame synchronization,? IEEE Trans. Signal Process., vol. 54,
no. 2, pp. 752?765, Feb. 2006.
[86] J.K. Tugnait, X. Meng, andS. He, ?Doubly-selective channel estimation usingsuperim-
posed training and exponential bases models,? EURASIP J. Applied Signal Processing
(Special Issue on Reliable Communications over Rapidly Time-Varying Channels), vol.
2006, Article ID 85303, 11 pages, 2006.
[87] J. K. Tugnait, L. Tong, and Z. Ding, ?Single-user channel estimation and equalization,?
IEEE Signal Process. Mag., vol. 17, no. 3, pp. 17?28, May 2000.
[88] S. Verd?u, ?Minimum probability of error for asynchronous Gaussian multiple-access
channels,? IEEE Trans. Inf. Theory, vol. 32, no. 1, pp. 85?96, Jan. 1986.
[89] A. Vosoughi and A. Scaglione, ?Everything you always wanted to know about training:
guidelines derived using the affine precoding framework and the CRB,? IEEE Trans.
Signal Process., vol. 54, no. 3, pp. 940?954, Mar. 2006.
222
[90] H. S. Wang and P.-C. Chang, ?On verifying the first-order Markovian assumption for
a Rayleigh fading channel model,? IEEE Trans. Veh. Technol., vol. 45, no. 2, pp.
353?357, May 1996.
[91] X. Wang and H. V. Poor, Wireless Communication Systems: Advanced Techiques for
Signal Reception. Upper Saddle River, NJ: Prentice Hall, 2004.
[92] J. J. Werner, J. Yang, D. D. Harman, and G. A. Dumont, ?Blind equalization for
broadband access,? IEEE Commun. Mag., vol. 37, no. 4, pp. 87?93, Apr. 1999.
[93] Q. Yang and K. S. Kwak, ?Superimposed-pilot-aided channel estimation for mobile
OFDM,? Electron. Lett., vol. 42, no. 12, pp. 73?74, June 2006.
[94] T. Zemen and C. F. Mecklenbr?auker, ?Time-variant channel estimation using discrete
prolate spheroidal sequences,? IEEE Trans. Signal Process., vol. 53, no. 9, pp. 3597?
3607, Sept. 2005.
[95] Q. Zhao and L. Tong, ?Semi-blind equalization by least squares smoothing,? in Proc.
32nd Asilomar Conf. Signals, Syst., Comput., Pacific Grove, CA, Nov. 1?4, 1998, pp.
645?649.
[96] Y. R. Zheng and C. Xiao, ?Simulation models with correct statistical properties for
Rayleigh fading channels,? IEEE Trans. Commun., vol. 51, no. 6, pp. 920?928, June
2003.
[97] G. T. Zhou and N. Chen, ?Superimposed training for doubly selective channels,? in
Proc. IEEE Workshop Statistical Signal Processing, St. Louis, MO, Sept. 28?Oct. 1,
2003, pp. 82?85.
[98] G. T. Zhou, M. Viberg, and T. McKelvey, ?A first-order statistical method for channel
estimation,? IEEE Signal Process. Lett., vol. 10, no. 3, pp. 57?60, Mar. 2003.
223
Appendices
224
Appendix A
Optimal Time-Multiplexed Training for Block Transmissions over
Doubly-Selective Channels [40]
Here we summarize the optimal TM training proposed in [40], based on CE-BEM
representations, which acts as a ?reference? training scheme in evaluating our superimposed
training-based approaches.
In [40], the authors made the following model assumptions:
(HA.1) The channel satisfies CE-BEM, i.e., (2.9).
(HA.2) The delay spread ?d and the Doppler spread fd are bounded, known (or at least
their bounds are known), and satisfy 2fd?d < 1.
(HA.3) The coefficients {hq(l)} are zero-mean complex Gaussian random variables, inde-
pendent one another, and remain invariant per block but are allowed to change at the
next block.
Under the above assumptions, the authors sought to design a TM training scheme that
optimizes channel MSEandergodic (average) capacity boundsto jointly account notonly for
channel estimation performance but also for transmission rate. The optimal block structure
s consists of sub-blocks of training and sub-blocks of information, which are transmitted
alternately:
s =
bracketleftbigg
bT0 cT0 ??? bTP?1 cTP?1
bracketrightbiggT
where bp and cp denote the p-th information and training sub-blocks respectively (p =
0,1,...,P?1). The optimal training sequence contains an impulse guarded by zeros (silent
225
Parameters Optimal Training
Placement of information symbols Equally long information sub-blocks (length ?Tb)
Placement of training symbols Equally long training sub-blocks
Structure of training sub-blocks cp =bracketleftbig0TL c 0TLbracketrightbigT , ?p
Number of training symbols 2L+1 per sub-block
Number of sub-blocks Q training and Q information sub-blocks
Power Allocation ? = 1/(1 +radicalbig(L+1)/?Tb)
Table A.1: Optimal TM training.
period). The details are shown in Table A.1, where ?Tb denotes the length of the information
sub-block, c is a scalar number denoting the training impulse, and ? denotes the training-
to-information power ratio (TIR).
We also apply this TM training structure for OP- and DPS-BEM?s. It is not known
whether it is optimal for these BEM?s, but it can still act as a good reference when studying
our superimposed training schemes.
For a channel satisfying a BEM (2.20), the noisy channel output is given by
y(n) =
Lsummationdisplay
l=0
Qsummationdisplay
q=1
hq (l)?q (n)s(n?l) +v(n),
where v(n) is additive white Gaussian noise (AWGN) with zero-mean and variance ?2v. At
the time slots np,l := ?Tb+L+pparenleftbig?Tb +2L+1parenrightbig+l (l = 0,1,...L), the received signal depends
only on the channel and the training impulse:
y(np,l) = c
Qsummationdisplay
q=1
hq (l)?q (np,l)+v(np,l). (A.1)
226
We define
yc (l) =
bracketleftbigg
y(n0,l) y(n1,l) ??? y(nP?1,l)
bracketrightbiggT
, (A.2a)
vc (l) =
bracketleftbigg
v(n0,l) v(n1,l) ??? v(nP?1,l)
bracketrightbiggT
, (A.2b)
?c (l) =
?
??
??
??
??
??
?1 (n0,l) ??? ?Q (n0,l)
?1 (n1,l) ??? ?Q (n1,l)
... ... ...
?1 (nP?1,l) ??? ?Q (nP?1,l)
?
??
??
??
??
??
, (A.2c)
h(l) =
bracketleftbigg
h1 (l) h2 (l) ... hQ (l)
bracketrightbiggT
,
then by (A.1)
yc (l) = c?c (l)h(l)+vc (l).
The LS estimator of h(l) is given by
?hLS (l) = 1
c?
?
c (l)yc (l), (A.3)
and the linear MMSE estimator is given by
?hMMSE (l) = c
?2v
bracketleftbigg
R?1h (l)+ c
2
?2v?
H
c (l)?c (l)
bracketrightbigg?1
?Hc (l)yc (l) (A.4)
that requires that Rh (l) := Ebraceleftbigh(l)hH (l)bracerightbig is known at the receiver.
227
This optimal TM training can be easily extended to an SIMO channel, since the above
procedures can be carried out for each independent output. For a multiple-user time-
invariant channel, an extension of this scheme was suggested in [41] for MIMO frequency-
selective fading channels. We extend this scheme to a doubly-selective MIMO channel in a
way similar to [40], even though we have no clues about its optimality.
For a multiple-user channel with K users,
y(n) =
Ksummationdisplay
k=1
Lsummationdisplay
l=0
hk(n;l)sk(n?l)+v(n),
where {sk(n)} denotes the k-th user?s information sequence and the corresponding channel
is denoted by {hk(n;l)}. We assume that the channels satisfy BEM representation, i.e.,
hk (n;l) =
Qsummationdisplay
q=1
hqk (l)?q (n).
The channel output can be expressed as
y(n) =
Ksummationdisplay
k=1
Lsummationdisplay
l=0
Qsummationdisplay
q=1
hqk (l)?q (n)sk(n?l) +v(n).
We design the training sub-block of the k-th user (k = 1,2,...,K) as
ck,p =
bracketleftbigg
0T(k?1)(L+1)+L c 0T(K?k)(L+1)+L
bracketrightbiggT
with length of K(L+1) +L symbols. At the time slots
nk,p,l := ?Tb +(k?1)(L+1) +L+pbracketleftbig?Tb +K(L+1) +Lbracketrightbig+l,
228
the received signal depends only on the training impulse and the k-th user?s channel. The
channels for different users can thus be decoupled. We have
yc (k,l) = c?c (k,l)hk (l) +vc (k,l),
where yc (k,l), ?c (k,l), and vc (k,l) are defined as in (A.2), only with np,l replaced with
nk,p,l, and
hk (l) =
bracketleftbigg
h1k (l) h2k (l) ... hQk (l)
bracketrightbiggT
.
Then similar to (A.3) and (A.4), we have the LS estimator
?hk,LS (l) = 1
c?
?
c (k,l)yc (k,l), (A.5)
and the linear MMSE estimator
?hk,MMSE(l) = c
?2v
bracketleftbigg
R?1h (k,l) + c
2
?2v?
H
c (k,l)?c (k,l)
bracketrightbigg?1
?Hc (k,l)yc (k,l) (A.6)
if Rh (k,l) := Ebraceleftbighk (l)hHk (l)bracerightbig is known at the receiver.
229
Appendix B
Symbol Detection
The role of channel estimation is to aid in extracting the desired information data from
the distorted received symbols. Two symbol detection techniques are reviewed: Viterbi
detector and Kalman filter.
B.1 Maximum Likelihood Sequence Detector (Viterbi Detector) [64]
Consider an SIMO FIR linear channel with N outputs and discrete-time impulse re-
sponse {h(n;l)}. Let {s(n)} denote the input sequence to the SIMO channel. The channel
output is given by
x(n) =
Lsummationdisplay
l=0
h(n;l)s(n?l), (B.1)
and the noisy measurement is given by
y(n) = x(n)+v(n) (B.2)
where v(n) is the AWGN. We assume:
(HB.1) The {v(n)} is uncorrelated with {s(n)}, with possible unknown mean E{v(n)} = m
and E{[v(n+?)?m][v(n)?m]H} = ?2vIN?(?).
Given {s(n)}, {y(n)} is a sequence of N-dimensional Gaussian random vectors with
mean summationtextLl=0h(n;l)s(n?l)+m and variance ?2vIN. The joint pdf of y(n) given {s(n),s(n?
230
1),...,s(n?L)} is
p(y(n)|s(n),...,s(n?L)) = 1(pi?
v)N
exp
?
?
??
1
?2v
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddoubley(n)?
Lsummationdisplay
l=0
h(n;l)s(n?l)?m
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddouble
2??
?
where s(n) = 0 for n< 0. The joint pdf of the random vectors y(0), y(1), ..., y(T ?1)
given the transmitted sequence s(0), s(1), ..., s(T ?1) is
p(y(0),...,y(T ?1)|s(0),...,s(T ?1))
= 1(pi?
v)NT
exp
?
?
??
1
?2v
T?1summationdisplay
n=0
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddoubley(n)?
Lsummationdisplay
l=0
h(n;l)s(n?l)?m
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddouble
2??
?.
Taking the logarithm on both sides of the equation above, we have
log p(y(0),...,y(T ?1)|s(0),...,s(T ?1))
= ?NT log(pi?v)? 1?2
v
T?1summationdisplay
n=0
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddoubley(n)?
Lsummationdisplay
l=0
h(n;l)s(n?l)?m
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddouble
2
.
The maximum likelihood (ML) estimate of the input sequence {s(0), ..., s(T ?1)} is the
one that maximizes
p(y(0),...,y(T ?1) |s(0),...,s(T ?1)),
or equivalently maximizes
log p(y(0),...,y(T ?1) |s(0),...,s(T ?1)),
231
or minimizes the Euclidean distance
T?1summationdisplay
n=0
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddoubley(n)?
Lsummationdisplay
l=0
h(n;l)s(n?l)?m
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddouble
2
.
This ML sequence estimation (MLSE) criterion is equivalent to the problem of estimating
the state of a discrete-time ?finite-state machine?. In this case, the finite-state machine is
the discrete-time channel with coefficients {h(n;l)} and its state at any time instance n is
represented by the L most recent input symbols
state(n) = (s(n),s(n?1),...,s(n?L+1)) (B.3)
where s(n) = 0 for n < 0. If the input symbols are M-ary, the finite-state machine has
ML states. Consequently, the channel is described by an ML-state trellis and the Viterbi
algorithm may be used to determine the most probable path through the trellis. In brief,
we describe the Viterbi algorithm as the following three steps:
Step 1. We begin with y(L), from which we compute the ML+1 metrics
Lsummationdisplay
n=0
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddoubley(n)?
Lsummationdisplay
l=0
h(n;l)s(n?l)?m
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddouble
2
. (B.4)
TheML+1 possible sequences are divided intoML groups according to theML states.
From each group, we pick the one with the minimum metric, i.e., the most probable
sequence, and assign to the surviving sequence the metric
PM0(s(L),...,s(1)) = min
s(0)
?
?
?
Lsummationdisplay
n=0
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddoubley(n)?
Lsummationdisplay
l=0
h(n;l)s(n?l)?m
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddouble
2 ??
?. (B.5)
232
The M ?1 remaining sequences from each of the ML groups are discarded.
Step 2. Upon reception of y(L+n), n? 1, compute the ML+1 metrics
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddoubley(L+n)?
Lsummationdisplay
l=0
h(n;l)s(L+n?l)?m
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddouble
2
+PMn?1 (s(L+n?1),...,s(n)). (B.6)
Again, the ML+1 sequences are divided into ML groups corresponding to the ML
possible state (s(L+n?1),s(L+n?2),...,s(n)) and the most probable sequence
from each group is selected while the other M ? 1 sequences are discarded. The
surviving metrics are
PMn (s(L+n),...,s(n+1))
= min
s(n)
?
?
?
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddoubley(L+n)?
Lsummationdisplay
l=0
h(n;l)s(L+n?l)?m
vextenddoublevextenddouble
vextenddoublevextenddouble
vextenddouble
2
+PMn?1 (s(L+n?1),...,s(n))}. (B.7)
Step 3. If y(L + n) is the last received sample, from the ML survivor sequences, pick
the one as the ML sequence estimator which has the minimum metric; otherwise, set
n = n+1 and then go to step 2.
In a multiple-user (of K users) context, the noisy channel input-output is given by
y(n) =
Ksummationdisplay
k=1
Lsummationdisplay
l=0
hk(n;l)sk(n?l)+v(n), (B.8)
where {sk(n)} denotes the k-th user?s information sequence and {hk(n;l)} denotes the k-th
user?s channel impulse response. The state at time n is now represented by the L most
233
recent input symbols of all the K users, i.e.,
state(n) = (s1(n),...,s1(n?L+1),...,s2(n?L+1),...,sK(n?L+1)) (B.9)
where sk(n) = 0 for n < 0. The channel is now described by an MKL-state trellis.
To adapt the Viterbi algorithm in for multiple users, we simply replace the state (B.3)
with (B.9), use M? = MK instead of M in all the three steps, and substitute the sum
summationtextK
k=1
summationtextL
l=0hk(n;l)sk(n?l) for
summationtextL
l=0h(n;l)s(n?l) in (B.4)?(B.7).
B.2 Kalman Filtering
Viterbi detector (or MLSD) is the optimal receiver that provides the minimum BER.
Its computational complexity, however, depends on the number of states. By (B.9), the
Viterbi detector has MKL states, given the M-ary input symbols, K users, and an (L+1)-
tap MIMO channel. The computational complexity grows exponentially with the length of
channel, the number of users, and the constellation of transmitted signal. Viterbi detector
may be extremely expensive to implement [64].
Kalman filter, based on MMSE criterion, offers us an alternative symbol detection
technique, with much lower computational complexity than that of the optimal detector?
even though at the expense of a slight sacrifice of error performance.
For the SIMO FIR linear channel with N receivers described by (B.1) and (B.2), we
define a (d+1)-column vector (the delay d?L) as the state vector of the Kalman filter:
S(n) :=
bracketleftbigg
s(n) s(n?1) ??? s(n?d)
bracketrightbiggT
.
234
We also define the state transition matrix
? : =
?
??01?d 0
Id 0d?1
?
??,
the control-input matrix
? : =
bracketleftbigg
1 01?d
bracketrightbiggT
,
the observation matrix
H(n): =
bracketleftbigg
h(n;0) h(n;1) ??? h(n;L) 0N?(d?L)
bracketrightbigg
,
and the input
w(n) := s(n+1).
Then we have the time-invariant state equation:
S(n+1) = ?S(n)+?w(n). (B.10)
By (B.1) and (B.2), the time-varying observation equation is given by
y(n) = H(n)S(n)+v(n). (B.11)
We assume the AWGN v(n) is zero-mean. The prior statistics of the above parameters are
given by
E{w(n)} = 0, E{v(n)} = 0, E{S(0)} = ?s (0),
235
E{w(m)w?(n)} = Vw (n)?(m?n), Ebraceleftbigv(m)vH (n)bracerightbig= Vv (n)?(m?n),
E
braceleftBig
w(m)v(n)H
bracerightBig
= 0, Ebraceleftbigw(m)SH (n)bracerightbig= 0, Ebraceleftbigv(m)SH (n)bracerightbig= 0,
E
braceleftBig
[S(0)?E{S(0)}][S(0)?E{S(0)}]H
bracerightBig
= Vs (0).
Given the state equation (B.10), the observation equation (B.11), and the above prior
statistics, the algorithm of Kalman filter is as follows [70]:
Initialization: For the time n = 0, ?S(1 | 0) = ?s (0) and V?s (1 | 0) = Vs (0).
Filtering: For n= 1,2,...
V? (n) = H(n)V?s (n|n?1)HH (n)+Vv (n);
K(n) = V?s (n|n?1)HH (n)V?1? (n);
?(n) = y(n)?H(n)?S(n|n?1);
?S(n|n) = ?S(n|n?1) +K(n)?(n);
?S(n+1 |n) = ??S(n|n);
V?s (n|n) = [I?K(n)H(n)]V?s (n|n?1);
V?s (n+1 |n) = ?V?s (n|n)?H +?Vw (n)?H.
Since ?S(n|n) =
bracketleftbigg
?s(n|n) ?s(n?1 |n) ??? ?s(n?d|n)
bracketrightbiggT
, we extract its last
term ?s(n?d|n) as the desired equalized output. Then hard-quantize ?s(n?d|n)
to acquire the detected symbol ?s(n?d).
236
For a multiple-user (MIMO) channel of (B.8) with total K users, the state vector for
the k-th user is
Sk (n) =
bracketleftbigg
sk (n) sk (n?1) ??? sk (n?d)
bracketrightbiggT
where we also have d?L. Then the augmented K(d+1)-state vector is given by
S(n) =
bracketleftbigg
ST1 (n) ST2 (n) ??? STK (n)
bracketrightbiggT
.
In order to apply Kalman filter, we revise the state transition matrix as
? : = IK ?
?
??01?d 0
Id 0d?1
?
??,
the control-input matrix as
? : = IK ?
bracketleftbigg
1 01?d
bracketrightbiggT
,
the observation matrix as
Hk (n): =
bracketleftbigg
hk (n;0) hk (n;1) ??? hk (n;L) 0N?(d?l)
bracketrightbigg
and
H(n): =
bracketleftbigg
H1 (n) H2 (n) ??? HK (n)
bracketrightbigg
,
and the input as
w(n) :=
bracketleftbigg
s1 (n+1) s2 (n+1) ??? sK (n+1)
bracketrightbiggT
.
237
Then the state equation (B.10) and the observation equation (B.11) still hold. Then apply
Kalman filtering to obtain
?S(n|n) =
bracketleftbigg
?ST1 (n|n) ?ST2 (n|n) ??? ?STK (n|n)
bracketrightbiggT
,
where ?STk (n|n) =
bracketleftbigg
?sk (n|n) ?sk (n?1 |n) ??? ?sk (n?d|n)
bracketrightbiggT
for k = 1,2,...,K. Fi-
nally, we hard-quantize ?sk (n?d|n) as the desired equalized output for the k-th user.
238
Appendix C
Mathematical Notations
? approximately equal to
? Kronecker product
0M?N M ?N all zeros matrix
a lower case letters for scalars
?a? integer ceiling of a
?a? integer floor of a
|a| magnitude of a
a lower case letters in bold face for column vectors
bardblabardbl Euclidean norm of a
A upper case letters in bold face for matrices
A? complex conjugate of A
A? Moore-Penrose pseudo-inverse operation
AH complex conjugate transpose of A
AT transpose of A
[A]n,m (n,m)-th entry of A
A upper case calligraphic letters for matrices
argmaxx f(x) value of x for which f(x) attains its maximum
argminx f (x) value of x for which f(x) attains its minimum
cov{?} covariance operator
239
?(?) Kronecker delta function, defined as
?(n) =
?
??
??
1 if n = 0
0 if nnegationslash= 0, t? Z
diag{a1,...,aN} N ?N diagonal matrix with [diag{a1,...,aN}]n,n = an
E{?} expectation operator
EH{?} expectation operator with respect to H
IN N ?N identity matrix
max(?) maximum value operator
min(?) minimum value operator
O(?) big O notation: f (x) = O(g(x)) as x?a (a? R???),
iff |f(x)|?M|g(x)| as x?a for some constant M > 0
R real field
tr{A} trace of a square matrix A
Z integer field
240
Appendix D
Abbreviations
AM amplitude modulation
AR auto-regressive
AWGN additive white Gaussian noise
BEM basis expansion model
BER bit error rate
BPSK binary phase-shift keying
CE-BEM complex exponential basis expansion model
CRLB Cram?er-Rao lower bound
CSI channel state information
DC direct current
DFT discrete Fourier transform
DKL-BEM discrete Karhuen-Lo`eve basis expansion model
DML deterministic maximum likelihood
DPS discrete prolate spheroidal
DPS-BEM discrete prolate spheroidal basis expansion model
FIR finite impulse response
FM frequency modulation
ISI inter-symbol interference
LMS least mean squares
LS least squares
241
MAC media access control
MANET mobile ad hoc networks
MIMO multiple-input multiple-output
ML maximum likelihood
MLSE maximum likelihood sequence estimation
MSE mean square error
MMSE minimum mean square error
MPR multiple packet reception
m.s. mean-square
MUI multiple-user interference
NCMSE normalized channel mean square error
OFDM orthogonal frequency division multiplexing
OP-BEM orthogonal polynomial basis expansion model
PDD partially-data-dependent
pdf probability density function
PN pseudo-noise
QAM quadrature amplitude modulation
SIMO single-input multiple-output
SISO single-input single-output
SNR signal-to-noise ratio
TIR training-to-information power ratio
TM time-multiplexed
242