Supervised Learning Models of fMRI Data for Inferring Brain Function and Predicting 
Behavior 
 
by 
 
Yunzhi Wang 
 
 
 
 
A thesis submitted to the Graduate Faculty of 
Auburn University 
in partial fulfillment of the 
requirements for the Degree of 
Master of Science 
 
Auburn, Alabama 
May 4, 2014 
 
 
 
 
Keywords: fMRI, Granger causality, Effective connectivity, Brain networks, Support Vector 
Machine, Neuroeconomics 
 
 
Copyright 2014 by Yunzhi Wang 
 
 
Approved by 
 
Gopikrishna Deshpande, Chair, Assistant Professor, Electrical & Computer Engineering 
Thomas S Denney, Director, Auburn University MRI Research Center 
Veena Chattaraman, Associate Professor, Department of Consumer & Design Sciences 
 
 
 
ii 
 
 
 
 
 
 
Abstract 
 
 
   The development of fMRI has revolutionized cognitive neuroscience. There are two related 
areas gaining increasing interest: 1) Investigating the directional interactions between different 
regions. 2) Predicting human behaviors from brain activities. In this thesis, supervised learning 
models were applied on fMRI data for solving these problems. Firstly, dynamic Granger 
causality, a regression based supervised learning model, was experimentally demonstrated to be 
capable of inferring stimulus-evoked sub-100ms timing difference in fMRI responses, providing 
a reliable data-driven method for effective connectivity analysis of fMRI data. Secondly, Patel?s 
? ? a method which performed best for inferring directional interactions in a previous simulation 
? was investigated using experimental fMRI data, highlighting the necessity of experimental 
validation of simulation results. Lastly, recursive cluster elimination based support vector 
machine, a classification based supervised learning model, was used to predict purchase 
decisions using spatio-temporal fMRI features, providing a reliable framework for using fMRI 
data to predict purchase-related decisions.  
 
 
 
 
 
 
 
 
 
 
 
 
 iii 
 
 
 
 
 
Acknowledgments 
 
 
   First, I would like to thank my advisor, Dr. Gopikrishna Deshpande. He led me to the 
interesting world of fMRI, taught me how to be a qualified researcher and spent lots of time 
reading and revising my papers. I really learned a lot from him. I am deeply indebted to him for 
his great patience and wonderful advices throughout the work of thesis.  
   I would like to thank Dr. Thomas Denney and Dr. Veena Chattaraman, who take time to serve 
on my graduate committee and review my work. I would like to thank my friends in AU MRI 
center, especially Hao and Karthik, who helped me a lot at the beginning of my graduate study. 
Without them it would be more difficult to finish the thesis.  
   I would like to thank Mengdi, who always accompanies and encourages me whenever I am 
happy or sad. 
   Finally, I would like to express my deepest gratitude to my parents for their unconditional love 
and support. They are the power of keeping me going forward.  
 
 
 
 
 
 
 
 
 
 
 
 
 iv 
 
 
 
 
 
Table of Contents 
 
 
Abstract ........................................................................................................................................... ii 
 
Acknowledgments.......................................................................................................................... iii 
 
List of Tables ................................................................................................................................ vii 
 
List of Figures .............................................................................................................................. viii 
 
Chapter 1: Introduction ................................................................................................................... 1 
 
1.1 MRI ....................................................................................................................................... 1 
 
1.2 Functional MRI ..................................................................................................................... 2 
 
1.3 Functional and Effective connectivity .................................................................................. 3 
 
1.4 Supervised Learning ............................................................................................................. 4 
 
1.4.1 Granger causality and Effective connectivity ................................................................ 5 
 
1.4.2 Support Vector Machine and Brain state classification ................................................. 6 
 
1.5 Motivation and Organization ................................................................................................ 8 
 
Chapter 2: Experimental Validation of Dynamic Granger Causality for Inferring Stimulus-
evoked Sub-100ms Timing Differences from fMRI ..................................................................... 10 
 
2.1 Introduction ......................................................................................................................... 11 
 
2.2 Methods .............................................................................................................................. 12 
 
2.2.1 Data acquisition ........................................................................................................... 12 
 
2.2.2 Dynamic Granger Causality Analysis .......................................................................... 13 
 
2.2.3 Covariance of Connectivity with Experimental Paradigm .......................................... 15 
 v 
2.2.4 Comparison with cross-correlation function ................................................................ 16 
 
2.3 Results ................................................................................................................................. 17 
 
2.4 Discussion ........................................................................................................................... 20 
 
2.5 Conclusion .......................................................................................................................... 23 
 
Chapter 3: Experimental evidence demonstrating the inability of Patel's ? for estimating 
directionality of brain networks from fMRI ................................................................................. 24 
 
3.1 Introduction ......................................................................................................................... 25 
 
3.2 Methods .............................................................................................................................. 27 
 
3.2.1 Animal model selection ............................................................................................... 27 
 
3.2.2 FMRI/EEG data acquisition and processing ................................................................ 28 
 
3.2.3 IEEG experiments and data analysis ............................................................................ 29 
 
3.2.4 Patel?s conditional dependence measures .................................................................... 30 
 
3.3 Results ................................................................................................................................. 32 
 
3.4 Discussion ........................................................................................................................... 36 
 
Chapter 4: Predicting Purchase Decisions based on Spatio-temporal Functional MRI Features 
using Machine Learning ............................................................................................................... 39 
 
4.1 Introduction ......................................................................................................................... 40 
 
4.2 Method ................................................................................................................................ 44 
 
4.2.1 Experiment design and data acquisition ...................................................................... 44 
 
4.2.2 ROI selection and feature extraction ............................................................................ 45 
 
4.2.3 Recursive Cluster Elimination Based Support Vector Machine Classifier ................. 48 
 
4.3 Results ................................................................................................................................. 51 
 
4.3.1 Prediction accuracy ...................................................................................................... 51 
 
4.3.2 Important spatio-temporal features for classification .................................................. 52 
 
 vi 
4.4 Discussion ........................................................................................................................... 54 
 
4.5 Conclusion .......................................................................................................................... 56 
 
Chapter 5: Conclusion................................................................................................................... 57 
 
Bibliography ................................................................................................................................. 58 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 vii 
 
 
 
 
 
List of Tables 
 
 
Table 2.1  .................................................................................................................................... 18 
Table 2.2  .................................................................................................................................... 19 
Table 4.1  .................................................................................................................................... 47 
Table 4.2  .................................................................................................................................... 52 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 viii 
 
 
 
 
 
List of Figures 
 
 
Figure 1.1  ..................................................................................................................................... 1 
Figure 1.2  ..................................................................................................................................... 7 
Figure 2.1  ................................................................................................................................... 13 
Figure 2.2  ................................................................................................................................... 17 
Figure 2.3  ................................................................................................................................... 18 
Figure 2.4  ................................................................................................................................... 20 
Figure 3.1  ................................................................................................................................... 33 
Figure 3.2  ................................................................................................................................... 34 
Figure 3.3  ................................................................................................................................... 35 
Figure 3.4  ................................................................................................................................... 36 
Figure 4.1  ................................................................................................................................... 44 
Figure 4.2  ................................................................................................................................... 47 
Figure 4.3  ................................................................................................................................... 50 
Figure 4.4  ................................................................................................................................... 51 
Figure 4.5  ................................................................................................................................... 52 
Figure 4.6  ................................................................................................................................... 53 
 
 
 
1 
 
Chapter 1: Introduction 
 
1.1 MRI 
   Magnetic Resonance Imaging (MRI) is a noninvasive medical imaging technique using 
high magnetic fields and wave pulses instead of ionizing radiations or radioactive tracers 
to image the structures inside the body. When a patient is positioned inside the MRI 
scanner which forms a strong magnetic field, the randomly spinning nuclei will align 
with the direction of the magnetic field. Three gradient coils are then used to choose the 
orientation of the slices in the three directions; the nuclei at different locations will rotate 
at different speeds because of the spatial variance of the magnetic field. The hydrogen 
atoms will get excited and emit a radio frequency signal when the RF energy is applied at 
the appropriate resonant frequency (Larmour frequency). These MR signals detected at 
the receiver are the mixture of RF signals with different amplitudes, frequencies and 
phases containing spatial information. Inverse Fourier transform is then applied to 
recover the spatial information and reconstruct the image of scanned area [1].  
 
 
    
Figure 1.1 MRI scanner 
 2 
   Compared with other medical imaging techniques, MRI has three primary advantages 
[2]: 1) It has the potential of getting very high spatial resolution images for both bone and 
soft tissue. 2) Ionizing radiation is not required as X-rays or CT scans. 3) It could get 
images in any plane through the body. Therefore, MRI has become one of the most 
popular diagnostic imaging techniques over the past two decades. 
 
1.2 Functional MRI 
   Functional magnetic resonance imaging is a neuroimaging technique using standard 
MRI scanner to investigate the neuronal changes in brain function over time [2]. The 
measurement of brain activity is mainly based on the blood oxygenation level dependent 
(BOLD) contrast. It relies on the fact that Cerebral Blood Flow (CBF) and neuronal 
activation are normally coupled. Whenever a brain region is activated either 
simultaneously or driven by some tasks, it will demand more oxygen. The demanded 
oxygen is carried to neurons by hemoglobin in capillary red blood cells, leading to the 
increase of blood flow in that region which can be detected by MRI. The change in MR 
signal from neuronal activation is called hemodynamic response (HDR). HDR always 
lags 1 to 2 seconds after the neuronal events triggering it, and takes another 5 seconds to 
rise to a peak.  
   The spatial resolution of an fMRI image is determined by its voxel dimensions, which 
are specified by three parameters [2]: field of view, matrix size, and slice thickness. Full-
brain experiments will use larger voxel size, while those focusing on the changes in 
specific regions of interest (ROIs) will use smaller ones. Spatial resolution of fMRI could 
be as small as the order of millimeter. Temporal resolution of fMRI scan is usually 
 3 
between 1 second and 2 second. Therefore, fMRI has a poor temporal resolution 
compared with some other neuroimaging techniques such as electroencephalography 
(EEG) and magnetoencephalography (MEG), but it has an attractive high spatial 
resolution and thus has been extensively used in both research and clinical applications. 
 
1.3 Functional and Effective connectivity 
   One area of rapidly increasing interest in neuroimaging is the mapping of brain network 
[3]. Different brain regions are assumed to perform different brain functions, while many 
neuronal processes cannot be localized in a single region and hence are presumed to be 
encoded by a network formed by several brain regions. Such ?mapping? usually starts by 
defining a set of function nodes [3]. In the context of fMRI, nodes are defined as specific 
regions of interest (ROIs). Once nodes are identified, various approaches are taken to 
estimate the edges (connections) between the nodes, using the experimental time courses 
in ROIs. The most straightforward method may be looking at the correlation between the 
time courses of the node pair. However, correlation cannot indicate the directionality of 
the node pair, or whether the connection of this node pair is direct or indirect [3]. 
Generally, The approaches estimating the interaction between brain regions can be 
classified into two categories: functional connectivity and effective connectivity. 
Functional connectivity is defined as ?temporal correlations between spatially remote 
neurophysiological events? while effective connectivity is defined as ?the influence one 
neuronal system exerts over another? [4]. Typically the estimation of directionality of 
influence is harder than just estimating whether a connection exists or not, but always of 
greater interest as it provides a mechanistic characterization of the underlying neuronal 
 4 
processes in terms of information flow. Both functional and effective connectivity can be 
estimated by data-driven or model-driven approaches.  
 
1.4 Supervised Learning 
   A supervised learning model is to reason from the external instances given with known 
output to build a general hypothesis, which is then used to make predictions about future 
instances [5]. Supervised learning is the process of learning the inherent rules from the 
training data, creating a classifier or regressor that can be applied to generalize from 
future instances for prediction of their outputs. There are several steps of supervised 
learning process. The first step is to collect training data and select features that may be 
informative for prediction. Feature subset selection is usually the second step to remove 
the irrelative features and reduce the dimensionality of data. Next part is algorithm 
selection. There are many approaches proposed for supervised learning. Most commonly 
used algorithms include artificial neural networks (ANN) [6], support vector machine 
(SVM) [7], regression models, etc. A particular algorithm will be chosen and performed 
on the training data. After that, cross-validation is often used in order to estimate the 
performance of the predictor, by dividing the training data into two exclusive subsets, one 
for training and the other for testing. Supervised learning models have a broad application 
in many areas. In the context of fMRI, they can be used for inferring brain function and 
predicting behavior.  
 
 
 
 5 
1.4.1 Granger causality and Effective connectivity 
   Granger causality [8] is an autoregressive (AR) model firstly proposed for assessing the 
?causality? of different time series in the context of economics. Given two time series Xt 
and Yt, if the past values of one time series can help predict the current and future values 
of the other, then we can say that the former Granger-cause the latter. AR models are 
used for the estimation of Granger causality. Individually,  
 
 
Then bivariate AR models are used for consideration of cross correlation. 
 
 
Where a and d represent autocorrelation of each time series, b and c represent cross-
correlation between time series. E represent estimation errors (or noises). P is the order of 
the model, which may be determined by Bayesian Information Criterion (BIC) [9]. Then 
Granger causality can be calculated from the estimation errors. 
 
  
 
 6 
The variance ratio cannot be less than 1, because the introduction of additional 
parameters in the model cannot lead to an increase of estimation errors. Therefore, 
Granger causality exists on the interval (0, ?), representing the degree to which one time 
series can help predict the other. 
   Previous studies have demonstrated that when applied to electrophysiological data, 
Granger causality is capable of getting interpretable results in terms of both the 
directionality and the magnitude of synaptic transmissions [10, 11]. However, in the 
context of fMRI, the application of Granger causality for estimating effective 
connectivity is still debated. There are three factors that could have the potential to 
confound the results of Granger causality: (i) Hemodynamic variability across different 
brain regions. (ii) Low temporal resolution. (iii) Low Signal-to-noise ratio SNR [12].  
   However, despite these concerns, highly interpretable results applying Granger 
causality in fMRI appeared both in simulation works [13, 14] and experimental works 
[15, 16]. Therefore, the issue of to what extent Granger causality can be applied to fMRI 
still needs for further considering.  
 
1.4.2 Support Vector Machine and Brain state classification 
   Support vector machine (SVM) is a supervised learning algorithm developed by Vapnik 
[7] to solve the classification problems. The goal in a classification problem is to separate 
different classes by a function which learns the implicit rules by the training data, such 
that it will be able to assign a novel unlabeled input into a correct class. The fundamental 
assumption is that samples of the same class will have similar values in feature space, and 
 7 
thus be near to each other. Therefore, a decision hyperplane can be used to separate the 
different classes in feature space, as fig.2.2 shows.  
 
 
   The goal of a linear SVM classifier it to find the optimal linear hyperplane in the 
feature space with the largest margin, since larger margin always means better 
generalization of the classifier. Given a linear separable training data set (xi,yi), where xi 
is the input features of the ith sample and yi is a binary value (either 0 or 1) indicating the 
class label of the ith sample, then a pair (w,b) exists such that 
 
 
 
Figure.1.2 An illustration of a decision plane in a three-dimension feature space. 
This figure is adapted from [22] 
 
 8 
Then the decision hyperplance could be given by , where w is the weight vector 
and b is the bias. Then an optimal hyperplane maximizing the distance between different 
classes can be found by solving a convex quadratic programming problem [5]: 
 
 
 
 
   Among the machine learning and data mining methods, SVM has been an active 
technique and has been applied in a broad range of areas. At the same time, the 
classification problems in the literature of fMRI have also been gaining more and more 
attention. The implication of these problems is that brain state can be predicted using 
fMRI data, which can enhance the understanding of the cognitive process and brain 
system [17]. SVMs have been extensively applied to solve the classification problems in 
fMRI, because they have some unique properties appropriate for the context of fMRI. 
Among these, one fact is that SVM is capable of dealing with small sample sizes and 
high dimensional features, which matches the situation of fMRI data [17]. Previous 
studies have demonstrated the feasibility and potential for the application of SVM in 
Fmri [17, 18], providing a reliable framework for brain state classification. 
 
1.5 Motivation and Organization 
   FMRI is a non-invasive technique, and it allows researchers to obtain indirect estimates 
of neural activity at a spatial resolution of millimeters within a matter of seconds. 
Consequently, mining fMRI data provides a powerful tool for understanding human 
 9 
cognitive processes. Among all the applications of fMRI in neuroscience, there are two 
areas gaining particular increasing interest: 1) Mapping the functional network of brain 
and investigating the directional interactions of different regions. 2) Decoding and 
predicting human behaviors from brain activities. 
   The goal of this thesis is to apply supervised learning models to infer brain function and 
predict behavior from fMRI data. There are two categories of supervised learning 
models: regression models and classification models. Both of them were used for the 
analysis of fMRI data in different chapters. In chapter 2, dynamic Granger causality [19], 
a regression based supervised learning model, is experimentally demonstrated to be 
capable of inferring stimulus-evoked sub-100 ms timing difference from fMRI, providing 
a reliable data-driven method for effective connectivity analysis of fMRI data. Chapter 3 
is a continuance of chapter 2, where Patel?s ? [20] ? a higher order statistics method 
which performed best for inferring directional interactions from fMRI in a previous 
simulation study [3] ? was verified using experimental fMRI data, highlighting the 
necessity of experimental validation of simulation results. In chapter 4, recursive cluster 
elimination based support vector machine [21], a classification based supervised learning 
model, was used to predict purchase decisions using spatio-temporal fMRI features. This 
provides a reliable framework for using fMRI data to predict purchase-related decision-
making as well as infer its neural correlates. Chapter 5 presents a conclusion of the whole 
work in the thesis.  
 
 
 
 10 
 
Chapter 2: Experimental Validation of Dynamic Granger Causality for Inferring 
Stimulus-evoked Sub-100ms Timing Differences from fMRI 
 
Abstract 
   Decoding the sequential flow of events in the human brain non-invasively is critical for 
gaining a mechanistic understanding of brain function. In this study, we propose a 
method based on dynamic Granger causality analysis to measure timing differences in 
brain responses from fMRI. We experimentally validate this method by detecting sub-
100ms timing differences in fMRI responses obtained from bilateral visual cortex using 
fast sampling, ultra-high field and an event-related visual hemifield paradigm with known 
timing difference between the hemifields. Classical Granger causality was previously 
shown to be able to detect sub-100 ms timing differences in the visual cortex. Since 
classical Granger causality does not differentiate between spontaneous and stimulus-
evoked responses, dynamic Granger causality has been proposed as an alternative, 
thereby necessitating its experimental validation. In addition to detecting timing 
differences as low as 28 ms during dynamic Granger causality, the significance of the 
inference from our method increased with increasing delay. Therefore, it provides a 
methodology for understanding mental chronometry from fMRI in a data-driven way. 
 
 
 
 
 11 
 
2.1 Introduction 
   Correct measurements of small temporal differences in brain activities play a critical 
role in fully understanding the neural connectivities underlying brain processes. 
Functional MRI (fMRI) is an indirect measure of neuronal activity based on the blood 
oxygenation level-dependent (BOLD) hemodynamic response. Typically the 
hemodynamic response takes 5-8 seconds to reach its peak and 15-30 seconds to return to 
baseline. On the other hand, neural latencies are typically of the order of tens to hundreds 
of milliseconds. Therefore, accurate detection of the timing difference of neuronal 
activities using fMRI is challenging. However, using innovative experimental designs, 
previous studies have shown that fMRI is sensitive to latency differences of the order of 
hundreds of milliseconds in the human brain, notwithstanding its poor temporal 
resolution and hemodynamic smoothing [23,24]. Recently, a study performed by Katwal 
et al. [16], suggested that recent advances in ultrahigh field image acquisition, fast 
temporal sampling, and techniques for increasing the available signal-to-noise ratio 
(SNR) may improve the ability to detect shorter timing differences. Using these 
strategies, Katwal et al. attempted to detect small timing differences in BOLD signals by 
introducing known timing differences between left and right visual cortices. They showed 
that Granger Causality (GC) works well for detecting small temporal precedence in 
BOLD responses in the visual cortex [16]. GC is a widely-applied method for mapping 
effective connectivity over the brain, which is based on a statistical measure of how one 
time series predicts the future values of another [8,14,25,26]. However, conventional GC 
is sensitive to both spontaneous and stimulus evoked responses [27]. Therefore, previous 
 12 
studies have proposed that by estimating time-varying coefficients using a dynamic 
Granger causality model (dGC), temporal precedence due to stimulus-evoked BOLD 
responses can be separated from that due to spontaneous activity using both EEG [27,28] 
and BOLD fMRI data [30-32]. In this study, we validate this method by demonstrating 
the ability of the dGC model to detect sub-100 milliseconds of timing differences 
between BOLD fMRI time series from left and right visual cortices. Note that dGC was 
used to detect temporal precedence and not to infer causal influences in this study. 
 
2.2 Methods 
2.2.1 Data acquisition 
   Gradient-echo EPI data (TR=250 ms, TE=25 ms, flip angle=30?, FOV=128 mm?128 
mm and voxel size=1 mm?1 mm?2 mm) were acquired from a 7T Philips Achieva 
scanner from 5 healthy subjects in two coronal slices (with no slice gap) around the 
calcarine fissure. An event-related visual hemifield paradigm with known timing 
difference between the hemifields was used. Each visual stimulus comprised a 2-s 
flashing of checkerboard followed by a 16-s fixation cross for total trial duration of 18 s. 
Each run included 17 trials and the total run time was 306 seconds. For each subject, five 
runs were executed by introducing known delays (including 0, 28, 56, 84, 112 ms) 
between right and left hemifield stimulus. Fig.2.1 shows the stimulus paradigm. FMRI 
data which consisted of average time series from two activated visual cortical regions 
(denoted as X and Y for right and left hemisphere, respectively) were obtained using 
voxels selected by a novel graph-based visualization of self-organizing maps [16,33]. 
These fMRI data were used for the current analyses. 
 13 
 
 
 
2.2.2 Dynamic Granger Causality Analysis 
   As mentioned above, conventional GC is incapable of separating temporal precedence 
due to spontaneous and stimulus-evoked brain activity. One possible approach for 
inferring temporal precedence only from stimulus-evoked brain activity is to show that 
Granger causal estimates covary with the experimental paradigm. Such modulation can 
confirm temporal precedence due to stimulus-evoked activity and rule out temporal 
precedence from spontaneous activity. However, conventional GC can only provide one 
connectivity measure for the entire experiment, because it assumes that the model 
coefficients are stationary and invariant across time as shown below. Let k fMRI time 
Figure 2.1 The stimulus paradigm. Adapted from Katwal et al 2012 with permission 
 14 
kjkitnatd G C p
n ijij
?? 1,1),()(
1
???? ?
?
series be represented as X(t) = [x1(t) x2(t) ? xk(t)]. The fMRI time series can be input into 
a multivariate autoregressive (MVAR) [4] as follows: 
 
 
Where p is the order of the model determined by the Akaike or Bayesian information 
criterion [25,34], a are the model coefficients and e is the model error. Note that a(0) 
represent the instantaneous influences between time series while a(n), n=1 .. k represent 
the causal influences between time series. The effect of instantaneous correlation on 
causality can be minimized by modeling both instantaneous and causal terms in a single 
model as shown before [35]. The model coefficients were allowed to vary as a function of 
time in order to make the MVAR model dynamic as given below.  
        
Considering the model coefficients aij(n,t) as a state vector of a Kalman filter, they were 
adaptively estimated using the algorithm proposed by Arnold et al. [36]. Dynamic 
Granger causality (dGC) was then obtained as follows: 
 
 
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
???
( t )ke
( t )2e
( t )1e
p
n
n)-(tkx
n)-(t2x
n)-(t1x
( n )kka( n )k2a( n )k1a
( n )2ka( n )22a( n )21a
( n )1ka( n )12a( n )11a
( t )kx
( t )2x
( t )1x
( 0 )k2a( 0 )k1a
( 0 )2ka( 0 )21a
( 0 )1ka( 0 )12a
( t )kx
( t )2x
( t )1x
.
.
1 .
.
...
.
.
.
.
.
.
...
.
.
0...
.
.
0
.
.
.
.
0
...0
.
.
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
??
?
?
?
?
?
?
?
??
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
???
( t )ke
( t )2e
( t )1e
p
n
n)-(tkx
n)-(t2x
n)-(t1x
t)( n ,kkat)( n ,k2at)( n ,k1a
t)( n ,2kat)( n ,22at)( n ,21a
t)( n ,1kat)( n ,12at)( n ,11a
( t )kx
( t )2x
( t )1x
t)( 0 ,k2at)( 0 ,k1a
t)( 0 ,2kat)( 0 ,21a
t)( 0 ,1kat)( 0 ,12a
( t )kx
( t )2x
( t )1x
.
.
1 .
.
...
.
.
.
.
.
.
...
.
.
0...
.
.0
.
.
.
.
0
...0
.
.
 15 
A similar metric has been previously used in the static case [37]. Since we had only one 
time series each from right and left visual cortices for every subject, we used a bivariate 
model with k=2 in this study. We denote the fMRI time series obtained from right and 
left visual cortices as X and Y, respectively. The dGC model was used to get X?Y and 
Y?X connectivity time series for all subjects and delays using a model order of one as 
determined by the Bayesian Information Criterion [9,14]. The forgetting factor for the 
Kalman filter was estimated based on minimization of relative error variance [32]. 
Subsequently, we calculated dynamic Granger causality difference (dGCD) time series, 
i.e. X?Y ? Y?X to infer the difference in timing between X and Y. If dGCD is larger 
than zero, it means that the precedence is from X to Y, and vice versa if dGCD is 
negative.  
 
2.2.3 Covariance of Connectivity with Experimental Paradigm 
   A time series representing the experimental paradigm was generated by the convolution 
of the stimulus boxcar function with Statistical Parametric Mapping (SPM)?s canonical 
HRF. Fig.2.2 shows the stimulus function and the time series representing the 
experimental paradigm. In order to evaluate how dGCD covaried with the experimental 
paradigm, a general linear model (GLM) was used, considering the dGCD time series as 
the response variable and the experimental paradigm as the predictor variable [30,31]. 
The t-value obtained from the GLM represents the strength of co-variance between 
dGCD time series and the experimental paradigm. For each delay, we obtained 5 t-values 
corresponding to the five subjects. Subsequently, a one-side z-test was performed to 
examine whether the sample represented by the 5 t-values had a mean significantly larger 
 16 
than zero. The null hypothesis of z-test was that the sample belonged to a normal 
distribution with zero mean, and standard deviation of ?. For all z-tests, we set ? equal to 
the standard deviation of the t-values obtained by 0 ms delay.  
 
2.2.4 Comparison with cross-correlation function 
   The conventionally used metric for estimating delays between time series is the cross-
correlation function which computes Pearson?s correlation coefficient between two time 
series at various delays and infer the delay corresponding to the highest correlation 
coefficient as the time delay between the time series. We compared the efficacy of dGCD 
with that of the conventionally used cross-correlation function for inferring neural 
latencies. The minimum latency that can be inferred using the cross-correlation method is 
equal to the sampling period. The TR of the fMRI time series we used was 250 ms. Since 
we were interested in inferring sub-100 ms delays, we upsampled the data 25 times such 
that the resampled data had a sampling period of 10 ms. For each delay and subject, we 
obtained the cross-correlation function between the upsampled data from bilateral visual 
cortices. The delay corresponding to the maximum cross-correlation value was found in 
each case. A one-side z-test was performed to test whether the timing differences 
obtained from the cross-correlation function were significantly greater than zero, similar 
to the procedure adopted in the case of dGCD.  
 17 
 
 
 
 
 
 
 
 
 
 
 
 
2.3 Results 
   Fig.2.3 shows the t-values of the GLM fit between the experimental paradigm and 
dGCD time series. Table 2.1 shows the p-values of a one-sided z-test used to test whether 
the t-value sample was significantly greater than zero. It is notable that no causality was 
detected for a delay of zero, while dGCD significantly covaried with the experimental 
paradigm for all other delays. Also, the significance of causality generally increased with 
increasing delay. This indicates that dGCD derived from fMRI data was sensitive to even 
28 ms latency and that the sensitivity increased with increasing delay time.  
 
 
Figure 2.2 Stimulus boxcar function and experimental paradigm 
 18 
 
 
 
 
 
 
 
 
 
 
  
 
Delay (ms) p-value 
0 0.1120 
28 0.0034 
56 0.0141 
84 5.18?10-07 
112 3.81?10-12 
Figure 2.3 t-values for the GLM fit between dGCD and experimental 
paradigm versus delay times 
Table 2.1 p-values of a z-test with the null hypothesis that the distribution of t-
values obtained from all subjects has zero mean 
 19 
   Fig.2.4 shows the delays inferred from the cross-correlation function on the y-axis and 
the true delays on the x-axis. The p-values of the one-sided z-test used to test whether the 
delays inferred from the cross-correlation function were significantly different from zero 
are shown in Table 2.2. It is apparent that the cross-correlation function infers a delay 
when there is no true delay and does not infer a delay when there is one.  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Delay (ms) p-value 
0 0.0079 
28 0.9939 
56 0.6852 
84 1 
112 0.8326 
Table 2.2 p-values of a z-test with the null hypothesis that the distribution of 
delays inferred from the cross-correlation function has zero mean 
 20 
 
 
 
 
 
2.4 Discussion  
   There has been intense debate in the past 2 years regarding methods which are suitable 
to infer directional connectivity information from fMRI [38-42]. Simulations by Smith et 
al [3] showed that Patel?s ? [20] was more suitable than lag-based methods such as 
Granger causality for detecting directional connectivity. However, studies conducted by 
different groups have shown that under certain conditions, such as fast sampling and 
hemodynamic variability being within a range typically observed in healthy individuals, 
Granger causality can faithfully capture directionality information from fMRI based on 
Figure 2.4 Delays inferred from the cross-correlation function on the y-axis 
and the true delays on the x-axis 
 
 21 
neuronal latencies [12,15,43,44]. The most recent and compelling experimental evidence 
in favor of Granger causality corresponds to the study by Katwal et al. which showed that 
using Granger causality for relative timing measurement and self-organizing maps for 
voxel selection, timing differences as low as 28 ms can be inferred from fMRI time series 
in bilateral visual cortices, which had experimentally controlled timing differences 
induced by time-lagged hemi-field stimulation [16]. This makes the data obtained from 
the Katwal et al.?s study ideal for testing and validating potential approaches for inferring 
latencies from fMRI.  
   One outstanding issue with conventional GC is that it is sensitive to both spontaneous 
and stimulus evoked responses [27]. Previous studies using both EEG [28,29] and BOLD 
fMRI data [30-32] have proposed that by estimating time-varying coefficients using a 
dynamic Granger causality model (dGC), temporal precedence due to stimulus-evoked 
BOLD responses can be separated from that due to spontaneous activity. Therefore, in 
this study, we have reused the data from the study conducted by Katwal et al [16] to 
demonstrate and validate the use of dynamic Granger causality to infer tens of 
milliseconds of stimulus-evoked timing differences from BOLD fMRI. In order to be 
consistent with the study by Katwal et al., we used dynamic Granger causality difference 
between bilateral visual cortices as our metric. 
   We tested three primary hypotheses. First, whether the covariance of dynamic Granger 
causality difference with the experimental paradigm was non-significant for a delay of 0 
ms. This was indeed the case as shown in the results of Table.2.1wherein the null result 
would indicate that there was no underlying timing difference. Second, the amount of 
covariance of dynamic Granger causality difference with the experimental paradigm must 
 22 
increase with increasing latency. The increasing t-value of the GLM in Fig.2.3 (and 
decreasing p-value in Table.2.1) supports this hypothesis. Consequently, the t-value can 
be interpreted in terms of the amount of latency between the time series. Third, we 
hypothesized that even for a 28 ms delay, we would find significant (p<0.05) covariance 
between dynamic Granger causality difference and the experimental paradigm. This was 
proven right as shown from the results in Table.2.1. Results obtained from the 
conventionally used cross-correlation function demonstrated its inability to infer neuronal 
latencies from fMRI data. 
   Finally, we provide a few cautionary notes for interpreting the results presented in this 
report. First, given the confounding effect of the variability of the hemodynamic response 
[45,46] on Granger causal estimates obtained from BOLD fMRI [15,47], it is noteworthy 
that hemodynamic variability was probably not a factor influencing the results of both the 
Katwal et al.?s study [16] as well as the current study since left and right visual cortices 
are likely to have the same hemodynamics as they are fed by a common hemodynamic 
source. However, if the proposed dGC technique is applied to other situations where this 
may not be the case, we recommend that the dGC model be applied on deconvolved 
fMRI data [48,49]. Second, the performance of the dGC model was aided by the high 
SNR obtained from the 7T magnet as well as high temporal resolution provided by a TR 
of 250 ms. More studies are required to ascertain the applicability of these results at 
lower field strengths and longer TRs. Third, our results should be strictly interpreted 
within the framework of detecting neuronal delays and not directional connectivity in 
general. Neuronal delays are an established electrophysiological signature of directional 
connectivity; however the activity of region A may directionally influence (or predict) the 
 23 
activity of region B regardless of an explicit delay between the activities obtained from 
both the regions. 
 
2.5 Conclusion 
   In this study, dynamic Granger causality analysis was performed to detect sub-100ms 
timing differences in BOLD responses from the visual cortex. While Katwal et al. [16] 
demonstrated this possibility using conventional Granger causality, our proposed 
dynamic Granger causality metric relies on experimental modulation of causality with 
time. Consequently, the proposed model was able to infer only stimulus-evoked (and not 
spontaneous) neural timing differences. In summary, our experimental validation of 
dynamic Granger causality to detect sub-100ms (as small as 28 ms) timing differences 
provides a reliable data-driven method for effective connectivity analysis of fMRI data. 
 
 
 
 
 
 
 
 
 
 
 
 24 
Chapter 3: Experimental evidence demonstrating the inability of Patel's ? for 
estimating directionality of brain networks from fMRI 
 
Abstract 
  Investigating the directional interactions between brain regions plays a critical role in 
fully understanding brain function. Consequently, multiple methods have been developed 
for non-invasively inferring directional connectivity from the human brain using 
functional magnetic resonance imaging (fMRI). Recent simulations by Smith et al 
showed that Patel?s ?, a method based on higher order statistics, was the best approach for 
inferring directional interactions from fMRI. Since simulations make restrictive 
assumptions about reality, we set out to verify this finding using experimental fMRI data 
obtained from a three-region network in a rat modal with electrophysiological validation. 
Our hypothesis was that Patel?s ? obtained from fMRI data should correctly estimate the 
directionality of neuronal influences obtained from intra-cerebral EEG in this network. 
However, our results indicate that the accuracy of network directionality estimated using 
Patel?s ? was not better than chance. First, our results highlight the necessity of 
experimental validation of simulation results. Second, it is unclear which brain 
mechanism is modeled by a directionality inferred from Patel?s ?. Third, this study shows 
that a directional connection ascertained by different methods may mean different things 
and more experimental studies are needed for investigating the neuronal mechanisms 
underlying the direction of a connection in the brain ascertained by fMRI using different 
methods.   
 
 25 
3.1 Introduction 
   Functional magnetic resonance imaging (fMRI) has primarily been used to explore the 
spatial localization of brain function [89] where in different brain regions are assumed to 
perform different brain functions. However, many neuronal processes cannot be localized 
in a single region and hence are presumed to be encoded by a network formed by several 
brain regions. Therefore, it is increasingly being recognized that investigating the 
interactions between brain regions plays a critical role in fully understanding brain 
function. There are many different methods that have been proposed to characterize the 
interactions between brain regions. These can be broadly classified into two categories: 
functional connectivity and effective connectivity. Functional connectivity is defined as 
?temporal correlations between spatially remote neurophysiological events? while 
effective connectivity is defined as ?the influence one neuronal system exerts over 
another? [4]. The estimation of directionality of influence is often of great interest as it 
provides a mechanistic characterization of the underlying neuronal processes in terms of 
information flow. However, it is much harder to estimate the direction of influence than 
just estimate whether a connection exists or not. There are three general classes of 
methods to accomplish this. The first one is ?lag-based? methods such as Granger 
causality [8]. The assumption of these methods is that if one time course is similar to a 
time-shifted version of the other, then the one with temporal precedence may cause the 
other. The second class utilizes the concept of conditional independence such as Bayes 
net methods [50]. The last class is based on higher order statistics such as Patel?s ? [20] 
wherein asymmetries in the probability of activation of brain region A given the 
activation of another brain region B versus the probability of activation of region B given 
 26 
the activation of region A are used to infer the directionality of the influences between 
regions A and B. 
   While some neuroimaging techniques such as electroencephalography (EEG) and 
magnetoencephalography (MEG) have attractive temporal resolution that may favor the 
application of the above approaches, they are limited by their poor spatial resolution. In 
contrast, functional magnetic resonance imaging (fMRI) can provide excellent spatial 
resolution of millimeters and thus has become a popular choice for network estimation. 
However, fMRI is an indirect measure of neural activities suffering from hemodynamic 
smoothing and poor temporal resolution [23]. Some approaches whose application to 
EEG has been established, remain debated when applied to fMRI data (e.g. Granger 
Causality) [12]. Therefore, careful validation is necessary for the application of network 
estimation methods, especially for the estimation of directionality of influence. Recently, 
Smith et al. performed extensive analyses of simulated fMRI data to evaluate the validity 
of various network estimation methods [3], observing that Patel's ? performed best in the 
estimation of connection directionality compared to other methods such as Granger 
causality. However, any simulation is limited by the underlying assumptions, and Smith 
et al. used a generative biophysical model without an explicit delay, which may have 
favored Patel?s ? over lag-based methods because the former is not based on a delay 
assumption while the latter is [26,40]. Meanwhile Roebroeck et al. [14] and Luo et al. 
[13] reported excellent results with Granger causality while explicitly including delays in 
their simulated data. It is notable that we can never make a final conclusion from 
simulations since they often make restrictive assumptions about reality which might not 
hold true. Therefore experimental validation of simulations is required. Two recent 
 27 
studies for experimental validation of effective connectivity methods for fMRI are 
noteworthy. In the first one, Katwal et al. showed that Granger Causality was capable of 
inferring sub-100 ms timing differences between right and left visual hemi field stimuli 
[16]. In the second work, David et al. performed simultaneous EEG and fMRI 
measurements followed by intra-cerebral EEG (iEEG) recordings in rats [15]. Effective 
connectivity obtained from both from raw and deconvolved fMRI data [8] using Granger 
Causality and that obtained from Dynamic Causal Modeling (DCM) [51] were compared 
with directed functional coupling estimated from iEEG recording for validation. The 
results showed that Granger causality applied to deconvolved fMRI data as well as DCM 
were able to estimate network directionality which was consistent with that obtained from 
iEEG.  
   In this study, we aimed to use experimental data from the study by David et al to verify 
the validation of simulation results obtained by Smith et al., specifically with reference to 
the superiority of Patel?s ? for obtaining the directionality of brain connectivity. Patel?s ? 
was performed on these data to estimate the directionality of the three-voxel network; the 
results were compared to the network estimated by iEEG for validation. Our results 
indicate that Patel?s ? cannot correctly estimate the directionality of brain network. 
 
3.2 Methods 
3.2.1 Animal model selection 
   Genetic Absence Epilepsy Rats from Strasbourg (GAERS) [52] were used in this 
experiment. GAERS results from genetic selection of more than 80 generations. The rats 
show spontaneous spike and wave discharges (SWDs), lasting 20 seconds on average and 
 28 
repeating every minute when they are at rest. Previous studies using genetic model of 
absence epilepsy have shown that SWDs originate from the perioral regions of the first 
somatosensory cortex [52,53], thus providing a reference for validating directionality 
estimation results using fMRI.  
 
3.2.2 FMRI/EEG data acquisition and processing 
   Six male adult GAERS were included in the fMRI/EEG study. Spontaneous SWDs 
during MR experiments were measured using EEG. Three carbon electrodes were used, 
locating on the skull near the midline (frontal, parietal and occipital). Two additional 
carbon electrodes were introduced for measurement of cardiac activity 
(electrocardiography [ECG]). MR experiments were performed in a horizontal-bore 2.35 
T magnet. FMRI data were acquired using gradient-echo echo-planar imaging (EPI) 
sequence with the following parameters: two shots, data matrix = 48 ? 48, FOV = 35 ? 
35 mm2, 15 contiguous 1.5-mm-thick slices covering the whole brain, alpha = 90?, TE = 
20 ms, TR = 3 s. T1 weighted anatomical scans were also obtained using a 3D-MDEFT 
sequence [54] with the following parameters: voxel size = 0.33 ? 0.33 ? 0.33 mm3, TI = 
605 ms, quot = 0.45, alpha = 22?, TR/TE = 15/5 ms, and BW = 20 kHz.  
   SPM 5 (http://www.fil.ion.ucl.ac.uk/spm/software/spm5/) was used for data processing 
and analysis [55]. Standard spatial preprocessing was performed including realigning, 
normalizing and smoothing. A SWD regressor was then obtained by convolving the 
down-sampled EEG signal with a canonical HRF. This regressor was then used to obtain 
SWD-related t-statistic maps and identify ROIs. Several significant activated and 
deactivated regions were found at the group level. Three of them were identified as ROIs: 
 29 
Primary somatosensory cortex, barrel field (S1BF), Thalamus and Striatum (caudate-
putamen; CPu). Activations were found in S1BF and Thalamus while deactivations were 
found in CPu. There were several reasons for selecting these 3 regions: (1) they were 
most consistently activated over different sessions and rats. (2) they exhibit different 
hemodynamics, which can provide a rigorous validation since HRF variability is a vital 
concern for many approaches. (3) our current understanding of SWDs can easily integrate 
them. The time courses from these three regions were used in the following analysis. 
Please refer to David et al [15] for complete details of data acquisition and activation 
analysis. 
 
3.2.3 IEEG experiments and data analysis 
   Five adult GAERS (two males, three females) were used in the iEEG experiment. 
GAERS were implanted with intra-cerebral electrodes locating in the three ROIs (S1BF, 
ventrobasal thalamus and striatum). Another two electrodes were fixed in the nasal and 
occipital bones for reference. EEG data were obtained in awake and feely moving rats. 
Please refer to David et al [15] for details of the iEEG experiments. IEEG connectivity 
was obtained by spike averaging and generalized synchronization. Significant 
directionality were found from S1BF to the striatum (p < 10-9) and from S1BF to the 
thalamus (p < 0.02), while the connectivity from striatum to thalamus was not significant 
(p > 0.3). These results were also in accordance with some previous studies [52,53], thus 
were used as ground truth to validate the results obtained from fMRI connectivity 
analysis. 
 
 30 
3.2.4 Patel?s conditional dependence measures 
   A data-driven and hypothesis-unconstrained Bayesian approach was proposed by Patel 
et al. to examine both functional connectivity and effective connectivity [20]. It assesses 
the connectivity between two voxel or ROI time series by comparing joint and marginal 
probabilities of evoked activity of voxel/ROI pairs using a Bayesian method. Note that 
the inferences are made using time series, which could either be extracted from a single 
voxel or be obtained by averaging time series from multiple voxels within an ROI. In this 
study, we employed mean time series extracted from the ROIs S1BF, Thalamus and 
Striatum. Three steps were taken for the measurement of directionality: determining 
voxel activation, Patel?s kappa for measurement of functional connectivity, and Patel?s 
tau for measurement of ascendancy (or directionality). 
   Determining voxel activation. Patel?s method derives inferences based on a binary time 
series indicating whether a ROI is evoked or not at each time point (or TR). Here, we use 
the word ?evoked? in order to refer to both activations and deactivations. In order to 
determine the evocation of ROIs at each time point, we first normalized the time series to 
the range [0,1], limiting the largest 10 percentile data to 1 and lowest 10 percentile data to 
0, and linearly mapping other data to 0:1. A pre-chosen threshold T (0 < T <1) was used 
for binarization. Since S1BF and thalamus were shown to be activated by David et al 
[14], normalized values larger than T were binarized to 1 and others were binarized to 0. 
However, David et al showed that the striatum (CPu) was deactivated [14], and hence 
normalized values smaller than 1 ? T were binarized to 1 and others were binarized to 0 
[8]. Given two ROIs a and b, joint evocation probabilities were calculated from the 
binary time series extracted from corresponding mean ROI time series. , , ,  were 
 31 
denoted as the probability of ROI a evoked and ROI b evoked, the probability of ROI a 
evoked and ROI b rest, the probability of ROI a rest and ROI b evoked, and the 
probability of ROI a rest and ROI b rest, respectively. 
   Patel?s ? is a measure of functional connectivity based on the posterior distributions. ? 
is defined as follows [20]: 
 
 
 
Where  
 
 
 
 
 
Here the numerator of ? is the difference between the joint probability of both ROIs 
being evoked and the expected joint probability under the case of independence, while 
the denominator is just a constant restricting ? from -1 to 1. Specifically, if ROI a and 
ROI b are statistically independent, ? will be 0. When ROI a and ROI b are evoked 
simultaneously (?2 = ?3 =0), ? will be 1. Large value of ? indicates a stronger dependence 
relationship, or say, functional connectivity between the two ROIs. 
   Patel?s ? is a measure of ascendancy (or directional influence) between any given ROIs 
a and b. The assumption is that if ROI b is ascendant to ROI a, the time period when ROI 
 32 
a is evoked should be in the subset of the period when ROI b is evoked, if ROIs a and b 
are functional connected. ? is define as follows [20]: 
 
=  
 
?ab ranges from -1 to +1, with a positive value indicating that the influence is from ROIs a 
to b, and a negative value indicating directional influence from ROI b to a, given ? ? 0. 
 
3.3 Results 
   Functional connectivity between S1BF, Thalamus and Striatum was determined using 
Patel?s ? with a threshold 0.75. The choice of this value for the threshold is guided by the 
recommendations of Smith et al [3]. Fig.1 shows Patel?s ? values obtained from each of 
the five rats between the three ROIs. It is notable that a negative ? indicates that the ROI 
under consideration tends to be evoked at the rest period of the other. In our study, since 
the three ROIs have been previously demonstrated to be involved in SWDs and were 
identified using an fMRI regressor, negative ? values were regarded as errors in the 
estimation of ?. Fig.3.1 shows that most of the ? values were positive, indicating that the 
three ROIs were functionally connected.    
 33 
 
 
   Subsequently, the directionality of connectivity in the 3-ROI network was estimated by 
computing Patel?s ? with threshold being 0.75. The results were compared with the 
ground truth network obtained from iEEG data for validation (please refer to David et al 
[15] for this result). Fig.3.2 shows the network estimated using Patel?s ? for each rat.  
 
Figure 3.1 Patel?s ? for all ROI pairs for each rat showing that the ROIs 
are functionally connected 
 34 
 
 
   At the group level, the correct estimation rate for each ROI pair was obtained over the 
six rats. In order to get the statistical significance of the correct estimation rate over 
chance level, a binomial null distribution B (?, ?) was formed, where ? is the number of 
rats (i.e. 6), ? is the success probability at chance level (i.e. 0.5). The correct estimation 
rates based on Patel?s ? were compared with the null distribution to get p-values, as 
shown in Fig.3.3. All the p-values were greater than 0.05, indicating that Patel?s ? cannot 
correctly estimate the directionality of network.  
 
Figure 3.2 Estimated directionality of the network using Patel?s ?. Blue arrows indicate 
estimated directions which agree with that obtained from iEEG while red arrows indicate 
estimated directions which disagree with that obtained from iEEG [14] 
 35 
 
 
 
 
   In order to consider the effect of different thresholds, Patel?s ? with two other 
thresholds (0.5 and 0.25) as well as the case of no binarization was performed to get 
estimates of the 3-ROI network. Fig.3.4 shows the correct estimation rates and 
corresponding p-values in these three cases. 
Figure 3.3 Correct group level estimation rates and corresponding p-values for 
each ROI pair. 
 36 
  
 
 
 
3.4 Discussion 
   In this study, we experimentally demonstrate that Patel?s ? is unable to correctly 
estimate the directionality of neuronal influences from fMRI data. This finding does not 
support previous simulations that suggested Patel's ? as an effective measure of 
directional brain connectivity [3]. Our experimental conditions were fairly similar to 
those used in the simulations by Smith et al [3]: both used a TR of 3 s, we had data 
lengths up to 3 times more than those used by Smith et al (note that different rats had 
different data lengths [15]) which should favor all methods including Patel?s ?, and HRF 
variability was simulated by Smith et al [3] and was reported in the dataset used in the 
current work by David et al [15]. Further, Smith et al showed that the accuracy of the 
Figure 3.4 Correct group level estimation rates and corresponding 
p-values for each ROI pairs for different thresholds.  
 
 37 
estimation of network directionality using Patel?s ? showed a strong positive dependence 
on the length of the time series and strength of connections. As mentioned earlier, the fact 
that we used a longer time series than that used by Smith et al should favor Patel?s ?. 
However, Rat-2 had the shortest time series of all rats and yet we were able to correctly 
estimate the directionality of all the three paths in this rat. Also, David et al showed based 
on directional connectivity estimated from iEEG data that S1BF had very strong 
directional influences on Thalamus and the Striatum. This should also favor Patel?s ?. 
Taken together, our results demonstrate that even though we had reasonable similarity of 
experimental conditions as compared to the simulations by Smith et al [3], with certain 
parameters which were favorable to Patel?s ?, we were not able to correctly estimate the 
directionality of connections in the rat brain. This opens up the possibility that certain 
assumptions made Smith et al [3] may not hold true in reality. However, we are unable to 
speculate on specific factors which led to this negative result, and this is an aspect that 
may be probed in future studies. It is noteworthy that simulations are required to model 
experimental conditions and not vice versa. Hence, our results could be viewed 
independently from the simulations of Smith et al as well. Generally speaking, our results 
demonstrate the need for experimental validation of simulations, since the latter often 
make restrictive assumptions about reality which might not hold true. 
   Another issue that should be highlighted is the lack of clarity in the neuroimaging 
community regarding the neuronal mechanisms underlying the direction of a connection 
in the brain ascertained by fMRI using various methods. On the one hand, lag-based 
methods have a clear neuroscientific connotation as it is linked to the concept of mental 
chronometry. Also, electrophysiological experiments such as the ones employing 
 38 
syntactic event-related potentials have observed latencies in the primary visual cortex 100 
ms post stimulus and in the parietal cortex 500-600 ms most stimulus [57]. These 
?causal? chains of events in the brain make the interpretation of directionality in lag-
based methods fairly straightforward. On the other hand, other methods which assign 
directionality to connections, specifically Patel?s ? derived from higher order statistics, 
may not capture ?directional influence? in a temporal sense, the way it is intuitively 
construed by many people. Rather, they rely on other concepts, such as the asymmetries 
in the probability of activation of a region A given the activation of another region B, 
versus the probability of activation of a region B given the activation of region A, as in 
Patel?s ?. Our results indicate that such concepts may not be capable of correctly 
estimating the directionality of neuronal influence. Therefore, more research is needed to 
ascertain the neuronal mechanistic underpinnings of methods claiming to ascertain 
directionality of neuronal influence from fMRI data.  
 
 
 
 
 
 
 
 
 
 
 39 
Chapter 4: Predicting Purchase Decisions based on Spatio-temporal Functional 
MRI Features using Machine Learning 
 
Abstract 
   Machine learning algorithms allow us to directly predict brain states based on 
functional magnetic resonance imaging (fMRI) data. In this study, we demonstrate the 
application of this framework to neuromarketing by predicting purchase decisions from 
spatio-temporal fMRI data. A sample of 24 subjects were shown product images and 
asked to make decisions of whether to buy them or not while undergoing fMRI scanning. 
Eight brain regions which were significantly activated during decision-making were 
identified using a general linear model. Time series were extracted from these regions 
and input into a recursive cluster elimination based support vector machine (RCE-SVM) 
for predicting purchase decisions. This method iteratively eliminates features which are 
unimportant until only the most discriminative features giving maximum accuracy are 
obtained. We were able to predict purchase decisions with 71% accuracy, which is higher 
than previously reported. In addition, we found that the most discriminative features were 
in signals from medial and superior frontal cortices, both before and after the decision 
point. Therefore, this approach provides a reliable framework for using fMRI data to 
predict purchase-related decision-making as well as infer its neural correlates. 
 
 
 
 
 40 
4.1 Introduction 
   The development of functional MRI (fMRI) has greatly promoted our understanding of 
human brain function. One fundamental and valued problem in neuroimaging is its 
potential to predict human behaviors from their brain activations. FMRI is a non-invasive 
technique, and it allows researchers to obtain indirect estimates of neural activity at a 
spatial resolution of millimeters within a matter of seconds [58]. Consequently, mining 
fMRI data provides a powerful tool for understanding human cognitive processes.  
   Recently, multivariate pattern recognition (MPR) methods have been extensively 
applied to analyze fMRI data for decoding behaviors and cognitive processes 
[18,59,60,61]. In these approaches, fMRI data are used to detect the differences in 
activation patterns of cognitive state (state 1 vs. state 2) and discriminate one from the 
other. Many earlier studies have shown that MPR methods can successfully predict 
behaviors from brain activations. For example, Haxby et al. distinguished the category of 
perceived visual stimuli [62], Kamitani et al. decoded the direction of movement [63], 
Mitchell et al. predicted whether the subjects were looking at a picture or a sentence [64], 
etc. 
   Methodologically, the framework of MPR methods applied in neuroimaging usually 
consists of three parts: feature extraction, feature selection and a particular pattern 
recognition algorithm [65]. Feature extraction is to obtain some specific characteristics 
from fMRI data, with the hope that they may have the power to discriminate different 
classes. The most commonly used features are voxel intensities from specific brain 
regions of interest (ROIs) [60]. While traditional fMRI studies focused on the spatial 
features to identify the relevant brain regions, Mour?o-Miranda et al. proposed a spatio-
 41 
temporal classifier by considering both spatial and temporal features [66]. This approach 
is valuable for finding out not only ?where? but also ?when? the brain activation predicts 
behavior. After feature extraction, feature selection is used to select subsets of features 
that possess the most discriminatory power. Feature selection is essential for fMRI 
studies, not only because it can improve the performance of classifiers in terms of 
prediction accuracy, but also because it can identify the relevant regions and time points 
that are most useful for classifying different cognitive states. There are two main 
categories of existing feature selection approaches: ?filter methods? such as t-test [64] 
and ?wrapper methods? such as recursive feature elimination (RFE) [67]. Generally, the 
wrapper methods perform better than filter methods for fMRI studies [59]. In the last 
part, the selected features are input to a specific pattern recognition algorithm (example: 
support vector machine) to separate the different classes and correctly predict the class of 
a novel pattern. Strategies using any combination of each individual part (i.e. feature 
extraction, feature selection and pattern recognition algorithm) can be used for brain state 
classification. 
   While most of the previous studies related to brain state classification focused on the 
prediction of sensory stimulus perceived by human beings, the possibility of using fMRI 
data to directly predict human decisions is greatly attractive in many applications. One 
such example is neuromarketing ? the application of neuroimaging techniques to 
objectively characterize the effect of product marketing on human brains ? which has 
gained increasing popularity [68]. A mechanistic insight into the neurocognitive 
processes underlying an individual?s decision on whether to buy a product or not is the 
most fundamental quest in marketing analysis. There are a couple of previous studies 
 42 
using fMRI data and MPR methods to predict purchase decisions. Knutson et al. 
extracted fMRI data from the following brain regions ? bilateral nucleus accumbens 
(NAcc), bilateral medial prefrontal cortex (MPFC) and right insula ? and used a simple 
logistic regression model to predict purchase decisions with prediction accuracy at 60% 
[69]. In order to increase prediction accuracy and yield interpretable coefficients for 
gaining insight into the neural mechanisms underlying the decision process, Grosenick et 
al. reused the data from Knutson et al. and applied six different classification models to 
predict purchase behavior [70]. They showed that PDA-ENET (penalized discriminant 
analysis ? elastic net) classifier [71, 72] performs best with across-subjects classification 
rate at 66% and within-subjects classification rate at 67.05% and 63.15% for the two 
presentation datasets respectively [70]. Although Grosenick et al. significantly increased 
the prediction accuracy as well as temporal and spatial interpretability compared to 
Knutson et al, we highlight the following outstanding issues. First, three periods were 
included in the experiment designed by Knutson et al.: product period, price period and 
choice period [69]. Although prices are an essential element for marketing analysis, the 
induction of price period may make it difficult to disentangle the effects of product 
design features and price on the purchase decision. Therefore, in this study, we asked 
participants to make purchasing decisions solely based on product design features 
without price considerations, i.e. prices were not displayed. Second, the authors 
employed three different regression models (i.e. Least Absolute Shrinkage and Selection 
Operator (LASSO), Elastic Net (ENET) and Univariate Soft Thresholding (UST)) for 
Penalized Discriminant Analysis (PDA) classifiers. All of the three models have the 
mechanism of automatic variable selection, i.e. removing less discriminative features 
 43 
from the model. However, the linear Support Vector Machine (SVM) used by the authors 
does not actually have such ?feature selection? part embedded into it [70]. Thus, the fact 
that PDA classifiers gave a better prediction accuracy and spatial-temporal 
interpretability than linear SVM are less convincing. Therefore we employed a method 
based on RFE for selecting features, which were then input into the SVM classifier so 
that the efficacy of SVMs for brain state classification in the context of predicting 
purchase decisions using fMRI data can be established. 
  Specifically, we used spatio-temporal fMRI features with Recursive Cluster Elimination 
based Support Vector Machine (RCE-SVM) [21] for predicting purchase decisions. The 
signal features were extracted from time series obtained from 8 different ROIs activated 
during the task. The reason for adopting RCE as feature selection method is that the 
wrapper methods have been shown to be advantageous over filter methods for feature 
selection [59], and RCE considers feature clusters rather than individual features 
(assuming that features are usually correlated with each other) which makes it faster than 
the RFE method [21]. Also, RCE-SVM has been reported to be a very reliable 
classification method in some earlier fMRI studies [65, 73]. Using this method, we were 
able to achieve an average classification accuracy of 71%, which is better than that 
obtained by previous purchase decision prediction studies. Also we ranked the features 
based on their discriminability and in order to infer where and when brain activation can 
best predict purchase decisions. 
 
 
 44 
4.2 Method 
4.2.1 Experiment design and data acquisition 
   Twenty-four healthy subjects (17 female and 7 male; mean age = 23.6; age range ? 19 ? 
59 years) who were recruited from Auburn University participated in this study. The 
study protocol was approved by the Institutional Review Board at the university and 
informed consent was obtained from each subject prior to their participation.  
   While being scanned, the subjects participated in an event-related task. There were 64 
actual product images, with equal number of complex and simple product designs (32 
each). There were also an equal number of products reflecting both hedonic and 
utilitarian product categories (32 each). Please refer to our other publication for details 
regarding how the product images were chosen based on behavioral testing [74]. For each 
trial, the subjects were shown one of the 64 product images for 5 seconds, and allowed 
another 5 seconds to make a purchase decision (buy or not buy?). The 64 stimuli were 
shown in pseudo-random order, using the E-prime software 
(http://www.pstnet.com/software.cfm?ID=101). Inter-trial intervals were also randomly 
chosen using optseq software (http://surfer.nmr.mgh.harvard.edu/optseq/). The schematic 
of this event-related design is shown in fig.4.1. 
 
Figure 4.1 The schematic of the event-related experimental design 
 45 
   Functional MRI images were acquired with a 3 Tesla Siemens Verio scanner. 64 visual 
stimuli were presented to the subjects using an MR-compatible projection system while 
they lay in the scanner. An MR-compatible button box was used to record the subjects? 
response to each stimulus. Whenever the subjects had a choice they would press different 
buttons for buy/not buy decisions. FMRI data were obtained using Echo-planar imaging 
(EPI) sequence [75] with a 32-channel head coil and the following parameters: TR = 
1000 ms, TE = 30 ms, FOV = 24 cm, matrix = 64 ? 64, 3 ? 3 mm2 in-plane resolution 
and contiguous slices of 5 mm thickness with whole brain coverage. High-resolution 
anatomical scans were also obtained for an anatomical reference using the 3D 
magnetization-prepared rapid gradient echo (MPRAGE) [76] sequence (TE/TR = 5/35 
ms, matrix = 256?208?196, FOV = 256?208?192 mm2, and a 1 mm isotropic 
resolution). FMRI data were subjected to standard pre-processing using statistical 
parametric mapping (SPM) software (www.fil.ion.ucl.ac.uk/spm/).  
 
4.2.2 ROI selection and feature extraction 
   Using a general linear model [77], brain regions activated more when a product was 
bought compared to when it was not and vice versa were identified. We employed a 
stringent threshold of p<0.01 FWE corrected for multiple comparisons so that only the 
most discriminative activations were used for further analysis.  
   One fundamental assumption in MPR methods is that all the training trials in the same 
class (e.g. state 1) will have the same properties and thus can be exchanged with each 
other. If only spatial features are extracted for prediction, the stationarity and 
exchangeability assumption cannot hold [66]. Simple temporal embedding can solve this 
 46 
problem (i.e. MPR methods would then use both spatial and temporal features for 
prediction). The feature selection part can produce a discriminating score for both voxel 
and time point, providing insight into dynamic changes in discriminating power of 
voxels/ROIs. 
   Eight activated ROIs were selected; their names and coordinates are shown in Table 
4.1. Ten time points (5 from the viewing window and 5 from the decision window) 
extracted for each subject and each trial was aligned with respect to the exact time point 
when the subjects made the purchase decision (indicated via pressing the button). Most of 
the subjects made the decision between time point 6 and 8. So the length of aligned time 
series was 8 time points, with the decision point being 6th, as shown in Fig.4.2. All the 
aligned time series were arranged wherein the input space covered both voxels and time 
points [66, 70]. Specifically, the data was arranged as a three dimensional (N ? F ? S) 
matrix X, with N corresponding to the number of trials per subject (64), F corresponding 
to the input features of the classifier (64), and S corresponding to the number of subjects. 
For each trial, the extracted features were the 8-timepoint aligned time series in the 8 
ROIs, so F (input features) was 8 ? 8 = 64. In this study, we focused on the classification 
within individual subjects. For each subject, the 64 input features were used in a classifier 
to obtain the prediction accuracy.  
 
 
 
 
 
 47 
 
 
Name Peak MNI coordinate 
Inferior Temporal Gyrus (ITG) -52, -36, -24 
Medial Frontal Gyrus (MFG) 1, 3, 46 
Angular Gyrus (AG) -50, -74, 16 
Superior Frontal Gyrus (SFG) -14, 32, 62 
Middle Frontal Gyrus (MiFG) 52, 24, 38 
Left Middle Temporal Gyrus (L MTG) -46, -40, -6 
Right Middle Temporal Gyrus (R MTG) 56, -44, -12 
Mid Orbitofrontal Cortex (MOFC) 32, 58, -14 
 
 
 
 
 
 
 
Table 4.1 ROI names and peak MNI coordinates 
Figure 4.2 An illustration of the process of alignment of ROI time series with 
respect to the decision time point (red point) 
 48 
4.2.3 Recursive Cluster Elimination Based Support Vector Machine Classifier 
   Support Vector Machine (SVM), which was initially proposed by Vapnik [7], is a 
widely used machine learning method for classification in many different fields of 
research [78]. Earlier studies have shown that using discriminatory input features will 
enhance the performance of SVM classifier [67]. Filtering methods and wrapper methods 
are two commonly used approaches for feature selection [79,80]. In filtering methods, 
statistical tests such as t-test are performed to select the features that are statistically 
different between classes [81]. The limitation of this method is that the features are 
selected independent of the classification process, and the measures are univariate 
without considering the relationship between features [82]. Wrapper methods can 
successfully solve these problems by embedding the feature selection into the 
classification process. In this method, features are iteratively eliminated to minimize 
prediction error [59, 79, 83]. RCE-SVM is a wrapper methods based SVM. It was firstly 
proposed for gene classification to enhance both classification accuracy and 
computational efficiency [21], and then successfully applied in some previous fMRI 
studies [65, 73]. In this study, we propose a method that takes advantage of both filter 
and wrapper methods. Selection of spatio-temporal features using GLM analysis 
represents a ?filtering? of the input space for dimensionality reduction using mass 
univariate models. By using these selected features in an RCE-SVM wrapper model, our 
approach represents a fusion of both filter and wrapper methods.  
   There are three main steps in RCE-SVM algorithm: cluster step, SVM scoring step and 
RCE step, as shown in Fig.4.3 [65, 21]. Firstly, the input features on the 64 trials for each 
subject were equally divided into two sets, one for training and the other for testing. In 
 49 
the cluster step, an unsupervised learning method K-means algorithm [84] was performed 
to identify features correlated in the training set which were clustered into n clusters, 
where n was the initial number of clusters and set to a pre-chosen number (i.e. 35 in this 
study). In the next step, SVM score of each cluster was defined as its ability to 
discriminate the two classes. The scores were obtained through a cross-validation by a 
linear SVM. The training data were randomly and equally partitioned into 5 non-
overlapping folds; linear SVM using the features in one particular cluster was trained 
over 4 folds and performance was calculated from the remaining fold. The procedure was 
repeated for 50 times in order to take into account different partitions to ensure the 
reliability of performance. The mean classification accuracy over all the folds and 
repetitions was assigned as the SVM score of each feature. In the RCE step, the 20% of 
features with the lowest SVM scores were removed. The remaining features were merged 
and n was set to n ? 0.2*n. All the three steps were repeated until n is equal to 2. Testing 
set was used to evaluate the prediction performance at the end of each iteration. There is 
no bias in the performance accuracy using this procedure because of total separation of 
training and testing data [85]. The accuracy at each RCE-SVM loop was obtained from 
the average accuracy of all 50 repetitions using the feature clusters of testing data at the 
corresponding loop.  
   A within-subject prediction performance was calculated as the mean value of the 
individual subject RCE-SVM classifier prediction accuracies. In order to calculate the 
statistical significance of prediction accuracy over chance level, a binomial null 
distribution B(?, ?) [61] was formed, where ? is the number of trials (i.e. 64), ? is the 
success probability of chance level (i.e. 0.5). The prediction accuracies were compared 
 50 
with the null distribution to get p-values and only accuracies whose p-values were less 
than 0.05 were considered as significantly higher than guess level. 
 
 
 
 
Figure 4.3 Flowchart of RCE-SVM algorithm 
 51 
4.3 Results 
4.3.1 Prediction accuracy 
   The average prediction accuracies at each RCE-SVM step (i.e. using particular number 
of feature clusters) are shown in Fig.4.4. The figure illustrates that the average 
performance of prediction increased with the removal of non-discriminative features and 
reached a highest accuracy of 70.73% using 2 clusters and 4 features. The p-values 
corresponding to accuracies at each step are shown in Table 4.2. For each individual 
subject, the prediction accuracy was defined as the maximum accuracy in the accuracy 
curve. When prediction accuracy was calculated separately for male and female subjects, 
no significant differences in accuracy were observed (p>0.05). Fig.4.5 shows the 
histogram and statistics of the individual prediction accuracies for all the subjects. The 
mean and median accuracy is 70.98% and 70.56% respectively. 11 subjects had rates > 
70%, with a maximum individual accuracy of 83.35%. 
 
 
 Figure 4.4 The evolving prediction accuracy of RCE-SVM with 
decreasing number of features 
 52 
 
 
4.3.2 Important spatio-temporal features for classification    
   The SVM scores indicating discriminatory power of individual features were averaged 
over all the RCE-SVM classifiers and ranked in descending order, i.e. the feature with 
Number of features Prediction accuracy (%) P-value 
64 55.70 0.1919 
51 59.39 0.0517 
40 61.50 0.0300 
31 63.35 0.0164 
24 65.01 0.0084 
19 66.28 0.0041 
15 67.13 0.0041 
11 68.19 0.0018 
8 69.07 0.0008 
6 69.76 0.0008 
4 70.73 0.0003 
Table 4.2 Prediction accuracies and corresponding p-values at each step of RCE-
SVM classifier 
Figure 4.5 Statistics and histogram of RCE-SVM within-subject prediction 
accuracies 
 
 53 
rank-1 was the most discriminative. Fig.4.6 shows the dynamic changes of discriminatory 
power in the 8 ROIs across the entire trial. The top 4 ranked are indicated in red since 
they gave the best prediction accuracy.  
 
 
 
 
Figure 4.6 Dynamic changes of discriminatory power in 8 ROIs 
(Red-labeled points are top-4-ranked features; green-labeled 
points are decision points) 
 54 
 
4.4 Discussion 
   The goals of this study were: (1) to predict purchase decisions by utilizing machine 
learning methods (2) to find the spatio-temporal features which were most important for 
prediction, and (3) to interpret them. We employed a recursive cluster elimination based 
support vector machine for prediction of purchase decisions based on spatial-temporal 
fMRI features and obtained those features which have the highest predictive power. We 
obtained >70% which was significantly higher than chance level. Further, the most 
discriminative features were in medial and superior frontal cortices, both before and after 
the decision point. We elaborate on these themes below. 
   Recently, SVMs have been extensively applied in fMRI data analysis [59, 65, 66]. 
There are two main motivations for utilizing machine learning methods for fMRI 
analysis: (1) Classifiers can be seen as a pattern recognition method used to predict 
cognitive behaviors from brain activity. The optimization of prediction accuracy is 
crucial for this case. (2) The classification procedure can also provide an insight into the 
neuronal mechanisms underlying the cognitive process [66]. The two motivations also 
correspond to the two goals in our study.  
   The average prediction accuracy curve in Fig.4.4 shows that without the feature 
selection part (i.e. RCE part), the average accuracy was 55.70% with a p-value < 0.05. In 
contrast, the average individual accuracy obtained from RCE-SVM after eliminating 
uninformative features was much higher. The results demonstrate that the feature 
selection part is of great importance for advancing the utility of machine learning 
algorithms for brain state classification. Therefore, the results obtained by Grosenick et al 
 55 
which claimed that the LDA classifier performs better than linear support vector machine 
for purchase prediction needs to be viewed in the context that they did not employ feature 
selection before using the SVM [70].  
   The dynamic changes of SVM scores in ROIs shown in Fig.4.6 provide the spatio-
temporal information about ?when? and ?where? the human brain activations are most 
important for purchase decision prediction. It shows that the most important features for 
purchase prediction included signal amplitudes in SFG and MFG before the decision 
point, and in SFG after the decision point. Therefore, understanding the role of SFG and 
MFG in decision-making process is essential for insight into the underlying neural 
mechanism. SFG has been proved to be less important when a single action is selected, 
but necessary when the decision rules change dynamically [86,87]. In our experiment, 
products with different design features (i.e. hedonic or utilitarian products with either 
simple or complex design) were presented to the participants in a random order. It is 
obvious that the rules for whether to buy a hedonic product will be different from the 
rules for whether to buy a utilitarian product. SFG was activated and generated 
discriminatory power soon after participants viewed the products, but 3 s before they 
made the purchase decision. Medial frontal gyrus (MFG, also referred to as medial 
prefrontal cortex) plays an important role in integrating gains and losses [74]. Previous 
studies have shown that MFG is activated in economic decisions [69, 74]. After viewing 
the products and before making a decision whether to buy it or not, considering the gains 
and losses of a decision activated MFG. This probably explains the predictive power of 
the amplitude of MFG signal 2 s before the decision point. After the decision point, 
participants will hold a short term-memory for what decision they made. SFG has been 
 56 
implicated in working memory (WM) [88], indicating a probable reason for its 
discriminatory power after participants made a purchase decision.  
 
4.5 Conclusion 
   In this study, we adopted recursive cluster elimination based support vector machine to 
predict purchase decisions, (i.e. whether an individual decides to buy a product or not), 
using spatio-temporal fMRI features with more than 70% accuracy. We combined filter 
methods (i.e. GLM) with wrapping methods (i.e. RCE) for feature selection. This enabled 
us to identify the signal values in medial and superior frontal gyrus, both before and after 
the decision point, as spatio-temporal features possessing the most discriminatory power 
for predicting purchase decisions. Our approach provides a reliable multivariate pattern 
recognition framework for brain state classification using neuroimaging data, in terms of 
both improving prediction accuracy and generating interpretable spatio-temporal 
information. 
 
 
 
 
 
 
 
 
 
 57 
Chapter 5: Conclusion 
    
   In this thesis supervised learning models were applied for estimating effective 
connectivity and predicting purchase decisions from fMRI data. Our proposed dynamic 
Granger causality relies on experimental modulation of causality with time, and therefore 
was able to infer only stimulus-evoked (and not spontaneous) neural timing differences. 
Subsequently, we experimentally demonstrated that Patel?s ? was unable to correctly 
estimate the directionality of neuronal influence of spontaneous spike and wave 
discharges (SWDs) in Genetic Absence Epilepsy Rats from Strasbourg (GAERS). These 
findings do not support previous simulations that suggested Patel's ? as the most effective 
measure of directional connectivity, and demonstrate the need for experimental validation 
of simulations since the latter often make restrictive assumptions about reality which 
might not hold true. Last but not least, we adopted recursive cluster elimination based 
support vector machine to predict purchase decisions, using spatio-temporal fMRI 
features with more than 70% accuracy. The combination of filter methods with wrapping 
methods for feature selection enabled us to identify the signal values in medial and 
superior frontal gyrus, both before and after the decision point, as spatio-temporal 
features possessing the most discriminatory power for predicting purchase decisions. In 
conclusion, this thesis provides some reliable validation and methodology for the 
application of supervised learning models in the context of fMRI.  
 
 
 
 
 
 
 
 58 
Bibliography 
 
 
[1] S. C. Bushong, Magnetic resonance imaging, Elsevier Health Sciences, 2003. 
[2] S. A. Huettel, A. W. Song and G. McCarthy, Functional magnetic resonance imaging, 
Sunderland, MA: Sinauer Associates, 2004. 
[3] S. M. Smith, K. L. Miller, G. Salimi-Khorshidi, M. Webster, C. F. Beckmann, T. E. 
Nichols, J. D. Ramsey and M. W. Woolrich, "Network modelling methods for 
FMRI," Neuroimage, vol. 54, no. 2, pp. 875-891, 2011. 
[4] K. J. Friston, "Functional and effective connectivity in neuroimaging: a synthesis," 
Human brain mapping, vol. 2, no. 1-2, pp. 56-78, 1994. 
[5] S. B. Kotsiantis, "Supervised machine learning: a review of classification 
techniques," Informatica (03505596), vol. 31, no. 3, 2007. 
[6] B. Yegnanarayana, Artificial neural networks, PHI Learning Pvt. Ltd., 2009. 
[7] V. Vapnik, The nature of statistical learning theory, springer, 2000. 
[8] C. W. Granger, "Investigating causal relations by econometric models and cross-
spectral methods," Econometrica: Journal of the Econometric Society, pp. 424-438, 
1969. 
[9] G. Schwarz, "Estimating the dimension of a model," The annals of statistics, vol. 6, 
no. 2, pp. 461-464, 1978. 
[10] A. Bollimunta, Y. Chen, C. E. Schroeder and M. Ding, "Neuronal mechanisms of 
cortical alpha oscillations in awake-behaving macaques," The Journal of 
neuroscience, vol. 28, no. 40, pp. 9976-9988, 2008. 
[11] A. Brovelli, M. Ding, A. Ledberg, Y. Chen, R. Nakamura and S. L. Bressler, "Beta 
oscillations in a large-scale sensorimotor cortical network: directional influences 
revealed by Granger causality," Proceedings of the National Academy of Sciences of 
the United States of America, vol. 101, no. 26, pp. 9849-9854, 2004. 
[12] X. Wen, G. Rangarajan and M. Ding, "Is Granger causality a viable technique for 
analyzing fMRI data?," PloS one, vol. 8, no. 7, p. e67428, 2013. 
[13] Q. Luo, W. Lu, W. Cheng, P. A. Valdes-Sosa, X. Wen, M. Ding and J. Feng, "Spatio-
temporal Granger causality: A new framework," NeuroImage, vol. 79, pp. 241-263, 
2013. 
[14] A. Roebroeck, E. Formisano and R. Goebel, "Mapping directed influence over the 
brain using Granger causality and fMRI," Neuroimage, vol. 25, no. 1, pp. 230-242, 
2005. 
[15] O. David, I. Guillemain, S. Saillet, S. Reyt, C. Deransart, C. Segebarth and A. 
Depaulis, "Identifying neural drivers with functional MRI: an electrophysiological 
validation," PLoS biology, vol. 6, no. 2, p. e315, 2008. 
[16] S. B. Katwal, J. C. Gore, J. C. Gatenby and B. P. Rogers, "Measuring relative 
timings of brain activities using fMRI," NeuroImage, vol. 66, pp. 436-448, 2013. 
[17] S. LaConte, S. Strother, V. Cherkassky, J. Anderson and X. Hu, "Support vector 
machines for temporal classification of block design fMRI data," NeuroImage, vol. 
26, no. 2, pp. 317-329, 2005. 
[18] D. D. Cox and R. L. Savoy, "Functional magnetic resonance imaging (fMRI)?brain 
reading?: detecting and classifying distributed patterns of fMRI activity in human 
visual cortex," Neuroimage, vol. 19, no. 2, pp. 261-270, 2003. 
 59 
[19] M. Arnold, X. H. R. Milner, H. Witte, R. Bauer and C. Braun, "Adaptive AR 
modeling of nonstationary time series by means of Kalman filtering," Biomedical 
Engineering, IEEE Transactions on, vol. 45, no. 5, pp. 553-562, 1998. 
[20] R. S. Patel, F. D. Bowman and J. K. Rilling, "A Bayesian approach to determining 
connectivity of the human brain," Human brain mapping, vol. 27, no. 3, pp. 267-
276, 2006. 
[21] M. Yousef, S. Jung, L. C. Showe and M. K. Showe, "Recursive Cluster Elimination 
(RCE) for classification and feature selection from gene expression data," BMC 
bioinformatics, vol. 8, no. 1, p. 144, 2007. 
[22] T. C. Clancy, A. Khawar and T. R. Newman, "Robust signal classification using 
unsupervised learning," Wireless Communications, IEEE Transactions on, vol. 10, 
no. 4, pp. 1289-1299, 2011. 
[23] R. Menon, & S. G. Kim, ?Spatial and temporal limits in cognitive neuroimaging 
with fMRI,? Trends Cogn. Sci., vol. 3, pp. 207?216, Jun. 1999. 
[24] R. Menon, D. Luknowsky, & J. Gati, ?Mental chronometry using latency-resolved 
functional MR,? Proc. Natl. Acad. Sci. U. S. A., vol. 95, pp. 10902-10907, Sept. 
1998. 
[25] G. Deshpande, S. LaConte, G. James, S. Peltier, & X. Hu, ?Multivariate Granger 
causality analysis of brain networks,? Human Brain Mapping, vol. 30, no. 4, pp. 
1361-1373, Apr. 2009 
[26] M. Dhamala, G. Rangarajan, & M. Ding, ?Estimating Granger causality from fourier 
and wavelet transforms of time series data,? Physical Review Letters, vol. 100, no. 1, 
p. 018701, Jan. 2008 
[27] S. Ryali, K. Supekar, T. Chen, & V. Menon, ?Multivariate dynamical systems 
models for estimating causal interactions in fMRI,? NeuroImage, vol. 54, no. 2, pp. 
807-23, Jan 2011 
[28] W. Hesse, E. Moller, M. Arnold, & B. Schack, ?The use of time-variant EEG 
Granger causality for inspecting directed interdependencies of neural assemblies,? 
Journal of Neuroscience Methods, vol. 124, no. 1, pp. 27-44, Jan. 2003 
[29] L. Astolfi, F. Cincotti, D. Mattia, F. De Vico Fallani, A. Tocci, A. Colosimo, . . . F. 
Babiloni, ?Tracking the time-varying cortical connectivity patterns by adaptive 
multivariate estimators,? IEEE Transactions on Biomedical Engineering, vol. 55, pp. 
902-913, Mar. 2008 
[30] S. Lacey, H. Hagtvedt, V. Patrick, A. Anderson, R. Stilla, G. Deshpande, ? & K. 
Sathian, ?Art for reward?s sake: Visual art recruits the ventral striatum,? NeuroImage 
vol. 55, pp. 420?433, Mar. 2011 
[31] D. Kapogiannis, G. Deshpande, F. Krueger, M. Thornburg, & J. Grafman, ?Brain 
Networks Shaping Religious Belief,? Brain Connectivity, 2013 (in press) 
[32] M. Havlicek, J. Jan, M. Brazdil, & V. Calhoun, ?Dynamic Granger causality based 
on Kalman filter for evaluation of functional network connectivity in fMRI data,? 
NeuroImage, vol. 53, pp. 65-77, Oct. 201 
[33] S. Katwal, J. Gore, & B. Rogers, ?Unsupervised spatiotemporal analysis of fMRI 
data using graph-based visulazations of self-organizing maps,? IEEE Transactions in 
Biomedical Engineering, vol. 60, pp. 2472-2483, Sept. 2013 
[34] H. Akaike, ?A new look at the statistical model identification,? IEEE Transactions 
on Automatic Control, vol. 9, pp. 716-723, Dec. 1974 
 60 
[35] G. Deshpande, K. Sathian, & X. Hu, ?Assessing and Compensating for Zero-lag 
Correlation Effects in Time-lagged Granger Causality Analysis of fMRI,? IEEE 
Transactions on Biomedical Engineering, vol. 57, pp. 1446-1456, Jun. 2010 
[36] M. Arnold, W. Miltner, H. Witte, R. Bauer, & C. Braun, ?Adaptive AR Modeling of 
Nonstationary Time Series by Means of Kalman Filtering,? IEEE Transactions on 
Biomedical Engineering, vol. 45, pp. 553-562, May 1998 
[37] M. Kaminski, M. Ding, W. Truccolo, & S. Bressler, ?Evaluating causal relations in 
neural systems: Granger causality, directed transfer function and statistical 
assessment of significance,? Biological Cybernetics, vol. 85, pp. 145-157, Aug. 
2001 
[38] K. Friston, ?Causal modelling and brain connectivity in functional magnetic 
resonance imaging,? PLoS Biology, vol. 7, p. e33, Feb. 2009 
[39] G. Deshpande, K. Sathian, X. Hu, & K. Buckhalt, ?A rigorous approach for testing 
the constructionist hypotheses of brain function,? Behavioral and Brain Sciences, 
vol. 35, pp. 148-149, Jun. 2012 
[40] G. Deshpande, & X. Hu, ?Investigating effective brain connectivity from FMRI data: 
past findings and current issues with reference to granger causality analysis,? Brain 
Connectivity, vol. 2, pp. 235-245, Oct. 2012 
[41] A. Roebroeck, E. Formisano, & R. Goebel, ?The identification of interacting 
networks in the brain using fMRI: Model selection, causality and deconvolution,? 
NeuroImage, vol. 58, pp. 296-302, Sept. 2011 
[42] S. Bressler, & A. Seth, ?Wiener-Granger Causality: A well established 
methodology,? NeuroImage, vol. 58, pp. 323-329, Sept. 2011 
[43] A. Seth, P. Chorley, & L. Barnett, ?Granger causality analysis of fMRI BOLD 
signals is invariant to hemodynamic convolution but not downsampling,? 
NeuroImage, vol. 65, pp. 540-555, Jan. 2013 
[44] M. Schippers, R. Renken, & C. Keysers, ?The effect of intra- and inter-subject 
variability of hemodynamic responses on group level Granger causality analyses,? 
NeuroImage, vol. 57, pp. 22-36, Jul. 2011 
[45] G. Aguirre, E. Zarahn, & M. D'Esposito, ?The variability of human, BOLD 
hemodynamic responses,? NeuroImage vol. 8, pp. 360?369, Nov. 1998 
[46] D. Handwerker, J. Ollinger, & M. D'Esposito, ?Variation of BOLD hemodynamic 
responses across subjects and brain regions and their effects on statistical analyses,? 
NeuroImage vol. 21, pp. 1639?1651, Apr. 2004 
[47] G. Deshpande, K. Sathian, & X. Hu, ?Effect of hemodynamic variability on Granger 
causality analysis of fMRI,? NeuroImage vol. 52, pp. 884?896, Sept. 2010 
[48] M. Havlicek, K. Friston, J. Jan, M. Brazdil, & V. Calhoun, ?Dynamic modeling of 
neuronal responses in fMRI using cubature Kalman filtering,? NeuroImage, vol. 56, 
pp. 2109-2128, Jun. 2011 
[49] G. Wu, W. Liao, S. Stramaglia, J. Ding, H. Chen, & D. Marinazzo, ?A blind 
deconvolution approach to recover effective connectivity brain networks from 
resting state fMRI data,? Med Image Anal. , vol. 17, pp. 365-374, Apr. 2013 
[50] J. D. Ramsey, S. J. Hanson, C. Hanson, Y. O. Halchenko, R. A. Poldrack and C. 
Glymour, "Six problems for causal inference from fMRI," Neuroimage, vol. 49, no. 
2, pp. 1545-1558, 2010. 
 61 
[51] K. J. Friston, L. Harrison and W. Penny, "Dynamic causal modelling," Neuroimage, 
vol. 19, no. 4, pp. 1273-1302, 2003. 
[52] L. Danober, C. Deransart, A. Depaulis, M. Vergnes and C. Marescaux, 
"Pathophysiological mechanisms of genetic absence epilepsy in the rat," Progress in 
neurobiology, vol. 55, no. 1, pp. 27-57, 1998. 
[53] H. Meeren, G. van Luijtelaar, F. L. da Silva and A. Coenen, "Evolving concepts on 
the pathophysiology of absence seizures: the cortical focus theory," Archives of 
neurology, vol. 62, no. 3, pp. 371-376, 2005. 
[54] P. O. Polack, I. Guillemain, E. Hu, C. Deransart, A. Depaulis and S. Charpier, " 
Deep layer somatosensory cortical neurons initiate spike-and-wave discharges in a 
genetic model of absence seizures," The Journal of neuroscience, vol. 27, no. 24, pp. 
6590-6599, 2007. 
[55] R. Deichmann, C. Schwarzbauer and R. Turner, "Optimisation of the 3D MDEFT 
sequence for anatomical brain imaging: technical implications at 1.5 and 3 T," 
Neuroimage, vol. 21, no. 2, pp. 757-767, 2004. 
[56] P. Schweinhardt, P. Fransson, L. Olson, C. Spenger and J. L. Andersson, "A template 
for spatial normalisation of MR images of the rat brain," Journal of neuroscience 
methods, vol. 129, no. 2, pp. 105-113, 2003. 
[57] A. C. Gouvea, C. Phillips, N. Kazanina and D. Poeppel, "The linguistic processes 
underlying the P600," Language and Cognitive Processes, vol. 25, no. 2, pp. 149-
188, 2010. 
[58] K. A. Norman, S. M. Polyn, G. J. Detre and J. V. Haxby, "Beyond mind-reading: 
multi-voxel pattern analysis of fMRI data," Trends in cognitive sciences, vol. 10, no. 
9, pp. 424-430., 2006. 
[59] F. De Martino, G. Valente, N. Staeren, J. Ashburner, R. Goebel and E. Formisano, 
"Combining multivariate voxel selection and support vector machines for mapping 
and classification of fMRI spatial patterns," Neuroimage , vol. 43, p. 44?58, 2008. 
[60] J. D. Haynes, K. Sakai, G. Rees, S. Gilbert, C. Frith and R. E. Passingham, "Reading 
hidden intentions in the human brain," Curr. Biol., vol. 17, p. 323?328, 2007. 
[61] F. Pereira, T. Mitchell and M. Botvinick, "Machine learning classifiers and fMRI: a 
tutorial overview," Neuroimage , vol. 45, p. S199?209, 2009. 
[62] J. V. Haxby, M. I. Gobbini, M. L. Furey, A. Ishai, J. L. Scgiyteb and P. Pietrini, 
"Distributed and overlapping representations of faces and objects in ventral temporal 
cortex," Science, vol. 293, p. 2425?2429, 2001. 
[63] Y. Kamitani and F. Tong, "Decoding seen and attendedmotion directions from 
activity in the human visual cortex," Current Biology, vol. 16, no. 11, pp. 1096-1102, 
2006. 
[64] T. M. Mitchell, R. Hutchinson, R. S. Niculescu, F. Pereira, X. Wang, M. Just and S. 
Newman, "Learning to decode cognitive states from brain images," Machine 
Learning , vol. 57, no. 1-2, pp. 145-175, 2004. 
[65] G. Deshpande, Z. Li, P. Santhanam, C. D. Coles, M. E. Lynch, S. Hamann and X. 
Hu, "Recursive cluster elimination based support vector machine for disease state 
prediction using resting state functional and effective brain connectivity," PloS one, 
vol. 5, no. 12, p. e14277, 2010. 
[66] J. Mourao-Miranda, K. J. Friston and M. Brammer, "Dynamic discrimination 
analysis: a spatial?temporal SVM," Neuroimage, vol. 36, no. 1, pp. 88-99, 2007. 
 62 
[67] R. C. Craddock, P. E. Holtzheimer, X. Hu and H. S. Mayberg, "Disease state 
prediction from resting state functional connectivity," Magnetic resonance in 
Medicine, vol. 62, no. 6, pp. 1619-1628, 2009. 
[68] D. Ariely and G. S. Berns, "Neuromarketing: the hope and hype of neuroimaging in 
business," Nature Reviews Neuroscience, vol. 11, no. 4, pp. 284-292, 2010. 
[69] B. Knutson, S. Rick, G. E. Wimmer, D. Prelec and G. Loewenstein, "Neural 
predictors of purchases," Neuron, vol. 53, no. 1, pp. 147-156, 2007. 
[70] L. Grosenick, S. Greer and B. Knutson, "Interpretable classifiers for FMRI improve 
prediction of purchases," Neural Systems and Rehabilitation Engineering, IEEE 
Transactions on, vol. 16, no. 6, pp. 539-548, 2008. 
[71] T. Hastie, A. Buja and R. Tibshirani, "Penalized discriminant analysis," The Annals 
of Statistics, vol. 23, no. 1, pp. 73-102, 1995. 
[72] H. Zou and T. Hastie, "Regularization and variable selection via the elastic net," 
Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, 
no. 2, pp. 301-320, 2005. 
[73] G. Deshpande, L. E. Libero, K. R. Sreenivasan, H. D. Deshpande and R. K. Kana, " 
Identification of neural connectivity signatures of autism using machine learning," 
Frontiers in human neuroscience, p. 7, 2013. 
[74] V. Chattaraman, G. Deshpande, H. Kim and K. Sreenivasan, "Form ?Defines? 
Function: Converging Evidence from Functional MRI and Behavioral Studies on the 
Predictive Influence of Product Beauty on Purchase," Journal of Marketing 
Research, 2014, under review. 
[75] M. Poustchi-Amin, S. A. Mirowitz, J. J. Brown, R. C. McKinstry and T. Li, 
"Principles and Applications of Echo-planar Imaging: A Review for the General 
Radiologist," Radiographics, vol. 21, no. 3, pp. 767-779, 2001. 
[76] J. P. Mugler and J. R. Brookeman, "Three?dimensional magnetization?prepared rapid 
gradient?echo imaging (3D MP RAGE)," Magnetic Resonance in Medicine, vol. 15, 
no. 1, pp. 152-157, 1990. 
[77] K. J. Friston, A. P. Holmes, K. J. Worsley, J. P. Poline, C. D. Frith and R. S. 
Frackowiak, " Statistical parametric maps in functional imaging: a general linear 
approach," Human brain mapping, vol. 2, no. 4, pp. 189-210, 1994. 
[78] L. Wang, Support Vector Machines: Theory and Applications, New York: Springer, 
2005. 
[79] I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," The 
Journal of Machine Learning Research, vol. 3, pp. 1157-1182, 2003. 
[80] R. Kohavi and G. H. John, "Wrappers for feature subset selection," Artificial 
intelligence, vol. 97, no. 1, pp. 273-324, 1997. 
[81] J. Mourao-Miranda, E. Reynaud, F. McGlone, G. Calvert and M. Brammer, "The 
impact of temporal compression and space selection on SVM analysis of single-
subject and multi-subject fMRI data," Neuroimage, vol. 33, no. 4, pp. 1055-1065, 
2006. 
[82] S. Ryali, K. Supekar, D. A. Abrams and V. Menon, "Sparse logistic regression for 
whole-brain classification of fMRI data," NeuroImage, vol. 51, no. 2, pp. 752-764, 
2010. 
 63 
[83] I. Guyon, J. Weston, S. Barnhill and V. Vapnik, "Gene selection for cancer 
classification using support vector machines," Machine learning, vol. 46, no. 1-3, 
pp. 389-422, 2002. 
[84] X. Yang, D. Lin, Z. Hao, Y. Liang, G. Liu and X. Han, "A fast SVM training 
algorithm based on the set segmentation and k-means clustering," Progress in 
Natural Science, vol. 13, no. 10, pp. 750-755, 2003. 
[85] N. Kriegeskorte, W. Simmons, P. Bellgowan and C. Baker, "Circular analysis in 
systems neuroscience: the dangers of double dipping," Nature Neuroscience, vol. 12, 
no. 5, pp. 535-540, 2009. 
[86] M. S. Rushworth, K. A. Hadland, T. Paus and P. K. Sipila, " Role of the human 
medial frontal cortex in task switching: a combined fMRI and TMS study," Journal 
of Neurophysiology, vol. 87, no. 5, pp. 2577-2592, 2002. 
[87] M. S. Rushworth, M. E. Walton, S. W. Kennerley and D. M. Bannerman, "Action 
sets and decisions in the medial frontal cortex," Trends in cognitive sciences, vol. 8, 
no. 9, pp. 410-417, 2004. 
[88] F. du Boisgueheneuc, R. Levy, E. Volle, M. Seassau, H. Duffau, S. Kinkingnehun, Y. 
Samson, S. Zhang and B. Dubois, "Functions of the left superior frontal gyrus in 
humans: a lesion study," Brain, vol. 129, no. 12, pp. 3315-3328, 2006. 
[89] K. Friston, P. Jezzard and R. Turner , "Analysis of functional MRI time series," 
Hum Brain Mapping 2, p. 69?78, 1994.