This Is AuburnElectronic Theses and Dissertations

The Application of Data Mining and Machine Learning in the Diagnosis of Mental Disorders




Syed, Mohammed

Type of Degree

PhD Dissertation


Computer Science and Software Engineering


Autism is a developmental disorder that is currently diagnosed using behavioral tests which can be subjective. Consequently, objective non-invasive imaging biomarkers of Autism are being actively researched. The common theme emerging from previous functional magnetic resonance imaging (fMRI) studies is that Autism is characterized by alterations of fMRI-derived functional connections in certain brain networks which may provide a biomarker for objective diagnosis. However, identification of individuals with Autism solely based on these measures has not been reliable, especially when larger sample sizes are taken into consideration. There is a lack of objective biomarkers to accurately identify the underlying etiology and related pathophysiology of disparate brain-based disorders that are less distinguishable clinically. Brain networks derived from resting-state functional magnetic resonance imaging (R-fMRI) has been a popular tool for discovering candidate biomarkers. Specifically, independent component analysis (ICA) of R-fMRI data is a powerful multivariate technique for investigating brain networks. However, ICA-derived brain networks that are not highly reproducible within heterogeneous clinical populations may provide mean statistical separation between groups, and yet not be very discriminative at the individual subject level. We hypothesize that functional brain networks that are most reproducible in subjects within clinical and control groups separately, but not when the two groups are merged, may possess the ability to discriminate effectively between the groups. In this study, we propose a “discover-confirm” scheme based upon the assessment of the reproducibility of independent components (representing brain networks) obtained from R-fMRI (discover phase) using the gRAICAR (generalized Ranking and Averaging Independent Component Analysis by Reproducibility) algorithm followed by unsupervised clustering analysis of these components to evaluate their ability to discriminate between groups (confirm phase). followed by a clustering analysis of these components to evaluate their ability to discriminate between groups in an unsupervised way (confirm). Furthermore, we present gMedICA, a software package that implements the methodology in in support of our hypothesis. The unique feature of our software package is its ability to seamlessly interface with other software packages such as FSL so that all related analyses utilizing features of other software can be performed within the same package, thus providing a one-stop software solution starting with raw DICOM images to the results. We obtained cluster purity of up to 0.971 or 97.1% accuracy in a data set of 799 subjects comprising Autism Spectrum Disorders (ASD) and healthy controls acquired from multiple sites, using our proposed methodology. In addition, we showcase our software using R-fMRI data acquired from US Army soldiers returning from the wars in Iraq and Afghanistan who were clinically grouped into the following groups: PTSD (posttraumatic stress disorder), comorbid PCS (post-concussion syndrome) + PTSD, and matched healthy combat controls. Using our methodology, we obtained cluster purity of up to 1 or all groups were identified with 100% accuracy.