|Deep neural networks are increasingly being used in neuroimaging research for the diagnosis of brain disorders and understanding of human brain. They are complex data driven systems that work in a black-box fashion once they are trained. Despite their impressive performance, their applicability in medical applications will be limited unless there is more transparency on how these algorithms arrive at their decisions. Interpretability algorithms help in bringing transparency in the decision-making of deep neural networks.
In this work, we combined the understanding generated by multiple interpretable deep learning algorithms, to explain our resting state functional connectivity classifiers. Convolutional neural network classifiers were trained for discriminating between patients and healthy subjects; we worked with post-traumatic stress disorder (PTSD), autism spectrum disorder (ASD) and Alzheimer’s disease. We used the variants of gradient and relevance-based interpretability algorithms. Permutation testing and cluster mass thresholding was used to identify the significant discriminating functional connectivity paths between patients and control subjects.
For PTSD, the classifier provided >90% accuracy and we found that different interpretability algorithms gave slightly different results, most likely because they assume different things about the model and data. By taking a consensus across methods, the interpretability became more robust and was found to be in general agreement with prior literature on connectivity alterations underlying PTSD. For ASD, classification performance, and hence interpretability, varied widely across data acquisition sites (56% - 94%). Harmonization of data across sites provided incremental improvement in accuracy, but not enough to make interpretability largely consistent. We found that interpretability makes sense only for some sites that provide high enough accuracy. Our results demonstrate that robust interpretability across methods and models requires substantially higher accuracy than is currently possible in many neuroimaging datasets. This should be a cautionary tale for researchers wanting to use interpretability of artificial neural network classifiers in neuroimaging.