|Proteins are essential parts of organisms and participate in virtually every process within the cells. The function of a protein is closely related to its structure than to its amino acid sequence. Hence, the study of the protein’s structure can give us valuable information about its functions. Due to the complex and expensive nature of the experimental techniques, computational methods are often the only possibility to obtain structural information of a protein. Major advancements in the field of protein structure prediction have made it possible to generate a large number of models for a given protein in a short amount of time. Hence, to assess the accuracy of any computational protein structure prediction method, evaluation of the similarity between the predicted protein models and the experimentally determined native structure is one of the most important tasks. Existing approaches in model quality assessment suffer from two key challenges: (1) difficulty in efficiently ranking and selecting optimal models from a large number of protein structures (2) lack of a similarity measure that takes into consideration the side-chain orientation along with main chain Carbon alpha (Cα) and Side-Chain (SC) atoms for comparing two protein structures.
This thesis attempts to address these challenges by (1) developing a rapid protein decoy clustering algorithm, called clustQ, that employs a multi-model pairwise comparison approach for model quality assessment, based on weighted internal distance comparisons and (2) developing a Superposition-based Protein Embedded Cα-SC (SPECS) score, that integrates the high accuracy version of the Global Distance Test (GDT-HA) metric, and side-chain distance and orientation in a singular framework for protein structure comparison. We show that our methods outperform many traditional and state-of-the-art model quality assessment approaches and similarity measures in terms of accuracy, speed and robustness. In particular, the clustQ method was ranked 6th among the model quality estimators in the 13th edition of the Critical Assessment of Techniques for Protein Structure Prediction. All of these methods are freely available to the scientific community in the form of software and web-servers.