New computational and data-driven methods for protein homology modeling
Date
2021-07-14Type of Degree
PhD DissertationDepartment
Computer Science and Software Engineering
Metadata
Show full item recordAbstract
Threading a query protein sequence onto a library of weakly homologous structural templates remains challenging. In light of the recent advancements in residue-residue contact prediction technologies powered by sequence co-evolution and deep learning, we propose a new contact-assisted threading method. In particular, the method integrates the residue-residue contact information with various sequential and structural features to improve the threading scoring function for the better template selection. A large-scale benchmarking result on 500 targets demonstrates that our contact-assisted threading method attains a statistically significantly better threading performance than a baseline contact-free threading acting as a control. Our study further reveals contact-assisted threading using high-quality contacts with the Matthews Correlation Coefficient (MCC) ≥ 0.5 improves the threading performance in nearly 30% of the cases, while low-quality contacts with the MCC <0.35 degrades the performance for 50% of the cases. Moreover, instead of leveraging binary contacts, we move one step further by developing a new distance- and orientation-based covariational threading method called DisCovER by effectively integrating information from inter-residue distances and orientations along with the topological network neighborhood of a query-template alignment. Multiple large-scale benchmarking results on query proteins classified as weakly homologous from the Continuous Automated Model Evaluation (CAMEO) experiment and from the current literature show that our method outperforms several existing state-of-the-art threading approaches. It also shows that the integration of the neighborhood effect with the inter-residue distance and orientation information synergistically contributes to the improved performance of DisCovER.