Geometric Representation Learning on Molecular Graphs

Tian, Xia

Metadata Field	Value	Language
dc.contributor.advisor	Ku, Wei-Shinn
dc.contributor.author	Tian, Xia
dc.date.accessioned	2024-07-28T20:47:22Z
dc.date.available	2024-07-28T20:47:22Z
dc.date.issued	2024-07-28
dc.identifier.uri	https://etd.auburn.edu//handle/10415/9385
dc.description.abstract	Graphs as a type of data structure have recently attracted significant attention. Representation learning of geometric graphs has achieved great success in many fields including molecular, social, and financial networks. It is natural to present proteins as graphs in which nodes represent the residues and edges represent the pairwise interactions between residues. However, 3D protein structures have rarely been studied as graphs directly. The challenges include: 1) Proteins are complex macromolecules composed of thousands of atoms making them much harder to model than micro-molecules. 2) Capturing the long range pairwise relations for protein structure modeling remains under-explored. 3) Few studies have focused on learning the different attributes of proteins together. 4) Existing graph neural networks (GNNs) have limitations in capturing complex multi-level structural information and handling variable sizes of molecular structures. In this dissertation, we propose four geometric representation learning frameworks to address the above challenges under different scenarios. First, we introduce the Protein Graph-GNN (PG-GNN) architecture for protein backbone structure modeling, which utilizes geometric graph convolution blocks to generate distance geometric graph representations and can handle variable sizes of protein graphs dynamically. This gives a significant advantage because this network opens a new path from sequence to structure. Second, we develop the Attention-based Protein-drug Interaction Prediction (APIP) framework for interpretable protein-ligand interface prediction, which handles different input types separately and models long-range dependencies in protein sequences. Third, we present the Explainable framework for drug-target Interaction prediction (EIR), which incorporates both intrinsic and extrinsic information to enhance interpretability and accuracy in drug screening. Finally, we propose the Subgraph Aggregation Module Network (SAMNet), a provably geometric lossless encoding and rotation equivariant network for molecular representation learning, which captures complex geometry across spatial dimensions using a subgraph sampling policy and a drop-in geometric Subgraph Aggregation Module (SAM). We conducted extensive experiments on benchmark datasets and demonstrated the effectiveness of the proposed methods for in silico structural biology and rational drug discovery, and showcased their ability to address the limitations of existing GNNs in molecular representation learning. By developing these novel approaches, we contribute to advancing the field of life science and pave the way for more accurate, interpretable, and generalizable machine learning models in protein structure prediction, drug discovery, and molecular representation learning.	en_US
dc.rights	EMBARGO_GLOBAL	en_US
dc.subject	Computer Science and Software Engineering	en_US
dc.title	Geometric Representation Learning on Molecular Graphs	en_US
dc.type	PhD Dissertation	en_US
dc.embargo.length	MONTHS_WITHHELD:36	en_US
dc.embargo.status	EMBARGOED	en_US
dc.embargo.enddate	2027-07-28	en_US
dc.creator.orcid	0000-0002-0826-8031	en_US

Files in this item

Name:: Ph_D__Dissertation___Tian_Xia-2.pdf
Size:: 15.07Mb

Show simple item record