This Is AuburnElectronic Theses and Dissertations

Show simple item record

Geometric Representation Learning on Molecular Graphs


Metadata FieldValueLanguage
dc.contributor.advisorKu, Wei-Shinn
dc.contributor.authorTian, Xia
dc.date.accessioned2024-07-28T20:47:22Z
dc.date.available2024-07-28T20:47:22Z
dc.date.issued2024-07-28
dc.identifier.urihttps://etd.auburn.edu//handle/10415/9385
dc.description.abstractGraphs as a type of data structure have recently attracted significant attention. Representation learning of geometric graphs has achieved great success in many fields including molecular, social, and financial networks. It is natural to present proteins as graphs in which nodes represent the residues and edges represent the pairwise interactions between residues. However, 3D protein structures have rarely been studied as graphs directly. The challenges include: 1) Proteins are complex macromolecules composed of thousands of atoms making them much harder to model than micro-molecules. 2) Capturing the long range pairwise relations for protein structure modeling remains under-explored. 3) Few studies have focused on learning the different attributes of proteins together. 4) Existing graph neural networks (GNNs) have limitations in capturing complex multi-level structural information and handling variable sizes of molecular structures. In this dissertation, we propose four geometric representation learning frameworks to address the above challenges under different scenarios. First, we introduce the Protein Graph-GNN (PG-GNN) architecture for protein backbone structure modeling, which utilizes geometric graph convolution blocks to generate distance geometric graph representations and can handle variable sizes of protein graphs dynamically. This gives a significant advantage because this network opens a new path from sequence to structure. Second, we develop the Attention-based Protein-drug Interaction Prediction (APIP) framework for interpretable protein-ligand interface prediction, which handles different input types separately and models long-range dependencies in protein sequences. Third, we present the Explainable framework for drug-target Interaction prediction (EIR), which incorporates both intrinsic and extrinsic information to enhance interpretability and accuracy in drug screening. Finally, we propose the Subgraph Aggregation Module Network (SAMNet), a provably geometric lossless encoding and rotation equivariant network for molecular representation learning, which captures complex geometry across spatial dimensions using a subgraph sampling policy and a drop-in geometric Subgraph Aggregation Module (SAM). We conducted extensive experiments on benchmark datasets and demonstrated the effectiveness of the proposed methods for in silico structural biology and rational drug discovery, and showcased their ability to address the limitations of existing GNNs in molecular representation learning. By developing these novel approaches, we contribute to advancing the field of life science and pave the way for more accurate, interpretable, and generalizable machine learning models in protein structure prediction, drug discovery, and molecular representation learning.en_US
dc.rightsEMBARGO_GLOBALen_US
dc.subjectComputer Science and Software Engineeringen_US
dc.titleGeometric Representation Learning on Molecular Graphsen_US
dc.typePhD Dissertationen_US
dc.embargo.lengthMONTHS_WITHHELD:36en_US
dc.embargo.statusEMBARGOEDen_US
dc.embargo.enddate2027-07-28en_US
dc.creator.orcid0000-0002-0826-8031en_US

Files in this item

Show simple item record