This Is AuburnElectronic Theses and Dissertations

Show simple item record

Machine Learning Meta-analysis of Proteolytic Cleavage Specificity


Metadata FieldValueLanguage
dc.contributor.advisorKieslich, Christopher
dc.contributor.authorKim, Suhyeon
dc.date.accessioned2024-07-30T20:59:40Z
dc.date.available2024-07-30T20:59:40Z
dc.date.issued2024-07-30
dc.identifier.urihttps://etd.auburn.edu//handle/10415/9410
dc.description.abstractProteolytic enzymes, such as cathepsins and matrix metalloproteinases (MMPs), play crucial roles in various physiological processes, including metabolism, cell signaling, and apoptosis. Identifying their cleavage sites is a complex challenge due to the diverse substrate specificities and regulatory mechanisms of these enzymes. This thesis investigates the use of machine learning models, particularly Support Vector Machines (SVMs) and One-Class Support Vector Machines (OCSVMs), to predict proteolytic cleavage specificity. The study introduces a novel approach utilizing Fourier Transform-based encoding of peptide sequences to capture essential biochemical properties and structural characteristics, which are used as inputs into SVM algorithms. The research encompasses a comprehensive meta-analysis using SVM-based feature selection techniques to compare and contrast the substrate specificity of different proteases. This analysis aims to uncover distinct patterns in substrate interaction, offering valuable insights for therapeutic strategies and biomarker discovery. The datasets used in this study were sourced from the MEROPS database and included both positive data points (cleaved sequences) and synthetic negative data points (non-cleaved sequences) to ensure robustness and diversity. Through rigorous cross-validation and hyper-parameter optimization, the SVM models demonstrated high predictive accuracy, achieving Area Under the Receiver Operating Characteristic (AUC-ROC) scores close to 1.00 for several proteases. The study also explores the performance of OCSVM models, both with and without negative class data, revealing that tailored feature selection and weighting strategies significantly enhance model performance. The findings of this research underscore the potential of machine learning techniques in advancing bioin- formatics and protease research. The developed models not only improve the precision of proteomic analyses but also support the broader field of precision medicine by providing deeper insights into protease functions in health and disease.en_US
dc.subjectChemical Engineeringen_US
dc.titleMachine Learning Meta-analysis of Proteolytic Cleavage Specificityen_US
dc.typeMaster's Thesisen_US
dc.embargo.statusNOT_EMBARGOEDen_US
dc.embargo.enddate2024-07-30en_US
dc.contributor.committeeCremaschi, Selen
dc.contributor.committeeHe, Peter

Files in this item

Show simple item record