dc.description.abstract | Membrane-active peptides, particularly those with antimicrobial, anticancer, and other thera peutic properties, offer a promising alternative to traditional drug treatments. However, accurate
predictive models are essential to maximize their effectiveness while minimizing undesirable effects
such as hemolysis and improving solubility. This dissertation focuses on developing data-driven
models for activity prediction of membrane-active peptides (MAPs), with the ultimate goal of
designing of therapeutics with enhanced specificity. A key innovation in this work is using Fourier
transform (FFT)-based features, which capture the periodicities and order of amino acids in peptide
sequences without requiring a sequence alignment, leading to a more detailed understanding of
sequence properties that enable these peptides to interact with biological membranes. These FFT based features are not specific to MAP activity and potentially broad utility for predicting peptide
properties, such as solubility and hemolytic potential, since these inherent structural periodicities
could contribute to various types of protein structures and activities.
To ensure our models are interpretable, we have incorporated a feature selection framework
that ensures the most contributive features are identified and used in the models, allowing for
high predictive accuracy while minimizing model complexity. By focusing on a small number
of critical features, these models offer valuable insights into the sequence characteristics that are
most influential in determining MAP activities, making them highly interpretable and practical for
therapeutic peptide design. Support vector machines (SVMs) were employed due to their ability to
handle complex, non-linear relationships, and the models developed in this work demonstrate high
performance, robustness, and reliability. Extensive cross-validation and blind test evaluations reveal
that the models achieve competitive performance when compared to state-of-the-art approaches,
while also being simple and interpretable. This enhanced interpretability sets them apart from
more complex models, offering a clear advantage in therapeutic peptide design. The performance of these models stands out for utilizing a minimal number of features while still outperforming or
matching more complex state-of-the-art models. This is particularly relevant in drug discovery,
where identifying meaningful predictive features directly influences both the speed and accuracy
of therapeutic development.
Although the ultimate goal of this research is to facilitate the design of MAPs with high speci ficity, the primary contribution lies in the development of powerful and computationally efficient
predictive models. These models offer a practical and effective solution for advancing peptide based drug discovery, enabling the identification of MAPs with optimal therapeutic potential while
minimizing undesired properties such as hemolytic activity and increasing solubility. The design
aspect of this dissertation is positioned as a long-term outcome, supported by the high performance
and reliability of the predictive models developed here. | en_US |