This Is AuburnElectronic Theses and Dissertations

Semi-Supervised Multiclass Classification with Novelty Detection Using Support Vector Machines and Linear Discriminant Analysis

Date

2024-08-19

Author

Dove, Ryan

Type of Degree

Master's Thesis

Department

Aerospace Engineering

Abstract

Semi-supervised multi-class classification with novelty detection consists of one tool that can accurately define what class an instance belongs to from a set group of known classes while simultaneously detecting instances that do not belong to any of the classes. This field of machine learning has been studied for a decade or more, with many different algorithms and solutions tested. In this thesis, an algorithm built using One-Class Support Vector Machines (OCSVM) and Linear Discriminant Analysis (LDA) will be explored and demonstrated on data from simulated missile trajectories. Not only is this algorithm novel to the aerospace industry, but also to the computer science and machine learning industries as well. Its strengths and limitations will be explored, as well as techniques to increase accuracy and ideas for future work and other uses for this tool. The data used is produced by the Auburn University Solid Rocket Code (AUSRC), a validated code which simulates the flight of a missile given its design parameters. The output data from the AUSRC is preprocessed using common techniques such as standardization, to eliminate the influence of units of measurement, and principal component analysis (PCA) as a method of data reduction. The data is then fed into an iterative OCSVM, during which necessary data is recorded to extract any outlier points. An objective threshold is used to differentiate between instances that are outliers of a known clean class and instances that are true novelties from an unknown class. The instances deemed 'true' novelties by the threshold are added to the training data to fit the LDA to. The data is then run through the LDA, which provides a higher classification accuracy than the OCSVM, cleaning up inter-class misclassifications as well as tracking the outliers back to their known class. Since the LDA was trained on the 'true' novelty class data, it acts as a novelty detector by extracting any other points the OCSVM missed. This method works accurately (>90\% overall accuracy) on simulated missile trajectories across 3 different data sets and robustness tests. The data set features used are comprised of performance parameters, trajectory data, or derived features. The tests comprise of reduced time frames and random missing data. The OCSVM provides high accuracy novelty detection while the LDA provides high accuracy classification, exploiting the strengths of each machine learning algorithm. Further work should be done to improve the hybrid method and to investigate ways to eliminate the LDA's underlying assumption of the data set to have a multivariate Gaussian distribution.