This Is AuburnElectronic Theses and Dissertations

Robust Partial Least Squares for Regression and Classification

Date

2008-08-15

Author

Turkmen, Asuman

Type of Degree

Dissertation

Department

Mathematics and Statistics

Abstract

Partial Least Squares (PLS) is a class of methods for modeling relations between sets of observed variables by means of latent variables where the explanatory variables are highly collinear and where they outnumber the observations. In general, PLS methods aim to derive orthogonal components using the cross-covariance matrix between the response variable(s) and the explanatory variables, a quantity that is known to be affected by unusual observations (outliers) in the data set. In this study, robustified versions of PLS methods, for regression and classification, are introduced. For regression with quantitative response, a robust PLS regression method (RoPLS), based on weights calculated by BACON or PCOUT algorithm, is proposed. A robust criteria is suggested to determine the optimal number of PLS components which is an important issue in building a PLS regression model. In addition, diagnostic plots are constructed to visualize and classify outliers. Robustness of the proposed method, RoPLS, is studied in detail. Influence function for the RoPLS estimator is derived for low dimensional data and empirical robustness properties are provided for high dimensional data. PLS was originally designed for regression problems with quantitative response, however, it is also used as a classification technique where the response variable is qualitative. Although several robust PLS methods have been proposed for regression problems, to our knowledge, there has been no study on the robustness of the PLS classification methods. In this study, the effect of outliers on existing PLS classification methods is