This Is AuburnElectronic Theses and Dissertations

Advanced Statistical Learning Approaches to Healthcare




Simsek, Serhat

Type of Degree

PhD Dissertation


Mathematics and Statistics

Restriction Status


Restriction Type

Auburn University Users

Date Available



In this dissertation, we propose a novel statistical modeling methodology that effectively analyzes and extracts information from large datasets. Two real datasets were employed to validate the proposed methodology: breast cancer patient dataset and a no-show patient dataset. The objective of this dissertation is then to extract novel and useful information from these large and complex datasets by applying the proposed methodology that involves statistical and machine learning, heuristic optimization, and advanced data resampling techniques. Particularly, in the first study (breast cancer), we developed prediction models for breast cancer 1-, 5- and 10-year survivability and studied the variables whose importance for survival change over time. The obtained results revealed that certain variables lose their importance over time, while others gain importance. This information can assist medical practitioners in identifying specific subsets of variables to focus on in different periods, which will in turn lead to more effective and efficient cancer care. In the second study, we employ several statistical and machine learning algorithms to accurately predict no-show patients and to obtain patient-specific risk scores. Also, we developed a prediction tool that enables practitioners to use the proposed model without having any knowledge in statistics, optimization, and programming. Therefore, the overarching goal of the study is to develop a parsimonious model embedded within the prediction tool so as to enable healthcare professionals to improve clinic utilization, and improve patient outcomes, at the same time, decrease the costs that originate from patient no-shows.