This Is AuburnElectronic Theses and Dissertations

A Clustering Rule-based Approach for Classification Problems

Date

2010-05-24

Author

Williams, Philicity

Type of Degree

dissertation

Department

Computer Science

Abstract

Today’s data storage and collection abilities have allowed the accumulation of enormous amounts of data. Data mining can be a useful tool in transforming these large amounts of raw data into useful information. Predictive modeling is a very popular area in data mining. The results of these type tasks can contain helpful information that can be used in decision making. Problems arise when the data sets that are used to build these models are not as complete (e.g. erroneous/missing values) as the data used to evaluate the model. Rule based classifiers are widely used and accepted type of predictive model. We present a method to reduce the severity of the effects of missing data on the performance of rule base classifiers using divisive data clustering. The Clustering Rule based Approach (CRA) clusters the original training data and builds a separate rule based model on the cluster wise data. The individual models are combined into a larger model and evaluated against test data. We evaluate the effects of the missing attribute information for ordered and unordered rule sets. We experimentally show that the collective model is less affected by missing attribute information when the test data has missing attribute values.