Methods for Improving Generalization and Convergence in Artificial Neural Classifiers
Type of Degreedissertation
MetadataShow full item record
Artificial neural networks have proven to be quite powerful for solving nonlinear classification problems. However, the complex error surfaces encountered in such problems often contain local minima in which gradient based algorithms may become trapped, causing improper classification of the training data. As a result, the success of the training process depends largely on the initial weight set, which is generated at random. Furthermore, attempting to analytically determine a set of initial weights that will achieve convergence is not feasible since the shape of the error surface is generally unknown. Another challenge which may be faced when using neural classifiers is poor generalization once additional data points are introduced. This can be especially problematic when dealing with training data that is poorly distributed, or in which the number of data points in each respective class is unbalanced. In such cases, proper classification may still be achieved, but the orientation of the separating plane and its corresponding margin of separation may be less than optimal. In this dissertation, a set of methods designed to improve both the generalization and convergence rate for neural classifiers is presented. To improve generalization, a single neuron pseudo-inversion technique is presented that guarantees optimal separation and orientation of the separating plane with respect to the training data. This is done by iteratively reducing the size of the training set until a minimal set is reached. The final set represents those points which lie on the boundaries of the data classes. Finally, a quadratic program formulation of the margin of separation is defined for the reduced data set, and an optimal separating plane is obtained. A method is then described by which the presented technique may be applied to non-linear classification by systematically optimizing each of the neurons in the network individually. Next, a modified training technique is discussed, which significantly improves the success rate in gradient based searches. To do this, the proposed method monitors the state of the gradient search in order to determine if the algorithm has become trapped in a false minimum. Once entrapment is detected, a set of desired outputs are defined using the current outputs of the hidden layer neurons. The desired values of the remaining misclassified patterns are then inverted in an attempt to reconfigure the hidden layer mapping, and the hidden layer neurons are retrained one at a time. Linear separation is then attempted on the updated mapping using pseudo-inversion of the output neuron. The process is repeated until separation is achieved. The second method is compared with other popular algorithms using a set of 8 nonlinear classification benchmarks, and the proposed method is shown to produce the highest success rate for all of the tested problems. Therefore, the proposed method does, in fact, achieve the desired, which is to improve the rate of convergence of the gradient search by overcoming the challenge presented by local minima. Furthermore, the resulting improvement is shown to have a relatively low cost in terms of the number of required iterations.