Activation Functions as Parameters for Improved Deep Learning
Type of DegreePhD Dissertation
Systems and Technology
MetadataShow full item record
Neural networks have evolved into strong and dependable machine learning systems. Practitioners and researchers using them have at their disposal many tuning levers for achieving successful learning. A key task is selecting optimal activation functions (AF) for the hidden and output layers. While the recipe for selecting the output layer activation function is determined by the type of data, hidden layer AFs largely determine learning success. Practitioners, being human, approach problems with an inference bias that is driven by goals, problem interpretation, and previous experiences. Activation functions are hyperparameters; they cannot be adjusted during learning. This leads to repeated trials as users search for the optimal combination of activation functions to assign among the hidden layers. Swapping activation functions during training offers solutions to both the user bias problem and the risk of suboptimal learning. The key idea is for the loss function to sense learning decline and, at an appropriate point, discard the underperforming activation function in favor of another one. This process is repeated every time the low learning threshold is reached and continues until convergence. In this study I show that designing a neural network with the ability to manage activation functions as parameters (they may be changed by the system in response to a training success threshold) improves performance and efficiency while mitigating user bias.