A Neural Network Implementation on Embedded Systems by Nicholas Jay Cotton A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Auburn, Alabama August 9, 2010 Keywords: Neural Network Implementation, Microcontroller, Neural Network Training Copyright 2010 by Nicholas Jay Cotton Approved by Bogdan Wilamowski, Chair, Professor of Electrical and Computer Engineering Thaddeus Roppel, Associate Professor of Electrical and Computer Engineering Victor Nelson, Professor of Electrical and Computer Engineering Vitaly J. Vodyanoy, Professor of Physiology ii Abstract This dissertation presents a solution for embedded neural networks across many types of hardware and for many applications. The software package presented here allows the user to develop a neural network for a desired application, train the network, embed it on most platforms, and verify its functionality. This software supports advanced and very powerful types of neural networks including cascade, fully, and arbitrarily connected networks. It also supports several different training algorithms both first and second order. This system automates the process of transforming the trained neural network to an embedded neural network on most microcontrollers with a C compiler. There is also an assembly language neural network highly optimized for speed based on an inexpensive 8-bit PIC microcontroller. Software for testing and verifying functionality of the embedded neural networks is also included. Several neural network examples are also shown being calculated on the embedded system. iii Acknowledgments The author would like to thank Auburn University's Department of Electrical and Computer Engineering at the Samuel Ginn College of Engineering for their support during this research. The author would also like to recognize his advisor and mentor Dr. Bogdan Wilamowski for his professional and personal support over the years. Without Dr. Wilamowski's confidence and commitment to the author, graduate school would not have been possible. A graduate student simply could not have a better advisor. The author would also like to thank the members of his committee; Dr. Thaddeus Roppel, Dr. Victor Nelson, and Dr. Vitaly Vodyanoy. The author also recognize Dr. Roppel for his extra guidance and support through ought the author's undergraduate and graduate curriculums at Auburn. He has been a source of great support several times through the years. The author would lastly, but most importantly, thank his wife Erin for her love, patience, and encouragement that provided the perseverance to continue in the most difficult of times. iv Table of Contents Abstract ............................................................................................................................... ii Acknowledgments.............................................................................................................. iii List of Tables .......................................................................................................................v List of Figures .................................................................................................................... vi List of Abbreviations ........................................................................................................ vii 1. Review of Embedded Neural Networks ...................................................................... 1 1.1. Neural Network Critical Components .................................................................. 2 1.2. MLP Versus Arbitrarily Connected Networks .................................................... 3 1.3. Activation Functions ............................................................................................ 6 1.4. Analog implementation ...................................................................................... 11 1.5. Microcontroller Implementations ....................................................................... 13 1.5.1. Embedded Neural Network for Fire Classification Using an Array of Gas Sensors ................................................................................................ 14 1.5.2. Microcontroller Based Neural Network Controlled Low Cost Autonomous Vehicle .................................................................................. 17 1.5.3. Control Sensor Linearization Using a Microcontroller-Based Neural Network ...................................................................................................... 20 1.5.4. A Solar-powered Battery Charger with Neural Network Maximum Power Point Tracking Implemented on a Low-Cost PIC-microcontroller . 21 1.6. Current Research Summary ............................................................................... 23 2. Neural Network Training .......................................................................................... 25 2.1. Neural Network Trainer ..................................................................................... 26 v 2.1.1. Training Data .............................................................................................. 27 2.1.2. Input File ..................................................................................................... 28 2.1.3. Training Parameters .................................................................................... 30 2.2. NNT Adaptations ............................................................................................... 34 2.2.1. Neural Network Weight Scaling ................................................................. 35 2.2.2. 8-Bit Neural Network Simulator ................................................................. 36 2.2.3. Generated Files ........................................................................................... 40 2.3. PIC Simulator Software (PicSim) ...................................................................... 43 3. Hardware Implementation ......................................................................................... 46 3.1. Pseudo Floating Point ........................................................................................ 46 3.2. Multiplication ..................................................................................................... 48 3.3. Addition and Subtraction ................................................................................... 49 3.4. Activation Function ............................................................................................ 50 3.5. Memory Structures ............................................................................................. 56 3.6. Neuron By Neuron Computation Process .......................................................... 58 3.6.1. Forward Calculations .................................................................................. 58 3.6.2. Individual Neuron Calculations .................................................................. 62 4. Application ................................................................................................................ 65 4.1. Simple Surface ................................................................................................... 67 4.2. Matlab's Peaks Surface ....................................................................................... 76 4.3. Two Arm Planar Manipulator ............................................................................ 79 4.4. Matlab's Peaks Surface Example In C ............................................................... 87 4.5. Experimental Data Summary ............................................................................. 90 vi 5. Conclusion ................................................................................................................. 92 References ......................................................................................................................... 95 vii List of Tables Table 1: Training data used by Bashyal et al. in "Embedded Neural Network for Fire Classification Using an Array of Gas Sensors". The authors labels the six sensor readings TGS but do not number each sensor. ....................................... 16 Table 2: Neural network performance comparison. ........................................................ 90 viii List Of Figures Figure 1: A single neuron neural network with two inputs and one output. ...................... 3 Figure 2: Ten hidden neuron MLP network for solving parity-9. ..................................... 5 Figure 3: Fully connected cascade neural network with ten hidden neurons for calculating parity-1023. .................................................................................... 6 Figure 4: Tangent Hyperbolic function used for neural network activation function. ...... 8 Figure 5: Linear activation function with saturation. ......................................................... 9 Figure 6: Piecewise linear activation function. .................................................................. 9 Figure 7: Sigmoid function non-linear tanh approximation ............................................ 10 Figure 8: Elliott function non-linear tanh approximation. ............................................... 10 Figure 9: Sigmoid activation function with an input voltage and output current. ........... 12 Figure 10: Analog sigmoid circuit output. ........................................................................ 12 Figure 11: Network used by Bashyal et al. in "Embedded Neural Network for Fire Classification Using an Array of Gas Sensors". ............................................. 15 Figure 12: Neural network from Farooq's" Microcontroller based Neural Network Controlled Low Cost Autonomous Vehicle". ................................................. 19 Figure 13: Neural network for "Control Sensor Linearization Using a Microcontroller- Based Neural Network" by Dempsey et al. .................................................... 20 Figure 14: Neural Network implemented in Petchjatuporn's work on "A Solar- powered Battery Charger with Neural Network Maximum Power Point Tracking Implemented on a Low-Cost PIC-microntroller". ........................... 22 Figure 15: Front end of Neural Network Trainer (NNT) ................................................. 26 Figure 16: Three Neuron architecture for parity-3 problem. ............................................ 29 Figure 17: PicSim software for simulating and verify embedded neural networks. ........ 43 ix Figure 18: Implementation of 16-bit fixed point multiplication using 8-bit hardware multiplier. Steps 1-4 are summed with place holders to give the final product on the result line. Abbreviations: Integer (I) Fractional (F) Product (P) Lower- Byte (L) Higher-Byte (H). .......................................................................... 48 Figure 19: Logical block diagram of the activation function. .......................................... 52 Figure 20: Example of linear approximations (red) and parabolas between 0 and 4 (magenta). Tanh (green) and the approximation (blue) are also shown on the graph. Only 4 divisions were used for demonstration purposes. .................. 54 Figure 21: Error from tanh approximation using 16 divisions from -5 to +5. ................. 56 Figure 22: Memory Allocation table for Pic18F45J10. ................................................... 58 Figure 23: Block diagram of Neural Network forward calculations using the nested loop structure for cross layer connected networks. ......................................... 60 Figure 24: PF stands for Pseudo Floating point number. The Numbers in brackets refer to the number of bits that represent that particular value. ...................... 62 Figure 25: Pre Activation Function Routine. The transformation between a pseudo floating point number to a fixed point number that the activation function can use. .................................................................................................................. 63 Figure 26: Simple surface training data. .......................................................................... 68 Figure 27: Four neuron cascade architecture for solving the simple surface. The inputs are the circles on the left and the output is the last neuron on the right side. ................................................................................................................. 69 Figure 28: Ideal neural network output. ............................................................................ 69 Figure 29: Output of the PIC. .......................................................................................... 70 Figure 30: Error surface showing the difference of the training data and the ideal neural network. ............................................................................................... 70 Figure 31: Error surface showing the difference of the PIC output and the ideal neural network. .......................................................................................................... 71 Figure 32: Error surface showing the difference of the PIC output and the training data. ................................................................................................................. 71 Figure 33: Histogram of errors between the PIC and the training data. .......................... 72 x Figure 34: Output of PIC with 196 test patterns. ............................................................. 73 Figure 35: Two neuron architecture for solving simple surface problem. ....................... 74 Figure 36: PIC output with 196 points on small two neuron architecture. ...................... 74 Figure 37: Histogram of errors between the ideal neural network and the PIC. ............. 75 Figure 38: Error of the PIC and the Training data compared. ......................................... 75 Figure 39: Training data used for Matlab's peaks surface. .............................................. 76 Figure 40: Eight neuron network used for solving the Matlab peaks surface. ................. 77 Figure 41: Pic output for Matlab's peaks surface. ............................................................ 78 Figure 42: Histogram of errors between the PIC output and the training data. ............... 78 Figure 43: Histogram of errors between the ideal neural network and the training data. 79 Figure 44: Two arm planar manipulator with variables shown. ...................................... 80 Figure 45: Ten neuron network for solving forward kinematics problem. ...................... 82 Figure 46: Training data output x of the two output system. ........................................... 83 Figure 47: Output x of two output system generated by embedded neural network. ...... 83 Figure 48: Training data output y of the two output system. ........................................... 84 Figure 49: Output y of two output system generated by embedded neural network. ...... 84 Figure 50: Error between the embedded neural network and training data of output x. . 85 Figure 51: Error between the embedded neural network and training data of output y. 85 Figure 52: Histogram of Errors between training data and PIC for output x. ................. 86 Figure 53: Histogram of Errors between training data and PIC for output y. ................. 86 Figure 54: Output of the PIC using the C version of the embedded neural network software. .......................................................................................................... 88 Figure 55: Error between the ideal neural network and the PIC output using C. ............ 88 Figure 56: Histogram of errors between Ideal neural network and training data. ........... 89 xi Figure 57: Histogram of errors between ideal neural network and PIC implemented neural network. ............................................................................................... 89 xii List of Abbreviations NNT Neural Network Trainer EBP Error Back Propagation FPGA Field Programmable Gate Array MLP Multi Layer Perceptron PIC Microcontroller By Microchip Technology Inc. ACN Arbitrarily Connected Networks FCC Fully Connected Networks 1 1. Review of Embedded Neural Networks Neural networks have become a growing area of research over the last few decades and have affected many branches of industry. The concept of neural networks and a few types of their applications in industrial electronics are summarized in [1]. In the field of industrial electronics alone there are several applications for neural networks, some include motor drives [2-9] and power distribution problems dealing with harmonic distortion [10-23]. These papers show how valuable neural networks are becoming in industry. Due to the nonlinear nature of neural networks, they have become an integral part of the field of controls [24-26]. On a parallel note, embedded applications are also becoming exponentially more prevalent [27-33]. However, even though these two independent topics are continually growing there is not a significant amount of research being done on embedded neural networks. This dissertation proposes a solution for implementing neural networks on microcontrollers for many embedded applications. Many people have a predisposition about neural networks, one being that they require significant computing power. One researcher stated, "most embedded microprocessor cores lack the performance for running neural networks" [34]. Many researchers have implemented neural networks on sophisticated hardware; for example, creating dedicated Application-Specific Integrated Circuits (ASICS) as in [34-39]. Others have used Field Programmable Gate Arrays (FPGAs) to perform the neural network calculations for embedded tasks [40-46]. High-end digital signal processors 2 (DSPs) are also commonly used to implement neural networks because they are typically designed with floating point hardware. They also have multiply and accumulate registers which are helpful when processing neural network applications. A few examples of DSP implementations are described in [47-54]. Because DSPs and FPGAs have more computing power, they tend to be very expensive. However, this large amount of power is not necessary for implementing neural network tasks that can easily be done on an inexpensive microcontroller. There is no current solution for implementing neural networks into an embedded environment other than for a few specific applications. The above-mentioned articles validate the utility of neural networks. Technology as a whole is becoming more portable which leaves a need for a portable neural network solution. This dissertation discusses a solution for embedding any neural network on a microcontroller. It also offers methods for implementing Multi Layer Perceptron (MLP) and arbitrarily connected networks (ACN), discussed in the next section, on a microcontroller. 1.1. Neural Network Critical Components Neural Networks are made up of several critical components. The largest component is the neuron itself which is the triangle component in Figure 1. A neural network is made up of one or more neurons connected in any configuration. The connecting lines represent weights. Selecting these weights determines how the neural network will respond to particular input patterns. Training neural networks is the method of selecting weights to give the desired output with a given set of inputs. Neural 3 networks gain their nonlinear properties from their activation function which is represented by the signal passing through the neuron. Neural networks can take on many shapes and sizes and be arranged in an infinite number of ways. Some of the most common networks will be discussed in this dissertation. W e i g h t 1 W e i g h t 3 W e i g h t 2 I n p u t 1 I n p u t 2 + 1 ? ? ?? i n p u t s n nn winNe t 0O u t p u t = A c t i v a t i o n ( N e t ) O u t p u t Figure 1: A single neuron neural network with two inputs and one output. 1.2. MLP Versus Arbitrarily Connected Networks Much of embedded neural network research has involved using Multi Layer Perceptron (MLP) networks. Little research, however, entails neural networks with arbitrarily connected networks on the embedded system level. This is unfortunate because these neural networks are superior to traditional MLP networks in several ways. 4 These networks are faster, more reliable to train, more efficient because less neurons are needed to solve similar problems, and they can solve more difficult problems that are nearly impossible for MLP networks to solve[55-62]. One common benchmark for training neural networks is the Parity-N problem. For example, if an MLP network is used with one hidden layer with ten neurons then the largest parity problem that can be solved is parity-9. However, if the same ten neurons are used in arbitrarily connected cascade architecture then the network would be capable of solving parity-1023 [63]. The MLP and arbitrarily connected cascade architectures can be seen in Figure 2 and Figure 3 respectively. Despite the drawbacks of using MLP networks, neural network research is still exclusively done using them. As stated earlier, arbitrarily connected networks are superior. Unfortunately, people seldom utilize the latter because they do not have the software to train them. This dissertation offers a solution to train these networks. 5 Figure 2: Ten hidden neuron MLP network for solving parity-9. 6 1 0 2 3 O u t Figure 3: Fully connected cascade neural network with ten hidden neurons for calculating parity-1023. 1.3. Activation Functions The most common neural network activation function is tangent hyperbolic (tanh) which is defined in Equation 1 and shown in Figure 4. Much research on embedded neural networks uses some approximation of tanh and most training software trains neural networks based on the activation function tanh. One of the simplest approximations is a linear approximation with saturation points as shown in Figure 5 [64]. A piecewise linear activation function approximation of tanh is shown in Figure 6 [65]. All of these linear approximations are based on the tanh activation function and 7 allow for quick, simple linear calculations, however at the expense of decreasing accuracy. Tanh is defined in Equation 1. (1) There are other nonlinear approximations that are easier to calculate than tanh but more accurate than the linear versions shown here. One very common activation function is the sigmoid function [66-67]. The sigmoid function is shown in Equation 2 and Figure 7. This sigmoid function has a similar shape as tanh but it ranges from zero to one rather than negative one to positive one. This needs to be taken into consideration when the neural network is trained because commercially available neural network training software may not include the sigmoid activation function; thereby, creating difficulty for the user to train the network with the same activation function the hardware is going to use. (2) The Elliott function is another nonlinear activation function used by [68]. This function is between negative one and positive one but its shape is not as steep as tanh requiring the network to also be trained using the Elliot function. This function can be seen in Equation 3 and Figure 8. (3) The final common activation function is a simple lookup table. The lookup table can have a variety of different layers of accuracy but this accuracy is exponentially proportional to the number of data points that must be stored. This approach was used in 8 [69] and stored 256 values, which is not enough resolution. The problem with this method is a vast amount of memory must be dedicated to storing the values. Figure 4: Tangent Hyperbolic function used for neural network activation function. -4 -3 -2 -1 0 1 2 3 4 -1 - 0 . 8 - 0 . 6 - 0 . 4 - 0 . 2 0 0 . 2 0 . 4 0 . 6 0 . 8 1 T a n h 9 Figure 5: Linear activation function with saturation. Figure 6: Piecewise linear activation function. -5 -4 -3 -2 -1 0 1 2 3 4 5 -1 - 0 . 8 - 0 . 6 - 0 . 4 - 0 . 2 0 0 . 2 0 . 4 0 . 6 0 . 8 1 L i n e a r W i t h S a t u r a t i o n -5 -4 -3 -2 -1 0 1 2 3 4 5 -1 - 0 . 8 - 0 . 6 - 0 . 4 - 0 . 2 0 0 . 2 0 . 4 0 . 6 0 . 8 1 P i e c e w i s e 10 Figure 7: Sigmoid function non-linear tanh approximation Figure 8: Elliott function non-linear tanh approximation. -5 -4 -3 -2 -1 0 1 2 3 4 5 0 0 . 2 0 . 4 0 . 6 0 . 8 1 y = 1 / ( 1 + e x p ( - x ) ) ; -5 -4 -3 -2 -1 0 1 2 3 4 5 -1 - 0 . 8 - 0 . 6 - 0 . 4 - 0 . 2 0 0 . 2 0 . 4 0 . 6 0 . 8 1 E l l i o t t F u n c t i o n y = x / ( 1 + | x | ) ; 11 1.4. Analog implementation The analog implementation of neural networks becomes a broad category of methodologies of using neural networks. Most people's renditions work for their specific cases but are not general solutions for analog neural networks. The different ways to implement neural networks in an analog system are as diverse as the applications for neural networks. Every researcher has their own method and approach which is most appropriate for their need and resources. Some researchers make every network on the ASIC level while others have created an ASIC for a single neuron that they can use in different applications [37, 70-73]. Many different activation functions are used from simple threshold functions that are essentially a digital 1 or 0 output to full tanh approximations. An activation function that can easily be implemented in VLSI without the need of resistors is demonstrated in Figure 9 [72]. This activation function is made up of two CMOS coupled differential pairs. These circuits are biased by IH and IL which also become the higher and lower limits of the sigmoid function. The right half of the circuit is a voltage reference that is biased at ground. This centers the sigmoid around zero. The output of the circuit can be seen in Figure 10. 12 Figure 9: Sigmoid activation function with an input voltage and output current. Figure 10: Analog sigmoid circuit output. 13 Analog neural network implementations offer incredible speed of calculation but not without a price. Probably the single largest deterrent for analog networks is the cost. Application Specific Integrated Circuits (ASICS) are very expensive and time consuming to produce making them not practical for most neural network applications. Analog neural networks also have other pitfalls such as accuracy of the fabrication process affecting the accuracy of the network output. It is hard to produce very accurate resistive material which is typically used for the weights. It is also difficult to account for circuit variation from temperature dependant components. Moreover, it is very difficult to create accurate analog neural networks. Finally, with integrated analog circuits it is not possible to modify the network once it is finished. This prevents the network from being retrained or even slightly tuned to be more accurate. Overall, analog neural networks simply are not a feasible solution for the majority of neural network applications. 1.5. Microcontroller Implementations There have been few papers published [66, 74-79] on microcontroller-based neural networks on a comparable level to the one used in this dissertation. These papers will be discussed in the following section. However, they are lacking in several categories which is the main motivation for this work. These categories are neural network architectures, activation function, and training algorithms. Neural networks are great for approximating systems with limited training points but are exposed to a large 14 variety of input patterns. Neural networks do a great job of predicting the outputs between the data points. In a large portion of the research, neural networks are used to solve digital type problems with a limited number of inputs and outputs. These problems are not well suited for neural networks but could be solved simply using logic gates. In these applications all possible scenarios are specifically trained. This situation does not utilize the full power of the neural network. 1.5.1. Embedded Neural Network for Fire Classification Using an Array of Gas Sensors Bashyal et al. at Missouri University of Science and Technology created an embedded neural network for fire classification [66]. This application does not require the system to be able to continuously process data in real-time since once a fire is detected and classified, its work is finished. Even if the calculations required several seconds, this is acceptable for this application. This network is very large relative to the simplicity of the problem to be solved. It has seven inputs and three outputs with the configuration shown in Figure 11. 15 Figure 11: Network used by Bashyal et al. in "Embedded Neural Network for Fire Classification Using an Array of Gas Sensors". Each of the seven inputs is one individual sensor and the three outputs represent the three types of fires to be classified: No Fire, Class A, and Class B. This work appears 16 to questionable due to the data used for training the network. The table containing the training data displays the seven sensor readings for different types of materials burning. Of the seven sensors, however, only one sensor is needed to classify the difference between Class A and Class B fires. The other six inputs become irrelevant. Based on the this data, a neural network of this size is not needed to analyze the data. The sensor in the far right column accurately distinguishes the differences between the items burnt once the temperature level is elevated. The training set here is shown in Table 1. Table 1: Training data used by Bashyal et al. in "Embedded Neural Network for Fire Classification Using an Array of Gas Sensors". The authors labels the six sensor readings TGS but do not number each sensor. Looking at the training data of Bashyal et al., it can be assumed that most of the neurons in the network shown in Figure 11 are not actually contributing to the correct classification of the fires. It seems that the designer is unaware of their system, so it can 17 be very difficult to detect whether the entire network is working or a small number of neurons are carrying the load. The network configuration [66] is inefficient for this dataset because a neural network will train to the simplest distinguishing characteristic; in this case it was the last sensor and the temperature sensor and the other data need not be used. Bashyal et al. also only used integers and not fractional math which is a limiting factor of the work. The authors used Error Back Propagation (EBP) for training, which they admit is another limitation of the system. The authors also mention that the embedded network was manually converted from the computer based network to an embedded network and that it would be optimal to have an automated system for this implementation. This dissertation addresses all unsolved issues of [66]: arbitrary architectures, training algorithms, and completely automated embedded implementation. 1.5.2. Microcontroller Based Neural Network Controlled Low Cost Autonomous Vehicle Farooq et al. from the University of The PunJAb Lahore in Pakistan developed a neural network to control a model remote control car [79]. The neural network had three sensor inputs and four outputs for the motor control. The inputs have only two bits of precision for a sonar sensor. The network is shown in Figure 12 and again this network is much larger than needed for this problem. This network should not require as many neurons in the hidden layer. Since the author trained with EBP, more neurons were required to solve the same problem because of the limitations of the first order gradient approach for minimizing the error. If the authors had used arbitrarily connected networks 18 and trained with a second order algorithm, this network could have been greatly reduced in size possibly to a point of completely removing the hidden layer neurons entirely. The authors of [79] are not using the neural network to its full potential as discussed previously because all possible input and output patterns are trained. In this case, the model car can only do one of four possible directions: Back, Forward, Right, or Left. This system could be redesigned so that there are only two output neurons, one for each motor. This way, the car?s output can be analog, based on simple inputs. Ideally, the input sensors should be analog or at least high resolution digital inputs; for example 8-bits instead of two. The network could then be trained with sample patterns like the digital ones used for their example and allow the network to extrapolate the data between points. This would create a smooth turning car opposed to very rigid all-or-nothing movement. 19 Figure 12: Neural network from Farooq's" Microcontroller based Neural Network Controlled Low Cost Autonomous Vehicle". Farooq et al. also used a piecewise linear approximation of tanh for his activation function. A similar activation function can be seen in Figure 6. This activation function is very simple to calculate but it degrades the accuracy of the neural network, especially with more layers because each error is amplified by the next layer. Overall their work did not implement a reasonable application of a neural network that is used to its fullest potential. The outputs are rounded to such a broad 4-bit result that it would not be difficult to achieve with many types of control systems. 20 1.5.3. Control Sensor Linearization Using a Microcontroller-Based Neural Network Dempsey et al. from Bradley University in Peoria, IL created an embedded neural network used for sensor linearization [77]. This application is a much more practical use of neural networks because it utilizes the nonlinear properties of the network. However, it is a very simple MLP network trained with EBP. The author optimized the architecture and eliminated weights that were approximately zero to decrease the number of calculations for the processor. The network is shown in Figure 13. The authors of [77] even specify that the entire network is simplified to five multiplications and three additions. The activation function used is a lookup table so the calculations on the microcontroller are very limited and are acceptable for this specific network and application. However, this work cannot be translated to general applications because the network is customized and simplified for this application. This work is also done on an 8-bit microcontroller, the Intel 87C51, which is slightly more sophisticated than the one used in this dissertation because it contains a hardware divider. Figure 13: Neural network for "Control Sensor Linearization Using a Microcontroller- Based Neural Network" by Dempsey et al. 21 Dempsey et al. discusses how the neural network degrades some of the characteristics of the controller they were using. They attributes these errors to several things. First, the errors are rounded to one decimal place which makes the network a crude approximation of a potentially very accurate control device. Secondly, they contribute to the error by using a lookup table for the activation function. The final significant loss of data is the truncation of intermediate calculations of the neural network. All of the errors that Dempsey et al. note in [77] are addressed in this dissertation. The second order tanh activation approximation is more accurate than the lookup table. The use of 16-bit and 32-bit pseudo floating point numbers to prevent truncation at intermediate calculations increase accuracy. Also, the storing of 16-bit pseudo floating point weights drastically reduces the round-off error created by their design. This will come at a cost of slightly more calculations, but in a very similar calculation time based on a slightly faster clock speed. 1.5.4. A Solar-powered Battery Charger with Neural Network Maximum Power Point Tracking Implemented on a Low-Cost PIC-microcontroller Petchjatuporn et al. from the University of Technology in Thailand used a small neural network to control the power point on a solar charger[64]. This is a great application of an embedded neural network because it is a simple but very nonlinear control problem. However, the author?s approach can be improved in a few ways. First, the activation function is a piecewise function with two saturation points at the limits but 22 then in the "linear" region, as the author describes, the output is simply the input. The activation function does not require calculations between 0 and 1. This heavily limits the neural network because the network gets its nonlinearity from the activation function, and, in this case, that nonlinearity is removed. The next problem with this network is the architecture and training of the network. The network can be seen in Figure 14. Figure 14: Neural Network implemented in Petchjatuporn's work on "A Solar-powered Battery Charger with Neural Network Maximum Power Point Tracking Implemented on a Low-Cost PIC-microntroller". The input neuron really is not a neuron at all; it is a specific power calculation based on duty cycle shown in Equation 4. (4) The weights for the entire network are designed and actually remove the nonlinear properties of the network. In the first layer, they are +1 and -1. In the second layer, the weights are chosen to make the middle layer linear between -1 and +1 without the use of an activation function. This network could be more powerful if it were trained. The purpose of neural networks is to use nonlinearities to the designer?s advantage and train them to solve the application needed. In this case, the author simply linearized the 23 network and designed it to do what was desired. However, in doing so, this defeats the purpose of using a neural network. This application could also benefit from using an arbitrarily connected network and removing at least one neuron if not more while obtaining better results. The application warrants a neural network but the implementation used prevents the network from operating at its full potential. 1.6. Current Research Summary The current research on embedded neural networks in low end microcontrollers does not fully utilize the neural networks to their potential. Most of the work modifies the neural networks to simplify calculations to make them easier to implement, or simplifies the data to make their training work easier. The networks are all converted manually from a PC based network to the embedded network. None of the current work uses arbitrarily connected neural networks but instead only MLP networks. All of the previously published networks that are trained are trained using EBP which greatly limits the ability to train the networks. These neural network implementations used integer math or single digit fixed point math to achieve their results. This dissertation addresses most of these shortcomings of the listed work. It first offers a wide variety of training algorithms for training neural networks, including first and second order algorithms. An automated system was created to convert the newly trained network to an embedded network. The embedded networks are all configured using any feed-forward architecture. The embedded neural network also uses a pseudo floating point algorithm for exceptional accuracy on a limited system. In addition to the 24 pseudo floating point mathematics is a second order approximation of tanh. This is a very accurate and nonlinear approximation to maintain the integrity of the nonlinear neural network. In addition to creating this specialized neural network for the PIC 18F series microcontroller, the software also generates a C version of the network that can be easily transferred to any microcontroller with a C compiler. This will only require minor changes to such things as headers and initial addressing. 25 2. Neural Network Training The Neural Network Trainer (NNT) was originally developed as a tool for training neural networks for use on a PC or comparable computing machine. NNT originally produced for the user an array of weights that corresponded to the weights in a neural network architecture designed by that user. From this point, it is was the user's responsibility to create a neural network that could utilize these weights [80]. This dissertation transforms this original tool into a complete neural network implementation package for microcontrollers. This software package includes the trainer, an assembly language based generic neural network for the PIC 18 series microcontroller, 8-bit neural network simulator, a microcontroller communication interface for testing embedded neural networks, and a C implemented neural network for any microcontroller with a C compiler. These features work together to create a total package that can be used not only to implement a neural network for one case but a very wide variety of neural network applications. This software is the only published embedded neural network software that uses arbitrarily connected neural networks. In addition, this software allows the user to create, train, test, and implement a neural network on a microcontroller for his purpose in an automated process. The tools, steps, and details of the process will be discussed in the sections following. 26 2.1. Neural Network Trainer The user interface for the Neural Network Trainer is shown in Figure 15. The user will first notice there is an empty plot on the left side of the trainer where the iterations versus means squared error are displayed as well as training parameters on the right hand side. The user must follow a few simple steps before training a network. He must prepare an input file that contains the training data and an architecture file that describes the network connections, and then set the training parameters. Figure 15: Front end of Neural Network Trainer (NNT) 27 2.1.1. Training Data The user must create a training file with all the data sets required to train the neural network. This data may be created in various ways such as by hand, spreadsheet, or directly through Matlab. A simple parity-3 problem will be used for demonstration purposes. This demonstration will use bipolar neurons so the extremes for data will be +1 and -1. The training data for parity-3 is represented by the following matrix: In1 In2 In3 Out1 -1 -1 -1 -1 -1 -1 1 1 -1 1 -1 1 1 -1 -1 -1 1 -1 1 1 1 1 -1 -1 1 1 1 1 As with any parity-N problem there are possible outcomes. As the top row indicates the first three columns are the inputs and the last column is the output for that row. The top row of the matrix is for demonstration purposes only but is not needed in the actual data file. This data is then copied to a text file and saved with the file extension .dat. Delimiters other than white space are not required. Once the data file is finished it can be referenced by numerous architecture input files. 28 2.1.2. Input File The input file contains the network architecture, neuron models, data file reference, and optional initial weights. Each input file will be unique to a specific architecture but not necessarily to each data set. In other words, the same data set can be used for several different architectures simply by creating a new input file. The input file contains 3 sections: the architecture, model parameters, and data file definition. The following is an example of an input file for the parity-3 problem discussed in the previous section. \\ Parity-3 input file (parity3.in) n 4 mbip 1 2 3 n 5 mbip 1 2 3 n 6 mbip 1 2 3 4 5 W 5.17 20.08 -10.01 -4.23 W 1.0 10.81 2.20 19.84 .model mbip fun=bip, der=0.01 .model mu fun=uni, der=0.01 .model mlin fun=lin, der=0.05 datafile=parity3.dat The first line is a comment. Either a double backslash, as in C, or a percent sign, as in Matlab, is acceptable as a comment delimiter. After the comment comes the network architecture for a 3-neuron fully-connected network as shown in Figure 16. 29 1 2 3 4 5 6 + 1 + 1 Figure 16: Three Neuron architecture for parity-3 problem. The neurons are listed in a net list type of layout that is very similar to a SPICE program. This way of listing the layout is node based. The first nodes are reserved for the input nodes. The first character of the line is an N to signify that this line describes a neuron. The N is followed by the neuron output node number. Looking at Figure 16, the first neuron is neuron 4 because it is the first available number after the three inputs, and it is connected to nodes 1, 2, and 3, which are inputs. The same is true for neuron 5 or the second neuron, which is also connected to all three inputs. The output node is slightly different but it follows the same concept. It is connected to all three inputs as well as to the output of the first two neurons. Based on this it should be straightforward to see the connection between the input file listed above and Figure 16. Also, listed on the line of each neuron is the model of the neuron, which allows the user to specify a unique model for each neuron. This network is designed to solve a parity-3 problem using three bipolar neurons. This is not the minimal architecture for 30 this problem, but it serves as a good demonstration of the tool. Following the architecture of the network are the optional starting weights. If no starting weights are given the trainer will choose random weights. The weights need to be listed in the same format as the architecture. Each line of weights starts with the capital letter W. The biasing weight goes in place of the output node of the neuron. In other words, the first weight listed for a particular neuron is the biasing weight followed by the remaining input weights in their respective order. See the input file for an example. The user specifies a model for each neuron and these models are defined on a single line. The user has the ability to specify the activation function and neuron type (unipolar, bipolar or linear) for each model. The user may include neurons with different activation functions in the same network. The final line of the input file includes a reference to the data file. This line simply needs to read datafile followed by the file name. In this example, it is parity3.dat which can be seen on the last line of the example input file. 2.1.3. Training Parameters Once the network architecture has been decided and the input files created the next step is to select the training algorithm and parameters. When NNT is loaded there is an orange panel full of adjustable parameters on the right side of the window. These parameters change for each algorithm so they will be addressed accordingly in the following section. There are several independent algorithms that can be used for training neural networks. The algorithms themselves are explained in more detail in[80], but first the user should select the input file just created. 31 Implemented algorithms The algorithm is chosen from the pull down menu in the training parameters. Four of the parameters are the same for all algorithms. They are: Print Scale, Max. Iterations, Max. Error, and Gain. The Print Scale refers to how often the mean squared error is printed to the Matlab command window. This can be important because in certain situations the longest calculation time is that of displaying the data, so increasing this number can significantly decrease training time. Max iterations is the number of times the algorithm will attempt to solve the problem before it is considered a failure. An iteration is defined as one adjustment of the weights, which includes calculating the error for every training pattern and adjusting at the end. The Max Error is the mean squared error that the user considers to be an acceptable value. When this number is reached the algorithm stops calculating and displays the final weights. Error Back Propagation (EBP) This algorithm is the traditional EBP with the ability to handle fully connected neural networks. The Alpha parameter is the learning constant. This value is a multiplier that acts as the numerical value of the step size in the direction of the gradient. If alpha is too big the algorithm can oscillate instead of reducing the error. However, if alpha is too small the algorithm can move toward the solution too slowly and prematurely level off. This parameter should be adjusted by the user until an optimal value is found which has some oscillation that diminishes while the error continues to decrease. Neuron By Neuron (NBN) NBN is a modified Levenberg-Marquardt algorithm [81] for arbitrarily connected neural networks. The NBN algorithm is briefly described in [58]. It has two training 32 parameters, and Scale. The learning parameter of the LM algorithm is ?. Its use can be seen in Equation 5. In Equation 5 the w describes the weights, the J is the Jacobian matrix, and the I is the identity matrix. (5) If then the algorithm becomes the Gauss-Newton method. For very large values of the algorithm becomes the steepest descent method or EBP. The parameter is automatically adjusted at each iteration to insure convergence. The amount it is adjusted each time is Scale which is the last parameter for the NBN algorithm. Self Aware (SA) The SA algorithm is a modification of NBN. It evaluates the progression of the algorithm's training and determines if the algorithm is failing to converge. If the algorithm begins to fail, the weights are reset and another trial is attempted. In this situation the program displays its progress to the user as a dotted red line on the display and begins again. The algorithm continues to attempt to solve the problem until either it is successful or the user cancels the process. The SA algorithm uses the same training parameters as NBN. Enhanced Self Aware algorithm (ESA) ESA is also a modification of the NBN algorithm and is used in order to increase chances for convergence. The modification was made to the Jacobian matrix in order to allow the algorithm to be much more successful in solving very difficult problems with 33 deep local minima. The algorithm also is aware of its current solving status and will reset when necessary. The ESA algorithm uses a fixed value of 10 for the Scale parameter and allows the user to adjust the LM parameter. The LM parameter is essentially a scale factor applied to the Jacobian matrix before it is used in calculating the weight adjustment. This scale factor is typically a positive number between 1 and 10 or possibly greater. The more local minima the problem has, the larger the LM factor should be. Forward-Enhanced Self Aware (F-ESA) F-ESA is another modification of the NBN algorithm developed by J. Hewlett [57] where an alternative method for calculating the Jacobian matrix is used. The calculation of Jacobian is unique in the sense that only feed-forward calculations are needed. This approach is then paired with the Enhanced Self-Aware LM algorithm. The F-ESA algorithm uses the same training parameters as the ESA algorithm. Evolutionary Gradient Evolutionary Gradient is a newly developed algorithm, which evaluates gradients from randomly generated weight sets and uses gradient information to generate a new population of weights. This is a hybrid algorithm which combines the use of random populations with an approximated gradient approach. Like standard methods of evolutionary computation, the algorithm is better suited for avoiding local minima when compared to common gradient methods such as EBP. What sets the method apart is the use of an approximated gradient which is calculated with each population. By generating successive populations in the gradient direction, the algorithm is able to converge much 34 faster than other forms of evolutionary computation. This combination of gradient and evolutionary methods essentially offers the best of both worlds. The training parameters are very different than the LM based algorithms previously discussed. They include Alpha, Beta, Min. Radius, Max Radius, and Population. This algorithm was written by Joel Hewlett and details regarding these parameters may be found in [82]. Training Once the user selects the appropriate training algorithm the parameter boxes will change to the corresponding parameters and default values will fill the boxes. After the user sets the parameters, there are two other boxes that can be selected. The clear plot box when checked will overwrite any existing plot with the new one, but if it is left unchecked then the subsequent plots will be drawn on the axis with all of the previous drawings. The last option is the external plot which draws the plots inside NNT and in a separate figure allowing for easy printing or modifying the plot. The train button begins the training process, which prints the error to the Matlab Command Window as it is training. At any time the process can be halted and the results plotted by pressing the cancel button. 2.2. NNT Adaptations The trainer was adapted to aid in the process of creating neural networks on the embedded level. NNT trains the neural network as it would any neural network and then the embedded network verifications begins. The trainer then makes a forward calculation on the network using the 8-bit neural network simulator, which will be described in more 35 detail in section 2.2.2. It essentially does all of the arithmetic that the neural network would do for one pattern. At every step of the way it rounds all of the digits in the same way the 8-bit microcontroller does. This calculation is performed as a sanity check and debugging step for the system. Step by step results from beginning to end of the network calculation are stored in hex and decimal format in an organized text file for the user. After the training process, the trainer generates the weights, the architecture, and other parameters into assembly and C files for microcontroller implementation. These files can be directly copied and pasted into the microcontroller IDE and then immediately assembled or compiled, respectively. These files will be discussed in Section 2.2.3. The trained and verified network can then be further tested on the embedded level using the neural network communication software. This software is used to communicate via a serial port with the microcontroller. This allows the user to simulate data that the neural network would receive from an external source, like an analog to digital converter. The microcontroller then performs the network forward calculation and sends the data back through the serial port for verification, simulating a network output such as a value for a pulse width modulation module. At this step the user can test as many test patterns as necessary to validate proper performance with hardware in the loop simulation. This software's features will be discussed in Section 2.2.2. 2.2.1. Neural Network Weight Scaling The assembly language version of the neural network implementation uses a custom pseudo floating point algorithm. This algorithm requires a weight scaling process to be performed off chip. This process allows for the largest number of significant digits 36 to be used for each neuron, and this process scales all the weights for a particular neuron to use the maximum number of digits possible. This scale factor is then saved as an attribute for the particular neuron. However this process is completed as another automated section before generating the assembly file. The details of this scaling process are shown in the following code. scale=ones(1,nn) for i=1:nn % Weights for each neuron. w=ww(iw(i):iw(i+1)-1); max_value=max(w); if max_value>=128 while max(w)>127 w=w/2; scale(i)=scale(i)/2; end else while (max(w))<63.5 w=w*2; scale(i)=scale(i)*2; end end The previous code shows that the largest weight is scaled to be as close to, but not exceeding 127 which is the largest positive number that can be represented using this protocol. As a consequence of the scaling the largest weight uses almost all of the 16-bits of the mantissa. These scaled weights are then the weights used for generating the assembly file. 2.2.2. 8-Bit Neural Network Simulator The simulator is written in Matlab to operate in the same fashion as an 8-bit microcontroller. The simulator introduces rounding errors in the appropriate places to function in the same manner as the microcontroller. This was accomplished by creating a 37 set of functions that operate identically to the PIC microcontroller. These instructions include commands that round, multiply, add, subtract, and perform the tanh approximation, all using the pseudo floating point arithmetic. There are also special instructions to detect any overflows. One example, rnd8bit function, takes any decimal number and rounds it to 8-bits of fractional data. The Matlab code can be seen below. function y=rnd8bit(x) y=fix(256*x)/256; return The rnd8bit function operates by shifting the fraction portion of the number into the integer portion by 8-bits, truncating the fractional part, and then shifting the bits back into place. This step is done any time a Matlab command is used that could possibly generate more decimal digits. Another example of how the simulator works is a routine setup to multiply in the same fashion as the microcontroller. This function is called mul and is shown below. function p=mul(x,y) x1=x; y1=y; if x>=128 | y>=128 ; disp(x); disp(y); error('x=%d and y= %d Overflow!!!',x,y); end ; if x<0 & x>-128 ; x1=256-abs(x); end if y<0 & y>-128 ; y1=256-abs(y); end x1=rnd8bit(x1); y1=rnd8bit(y1); 38 x1; y1; a=floor(x1); %whole number b=x1-a ; %fractional portion b=b*256; c=floor(y1); d=y1-c; d=d*256; p=(a*c*256^2+256*(a*d+b*c)+b*d)/256^2; p=floor(p*256^2)/256^2; if x<0 p=p-c*256-d; end if y<0 p=p-a*256-b; end if p<-16384 p=p+65536; end p=floor(256^2*p)/256^2; return This function, first of all, checks to make sure that neither of the two parameters to be multiplied is outside the range of valid numbers. This check is redundant and probably not necessary but the extra layer of protection against overflow errors was desired. If either number is outside the range the function displays the values and halts the process. Next the 8-bit multiplication is completed in the same manner as on the microcontroller. This process only operates on positive numbers and then converts a negative result back to two's-complement at the end. The final and intermediate results are rounded to ensure accurate introduction of error. Several other custom functions were necessary for the simulator as well as the creating of the assembly and C files. 39 Matlab does not have a way of converting decimal numbers to hexadecimal. In order to work with both hex and base ten numbers, functions were required to go between them. One of these functions is the frac2hex function that can be seen below. function out=frac2hex (x) if x==0; out='0000'; return; end; if x>0 conv=Fr_dec2bin(x); conv=num2str(conv); [whole,frac]=strtok(conv,'.'); whole=dec2hex(bin2dec(whole)); frac=strcat(frac,'00000000'); frac=dec2hex(bin2dec(frac(2:9))); else x=abs(x); x=256-x; conv=Fr_dec2bin(x); conv=num2str(conv); [whole,frac]=strtok(conv,'.'); whole=dec2hex(bin2dec(whole)); frac=strcat(frac,'00000000'); frac=dec2hex(bin2dec(frac(2:9))); end if size(whole,2)==1 whole(2)=whole; whole(1)='0'; end if size(frac,2)==1 frac(2)=frac; frac(1)='0'; end out=strcat(whole,frac); 40 This process requires several steps because the input is a decimal number that is first converted to an integer and a fraction. Next, the two pieces are operated on separately by using some of Matlab's built in functions. It was effective to convert to binary and then to hex from binary. This function has to take into account positive as well as negative numbers. The output is a four character string of hexadecimal numbers and letters to represent the 16-bit fractional number. These custom instructions are just a few of many required to make the simulator function properly. These functions were written and verified before the assembly code for the microcontroller was started. The assembly code was written step by step to follow each process of the Matlab code. The two processes were then tested and debugged until their results matched for all test cases. This was the only feasible way to construct a system of this size. The assembly code is approximately 1500 lines of code or 30 pages. The code had to be written in sections that could be tested individually and based on something that could verify each piece independent of the rest of the system. 2.2.3. Generated Files The trainer was adapted to automatically generate an assembly and C file for the user to use for implementing the system. This was motivated by two main factors. Automating the process removed the ability to add human error when converting the weights to hexadecimal numbers and also the process simply took too long to do by hand. An example of the output file for the network previously discussed in Figure 16 can be seen below. 41 ;IO data: #Inputs, #Outputs IO Data 0x0003, 0x0001 Weights Data 0x000B Data 0x000A, 0x9E6A, 0x6196, 0x9E6A, 0xFFEF, 0x9F4B, 0x60B5, 0x9F4B, 0x001A, 0xC969, 0x5B66 ;Inputs Data 0x03, 0x4000, 0xC000, 0xC000, 0x06, ;The output for this particular input should be 1 and 0x0100 Neurons Data 0x03 Data 0x05, 0x03, 0x04, 0x01, 0x02, 0x03 Data 0x08, 0x03, 0x05, 0x01, 0x02, 0x03 Data 0x03, 0x02, 0x06, 0x04, 0x05 The assembly file is made up of three parts. The first part is a few general network parameters and the scaled weights. The first line is a comment to clarify for the user what is shown. The second line is the number of inputs followed by the number of outputs for the network. The next line is the number of weights followed by the weights themselves. The weights are stored in their scaled form. This means all the weights for a particular neuron are scaled up or down appropriately to use the maximum number of bits possible. This scale factor is then stored in the topography section for each neuron. The next two lines are comments and are simply for the user as a sanity check. They show random inputs to the system and the correct output for those particular inputs. This way the user always has a valid network input and output pair available. The last section of the file is the neural network architecture and is actually read from program memory by the system as needed. These values act as indexes of which location in the architecture array to use next. This file is specifically designed to be placed at the end of the microcontroller code and is automatically read using indirect addressing. This file will change based on the 42 network being used but it will always follow this basic format. This file is all that is needed to change neural network architectures or weights. A similar file is generated in C and can be seen below. rom far float const ww[11] = { 5.0967435373381891e+000, 4.1913570939474143e+000, -3.3389689751411735e+000, 3.3500760230887412e+000, -1.3919747593993528e+000, -3.1527033887270779e+000, 3.0337819562494110e+000, -3.1133953328418693e+000, 1.7645099960474910e+000, -3.2695061817728570e+000, -2.0549280802763907e+000 }; unsigned char const ni = 3; unsigned char const topo[14]={ 3, 4, 1, 2, 3, 3, 5, 1, 2, 3, 2, 6, 4, 5}; float nodes[7]; unsigned char nn=3; The C file contains very similar data as the assembly file including the network inputs, outputs, weights and architecture. The C file goes at the top of the code and basically is the initialization for the neural network. It prepares the arrays and sets all of the indexes and allows everything to be read as needed. This C code is mostly platform independent except for the very first line. Depending on the amount of available ram, the weights will most likely need to be stored in program memory and accessed one at a time. This memory initialization line may vary from platform to platform. 43 2.3. PIC Simulator Software (PicSim) The simulator and verification tool designed was just as important to this overall project as the microcontroller implementation itself. The simulator engine was previously described but this section will discuss how that engine is interfaced with the user as well as the microcontroller. The PicSim software was generated before the assembly version to verify that is was possible to use the 8-bit math and obtain useable results. The software user interface is shown in Figure 17. Figure 17: PicSim software for simulating and verify embedded neural networks. 44 The software is designed to plot a two-input system with one or more outputs. PicSim has two main input requirements: a network architecture and a weight array text file. To simplify the process for the user PicSim reads the same input file used by NNT for the training process and the weight file generated by NNT. The user simply has to point the software to these files. There are four graphs displayed in the user interface. The figures in the top left corner is always the network being used as the reference and the top right is the network being calculated. The bottom two figures are the differences between the top two surfaces. The one on the left is on the same scale as the top two and the one on the right is a tighter axis to show the more specific location of the error. The user has a few other options as far as what type of network to simulate. The first option is the ideal neural network, which is a neural network on a PC using standard IEEE 754 floating point precision. This allows the user to compare the quality of the trained network to that of the training patterns before any error from the microcontroller is introduced. The user can simulate the error produced by the microcontroller by selecting the simulation button. This then compares the simulated network to the ideal network. The simulator engine discussed previously uses a configurable number of patters for testing. At this point any possible overflows or other errors should be caught before hardware is introduced. The last option is the hardware in the loop setup. This option is the final stage of testing for the embedded neural network. It allows the user to program the microcontroller and test it but still use the features of Matlab for verifying the data. 45 Matlab still produces the test patterns and then sends them via the serial port to the microcontroller. This simulates data the microcontroller would gain from another source like the analog to digital converter. Once the embedded network has all of the inputs it needs it then does the neural network forward calculations and produces one or more outputs. These outputs would typically drive an external source such as a pulse width modulator, but in this instance are transmitted back to the PC via the serial port. This allows the user to send and receive data from the microcontroller in real time. In addition the user can verify the hardware calculations and the amount of time being required. The data can easily be verified by using Matlab's graphing tools. This mode can be used for embedded networks using assembly or C. The difference is the format of the test data being sent and received, but both operate under the same principle. This is the final step before the microcontroller is configured to operate in its embedded application with real inputs and outputs but at this point the neural network operation has been thoroughly verified. The tools created to build the neural network on the microcontroller resulted in an equally challenging project as the embedded network. However, creating and debugging the assembly version of the neural network would never have been possible without the tools. Now with the automated system almost any trained network can be implemented on the microcontroller in a matter of seconds. 46 3. Hardware Implementation Implementing neural networks on an 8-bit microcontroller with limited computing power presents several programming challenges. In order for the network to perform as quickly as possible, creating the software at the assembly level was chosen. Writing the software in assembly allows a level of customization that cannot be achieved with C. However, the need for hardware portability was also a motivating factor and a more generic C implementation was also created. It was also very important to manually manage the very limited amount of data memory. Several assembly routines were created with this purpose in mind. A pseudo floating point arithmetic protocol was created exclusively for neural network calculations along with a multiplication routine for multiplying large numbers. A tanh compatible activation function was also needed. The final procedure is capable of implementing any neural network architecture on a single operating platform. This robust base removes the need to modify the structure of the software to make network architecture changes. 3.1. Pseudo Floating Point The first method was to use 16 bits to represent the weights, nodes, and inputs for the neural network. These 16-bits are all significant digits in this pseudo floating point protocol. The 16 bits consist of an 8-bit signed integer and an 8-bit fraction fractional part. The nonconventional part of this floating point routine is the way the exponent and 47 mantissa are stored. Essentially all sixteen bits are the mantissa and the exponent for the neuron is stored elsewhere. This has several advantages. It allows more significant digits for every weight using less memory. This pseudo floating point protocol is tailored directly to the needs of the neural network forward calculations. This solution requires the analysis of the weights of each neuron and scales them accordingly and assigns an exponent for the entire neuron. A similar process is used for the inputs so the entire range will share a single scale factor. This scaling is done off chip before programming in order to save valuable processing time on each and every forward calculation. Scaling does two things, first it prevents overflow by keeping the numbers within operating regions, and secondly automatically filters out inactive weights. For example, if a neuron has weights that are several orders of magnitudes larger than others it will automatically round the smallest weights to zero. These weights being zero allow the calculations to be optimized, unlike using traditional floating point arithmetic. However, if all of the weights are the same magnitude they are all scaled to values that preserve maximum precision and significant digits. In other words, the weights are stored in a manner that minimizes error on a system with limited accuracy. Thus far, all of these decisions for scaling the weights are made before the network is programmed on the microcontroller. This process has been automated for ease of use. The Neural Network Trainer [8] was modified to automatically scale the weights and inputs after it trains the network. This is done in Matlab and an example of the scaling process can be seen below. 48 3.2. Multiplication The Pic18F45J10 microcontroller has an 8-bit by 8-bit unsigned hardware multiplier. Considering that the hardware multiplier cannot handle floating point values or negative numbers, a routine was needed to allow fast multiplication of fractional values. The multiply routine is passed two sixteen bit numbers, consisting of an eight-bit integer and an eight bit fraction portion. The routine returns a 32-bit product. The result of the multiplication routine is a 32-bit fixed point result shown in Figure 18. I 1 I 2 x I 1 F 2 x F 1 I 2 x F 1 F 2I 2I 1 x I 1 . F 1 x I 2 . F 2 = I P . F P F 1 F 2 x I P H . . . 1 2 3 4 R e s u l t I P L F P H F P L Figure 18: Implementation of 16-bit fixed point multiplication using 8-bit hardware multiplier. Steps 1-4 are summed with place holders to give the final product on the result line. Abbreviations: Integer (I) Fractional (F) Product (P) Lower-Byte (L) Higher- Byte (H). Equation 6 shows the method of using a single 8-bit multiplier to implement 16- bit fixed point multiplication. The hardware multiplier does the multiplication between 49 bytes in a single instruction. The 16-bit result of I1 and I2 is placed in IPH and IPL, see Figure 18. Next I1 and I2 are multiplied and the lower byte is placed in IFL the higher byte of the product is added to IPL. F1 and I2 are then multiplied and the product is added to IPL and FPH respectively. Finally F1 and F2 are multiplied and the 16-bit product is summed with the current contents of FPH and FPL. At each step of this process when any of these 8-bit numbers are summed the carry bit must be added to the next most significant byte to maintain accuracy. This method does not require any shifts or division. This simple process allows each neuron to quickly multiply the weights by the inputs and then use the 32-bit result as an accumulator for all inputs of the neuron. Once the net value is calculated only IPL and FPH are required for the activation function. If IPH is not zero or all ones (signifying a negative number) then the neuron is in saturation and the activation function immediately outputs a one or negative one. ? ?22 256256256 DBCBDACA ????????? (6) 3.3. Addition and Subtraction Another issue arises when two numbers need to be added but they have different exponents; they must be first be converted to a common scale. This, however, is not necessary when using the proposed pseudo floating point protocol. The only summation that is required is for the calculation of the net value of each neuron. Even though there are two independent exponents these values will not change within one neuron, therefore 50 allowing all of the products to be summed in a single step. After the summation process is complete the reverse scaling can be done at the final stage. The details of this process will be discussed in more detail in section 3.6. 3.4. Activation Function A soft activation function was needed for the neural network. The most common activation function is tanh and the definition is shown back in Equation 1. The pure definition tanh was not a reasonable solution for several reasons. Specifically, the exponents would be very difficult to calculate accurately with the limited hardware in a timely fashion. Likewise, the floating point division would have been far too time consuming without dedicated divide hardware. The next possible choice for an activation function was Elliott?s function shown in Equation 3. This activation function was also rejected. The Elliot function does approach one hyperbolically but not at the same rate as tanh and therefore is not interchangeable. Networks with the Elliot approach are less powerful that those with tanh. This means the networks would have to be trained using the Elliott function, which was not desirable. The other pitfall with the Elliott function is that it requires division. Without dedicated hardware, division would be too slow of a process for the final solution. A second order approximation of tanh was chosen for its accuracy as well as its simple arithmetic calculations. Several features were added to the activation function besides simply calculating a second order approximation of tanh. One of these features analyzes the inputs to the activation function and converts negative numbers to positive 51 numbers to make the internal calculations faster and reduce the number of values that must be stored in the lookup table. The sign is restored at the end of the activation function. Another feature is a check to see if the neuron is in saturation. In other words, make sure the net value is within a given range. In this case the second order approximation is skipped and the neuron is put into saturation. These features of the second order approximation can be seen in better detail in Figure 19. 52 N e t N e g a t i v e ? Y e s M a k e N e t p o s i t i v e a n d s e t n e g a t i v e f l a g . N o I s N e t < - 4 ? I s N e t > 4 ? N O N e t = N e t * 3 2 ( S c a l e s N e t u p f o r b e s t p r e c i s i o n ) N o P e r f o r m S e c o n d O r d e r A p p r o x i m a t i o n N e t I n p u t t o A c t i v a t i o n F u n c t i o n O u t p u t N o O u t p u t e q u a l I n p u t S c a l e ( 1 * I n p u t S c a l e ) S c a l e O u t p u t b y : S c a l e )I n p u t ()32( 22 L o gL o g ? I s N e g a t i v e F l a g S e t ? Y e s Y e s M a k e O u t p u t N e g a t i v e O u t p u t e q u a l I n p u t S c a l e ( 1 * I n p u t S c a l e ) Y e s Figure 19: Logical block diagram of the activation function. 53 The routine requires that 30 values be stored in program memory. This is not simply a lookup table for tanh because a much more precise value is required. The tanh equivalent of 25 numbers between zero and four are stored. These numbers, which are the end points of the linear approximation, are rounded off to 16-bits of accuracy. Then a point between each pair from the linear approximation is stored. These points are the peaks of a second-order polynomial that crosses at the same points as the linear approximations. Based on the four most significant bits that are input into the activation function, a linear approximation of tangent hyperbolic is selected. The remaining bits of the number are used in the second-order polynomial. The coefficients for this polynomial were previously indexed by the integer value in the first step. The approximation of tanh is calculated by reading the values of yA, yB and ?y from memory and then the first linear approximation is calculated using yA and yB. ? ? x xyyyxy ABA ? ???? 2)(1 (7) The next step is the second-order function that corrects most of the error that was introduced by the linearization of the tangent hyperbolic function. ? ?? ?2222 )( xxxxyxy ??????? (8) or ? ? 22 2)( x xxxyxy ? ?????? (9) In order to utilize 8-bit hardware multiplication, the size of ?x was selected as 128. This way the division operation in both equations can be replaced by the right shift 54 operation. Calculation of y1 requires one subtraction, one 8-bit multiplication, one shift right by 7 bits, and one addition. Calculation of y2 requires one 8-bit subtraction, two 8- bit multiplications and shift right by 14-bits. Figure 20: Example of linear approximations (red) and parabolas between 0 and 4 (magenta). Tanh (green) and the approximation (blue) are also shown on the graph. Only 4 divisions were used for demonstration purposes. Ideally this activation function would work without any modification, but when the neurons are operating in the linear region (when the net values are between -1 and 1) 0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 4 4 . 5 5 0 0 . 2 0 . 4 0 . 6 0 . 8 1 55 the activation function is not making full use of the available bits for calculating the outputs. This generates significant error. Similarly to the weights and the inputs, a work- around is used for the activation function. Pseudo floating point arithmetic is then incorporated. When the numbers are stored in the lookup table they are scaled by 32 because the largest number stored is 4. The net value is also scaled by 32 and if its magnitude is greater than 4, the activation function is skipped and a 1 or -1 is output. After multiplying two numbers that have been scaled, the product is shifted to remove the square of the scale. Once the activation function is finished the numbers are scaled back to the same factor that was used to scale the inputs. The activation function was tested in hardware by sending a set of numbers from -5 to +5 and comparing them to the output of the tanh function. The difference between the sets of numbers can be seen in Figure 21. 56 Figure 21. Error from tanh approximation using 16 divisions from -5 to +5. 3.5. Memory Structures The Microchip PIC 18F45J10 microcontroller was used to implement the neural network. The microcontroller has only one true register that can be used for holding data, passing data, and ALU calculations. It has 1 kbyte of ram memory and when the neural network has 255 weights this memory is nearly all utilized. This memory is divided into four 256 byte banks. Only one of these banks can be accessed directly without the use of extra addressing instructions. This one bank has 128 bytes of general purpose memory and 128 bytes of processor configuration memory. This general purpose memory is used -5 -4 -3 -2 -1 0 1 2 3 4 5 -3 - 2 . 5 -2 - 1 . 5 -1 - 0 . 5 0 0 . 5 1 1 . 5 x 1 0 -3 57 as global and temporary variables for calculations. The other three banks are used for the weights and the individual nodes of the neural network. The weights are stored as 16-bit numbers, which consist of an 8-bit integer and an 8-bit fractional part. Two banks are used to store the high and low byte of each weight. This allows for 255 weights to be stored. The zero location is not used for indexing reasons. Figure 22 shows the memory mapping. As the output of the neural network is calculated the output of each neuron and the inputs need to be stored throughout the entire calculation to allow multi-layer connections. These node values are also 16-bit values. This poses a problem because there is only one ram bank left and two banks are needed. This problem is solved by splitting this bank into two separate banks; the low bank and high bank hold the low byte and high byte respectively. Notice this adds an additional limitation to the neural network size. The network may only have 127 total inputs and nodes. This limitation will most likely not be the dominant factor in many cases. Typically the weight limitation would be met prior to approaching the node limit. This memory limitation is only relevant to this microcontroller. This concept could be extended to other microcontrollers or systems with extended ram. More ram could easily allow for even larger networks with greater numbers of neurons and weights. The C version of the software stores all weights and architecture values in program memory not in ram. There simply is not enough ram for the C version to function if these values are in ram. 58 B a n k 0 G e n e r a l P u r p o s e M e m o r y P r o c e s s o r S e t u p M e m o r y B a n k 1 W e i g h t s H i g h B y t e B a n k 2 W e i g h t s L o w B y t e B a n k 0 H i g h B y t e N e t w o r k N o d e s L o w B y t e N e t w o r k N o d e s M e m o r y A l l o c a t i o n 0 x 0 0 0 x 8 0 0 x F F A d d r e s s Figure 22. Memory Allocation table for Pic18F45J10. 3.6. Neuron By Neuron Computation Process 3.6.1. Forward Calculations This process of forward calculations is a unique method compared to most neural network implementations because it uses the Neuron By Neuron method described in [57]. This method requires special modifications due to the fact that assembly language is used with very limited memory resources. The process is written so that each neuron is calculated individually in a series of nested loops; see Figure 23. The number of calculations for each loop and values for each node are all stored in two simple arrays in memory. The assembly language code does not require any modification to change the 59 network?s architecture. The only change that is required is to update these two arrays that are loaded into program memory. These arrays contain the architecture and the weights of the network and are generated by NNT. //Weights Number of inputs; Number of outputs; (8-Bit) Number of Weights; (8-bit) Weight(1), Weight(2), Weight(3)....Weight(N); (16-bit) //Architecture (8-bit) Number of Neurons; //Neuron 1 Neuron Scale, Number of Inputs, Output Node, Inputs(1-N) //Neuron 2 Neuron Scale, Number of Inputs, Output Node, Inputs(1-N) . . . //Neuron N Neuron Scale, Number of Inputs, Output Node, Inputs(1-N) The arrays are automatically generated by the NNT, as described in Section 2.2.3. The forward calculation steps through each node of network without regard for the complexity of the network. Similar to a netlist in Spice, the topology array has the running list of connections and allows the user to make as many cross layer connections as desired, only limited by the total number of weights. 60 I n i t i a l i z a t i o n M a i n L o o p ( F o r w a r d C a l c u l a t i o n ) R e a d N e x t I n p u t R e s e t P o i n t e r s & I n d e x v a l u e s N e t w o r k L o o p R e a d N e x t N e u r o n I n f o ( s c a l e f a c t o r , c o n n e c t i o n s , o u t p u t n o d e ? . ) I n d i v i d u a l N e u r o n L o o p F i r s t C o n n e c t i o n ? ( b i a s ) Y e s F i r s t W e i g h t R o t a t e b y S c a l e f a c t o r ( m u l t i p l i c a t i o n s h o r t c u t ) M u l t i p l y C u r r e n t W e i g h t b y N o d e N o A d d R e s u l t t o N e t L a s t C o n n e c t i o n ? N o A c t i v a t i o n F u n c t i o n Y e s L a s t N e u r o n ? R e m o v e S c a l e F a c t o r Y e s N e t w o r k O u t p u t N o Figure 23 Block diagram of Neural Network forward calculations using the nested loop structure for cross layer connected networks. 61 As seen in Figure 23, the network starts with an initialization block that configures the microcontroller by setting up the hardware for inputs and outputs. Next the tables for the network are initialized. The weights are stored in ROM or off chip and are loaded into RAM for faster calculations. Finally there are numerous constants that are configured such as scale values and saturated neuron values. After the initialization block, the Main Loop begins. This is an infinite loop that keeps the network sampling new inputs and then starting the forward calculations. With the next input sampled the network resets pointers and index values and enters the Network Loop. The Network Loop is essentially a for loop that executes the number of times as the number of neurons. The Network is responsible for the architecture of the network as well as the output of the network. It reads the scale factors and neuron connections and sets the corresponding values for the Neuron loop. The neuron loop begins with all of its indexes and pointers correctly initialized and it simply begins calculations. This loop is only responsible for calculating the output of a single neuron without information about the rest of the network. It begins by checking to see if the current connection is the bias connection or a standard input connection. Once the Net Value is calculated it passes the information to the Activation function. The individual calculation process are presented in more detail in Section 3.6.2. The Activation Function details are presented in Section 3.6.3. 62 After the Activation Function is finished the Network Loop determines when all neurons have been calculated. The final step is to remove the scale factor and send the output. The process is then repeated indefinitely. 3.6.2. Individual Neuron Calculations The Neuron calculations go through several steps in order to process the pseudo floating point arithmetic. The first step is the net value calculation which is shown in Figure 24. W e i g h t P F [ 1 6 ] W e i g h t P F [ 1 6 ] W e i g h t P F [ 1 6 ] I n p u t P F [ 1 6 ] I n p u t P F [ 1 6 ] 1 ? I n p u t S c a l e = W e i g h t S c a l e = S c a l e s ? ? ?? in pu ts n nn winNe t 0 ]32[ X - b i t s = [ X ] wSininS Figure 24. PF stands for Pseudo Floating point number. The Numbers in brackets refer to the number of bits that represent that particular value. 63 The inputs are multiplied by the corresponding weights and the result is stored in the 32-bit Net register. This is essentially a multiply and accumulate register designed for this particular stage. It is very important to keep all 32 bits in this stage for adding and subtracting. Without the 32 bits of precision at this step it would be very easy for an overflow to occur during the summing process that would not be reflected in the final net value. The next stage is to turn the pseudo floating point number into a fixed point number. This process can be seen in Figure 25. P F N e t [ 3 2 ] F P N e t [ 3 2 ] N inw SSN N e t ) ( P FN e t FP ??? | F P N e t [ 1 6 ] | F P N e t [ 1 6 ] |N e t[ 1 6 ]| N e t[ 1 6 ]Si g n ? ??? ??? 4N e t [ 3 2 ] N e t [ 3 2 ] 4N e t [ 3 2 ] 4N e t [ 1 6 ] Figure 25. Pre Activation Function Routine. The transformation between a pseudo floating point number to a fixed point number that the activation function can use. The next step is to convert the pseudo floating point number into a fixed point number that the activation function can correctly handle. First, the weight scale and input scale are summed. If the two factors exactly cancel then there is no scaling needed; however, if not, the formula shown in the figure is used. This raised to the N power is always the same as shift by N, because of the way the scale factors are calculated as described in Section 3.3. This makes the scaling process very fast as opposed to having to actually execute the multiplication instructions. Next, the sign of the net value is 64 stored and the absolute value of the net is used for the next steps. The net value is then examined and a decision is made. If the net value is too large then the tanh is approximately saturated and the appropriate output is assigned. However if the now fixed point number is within the operating range it is clipped to 16 bits and sent to the activation function. The activation function is detailed Section 3.4. 65 4. Application In order to demonstrate that the microcontroller neural network is performing correctly several example control problems were tested. Neural networks have the unique ability to solve multi-dimensional problems with many inputs and many outputs, however these types of problems are not easy to test and verify visually. For this reason the network was tested mainly with two input and one output problems in order to plot the output as a function of the input on a three dimensional surface. This is not the only type of problem that can be solved, it is just to demonstrate. A two input and two output system is also shown by graphing the outputs separately to demonstrate that other types of networks will work as well. The process is tested with the microcontroller hardware in the loop. In other words, the sensor data is transmitted via the serial port from Matlab to the microcontroller. The microcontroller then calculates the results and transmits this data via the serial port back to Matlab. The reason for this simulation is to isolate the errors in the system to those produced by the microcontroller calculations. In this test system any inaccuracy of the sensors can be avoided. This also removes any possibility of errors entering the system from external measurement tools. 66 To demonstrate the quality of the approximation several figures have been produced. The following examples will have some or all of the images that are described: Training Data --- The training data is the data used to train the neural network. The number of points will vary with the application. Ideal Neural Network -- This refers to a neural network running on a computer or a system using the IEEE floating point standard. The word ideal refers to most practical applications where there is no significant data loss due to the precision of the calculations. However, this is still a neural network approximation of the training data and not an identical representation. PIC Based Neural Network -- This is the output of the neural network running on the PIC hardware. This approximation will not be identical to the ideal neural network because of the approximations that are made on the microcontroller. Error Surfaces -- The error surfaces are differences between two of the previously shown surfaces. The surfaces will give a visual description of differences between surfaces shown on the same scale as the original surface. This comparison separates the error of using an ideal neural network and using a neural network with on the PIC. Error Surfaces Tight -- These surfaces are the same as the error surfaces except on a much narrower scale to show what shape the errors have taken. This allows the user to identify problem areas or to confirm the error is evenly distributed. Histograms -- The histograms show the errors of different surfaces in a numerical manner. This shows the distributions of the errors, in order to identify the 67 distribution of the errors. The X-axis is the errors and the Y-axis is the number of data points within the corresponding error range. 4.1. Simple Surface The following example is of a simple three dimensional control surface. This surface was used in a few examples to demonstrate multiple aspects of implementing the neural network on the microcontroller. The training data can be seen in Figure 26. This surface is 16 data points from a smooth surface and the neural network will learn to produce a better, smoother surface than the data given. This shows one of the fundamental advantages of neural networks opposed to other control methods. It is not necessary to have perfect training data to obtain very good results. The neural network inherently approximates the points in between the data points in a very smooth fashion. 68 Figure 26: Simple surface training data. This surface can be solved very effectively using a network with four neurons, which is shown in Figure 27. This architecture approximates the surface very well with minimal error. The output of the ideal network and the PIC can be seen in Figure 28 and Figure 29 respectively. To the naked eye there is no visual difference between the surfaces. Following the surfaces is the tight error surfaces in Figure 30 and Figure 31. These surfaces have a much smaller scale and they show the shapes and offset of the errors. 1 2 3 4 5 1 2 3 4 5 - 0 . 8 - 0 . 6 - 0 . 4 - 0 . 2 0 0 . 2 0 . 4 T r a i n i n g D a t a 69 Figure 27: Four neuron cascade architecture for solving the simple surface. The inputs are the circles on the left and the output is the last neuron on the right side. Figure 28: Ideal neural network output. 1 2 3 4 5 1 2 3 4 5 - 0 . 5 0 0 . 5 I d e a l N N 70 Figure 29: Output of the PIC. Figure 30: Error surface showing the difference of the training data and the ideal neural network. 1 2 3 4 5 1 2 3 4 5 - 0 . 5 0 0 . 5 P i c O u t p u t 1 2 3 4 5 1 2 3 4 5 - 1 0 -5 0 5 x 1 0 -3 E r r o r T r a i n i n g a n d I d e a l N N 71 Figure 31: Error surface showing the difference of the PIC output and the ideal neural network. Figure 32: Error surface showing the difference of the PIC output and the training data. 1 2 3 4 5 1 2 3 4 5 0 5 10 15 x 1 0 -3 E r r o r o f P i c a n d I d e a l N N 1 2 3 4 5 1 2 3 4 5 - 0 . 0 1 - 0 . 0 0 5 0 0 . 0 0 5 0 . 0 1 0 . 0 1 5 E r r o r o f P i c a n d T r a i n i n g D a t a 72 These images show the error introduced by using the neural network of Figure 27 and the show the variation from the ideal neural network to the network calculated on the PIC. Figure 32 shows how the output of the PIC is a very close approximation of the original training data. The majority of the points are centered around zero and have less than 1% error. This data is also verified in the histograms shown in Figure 33. Figure 33: Histogram of errors between the PIC and the training data. The X-axis is the error and the Y-axis is the number of samples. The same neural network was tested with patterns that were in between the training patterns. A total of 196 test patterns were used to generate the image shown in - 0 . 0 1 5 - 0 . 0 1 - 0 . 0 0 5 0 0 . 0 0 5 0 . 0 1 0 . 0 1 5 0 . 0 2 0 0 . 5 1 1 . 5 2 2 . 5 3 P I C V s . T r a i n i n g 73 Figure 34. This shows the neural network's ability to approximate points for which it was never trained. From this figure it is obvious that the network produces a very reasonable nonlinear approximation between the training points. The error surfaces with more points were omitted because they did not show any significant differences that were not shown in the previous figures. Figure 34: Output of PIC with 196 test patterns. This same surface was tested once again but with a much smaller network to show that the quality of the surface being produced is dependent on the number of neurons. The ideal neural network is not as close of an approximation of the training surface but the neural network on the PIC is still very close to the ideal neural network. This 5 10 15 5 10 15 - 0 . 5 0 0 . 5 P i c O u t p u t 74 network has only two neurons and produces errors typically under 10% for the entire system as shown in Figure 36. The surface is not quite as nonlinear due to the number of neurons being reduced. The histograms in Figure 37 and Figure 38 verify that the total error is larger and mostly introduced by the network and not by the PIC calculations. Figure 35: Two neuron architecture for solving simple surface problem. Figure 36: PIC output with 196 points on small two neuron architecture. 5 10 15 5 10 15 - 0 . 8 - 0 . 6 - 0 . 4 - 0 . 2 0 0 . 2 0 . 4 P i c O u t p u t 75 Figure 37: Histogram of errors between the ideal neural network and the PIC. The X- axis is the error and the Y-axis is the number of samples. Figure 38: Error of the PIC and the Training data compared. The X-axis is the error and the Y-axis is the number of samples. - 0 . 0 0 5 0 0 . 0 0 5 0 . 0 1 0 . 0 1 5 0 . 0 2 0 . 0 2 5 0 1 2 3 4 5 6 I d e a l N N V s . P I C - 0 . 2 - 0 . 1 5 - 0 . 1 - 0 . 0 5 0 0 . 0 5 0 . 1 0 . 1 5 0 . 2 0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 4 P I C V s . T r a i n i n g 76 4.2. Matlab's Peaks Surface The next example is generated by the common Matlab function peaks. The surface has several peaks and valleys and is a rather complicated nonlinear control surface. This complicated surface requires significantly more neurons to solve to a comparable accuracy. The training surface is shown in Figure 39 and the architecture in Figure 40. The network architecture is somewhat of a hybrid between the common MLP networks and the cascade network shown in the last example. The architecture has two hidden layers but all neurons are connected directly to the inputs and all preceding layers. Figure 39: Training data used for Matlab's peaks surface. 5 10 15 20 25 5 10 15 20 25 - 0 . 5 0 0 . 5 T r a i n i n g D a t a 77 Figure 40: Eight neuron network used for solving the Matlab peaks surface. This network was able to solve the peaks problem quite well and was implemented in the PIC with reasonable error. The PIC output, shown in Figure 41, had some rippling artifacts that are most likely attributed to rounding errors, with so many calculations for this network. However, the error is still very tolerable with no single point having more than 8% error and a very large percentage of points with 2% error centered around 0%. This can be seen in detail in the histogram in Figure 42. Looking at Figure 43 which is the error analysis of the ideal neural network it can be seen that about half of the large outlier errors are produced by the neural network itself and not the microcontroller calculations. 78 Figure 41: Pic output for Matlab's peaks surface. Figure 42: Histogram of errors between the PIC output and the training data. The X-axis is the error and the Y-axis is the number of samples. 5 10 15 20 25 5 10 15 20 25 - 0 . 5 0 0 . 5 P i c O u t p u t - 0 . 2 - 0 . 1 0 0 . 1 0 . 2 0 50 100 150 200 I d e a l N N V s . T r a i n i n g - 0 . 0 4 - 0 . 0 2 0 0 . 0 2 0 . 0 4 0 50 100 150 I d e a l N N V s . P I C - 0 . 1 0 0 . 1 0 . 2 0 . 3 0 50 100 150 200 P I C V s . T r a i n i n g 79 Figure 43: Histogram of errors between the ideal neural network and the training data. The X-axis is the error and the Y-axis is the number of samples. 4.3. Two Arm Planar Manipulator A two link planar manipulator was used as a practical application for this embedded neural network. The particular aspect is shown for sensing the position of a robotic arm, given sensor data of the joints. In this example the embedded neural network will calculate the x and y position of the arm based on the data read from sensors at the joints. This is known as forward kinematics. With this system it is assumed that the sensors are linear potentiometers. The x and y position of the arm is very nonlinear. The position can be calculated by Equation 13. In other words, this is a two input and two output nonlinear system. For this experiment we will assume that R1 and R2 are - 0 . 2 - 0 . 1 0 0 . 1 0 . 2 0 50 100 150 200 I d e a l N N V s . T r a i n i n g - 0 . 0 4 - 0 . 0 2 0 0 . 0 2 0 . 0 4 0 50 100 150 I d e a l N N V s . P I C - 0 . 1 0 0 . 1 0 . 2 0 . 3 0 50 100 150 200 P I C V s . T r a i n i n g 80 fixed length arms. However, this same procedure could be adopted for varying length arms by simply retraining the neural network with four inputs rather than two. The robotic arm simulated can be seen in Figure 44. Figure 44: Two arm planar manipulator with variables shown. The first step of the process was to generate neural network training data. The following equations were used to calculate the x and y position based on alpha and beta where alpha and beta are the angles shown in Figure 44. 81 )s i n (2)s i n (1 )c o s (2)c o s (1 B e t aa l p h aRa l p h aRy B e t aa l p h aRa l p h aRx ????? ????? (10) The neural network was then trained using this data. The trained network was tested in Matlab to confirm that it functioned correctly and can be seen in Figure 45. Matlab generates a set of test patterns of a user selectable size and transmits these values to the microcontroller via the serial port and reads the results. Matlab is then used to test the output patterns and calculate the errors. This process will introduce errors in two places. First there will be the error created by using a neural network approximation rather than the original equations. Then there is the error introduced between the ideal neural network and the network on the microcontroller. The training data for outputs x and y can be seen in Figure 46 and Figure 48, respectively. Figure 47 and Figure 49 show the outputs of the microcontroller neural network also for both outputs. The error for each output is the difference between the training data and the microcontroller output. These errors can be seen in Figure 50 and Figure 51. Histograms of these errors were also generated and can be seen in Figure 52 and Figure 53. These results are very reasonable and are expected to be less than that of the physical system. In other words, the error generated by the potentiometers or by measuring the position of the arm manually would be comparable to the error generated by the neural network. 82 Figure 45: Ten neuron network for solving forward kinematics problem. 83 Figure 46: Training data output x of the two output system. Figure 47: Output x of two output system generated by embedded neural network. 10 20 30 40 10 20 30 40 - 0 . 4 - 0 . 2 0 0 . 2 0 . 4 0 . 6 0 . 8 T r a i n i n g D a t a 10 20 30 40 10 20 30 40 - 0 . 4 - 0 . 2 0 0 . 2 0 . 4 0 . 6 0 . 8 P i c O u t p u t 84 Figure 48: Training data output y of the two output system. Figure 49: Output y of two output system generated by embedded neural network. 10 20 30 40 10 20 30 40 - 0 . 5 0 0 . 5 T r a i n i n g D a t a G r a p h 2 10 20 30 40 10 20 30 40 - 0 . 5 0 0 . 5 I d e a l N N G r a p h 2 10 20 30 40 10 20 30 40 - 0 . 5 0 0 . 5 E r r o r I d e a l N N a n d T r a i n i n g 10 20 30 40 10 20 30 40 - 0 . 0 2 - 0 . 0 1 0 0 . 0 1 0 . 0 2 E r r o r I d e a l N N a n d T r a i n i n g 10 20 30 40 10 20 30 40 - 0 . 5 0 0 . 5 I d e a l N N G r a p h 2 10 20 30 40 10 20 30 40 - 0 . 5 0 0 . 5 P I C o u t p u t 2 10 20 30 40 10 20 30 40 - 0 . 5 0 0 . 5 P I C e r r o r t o I d e a l N N 10 20 30 40 10 20 30 40 - 0 . 0 5 0 0 . 0 5 P I C e r r o r t o I d e a l N N T i g h t 85 Figure 50: Error between the embedded neural network and training data of output x. Figure 51: Error between the embedded neural network and training data of output y. 10 20 30 40 10 20 30 40 - 0 . 0 2 - 0 . 0 1 0 0 . 0 1 0 . 0 2 E r r o r o f P i c a n d T r a i n i n g D a t a 10 20 30 40 10 20 30 40 - 0 . 5 0 0 . 5 I d e a l N N G r a p h 2 10 20 30 40 10 20 30 40 - 0 . 5 0 0 . 5 P I C o u t p u t 2 10 20 30 40 10 20 30 40 - 0 . 5 0 0 . 5 P I C e r r o r t o I d e a l N N 10 20 30 40 10 20 30 40 - 0 . 0 5 0 0 . 0 5 P I C e r r o r t o I d e a l N N T i g h t 86 Figure 52: Histogram of Errors between training data and PIC for output x. The X-axis is the error and the Y-axis is the number of samples. Figure 53: Histogram of Errors between training data and PIC for output y. The X-axis is the error and the Y-axis is the number of samples. - 0 . 0 2 - 0 . 0 1 0 0 . 0 1 0 . 0 2 0 . 0 3 0 100 200 300 400 500 600 700 I d e a l N N V s . T r a i n i n g - 0 . 0 1 5 - 0 . 0 1 - 0 . 0 0 5 0 0 . 0 0 5 0 . 0 1 0 . 0 1 5 0 50 100 150 200 250 300 350 400 450 I d e a l N N V s . P I C - 0 . 0 3 - 0 . 0 2 - 0 . 0 1 0 0 . 0 1 0 . 0 2 0 . 0 3 0 50 100 150 200 250 300 350 400 450 P I C V s . T r a i n i n g - 0 . 1 - 0 . 0 8 - 0 . 0 6 - 0 . 0 4 - 0 . 0 2 0 0 . 0 2 0 . 0 4 0 . 0 6 0 100 200 300 400 500 600 700 I d e a l N N V s . P I C - 0 . 1 - 0 . 0 8 - 0 . 0 6 - 0 . 0 4 - 0 . 0 2 0 0 . 0 2 0 . 0 4 0 . 0 6 0 100 200 300 400 500 600 P I C V s . T r a i n i n g - 0 . 0 3 - 0 . 0 2 - 0 . 0 1 0 0 . 0 1 0 . 0 2 0 . 0 3 0 100 200 300 400 500 600 700 I d e a l N N V s . T r a i n i n g 87 4.4. Matlab's Peaks Surface Example In C The example shown previously was repeated, except this time using a C generated file instead of the assembly. The output of the PIC using C introduces an insignificant amount of error when compared with the error introduced by the neural network. The same training data and ideal network was used from the previous example. The original training data is shown in Figure 39 and Figure 55 shows the output of the network, written in C and implemented on the PIC. The difference between the ideal network and the PIC implemented network is insignificant when compared to the error added by the neural network itself. The error from the microcontroller is two orders of magnitude smaller than that generated by the microcontroller itself. More details of this can be seen in the histograms describing the errors in Figure 55 and Figure 56. 88 Figure 54: Output of the PIC using the C version of the embedded neural network software. Figure 55: Error between the ideal neural network and the PIC output using C. 5 10 15 20 25 5 10 15 20 25 - 0 . 5 0 0 . 5 P i c O u t p u t 5 10 15 20 25 5 10 15 20 25 -4 -2 0 2 4 x 1 0 -4 P I C e r r o r t o I d e a l N N T i g h t 89 Figure 56: Histogram of errors between Ideal neural network and training data. The X- axis is the error and the Y-axis is the number of samples. Figure 57: Histogram of errors between ideal neural network and PIC implemented neural network. The X-axis is the error and the Y-axis is the number of samples. - 0 . 1 5 - 0 . 1 - 0 . 0 5 0 0 . 0 5 0 . 1 0 50 100 150 200 I d e a l N N V s . T r a i n i n g -6 -4 -2 0 2 4 6 x 1 0 -4 0 10 20 30 40 50 60 I d e a l N N V s . P I C - 0 . 1 - 0 . 0 5 0 0 . 0 5 0 . 1 0 . 1 5 0 50 100 150 200 P I C V s . T r a i n i n g - 0 . 1 5 - 0 . 1 - 0 . 0 5 0 0 . 0 5 0 . 1 0 50 100 150 200 I d e a l N N V s . T r a i n i n g -6 -4 -2 0 2 4 6 x 1 0 -4 0 10 20 30 40 50 60 I d e a l N N V s . P I C - 0 . 1 - 0 . 0 5 0 0 . 0 5 0 . 1 0 . 1 5 0 50 100 150 200 P I C V s . T r a i n i n g 90 4.5. Experimental Data Summary After comparing the results of the two different implementations of neural networks, it was obvious that the C version is much more accurate; however this accuracy does not come without a decrease in performance. The C version was significantly slower due to its complexity of calculations and its necessity to store all weights and nodes in program memory because they are too large to put in RAM. The accuracy and speeds can be seen in Table 2: Neural network performance comparison.. Surface Neurons Training Error Ideal & PIC RMS Ideal & Training RMS Training & Pic RMS Time ms Peaks 19 0.01 0.015005 0.010594 0.018257 1.6 Peaks 8 0.01 0.007292 0.0253 0.026012 0.752 Simple 4 0.001 0.005151 0.0044663 0.0068827 0.317 Simple 2 0.001 0.0080934 0.07516 0.076336 0.163 Peaks C 19 0.01 0.00012772 0.0089456 0.0089523 2.95 Peaks C 8 0.01 0.00025928 0.0253 0.025301 5.87 Simple C 4 0.001 0.000041934 0.0044663 0.0044666 2.2 Table 2: Neural network performance comparison. Table 2 shows the performance of the neural network in several categories. The first column is the surface name as discussed in the previous sections. The neuron column is the number of neurons used in the network for the given example. The training error is the total error for all points summed. This parameter is used to decide when the training has been completed. The root mean squared (RMS) errors are taken for the difference between two of the three surfaces and then labeled in each column. The time 91 in milliseconds is the time required for one forward calculation of the neural network. This time does not include the time for acquiring a sample input or using the output. The C implementation time requirements are very difficult to analyze and virtually impossible to predict because of the sophisticated C compiler that is used. A few preliminary tests were done on the C program, such as taking into account how long it takes to manipulate the floating point numbers. Times were collected for memory reads and writes, multiplication, and the tanh calculation. Based on these individual component times, estimates were made to predict how long the system would take to process a given number of neurons based on the number and type of calculations. However, this data is a reasonable approximation for small neurons with a small number of calculations as in the simple surface shown in the last line in Table 2. The C version took approximately ten times longer than the assembly version, which was predicted based on number of calculations. Due to the optimization of the compiler this does not hold true for larger networks as in the peaks surface. The original estimate was significantly slower than it actually is. This allows the C version to operate faster than anticipated and still be very accurately which makes it very valuable even on such a low end microcontroller. 92 5. Conclusion This dissertation presents a solution for embedded neural networks across many types of hardware and for many applications. The software package presented here allows the user to develop a neural network for a desired application, train the network, embed it in any platform, and verify its functionality. This software package is a complete embedded neural network solution. This package offers the user the ability to use far superior neural network architectures than in other training software. The user has the freedom to customize his network for his application. He can use traditional multi layer perceptron networks or the superior arbitrarily connected networks including fully connected and cascade networks. Most other software and research only trains with error back propagation or other first order algorithms. This dissertation gives the user their choice of traditional EBP as well as the faster and more efficient second order algorithms such as the Neuron by Neuron algorithm and the Enhanced Self Aware algorithm. The software offers the user the option of installing the network on a Microchip's 18Fxxxx series microcontroller using custom made neural network software written in assembly language and optimized for both the microcontroller and the neural network application. This version offers a very fast and accurate solution on a very inexpensive microcontroller. 93 If the user prefers to use a different platform then the C code generated can be used to implement the trained network on any C capable platform. This can be used on other microcontrollers as well as PC based neural networks. This accomplishment demonstrates that neural networks can be used to solve problems that in the past would require custom software programs to be written for each problem. In other words, if three separate microcontrollers were needed to control three different processes for a single project then three unique programs would need to be written. This solution offers one standard solution for controlling all three. The user simply needs to train three separate networks, which is an automated process. Then the user has the solutions for unique problems without having to write code for the mathematics. The neural network may not be the absolute best solution for every problem but it is a very acceptable and easy to implement solution for an extremely large variety of problems. Many times neural networks are not used because of lack of software for training and implementing. This dissertation removes that burden and allows neural networks to be used in more main stream applications by allowing users to implement them in their applications with ease. The same concepts presented here could be used to produce similar custom optimized assembly language hardware for other networks. In microcontrollers with greater computing power this becomes easier. The neuron by neuron approach using the arrays for weights and nodes can be taken to any platform and implemented in the same manner. 94 The C output file could also be used to run the neural network on a computer as well as microcontrollers. The only modification that would be required is the two type definitions which specifying that they should be stored in program memory. This would not be applicable to a personal computer based network and this line would need to be removed. Otherwise the files are platform independent. This dissertation cannot stress enough that the proof of concept shown here opens the door for neural networks to be used on any platform for problems of virtually any kind. The complexity of the problems can range from a simple one neuron one input one output system to dozens of neurons and many inputs and outputs. This solution has endless problems it is capable of solving. The next step in this research is to extend the C version to other microcontrollers and compare calculation times. The final step would be to train the network on data collected for a physical system. This would demonstrate and verify the speed of the neural network in a hardware application. 95 References [1] B. K. Bose, "Neural Network Applications in Power Electronics and Motor Drives—An Introduction and Perspective," IEEE Transactions on Industrial Electronics, vol. 54, pp. 14-33, 2007. [2] M. A. El-Sharkawi, "Neural network application to high performance electric drives systems," in Proc. IEEE IECON 21st Int Industrial Electronics, Control, and Instrumentation Conf, 1995, pp. 44-49. [3] L. M. Grzesiak and B. Ufnalski, "Neural stator flux estimator with dynamical signal preprocessing," in Proc. 7th AFRICON Conf AFRICON in Africa, 2004, pp. 1137-1142. [4] Y. Yusof and A. H. M. Yatim, "Simulation and modeling of stator flux estimator for induction motor using artificial neural network technique," in Proc. National Power Engineering Conf. PECon 2003, 2003, pp. 11-15. [5] A. Ba-Razzouk, A. Cheriti, G. Olivier, and P. Sicard, "Field-oriented control of induction motors using neural-network decouplers," Proc. IEEE IECON 21st Int Industrial Electronics, Control, and Instrumentation Conf, vol. 12, pp. 752-763, 1997. [6] S. M. Gadoue, D. Giaouris, and J. W. Finch, "Sensorless Control of Induction Motor Drives at Very Low and Zero Speeds Using Neural Network Flux Observers," IEEE Transactions on Industrial Electronics, vol. 56, pp. 3029-3039, 2009. [7] C. Hudson, N. S. Lobo, and R. Krishnan, "Sensorless control of single switch based switched reluctance motor drive using neural network," in Industrial Electronics Society, 2004. IECON 2004. 30th Annual Conference of IEEE, 2004, pp. 2349-2354 Vol. 3. [8] J. F. Martins, P. J. Santos, A. J. Pires, L. E. B. da Silva, and R. V. Mendes, "Entropy-Based Choice of a Neural Network Drive Model," IEEE Transactions on Industrial Electronics, vol. 54, pp. 110-116, 2007. [9] H. Zhuang, K.-S. Low, and W.-Y. Yau, "A Pulsed Neural Network With On-Chip Learning and Its Practical Applications," IEEE Transactions on Industrial Electronics, vol. 54, pp. 34-42, 2007. 96 [10] J. Mazumdar and R. G. Harley, "Recurrent Neural Networks Trained With Backpropagation Through Time Algorithm to Estimate Nonlinear Load Harmonic Currents," Industrial Electronics, IEEE Transactions on, vol. 55, pp. 3484-3491, 2008. [11] N. Pecharanin, H. Mitsui, and M. Sone, "Harmonic detection by using neural network," in Proc. Conf. IEEE Int Neural Networks, 1995, pp. 923-926. [12] N. Pecharanin, M. Sone, and H. Mitsui, "An application of neural network for harmonic detection in active filter," in Proc. IEEE Int Neural Networks IEEE World Congress Computational Intelligence. Conf, 1994, pp. 3756-3760. [13] M. Rukonuzzaman, A. A. M. Zin, H. Shaibon, and K. L. Lo, "An application of neural network in power system harmonic detection," in Proc. IEEE Int. Joint Conf. IEEE World Congress Computational Intelligence Neural Networks, 1998, pp. 74-78. [14] A. A. M. Zin, M. Rukonuzzaman, H. Shaibon, and K. I. Lo, "Neural network approach of harmonics detection," in Proc. Int. Conf. Energy Management and Power Delivery EMPD '98, 1998, pp. 467-472. [15] H. C. Lin, "Dynamic power system harmonic detection using neural network," in Proc. IEEE Conf. Cybernetics and Intelligent Systems, 2004, pp. 757-762. [16] S. Osowski, "Neural network for estimation of harmonic components in a power system," IEE Proceedings C Generation, Transmission and Distribution, vol. 139, pp. 129-135, 1992. [17] Z. Jin and B. K. Bose, "Neural-network-based waveform Processing and Delayless filtering in power electronics and AC drives," Industrial Electronics, IEEE Transactions on, vol. 51, pp. 981-991, 2004. [18] M. J. Embrechts and S. Benedek, "Hybrid identification of nuclear power plant transients with artificial neural networks," Industrial Electronics, IEEE Transactions on, vol. 51, pp. 686-693, 2004. [19] L. Hsiung Cheng, "Intelligent Neural Network-Based Fast Power System Harmonic Detection," Industrial Electronics, IEEE Transactions on, vol. 54, pp. 43-52, 2007. [20] S. Chakraborty, M. D. Weiss, and M. G. Simoes, "Distributed Intelligent Energy Management System for a Single-Phase High-Frequency AC Microgrid," IEEE Transactions on Industrial Electronics, vol. 54, pp. 97-109, 2007. 97 [21] H. C. Lin, "Intelligent Neural Network-Based Fast Power System Harmonic Detection," IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, vol. 54, pp. 43-52, 2007. [22] W. Qiao and R. G. Harley, "Indirect Adaptive External Neuro-Control for a Series Capacitive Reactance Compensator Based on a Voltage Source PWM Converter in Damping Power Oscillations," IEEE Transactions on Industrial Electronics, vol. 54, pp. 77-85, 2007. [23] B. Singh, V. Verma, and J. Solanki, "Neural Network-Based Selective Compensation of Current Quality Problems in Distribution System," IEEE Transactions on Industrial Electronics, vol. 54, pp. 53-60, 2007. [24] S. S. Ge and W. Cong, "Adaptive neural control of uncertain MIMO nonlinear systems," Neural Networks, IEEE Transactions on, vol. 15, pp. 674-692, 2004. [25] E. B. Kosmatopoulos, M. M. Polycarpou, M. A. Christodoulou, and P. A. Ioannou, "High-order neural network structures for identification of dynamical systems," Neural Networks, IEEE Transactions on, vol. 6, pp. 422-431, 1995. [26] F. L. Lewis, A. Yegildirek, and L. Kai, "Multilayer neural-net robot controller with guaranteed tracking performance," Neural Networks, IEEE Transactions on, vol. 7, pp. 388-399, 1996. [27] U. D. Deep, B. R. Petersen, and J. Meng, "A Smart Microcontroller-Based Iridium Satellite-Communication Architecture for a Remote Renewable Energy Source," IEEE Transactions on Power Delivery, vol. 24, pp. 1869-1875, 2009. [28] F. Ferreyre, R. Goyet, G. Clerc, and T. Bouscasse, "Sensorless Slowdown Detection Method for Single-Phase Induction Motors," IEEE Transactions on Energy Conversion, vol. 24, pp. 60-67, 2009. [29] J. Kwong, Y. K. Ramadass, N. Verma, and A. P. Chandrakasan, "A 65 nm Sub- V Microcontroller With Integrated SRAM and Switched Capacitor DC-DC Converter," IEEE Journal of Solid-State Circuits, vol. 44, pp. 115-126, 2009. [30] C. Labussiere-Dorgan, S. Bendhia, E. Sicard, J. Tao, H. J. Quaresma, C. Lochot, and B. Vrignon, "Modeling the Electromagnetic Emission of a Microcontroller Using a Single Model," IEEE Transactions on Electromagnetic Compatibility, vol. 50, pp. 22-34, 2008. [31] D. G. Lamar, A. Fernandez, M. Arias, M. Rodriguez, J. Sebastian, and M. M. Hernando, "A Unity Power Factor Correction Preregulator With Fast Dynamic Response Based on a Low-Cost Microcontroller," Proc. IEEE IECON 21st Int Industrial Electronics, Control, and Instrumentation Conf, vol. 23, pp. 635-642, 2008. 98 [32] J. H. Lee, H. S. Bae, and B. H. Cho, "Resistive Control for a Photovoltaic Battery Charging System Using a Microcontroller," IEEE Transactions on Industrial Electronics vol. 55, pp. 2767-2775, 2008. [33] S.-Y. Oh, Y.-G. Jung, S.-H. Yang, and Y.-C. Lim, "Harmonic-Spectrum Spreading Effects of Two-Phase Random Centered Distribution PWM (DZRCD) Scheme With Dual Zero Vectors," IEEE Transactions on Industrial Electronics, vol. 56, pp. 3013-3020, 2009. [34] H. Esmaeilzadeh, P. Saeedi, B. N. Araabi, C. Lucas, and S. M. Fakhraie, "Neural network stream processing core (NnSP) for embedded systems," in Proc. IEEE Int. Symp. Circuits and Systems ISCAS 2006, 2006. [35] G. Dundar and K. Rose, "Analog neural network circuits for ASIC fabrication," in Proc. Fifth Annual IEEE Int. ASIC Conf. and Exhibit, 1992, pp. 419-422. [36] L. Gatet, H. Tap-Beteille, and M. Lescure, "Analog Neural Network Implementation for a Real-Time Surface Classification Application," Sensors Journal, IEEE, vol. 8, pp. 1413-1421, 2008. [37] S. Xiao, M. H. L. Chow, F. H. F. Leung, X. Dehong, W. Yousheng, and L. Yim- Shu, "Analogue implementation of a neural network controller for UPS inverter applications," IEEE Transactions on Power Electronics, vol. 17, pp. 305-313, 2002. [38] A. Rajah and M. Khalil Hani, "ASIC design of a Kohonen neural network microchip," in Proc. IEEE Int. Conf. Semiconductor Electronics ICSE 2004, 2004. [39] A. Konig, P. Windirsch, M. Gasteier, and M. Glesner, "Visual inspection in industrial manufacturing," IEEE Micro, vol. 15, pp. 26-31, 1995. [40] S. Himavathi, D. Anitha, and A. Muthuramalingam, "Feedforward Neural Network Implementation in FPGA Using Layer Multiplexing for Effective Resource Utilization," IEEE Transactions on Neural Networks, vol. 18, pp. 880- 888, 2007. [41] F.-J. Lin, S.-Y. Chen, and Y.-C. Hung, "Field-programmable gate array-based recurrent wavelet neural network control system for linear ultrasonic motor," IET Electric Power Applications, vol. 3, pp. 298-312, 2009. [42] F.-J. Lin, Y.-C. Hung, and S.-Y. Chen, "FPGA-Based Computed Force Control System Using Elman Neural Network for Linear Ultrasonic Motor," IEEE Transactions on Industrial Electronics, vol. 56, pp. 1238-1253, 2009. 99 [43] F.-J. Lin and Y.-C. Hung, "FPGA-based elman neural network control system for linear ultrasonic motor," IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, vol. 56, pp. 101-113, 2009. [44] N. Funabiki, M. Yoda, J. Kitamichi, and S. Nishikawa, "A gradual neural network approach for FPGA segmented channel routing problems," IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 29, pp. 481-489, 1999. [45] Y. Maeda and M. Wakamura, "Simultaneous perturbation learning rule for recurrent neural networks and its FPGA implementation," IEEE Transactions on Neural Networks, vol. 16, pp. 1664-1672, 2005. [46] D. Zhang and H. Li, "A Stochastic-Based FPGA Controller for an Induction Motor Drive With Integrated Neural Network Algorithms," IEEE Transactions on Industrial Electronics, vol. 55, pp. 551-561, 2008. [47] M. Mohamadian, E. P. Nowicki, A. Chu, F. Ashrafzadeh, and J. C. Salmon, "DSP implementation of an artificial neural network for induction motor control," in Proc. IEEE 1997 Canadian Conf. Electrical and Computer Engineering, 1997, pp. 435-437. [48] D. Dong, W. N. White, and H. Luo, "Investigation of kinematics and inverse dynamics algorithm with a DSP implementation of a neural network," in Proc. American Control Conf, 1994, pp. 2460-2464. [49] F. Ashrafzadeh, R. Sachdeva, and A. Chu, "A novel neural network controller and its efficient DSP implementation for vector controlled induction motor drives," in Proc. 37th IAS Annual Meeting Industry Applications Conf. Conf. Record of the, 2002, pp. 1455-1462. [50] M. Mohamadian, E. Nowicki, F. Ashrafzadeh, A. Chu, R. Sachdeva, and E. Evanik, "A novel neural network controller and its efficient DSP implementation for vector-controlled induction motor drives," #IEEE_J_IA#, vol. 39, pp. 1622- 1629, 2003. [51] T. Xu and G. Ni, "Research on Algorithms of Neural Network Ensemble with Multi-dsp Mixture Structure," in Proc. Third Int. Conf. Natural Computation ICNC 2007, 2007, pp. 162-166. [52] F. F. M. El-Sousy, "Robust adaptive H∞ position control via a wavelet- neural-network for a DSP-based permanent-magnet synchronous motor servo drive system," IET Electric Power Applications, vol. 4, pp. 333-347, 2010. [53] F.-J. Lin, P.-H. Chou, and Y.-S. Kung, "Robust fuzzy neural network controller with nonlinear disturbance observer for two-axis motion control system," IET Control Theory & Applications, vol. 2, pp. 151-167, 2008. 100 [54] E. Echenique, J. Dixon, R. Cardenas, and R. Pena, "Sensorless Control for a Switched Reluctance Wind Generator, Based on Current Slopes and Neural Networks," IEEE Transactions on Industrial Electronics, vol. 56, pp. 817-825, 2009. [55] B. M. Wilamowski, "Neural network architectures and learning algorithms," IEEE Industrial Electronics Magazine, vol. 3, pp. 56-63, 2009. [56] B. M. Wilamowski, "Special neural network architectures for easy electronic implementations," in Proc. Int. Conf. Power Engineering, Energy and Electrical Drives POWERENG '09, 2009, pp. 17-22. [57] B. M. Wilamowski, N. Cotton, J. Hewlett, and O. Kaynak, "Neural Network Trainer with Second Order Learning Algorithms," in Proc. 11th Int. Conf. Intelligent Engineering Systems INES 2007, 2007, pp. 127-132. [58] B. M. Wilamowski, N. J. Cotton, O. Kaynak, and G. Dundar, "Method of computing gradient vector and Jacobean matrix in arbitrarily connected neural networks," in Proc. IEEE Int. Symp. Industrial Electronics ISIE 2007, 2007, pp. 3298-3303. [59] B. M. Wilamowski, N. J. Cotton, O. Kaynak, and G. Dundar, "Computing Gradient Vector and Jacobian Matrix in Arbitrarily Connected Neural Networks," IEEE Transactions on Industrial Electronics, vol. 55, pp. 3784-3790, 2008. [60] B. M. Wilmowski, "New Trends in Neural and Fuzzy Systems," in Proc. Int. Conf. Intelligent Engineering Systems INES 2008, 2008, p. 9. [61] H. Yu and B. M. Wilamowski, "C++ implementation of neural networks trainer," in Proc. Int. Conf. Intelligent Engineering Systems INES 2009, 2009, pp. 257-262. [62] H. Yu and B. M. Wilamowski, "Efficient and reliable training of neural networks," in Proc. 2nd Conf. Human System Interactions HSI '09, 2009, pp. 109- 115. [63] B. M. Wilamowski, D. Hunter, and A. Malinowski, "Solving parity-N problems with feedforward neural networks," in Proc. Int Neural Networks Joint Conf, 2003, pp. 2546-2551. [64] P. Petchjatuporn, W. Ngamkham, N. Khaehintung, P. Sirisuk, W. Kiranon, and A. Kunakorn, "A Solar-powered Battery Charger with Neural Network Maximum Power Point Tracking Implemented on a Low-Cost PIC-microcontroller," in TENCON 2005 2005 IEEE Region 10, 2005, pp. 1-4. 101 [65] U. Farooq, M. Amar, K. M. Hasan, M. Khalil Akhtar, M. U. Asad, and A. Iqbal, "A low cost microcontroller implementation of neural network based hurdle avoidance controller for a car-like robot," in Computer and Automation Engineering (ICCAE), 2010 The 2nd International Conference on, 2010, pp. 592- 597. [66] S. Bashyal, G. K. Venayagamoorthy, and B. Paudel, "Embedded neural network for fire classification using an array of gas sensors," in Sensors Applications Symposium, 2008. SAS 2008. IEEE, 2008, pp. 146-148. [67] M. Xiaoying, T. Yuening, and H. Jia, "A Fuzzy Neural Network Control System Based on Embedded System," in Mechatronics and Automation, 2007. ICMA 2007. International Conference on, 2007, pp. 2394-2398. [68] J. Binfet and B. M. Wilamowski, "Microprocessor implementation of fuzzy systems and neural networks," in Neural Networks, 2001. Proceedings. IJCNN '01. International Joint Conference on, 2001, pp. 234-239 vol.1. [69] G. L. Dempsey, N. L. Alt, B. A. Olson, and J. S. Alig, "Control sensor linearization using a microcontroller-based neural network," in Systems, Man, and Cybernetics, 1997. 'Computational Cybernetics and Simulation'., 1997 IEEE International Conference on, 1997, pp. 3078-3083 vol.4. [70] M. Mestari, "An analog neural network implementation in fixed time of adjustable-order statistic filters and applications," Neural Networks, IEEE Transactions on, vol. 15, pp. 766-785, 2004. [71] J. R. Stack, G. J. Dobeck, X. Liao, and L. Carin, "Kernel-Matching Pursuits With Arbitrary Loss Functions," IEEE Transactions on Neural Networks, vol. 20, pp. 395-405, 2009. [72] S. Tabarce, V. G. Tavares, and P. G. de Oliveira, "Programmable analogue VLSI implementation for asymmetric sigmoid neural activation function and its derivative," Electronics Letters, vol. 41, pp. 863-864, 2005. [73] N. Zhang and D. C. Wunsch II, "A Switched-Resistor Approach to Hardware Implementation of Neural Networks," in Proc. 14th IEEE Int. Conf. Fuzzy Systems FUZZ '05, 2005, pp. 336-340. [74] B. M. Wilamowski and J. Binfet, "Do Fuzzy Controllers Have Advantages over Neural Controllers in Microprocessor Implementation," presented at the Proc of.2-nd International Conference on Recent Advances in Mechatronics, Istanbul, Turkey, 1999. 102 [75] J. Binfet and B. M. Wilamowski, "Microprocessor implementation of fuzzy systems and neural networks," in Proc. Int. Joint Conf. Neural Networks IJCNN '01, 2001, pp. 234-239. [76] N. J. Cotton, B. M. Wilamowski, and G. Dundar, "A Neural Network Implementation on an Inexpensive Eight Bit Microcontroller," in Proc. Int. Conf. Intelligent Engineering Systems INES 2008, 2008, pp. 109-114. [77] G. L. Dempsey, N. L. Alt, B. A. Olson, and J. S. Alig, "Control sensor linearization using a microcontroller-based neural network," in Proc. IEEE Int Systems, Man, and Cybernetics 'Computational Cybernetics and Simulation'. Conf, 1997, pp. 3078-3083. [78] U. Farooq, M. Amar, K. M. Hasan, M. Khalil Akhtar, M. U. Asad, and A. Iqbal, "A low cost microcontroller implementation of neural network based hurdle avoidance controller for a car-like robot," in Proc. 2nd Int Computer and Automation Engineering (ICCAE) Conf, 2010, pp. 592-597. [79] U. Farooq, M. Amar, E. ul Haq, M. U. Asad, and H. M. Atiq, "Microcontroller Based Neural Network Controlled Low Cost Autonomous Vehicle," in Proc. Second Int Machine Learning and Computing (ICMLC) Conf, 2010, pp. 96-100. [80] N. Cotton, "Training Arbitrarily Connected Neural Networks with Second Order Algorithms," Masters of Science, Electrical and Computer Engineering, Auburn University, Auburn, 2008. [81] M. T. Hagan and M. B. Menhaj, "Training feedforward networks with the Marquardt algorithm," IEEE Transactions on Neural Networks, vol. 5, pp. 989- 993, 1994. [82] J. Hewlett, B. Wilamowski, and G. Dundar, "Merge of Evolutionary Computation with Gradient Based Method for Optimization Problems," in Proc. IEEE Int. Symp. Industrial Electronics ISIE 2007, 2007, pp. 3304-3309.