i
Improved Nelder Mead?s Simplex Method and Applications
by
Nam Dinh Pham
A dissertation submitted to the Graduate Faculty of
Auburn University
in partial fulfillment of the
requirements for the Degree of
Doctor of Philosophy
Auburn, Alabama
May 7, 2012
Keywords: Nelder Mead?s simplex method, quasi gradient method,
lossy filter, Error Back Propagation, Lavenberg Marquardt, neural networks
Copyright 2012 by Nam Dinh Pham
Approved by
Bogdan Wilamowski, Chair, Professor of Electrical and Computer Engineering
Lloyd Stephen Riggs, Professor of Electrical and Computer Engineering
Thaddeus Roppel, Associate Professor of Electrical and Computer Engineering
Michael Baginski, Associate Professor of Electrical and Computer Engineering
Wei-Shinn Ku, Assistant Professor of Computer Science and Software Engineering
ii
Abstract
Derivative free optimization algorithms are often used when it is difficult to find function
derivatives, or if finding such derivatives are time consuming. The Nelder Mead?s simplex
method is one of the most popular derivative free optimization algorithms in the fields of
engineering, statistics, and sciences. This algorithm is favored and widely used because of its fast
convergence and simplicity. The simplex method converges really well with small scale
problems of some variables. However, it does not have much success with large scale problems
of multiple variables. This factor has reduced its popularity in optimization sciences
significantly. Two solutions of quasi gradients are introduced to improve it in terms of the
convergence rate and the convergence speed. The improved algorithm with higher success rate
and faster convergence which still maintains the simplicity is the key feature of this paper. This
algorithm will be compared on several benchmark functions with the original simplex method
and other popular optimization algorithms such as the genetic algorithm, the differential
evolution algorithm, and the particle swarm algorithm. Then the comparing results will be
reported and discussed.
iii
Acknowledgments
I would like to express my deepest appreciation to my advisor, Professor B. M.
Wilamowski. Without his patience and guidance I would not be in the position I am today. From
him I have learned a multitude of things, of which, engineering is only the tip of the iceberg.
iv
Table of Contents
Abstract .......................................................................................................................................... ii
Acknowledgements ........................................................................................................................ iii
List of Tables ............................................................................................................................... vii
List of Figures ............................................................................................................................. viii
List of Abbreviations ......................................................................................................................x
1 Chapter 1: Introduction .............................................................................................................1
1.1 Genetic Algorithm ...............................................................................................................5
1.2 Differential Evolution Algorithm .......................................................................................6
1.3 Particle Swarm Optimization ..............................................................................................7
1.4 Nelder Mead?s Simplex Algorithm ......................................................................................8
2 Chapter 2: Improved Simplex Method with Quasi Gradient Methods ...................................13
2.1 Deficiency of Nelder Mead?s Simplex Method ................................................................13
2.2 Quasi Gradient Methods ...................................................................................................15
2.2.1 Quasi Gradient Method Using an Extra vertex ......................................................16
2.2.2 Quasi Gradient Method Using a Hyper Plane Equation .......................................17
2.3 Testing Functions ...............................................................................................................20
2.4 Experimental Results ........................................................................................................24
3 Chapter 3: Synthesize Lossy Ladder Filters with Improved Simplex Method .......................35
v
3.1 Filter Synthesis Algorithms ..............................................................................................35
3.1. 1 Butterworth Low- pass Filter ................................................................................35
3.1. 2 Chebyshev Low- pass Filter...................................................................................37
3.1. 3 Inverse Chebyshev Low- pass Filter .....................................................................38
3.1. 4 Cauer Elliptic Low- pass Filter .............................................................................39
3.2 Ladder Prototype Synthesis Algorithms ...........................................................................41
3.2. 1 Design Ladder Low-pass Prototype without Zeros ..............................................42
3.2. 2 Design Ladder Low-pass Prototype with Zeros ....................................................44
3.3 Transformation from Low-pass Filters to Other Type Filters ..........................................46
3.4 Lossy Filter Synthesis with Improved Simplex Method ...................................................47
4 Chapter 4: Training Neural Networks with Improved Simplex Method ................................53
4.1 Artificial Neural Networks ...............................................................................................53
4.1.1 Neural Network Architectures ..............................................................................54
4.1.2 Error Back Propagation Algorithm .......................................................................58
4.1.3 Lavenberg Marquardt Algorithm ..........................................................................63
4.2 Training Neural Networks with Improved Simplex Method ............................................67
4.3 Control Robot Arm Kinematics with Improved Simplex Method ....................................71
5 Chapter 5: Conclusions ...........................................................................................................77
References: ....................................................................................................................................79
Appendix 1: Nelder Mead?s simplex method ...............................................................................86
Appendix 2: Improved simplex method with quasi- gradient method using an extra point .........90
Appendix 3: Improved simplex method with quasi-gradient method using a hyper plane ..........94
vi
Appendix 4: Test function .............................................................................................................98
vii
List of Tables
Table 2.1: Coefficients of Kowalik and Osborne function ...........................................................22
Table 2.2: Coefficients of Bard Functions ....................................................................................23
Table 2.3: Comparison of derivative free optimization algorithms of 20-dimensional function ..24
Table 2.4: Comparison of derivative free optimization algorithms of 10-dimensional function ..31
Table 2.5: Evaluation of success rate and computing time of 15-dimensional functions .............31
Table 2.6: Evaluation of success rate and computing time of 20-dimensional functions ..................32
Table 2.7: Evaluation of success rate and computing time of 40-dimensional functions .............32
Table 3.1: Transformation of the low pass immittances L and C to ladder arms for high pass,
band-pass, band-reject, and multiple pass-band filters .....................................................46
Table 4.1: Number of neurons/weights required for different parity problems using neural
network architectures ........................................................................................................58
Table 4.2: Comparison of training algorithms with MLP architecture .........................................69
Table 4.3: Comparison of training algorithms with BMLP architecture ......................................70
Table 4.4: Comparison of training algorithms with fully connected cascade architecture ...........71
viii
List of Figures
Fig. 1.1: Triangular simplex ?BGW with midpoint M, reflected point R and extended point E ..10
Fig. 1.2: Contracted points C1 and C2, shrinking points S and M toward B ................................11
Fig. 2.1: The triangular simplex ?BGW with similar function values at W and G (case (a)) and the
triangular simplex ?BGW with similar function values at B and G (case (b)) ......................15
Fig. 2.2: The simplex ?BGW with extra point E .............................................................................17
Fig. 2.3: Simulated error curves of De Jong function 1(a) and De Jong function 1 with moved
axis (b) ..............................................................................................................................27
Fig. 2.4: Simulated error curves of Quadruple function (a) and Powell function (b) ...................27
Fig. 2.5: Simulated error curves of Parallel hyperellipsoid function (a) and Zakarov function (b)
......................................................................................................................................28
Fig. 2.6: Simulated error curves of Schwefel function (a) and sum of different power function (b)
......................................................................................................................................28
Fig. 2.7: Simulated error curves of Step function (a) and Box function (b) .................................29
Fig. 2.8: Simulated error curves of Rosenbrock function (a) and Biggs Exp6 function (b) .........29
Fig. 2.9: Simulated error curves of Kowalik Osborne function (a) and Colville function (b) ......30
Fig. 2.10: Simulated error curves of Wood function (a) and Bard function (b) ..........................30
Fig. 3.1: Butterworth filter response ...............................................................................................36
Fig. 3.2: Chebyshev filter response ...............................................................................................37
Fig. 3.3: Inverse Chebyshev filter response ..................................................................................38
Fig. 3.4: Cauer Elliptic filter response ..........................................................................................39
ix
Fig. 3.5: Doubly terminated ladder network without zeros ..........................................................42
Fig. 3.6: General ladder circuit with presence of zeros ................................................................44
Fig. 3.7: Models of real capacitors and real inductors ..................................................................47
Fig. 3.8: Chebyshev filter circuit with ideal elements ..................................................................48
Fig. 3.9: Chebyshev filter circuit with lossy elements .....................................................................48
Fig. 3.10: Magnitude and phase responses (a) and pole locations (b) of filters ................................49
Fig. 3.11: 4th Chebyshev filter circuit with ideal elements ...........................................................51
Fig. 3.12: 4th Chebyshev filter circuit with lossy elements ...........................................................51
Fig. 3.13: Magnitude and phase responses (a) and pole locations (b) of filters ...........................52
Fig. 4.1: Multilayer perceptron architecture 3-3-4-1 (MLP) ........................................................56
Fig. 4.2: Bridged multilayer perceptron architecture 3-3-4-1 (BMLP) ........................................56
Fig. 4.3: Fully connected cascade architecture 3-1-1-1 (FCC) .....................................................57
Fig. 4.4: Neural network with one input layer and one output layer ............................................59
Fig. 4.5: Neural network with one hidden layer ...........................................................................60
Fig. 4.6: Multilayer perceptron neural network to train parity-N problems .................................68
Fig. 4.7: Bridged multilayer perceptron neural network to train parity-N problems ....................69
Fig. 4.8: Fully connected cascade neural network to train parity-N problems .............................70
Fig. 4.9: Two-link planar manipulator ..........................................................................................72
Fig. 4.10: Neural network architecture to control robot arm kinematics ......................................73
Fig. 4.11: Desired output (a) and actual output (b) from the neural network in x direction .........73
Fig. 4.12: Desired output (a) and actual output (b) from the neural network in y direction .........74
Fig. 4.13: Desired output of a function f .......................................................................................75
Fig. 4.14: Output from fuzzy system (a), output from neural network (b) ...................................76
x
List of Abbreviations
SIM: Nelder Mead?s Simplex Method
EBP: Error Back Propagation
LM: Lavernberg Marquardt
LMS: Least Mean Square
MLP: Multilayer Perceptron
BMLP: Bridged Multilayer Perceptron
FCC: Fully Connected Cascade
RBF: Radial Basis Function
LVQ: Learning Vector Quantization
1
Chapter 1
Introduction
The desire for optimality is the inherent nature of humans such as a manufacturer wants
to produce its products with the lowest cost, or a delivery company wants to deliver its products
to all distributers with the shortest distance to save gasoline, time, etc. These are the typical
examples which optimization theories can be applied to give optimal solutions. From the
appearance of computers, mathematical theories of optimization have been developed and
applied widely. The computer with its computing power has the ability to implement
optimization theories very efficiently in the manner of time and cost. The goal of the
optimization theories is the creation of a reliable method to optimize models by an intelligent
process. Applications of these theories play more important roles for modern engineering and
planning, etc.
In real life scientists, engineers, and managers often collect a lot of data and usually fall
into difficult situations how to select different factors to obtain desired results. Optimization is a
process of how to trade off these factors to find the best solution by evaluating their
combinations. Many engineering problems can be defined as optimization problems such as
process design, logistics, process synthesis & analysis, telecommunication network, finding of an
optimal trajectory for a robot arm, the optimal thickness of steel in pressure vessels, etc [1]. In
2
practice, optimization algorithms are able to solve these problems but to find the best solution for
these problems is often not very easy and straightforward because they include in large search
spaces. It will be more challenging particularly in real life systems, which require optimal
solutions in an acceptable amount of time.
Optimization is a useful and important tool in the decision science and the analysis of
physical systems. In order to use this tool, an objective function has to be defined. This objective
function can be the cost, profit, time, etc. Normally, an objective function is modeled by
unknown variables to describe its characteristics. And optimization algorithms define values of
these variables to meet the requirements of this objective function. If the model is so simplistic,
the solution will not reflect useful insights into practical systems. If the model is so complex,
optimization algorithms may not give solutions. Therefore, models and optimization algorithms
usually have to be complex enough to be handled by the computer. There are numerous
optimization algorithms. Each is developed to solve a particular set of problems, and each has its
own strength and weakness. Users usually have to evaluate a model and decide which algorithm
is suited for [2].
Discrete and continuous optimization: discrete optimization problems are known as
integer programming problems. In discrete optimization problems, solutions make sense if and
only if variables are integers. To meet this constraint, a good strategy is to solve problems with
real variables and then round them up to the closest integers. This type of work is by no means
guaranteed to give optimal solutions. In contrast with discrete optimization problems, continuous
optimization problems are easier to solve because of the smoothness of continuous functions.
Moreover, these problems have an infinite set of solutions with real values; therefore, we can use
other information at any point to speculate the function?s behavior. However, the same method
3
cannot be applied to solve discrete optimization problems with a finite set of solutions, where
points are close, may have different function values.
Constrained and unconstrained optimization: constrained optimization problems arise
from models which have constraints on variables. These can be the constraints of input variables
or the constraints to reflect relationships among variables, etc. Unconstrained optimization
problems can be considered as particular cases of constrained optimization problems in which
constraints of variables can be ignored without effect on the solution. Or these constraints can be
counted as penalization terms in the objective functions of unconstrained problems.
Global and local optimization: local optimization algorithms converge much faster than
global optimization algorithms. However, its solution is just a local one which is the minimum in
the vicinity and it is not guaranteed to be the global solution which is the best of all minima.
Stochastic and deterministic optimization: in some optimization problems, the model
cannot be fully defined because it depends on quantities that are unknown at the time of
formulation. Normally, a modeler can predict unknown quantities with some degree of
confidence. Stochastic optimization algorithms will use these quantifications of the uncertainty
to produce solutions that optimize the expected performance of the model. Vice versus with
stochastic optimization algorithms, deterministic optimization algorithms assume that the model
is fully specified.
Each optimization algorithm has different techniques to converge iteratively to optimal
solutions. Some use first derivatives, second derivatives, or function values, etc. to converge.
Some accumulate information from previous iterations to predict its sequential convergence to
target values. The optimization technique is a key to differentiate one algorithm from another. A
good optimization algorithm should possess some following properties:
4
? Robustness: the algorithm has the ability to converge a wide range of problems in its
category
? Efficiency: the algorithm can converge without too expensive computing cost. This
cost can be understood as computing time and storage cost
? Accuracy: the algorithm can give solutions with precision. It is not very sensitive with
errors when being implemented on computers.
The Nelder Mead?s simplex method [3] is a popular derivative free optimization
algorithm and is a method of choice for many practitioners. It is the prime choice algorithm in
Matlab optimization toolbox. It converges relatively fast and can be implemented relatively
easily compared with other classical algorithms relying on gradients or evolutionary
computations, etc. Unlike gradient methods [4], [5], the simplex method can optimize a function
without calculating its derivatives, which usually require a lot of computing power and are
expensive in high dimensional problems as well. This property makes it more advantageous than
others.
Although the simplex method is simple and robust in small scale optimization, it easily
fails with large scale optimization. In order to become a reliable optimization tool, it has to
overcome this shortcoming by improving its convergence rate and convergence speed. This
literature will give some new insights on why the simplex method may become inefficient in
high dimensional optimization because of its lack of gradient information. This approach
explains the low convergence rate without concerning its descent property when the objective
function is uniformly convex presented in other literature [6], [7]. The dissertation will
particularly present how to improve the simplex method by combining with two different quasi
5
gradient methods. The improved algorithm without complex mathematic computations can
optimize multi-dimensional problems with higher success rate and faster convergence speed.
The genetic algorithm [8], the differential evolution algorithm [9], [10] and the particle
swarm algorithm [11], etc. are the other popular derivative free optimization tools which are
widely applied and familiar by researchers and practitioners [12]-[14]. These algorithms can
perform well in both a global and local search and have the ability to find the optimum solution
without getting trapped in local minima. This capability is mostly lacked by local search
algorithms such as the calculus-based algorithms or the simplex method. The big issue of global
search algorithms is the computational cost which often makes their convergence speed much
slower than local search algorithms. The particle swarm optimization is a kind of global search
technique. It is a probabilistic technique which is different from the deterministic and stochastic
techniques. Compared with the genetic algorithm and the differential evolution algorithm, the
particle swarm optimization is simpler in term of computations because its crossover and
mutation operation are done simultaneously while the crossover and mutation operation of the
genetic algorithm and the differential evolution algorithm are done between each pair in the
whole population. With improvements contributed in this paper, the simplex method can be
considered as another optional optimization algorithm which can work much more efficiently
than other well-known derivative free optimization algorithms in many different fields of
engineering or sciences.
1.1 Genetic Algorithm
The genetic algorithm (GA) is invented to mimic the natural behavior of evolution
according to the Darwin principle of survival and reproduction [15]. Unlike calculus-based
6
methods, GA does not require derivatives, and it also has the ability to do a parallel search in the
solution space simultaneously. Therefore, it is less likely to get trapped in local minima. Like the
particle swarm algorithm and the differential evolution algorithm, GA starts by its initial
population, and each individual is called a chromosome to represent a solution. During each
generation, chromosomes will be evaluated according to their fitness values and evolved to
create new chromosomes for the next generation. New childish chromosomes can be produced in
two different ways either by emerging from two parental chromosomes in current generation
with the crossover operator or by modifying chromosomes with the mutation operator. In order
to maintain the population size, all chromosomes have to go through the natural selecting
process. The chromosomes with better genes or better fitness will have higher probability to go
to the next generation and other ones with worse genes is more likely to be rejected. This
procedure is repeated until the best chromosome close to the optimum solution can be obtained.
Another big advantage of GA is that it can be applied in different domains, not just only in
optimization problems. However, it still has the limitation of premature convergence and low
local convergence speed. Therefore, GA is usually improved by research scholars [16], [17].
1.2 Differential Evolution Algorithm
The differential evolution algorithm (DE) was introduced by R. Storn and K. Price in
1997 [9], [10]. Today it becomes one of the most robust function minimizers with relatively
simple self-adapting mutation and is able to solve a wide range of optimization problems. The
whole idea of DE is generating a new scheme to compute trial parameter vectors. These new
parameter vectors are computed by adding the weighted difference between two population
members to a third one. If the resulting vector has a lower objective function value than a
7
predefined population member, the newly generated vector will replace the vector with which it
was compared. Through time, this algorithm has been adapted to increase its efficiency. In 2007,
a new concept of multiple trial vectors [18] was introduced into this algorithm. This approach
aims to make DE able to converge for a broader range of problems because one scheme of
calculating trial vectors may work well with certain type of problems but may not work with
other ones. Another approach was proposed where the choice of learning strategies and the two
control parameters F (weighing factor) and CR (crossover constant) are dynamically adjusted
and also made a significant improvement [19]. Recently, an adaptive differential evolution
algorithm with multiple trial vectors can train artificial neural networks successfully and shows
its competitive results with the error back propagation algorithm and the Lavenberg Marquardt
algorithm [20].
1.3 Particle Swarm Optimization
The particle swarm optimization (PSO) is a concept that simulates the social swarm
behavior of a flock of birds or a school of fish in searching for food [21]. The main concept is to
utilize the inter-communication between each individual swarm with the best one to update its
position and velocity. This algorithm operates on a randomly created population of potential
solutions and searches for the optimum value by creating the successive population of solutions.
PSO sounds similar to the differential evolution algorithm or the genetic algorithm in term of its
selecting strategy of the best child (or the best swarm), but it is really different. In this algorithm,
the potential solutions so called swarm particles are moving to the actual (dynamically changing)
optimum in the solution space. Each swarm has its own location, best location, velocity, and
fitness. In each generation, each swarm will contact with the best swarm and follow him to
8
update its own information. During its motion, if some swarms find better positions by
comparing with their own fitness, they will automatically update themselves. In case there is a
swarm finding the new best position, that swarm will be considered immediately as the current
best. Because of its global search ability and fast convergence speed compared with other global
search algorithms, PSO is applied widespread in optimization.
1.4 Nelder Mead?s Simplex Algorithm
The simplex method [3] is a direct downhill search method. It is a simple algorithm to
search for local minima and applicable for multidimensional optimization applications. Unlike
classical gradient methods, this algorithm does not have to calculate derivatives. Instead it just
creates a geometric simplex and uses this simplex?s movement to guide its convergence. A
simplex is defined as a geometrical figure which is formed by (n+1) vertices. Where n is the
number of variables of an optimization function, and vertices are points selected to form a
simplex. In each iteration, the simplex method will calculate a reflected vertex of the worst
vertex through a centroid vertex. According to the function value at this new vertex, the
algorithm will do all kinds of operations as reflection or extension, contraction, or shrink to form
a new simplex. In other words, the function values at each vertex will be evaluated iteratively,
and the worst vertex with the highest function value will be replaced by a new vertex which has
just been found. Otherwise, a simplex will be shrunk around the best vertex, and this process will
be continued until a desired minimum is met. Moreover, the convergence speed of this algorithm
can also be influenced by three parameters ?, ?, ? (? is the reflection coefficient to define how far
a reflected point should be from a centroid point; ? is the contraction coefficient to define how
far a contracted point should be when it is contracted from the worst point and the reflected point
9
in case the function value of the reflected point is smaller than the function value of the worst
point; ? is the expansion coefficient to define how far to expand from the reflected point in case a
simplex moves on the right direction). Depending on these coefficients ?, ?, ?, the volume of a
simplex will be changed by the operations of reflection, contraction, or expansion respectively.
The Nelder Mead?s simplex method can be summarized as following and more details can be
found in the original paper [3].
? Step 1: get ?, ?, ?, select an initial simplex with random vertices x0, x1,?, xn-1 and
calculate their function values.
? Step 2: sort the vertices x0.,x1,?, xn-1 of the current simplex so that f0, f1,?, fn-1 in the
ascending order.
? Step 3: calculate the reflected point xr, fr
? Step 4: if fr < f0:
(a) calculate the extended point xe, fe
(b) if fe < f0 , replace the worst point by the extended point xn = xe, fn-1 = fe
(c) if fe > f0 , replace the worst point by the reflected point xn = xr, fn-1 = fr
? Step 5: if fr > f0:
(a) if fr < fi, replace the worst point by the reflected point xn = xr, fn-1 = fr
(b) if fr > fi:
(b1) if fr > fn-1: calculate the contracted point xc, fc
(c1) if fc > fn-1 then shrink the simplex
(c2) if fcf(i) % f_P*>f_i and i#h
check=1;
break;
end
end
if check==0
SIMPLEX(end,:)=Ps; %replace Ph by P*
f(end)=f_Ps;
else
if f_Ps>f(end) %f_P*>f_h
Pss=c*SIMPLEX(end,:)+(1-c)*Pb; %calculate P** by expansion
f_Pss=feval(obj,Pss);
if f_Pss>f(end) %f(P**)>f(h)
for i=1:n+1
SIMPLEX(i,:)=(SIMPLEX(i,:)+SIMPLEX(1,:))/2; %replace all Pi' by (Pi+Pl)/2
88
f(i)=feval(obj,SIMPLEX(i,:));
end
else
SIMPLEX(end,:)=Pss; %replace Ph by P**
f(end)=f_Pss;
end
else
SIMPLEX(end,:)=Ps; %replace Ph by P*
f(end)=f_Ps;
end
end
end
% reorder and display iteration output
SIMPLEX(:,end+1)=f';
SIMPLEX=sortrows(SIMPLEX,n+1);
f=SIMPLEX(:,end)';
SIMPLEX(:,end)=[];
%calculate error
error(ite)=f(1);
t(ite)=ite;
% terminate condition3 for neural network training
if f(1)f(1)
%% using composite point
for i=1:n,
ord(i)=SIMPLEX(i,i);
end
f_ord=feval(obj,ord);
for i=1:n,
if mod(i,2)==0
grad(i)=(f(i-1)-f_ord)/(SIMPLEX(i-1,i)-ord(i));
else
grad(i)=(f(i+1)-f_ord)/(SIMPLEX(i+1,i)-ord(i));
end
end
Gs=SIMPLEX(1,:)-alpha*grad/sqrt(sum(grad.^2));
% Calculate reflected point
P3=(1+a)*SIMPLEX(1,:)-SIMPLEX(end,:);
P1=SIMPLEX(1,:);
P2=Gs;
PP=(P3-P1).*(P2-P1);
u=sum(PP)/sum((P2-P1).^2);
Gs=P1+u*(P2-P1);
f_Gs=feval(obj,Gs);
if f_Gsf(i) % f_P*>f_i and i#h
check=1;
break;
end
end
if check==0
SIMPLEX(end,:)=Ps; %replace Ph by P*
f(end)=f_Ps;
else
if f_Ps>f(end) %f_P*>f_h
Pss=c*SIMPLEX(end,:)+(1-c)*Pb; %calculate P** by expansion
f_Pss=feval(obj,Pss);
if f_Pss>f(end) %f(P**)>f(h)
for i=1:n+1
SIMPLEX(i,:)=(SIMPLEX(i,:)+SIMPLEX(1,:))/2; %replace all Pi' by (Pi+Pl)/2
f(i)=feval(obj,SIMPLEX(i,:));
end
else
SIMPLEX(end,:)=Pss; %replace Ph by P**
f(end)=f_Pss;
end
else
SIMPLEX(end,:)=Ps; %replace Ph by P*
f(end)=f_Ps;
end
end
end
% reorder and display iteration output
SIMPLEX(:,end+1)=f';
93
SIMPLEX=sortrows(SIMPLEX,n+1);
f=SIMPLEX(:,end)';
SIMPLEX(:,end)=[];
error(ite)=f(1);
t(ite)=ite;
% terminate condition3 for neural network training
if f(1)f(1)
% using hyper plane equation
I=SIMPLEX(:,1:n);
A=ones(1,n+1)';
A(:,2:n+1)=I;
B=f';
P=pinv(A)*B;
grad=P';
Gs=SIMPLEX(1,:)-alpha*grad(1,2:n+1);
% Calculate reflected point
P3=(1+a)*SIMPLEX(1,:)-SIMPLEX(end,:);
P1=SIMPLEX(1,:);
P2=Gs;
PP=(P3-P1).*(P2-P1);
u=sum(PP)/sum((P2-P1).^2);
Gs=P1+u*(P2-P1);
f_Gs=feval(obj,Gs);
if f_Gsf(i) % f_P*>f_i and i#h
check=1;
break;
end
end
if check==0
SIMPLEX(end,:)=Ps; %replace Ph by P*
f(end)=f_Ps;
else
if f_Ps>f(end) %f_P*>f_h
Pss=c*SIMPLEX(end,:)+(1-c)*Pb; %calculate P** by expansion
f_Pss=feval(obj,Pss);
if f_Pss>f(end) %f(P**)>f(h)
for i=1:n+1
SIMPLEX(i,:)=(SIMPLEX(i,:)+SIMPLEX(1,:))/2; %replace all Pi' by (Pi+Pl)/2
f(i)=feval(obj,SIMPLEX(i,:));
end
else
SIMPLEX(end,:)=Pss; %replace Ph by P**
f(end)=f_Pss;
end
else
SIMPLEX(end,:)=Ps; %replace Ph by P*
f(end)=f_Ps;
end
end
end
% reorder and display iteration output
SIMPLEX(:,end+1)=f';
SIMPLEX=sortrows(SIMPLEX,n+1);
f=SIMPLEX(:,end)';
97
SIMPLEX(:,end)=[];
error(ite)=f(1);
t(ite)=ite;
% terminate condition3 for neural network training
if f(1)