Neural Enhancement for Multiobjective Optimization Except where reference is made to the work of others, the work described in this dissertation is my own or was done in collaboration with my advisory committee. This dissertation does not include proprietary or classifled information. Aaron Garrett Certiflcate of Approval: John Hamilton Associate Professor Computer Science and Software Engi- neering Gerry Dozier, Chair Associate Professor Computer Science and Software Engi- neering Cheryl Seals Associate Professor Computer Science and Software Engi- neering Joe F. Pittman Interim Dean Graduate School Neural Enhancement for Multiobjective Optimization Aaron Garrett A Dissertation Submitted to the Graduate Faculty of Auburn University in Partial Fulflllment of the Requirements for the Degree of Doctor of Philosophy Auburn, Alabama May 10, 2008 Neural Enhancement for Multiobjective Optimization Aaron Garrett Permission is granted to Auburn University to make copies of this dissertation at its discretion, upon the request of individuals or institutions and at their expense. The author reserves all publication rights. Signature of Author Date of Graduation iii Vita Aaron Lee Garrett, son of Jerry Wayne Garrett and Sheila (Reece) Garrett, was born August 11, 1977, in Rome, Georgia. He graduated from Spring Garden High School as salutatorian in 1995. He attended Jacksonville State University in Jacksonville, Alabama, and graduated summa cum laude with a Bachelor of Science degree in Mathematics in De- cember, 1999. He immediately entered into graduate school at Jacksonville State University and received a Master of Science degree in Computer Science in August, 2002. He was then hired by the Computer Science department at Jacksonville State University, where he cur- rently holds a faculty position. He married Ashley Lynn Tawbush, daughter of John and Patricia (Carr) Tawbush, on May 31, 2003. In September of 2004, he was admitted into the Ph.D. program in Computer Science at Auburn University. And on January 17, 2008, he and his wife Ashley became the proud parents of their son Ian Bishop Garrett. iv Dissertation Abstract Neural Enhancement for Multiobjective Optimization Aaron Garrett Doctor of Philosophy, May 10, 2008 (M.S., Jacksonville State University, 2002) (B.S., Jacksonville State University, 1999) 219 Typed Pages Directed by Gerry Dozier In this work, a neural network approach is applied to multiobjective optimization prob- lems in order to expand the set of optimal solutions. The network is trained using results obtained from existing evolutionary multiobjective optimization approaches. The network is then evaluated based on its performance against those same approaches when given more processing time. The results are collected from a set of well-known benchmark mul- tiobjective problems, and its performance is evaluated using various indicators from the multiobjective optimization literature. Preliminary experiments reveal the viability of this approach for expanding the set of solutions to multiobjective problems. Further experiments prove that it is possible to train the neural network in a reasonable time using heuristic methods. The results of this training approach are shown to be very competitive with the underlying evolutionary mul- tiobjective optimization approach that was used to produce the training set. Additional experiments reveal the applicability of this approach across existing multiobjective opti- mization approaches. v Style manual or journal used Journal of Approximation Theory (together with the style known as \auphd"). Bibliograpy follows van Leunen?s A Handbook forScholars. Computer software used The document preparation package TEX (speciflcally LATEX) together with the departmental style-flle auphd.sty. vi Table of Contents List of Figures x List of Tables xiv 1 Introduction 1 1.1 Multiobjective Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.2 Aggregation-based Approach . . . . . . . . . . . . . . . . . . . . . . 3 1.1.3 Criterion-based Approach . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.4 Pareto-based Approach . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Evolutionary Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.1 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.2 Evolutionary Programming . . . . . . . . . . . . . . . . . . . . . . . 13 1.2.3 Evolution Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2.4 Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . 16 1.2.5 Application to Multiobjective Optimization . . . . . . . . . . . . . . 18 1.3 Neural Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.3.1 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.3.2 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.3.3 Neural Network Basics . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.4 Neural Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2 Literature Review 25 2.1 Evolutionary Multiobjective Optimization . . . . . . . . . . . . . . . . . . . 25 2.1.1 Vector Evaluated Genetic Algorithm . . . . . . . . . . . . . . . . . . 25 2.1.2 Multiobjective Genetic Algorithm . . . . . . . . . . . . . . . . . . . 26 2.1.3 Niched Pareto Genetic Algorithm . . . . . . . . . . . . . . . . . . . . 26 2.1.4 Nondominated Sorted Genetic Algorithm . . . . . . . . . . . . . . . 27 2.1.5 Pareto Archived Evolution Strategy . . . . . . . . . . . . . . . . . . 28 2.1.6 Strength Pareto Evolutionary Algorithm . . . . . . . . . . . . . . . . 28 2.1.7 Pareto Evelope-based Selection Algorithm . . . . . . . . . . . . . . . 31 2.1.8 Micro-Genetic Algorithm for Multiobjective Optimization . . . . . . 32 2.1.9 Multiobjective Particle Swarm Optimization . . . . . . . . . . . . . 33 2.1.10 ParEGO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2 Neural Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2.1 Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2.2 Backpropagation Feed-forward Networks . . . . . . . . . . . . . . . . 37 vii 2.2.3 Radial Basis Function Networks . . . . . . . . . . . . . . . . . . . . 40 2.2.4 General Regression Neural Networks . . . . . . . . . . . . . . . . . . 42 2.3 Applications of Learning to Multiobjective Optimization . . . . . . . . . . . 43 3 Neural Enhancement for Multiobjective Optimization 46 3.1 Test Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.1.1 OKA2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.1.2 DTLZ1a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.1.3 DTLZ2a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.1.4 DTLZ4a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.1.5 SDFLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.2 Performance Indicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3 NEMO | The Neural Enhancer . . . . . . . . . . . . . . . . . . . . . . . . 51 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.4.1 DTLZ1a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.4.2 DTLZ2a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4.3 DTLZ4a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4.4 OKA2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.4.5 SDFLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4 Neural Enhancement Training Algorithm 61 4.1 Test Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.1.1 KNO1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.1.2 OKA1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.1.3 OKA2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.1.4 VLMOP2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.1.5 VLMOP3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.1.6 DTLZ1a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.1.7 DTLZ2a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.1.8 DTLZ4a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.1.9 DTLZ7a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.1.10 SDFLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2 Performance Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.2.1 Training Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.2.2 Yield Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.2.3 Spacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.2.4 Hypervolume Indicator . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2.5 Binary ?+ Indicator . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.3.1 Evolve I/O and Evolve Single Sigma . . . . . . . . . . . . . . . . . . 76 4.3.2 Heuristic I/O and Evolve Single Sigma . . . . . . . . . . . . . . . . . 76 viii 4.3.3 Heuristic I/O and Evolve Multiple Sigmas . . . . . . . . . . . . . . . 78 4.3.4 Heuristic I/O and Heuristic Single Sigma . . . . . . . . . . . . . . . 79 4.3.5 Heuristic I/O and Heuristic Multiple Sigmas . . . . . . . . . . . . . 81 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.4.1 Evolve I/O and Evolve Single Sigma . . . . . . . . . . . . . . . . . . 82 4.4.2 Heuristic I/O and Evolve Single Sigma . . . . . . . . . . . . . . . . . 89 4.4.3 Heuristic I/O and Evolve Multiple Sigmas . . . . . . . . . . . . . . . 89 4.4.4 Heuristic I/O and Heuristic Single Sigma . . . . . . . . . . . . . . . 94 4.4.5 Heuristic I/O and Heuristic Multiple Sigmas . . . . . . . . . . . . . 97 4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5 Neural Enhancement Using Other Foundational Optimization Algo- rithms 103 5.1 Test Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.2 Performance Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.3.1 NSGA-II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.3.2 PAES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.3.3 MOPSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.4.1 NSGA-II versus NEMONSGA?II . . . . . . . . . . . . . . . . . . . . 109 5.4.2 NSGA-II versus NEMOMOPSO . . . . . . . . . . . . . . . . . . . . . 116 5.4.3 NSGA-II versus NEMOPAES . . . . . . . . . . . . . . . . . . . . . . 116 5.4.4 MOPSO versus NEMONSGA?II . . . . . . . . . . . . . . . . . . . . 129 5.4.5 MOPSO versus NEMOMOPSO . . . . . . . . . . . . . . . . . . . . . 130 5.4.6 MOPSO versus NEMOPAES . . . . . . . . . . . . . . . . . . . . . . 136 5.4.7 PAES versus NEMONSGA?II . . . . . . . . . . . . . . . . . . . . . . 149 5.4.8 PAES versus NEMOMOPSO . . . . . . . . . . . . . . . . . . . . . . . 150 5.4.9 PAES versus NEMOPAES . . . . . . . . . . . . . . . . . . . . . . . . 156 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 6 Conclusions and Future Work 171 Bibliography 176 Appendices 182 A Results from Training Algorithm Experiments 183 ix List of Figures 1.1 Example Multiobjective Problem . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Example Minimization via Evolutionary Computation . . . . . . . . . . . . 9 2.1 Illustration of a Basic Perceptron . . . . . . . . . . . . . . . . . . . . . . . . 36 2.2 Illustration of the Exclusive-Or Function . . . . . . . . . . . . . . . . . . . . 38 2.3 Illustration of a Backpropagation Neural Network . . . . . . . . . . . . . . . 40 3.1 Overlay of Pareto optimal fronts for DTLZ1a from NSGA-II and GRNN . . 56 3.2 Overlay of Pareto optimal fronts for DTLZ2a from NSGA-II and GRNN . . 56 3.3 Overlay of Pareto optimal fronts for DTLZ4a from NSGA-II and GRNN . . 57 3.4 Overlay of Pareto optimal fronts for DTLZ4a from NSGA-II and GRNN using small value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.5 Overlay of Pareto optimal fronts for OKA2 from NSGA-II and GRNN . . . 59 3.6 Overlay of Pareto optimal fronts for SDFLP from NSGA-II and GRNN . . 59 4.1 Comparison of Training Method Impact on Training Time with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.2 Comparison of Training Method Impact on Yield Ratio with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3 Comparison of Training Method Impact on NEMO Spacing with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.4 Comparison of Training Method Impact on NEMO Hypervolume with Stan- dard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.5 Comparison of Training Method Impact on NSGA-NEMO Binary ?+ Indi- cator with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . 87 x 4.6 Comparison of Training Method Impact on NEMO-NSGA Binary ?+ Indi- cator with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . 88 4.7 EIO-ESS Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.8 EIO-ESS E?cient Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.9 HIO-ESS Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.10 HIO-ESS E?cient Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.11 HIO-EMS Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.12 HIO-EMS E?cient Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.13 HIO-HSS Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.14 HIO-HSS E?cient Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.15 HIO-HMS Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.16 HIO-HMS E?cient Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.1 Pareto Size Indicators for NSGA-II and NEMONSGA?II with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.2 Spacing Indicators for NSGA-II and NEMONSGA?II with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.3 Hypervolume Indicators for NSGA-II and NEMONSGA?II with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.4 ?+ Indicators for NSGA-II and NEMONSGA?II with Standard Error Bars. 114 5.5 Pareto Size Indicators for NSGA-II and NEMONSGA?II with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.6 Spacing Indicators for NSGA-II and NEMONSGA?II with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.7 Hypervolume Indicators for NSGA-II and NEMONSGA?II with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.8 ?+ Indicators for NSGA-II and NEMONSGA?II with Standard Error Bars. 122 xi 5.9 Pareto Size Indicators for NSGA-II and NEMOPAES with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.10 Spacing Indicators for NSGA-II and NEMOPAES with Standard Error Bars. 126 5.11 Hypervolume Indicators for NSGA-II and NEMOPAES with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.12 ?+ Indicators for NSGA-II and NEMOPAES with Standard Error Bars. . . 128 5.13 Pareto Size Indicators for MOPSO and NEMONSGA?II with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.14 Spacing Indicators for MOPSO and NEMONSGA?II with Standard Error Bars.133 5.15 Hypervolume Indicators for MOPSO and NEMONSGA?II with Standard Er- ror Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.16 ?+ Indicators for MOPSO and NEMONSGA?II with Standard Error Bars. . 135 5.17 Pareto Size Indicators for MOPSO and NEMOMOPSO with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.18 Spacing Indicators for MOPSO and NEMOMOPSO with Standard Error Bars.139 5.19 Hypervolume Indicators for MOPSO and NEMOMOPSO with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 5.20 ?+ Indicators for MOPSO and NEMOMOPSO with Standard Error Bars. . . 142 5.21 Pareto Size Indicators for MOPSO and NEMOPAES with Standard Error Bars.144 5.22 Spacing Indicators for MOPSO and NEMOPAES with Standard Error Bars. 146 5.23 Hypervolume Indicators for MOPSO and NEMOPAES with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 5.24 ?+ Indicators for MOPSO and NEMOPAES with Standard Error Bars. . . . 148 5.25 Pareto Size Indicators for PAES and NEMONSGA?II with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 5.26 Spacing Indicators for PAES and NEMONSGA?II with Standard Error Bars. 153 xii 5.27 Hypervolume Indicators for PAES and NEMONSGA?II with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 5.28 ?+ Indicators for PAES and NEMONSGA?II with Standard Error Bars. . . 155 5.29 Pareto Size Indicators for PAES and NEMOMOPSO with Standard Error Bars.158 5.30 Spacing Indicators for PAES and NEMOMOPSO with Standard Error Bars. 159 5.31 Hypervolume Indicators for PAES and NEMOMOPSO with Standard Error Bars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 5.32 ?+ Indicators for PAES and NEMOMOPSO with Standard Error Bars. . . . 162 5.33 Pareto Size Indicators for PAES and NEMOPAES with Standard Error Bars. 164 5.34 Spacing Indicators for PAES and NEMOPAES with Standard Error Bars. . 166 5.35 Hypervolume Indicators for PAES and NEMOPAES with Standard Error Bars.167 5.36 ?+ Indicators for PAES and NEMOPAES with Standard Error Bars. . . . . 168 xiii List of Tables 3.1 Parameters for Genetic Algorithm for Training NEMO . . . . . . . . . . . . 52 3.2 Summary of NEMO Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.1 NEMO Solutions for SDFLP Compared with NSGA-II (half) . . . . . . . . 82 5.1 Parameters for NSGA-II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.2 Parameters for PAES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.3 Pareto Size Indicators for NSGA-II and NEMONSGA?II . . . . . . . . . . . 109 5.4 Spacing Indicators for NSGA-II and NEMONSGA?II . . . . . . . . . . . . . 111 5.5 Hypervolume Indicators for NSGA-II and NEMONSGA?II . . . . . . . . . . 111 5.6 ?+ Indicators for NSGA-II and NEMONSGA?II . . . . . . . . . . . . . . . . 115 5.7 Pareto Size Indicators for NSGA-II and NEMOMOPSO . . . . . . . . . . . . 117 5.8 Spacing Indicators for NSGA-II and NEMOMOPSO . . . . . . . . . . . . . . 117 5.9 Hypervolume Indicators for NSGA-II and NEMOMOPSO . . . . . . . . . . . 120 5.10 ?+ Indicators for NSGA-II and NEMOMOPSO . . . . . . . . . . . . . . . . . 120 5.11 Pareto Size Indicators for NSGA-II and NEMOPAES . . . . . . . . . . . . . 123 5.12 Spacing Indicators for NSGA-II and NEMOPAES . . . . . . . . . . . . . . . 125 5.13 Hypervolume Indicators for NSGA-II and NEMOPAES . . . . . . . . . . . . 125 5.14 ?+ Indicators for NSGA-II and NEMOPAES . . . . . . . . . . . . . . . . . . 129 5.15 Pareto Size Indicators for MOPSO and NEMONSGA?II . . . . . . . . . . . 130 5.16 Spacing Indicators for MOPSO and NEMONSGA?II . . . . . . . . . . . . . 132 xiv 5.17 Hypervolume Indicators for MOPSO and NEMONSGA?II . . . . . . . . . . 132 5.18 ?+ Indicators for MOPSO and NEMONSGA?II . . . . . . . . . . . . . . . . 136 5.19 Pareto Size Indicators for MOPSO and NEMOMOPSO . . . . . . . . . . . . 137 5.20 Spacing Indicators for MOPSO and NEMOMOPSO . . . . . . . . . . . . . . 137 5.21 Hypervolume Indicators for MOPSO and NEMOMOPSO . . . . . . . . . . . 140 5.22 ?+ Indicators for MOPSO and NEMOMOPSO . . . . . . . . . . . . . . . . . 140 5.23 Pareto Size Indicators for MOPSO and NEMOPAES . . . . . . . . . . . . . 143 5.24 Spacing Indicators for MOPSO and NEMOPAES . . . . . . . . . . . . . . . 145 5.25 Hypervolume Indicators for MOPSO and NEMOPAES . . . . . . . . . . . . 145 5.26 ?+ Indicators for MOPSO and NEMOPAES . . . . . . . . . . . . . . . . . . 149 5.27 Pareto Size Indicators for PAES and NEMONSGA?II . . . . . . . . . . . . . 150 5.28 Spacing Indicators for PAES and NEMONSGA?II . . . . . . . . . . . . . . . 152 5.29 Hypervolume Indicators for PAES and NEMONSGA?II . . . . . . . . . . . . 152 5.30 ?+ Indicators for PAES and NEMONSGA?II . . . . . . . . . . . . . . . . . 156 5.31 Pareto Size Indicators for PAES and NEMOMOPSO . . . . . . . . . . . . . 157 5.32 Spacing Indicators for PAES and NEMOMOPSO . . . . . . . . . . . . . . . 157 5.33 Hypervolume Indicators for PAES and NEMOMOPSO . . . . . . . . . . . . 160 5.34 ?+ Indicators for PAES and NEMOMOPSO . . . . . . . . . . . . . . . . . . 160 5.35 Pareto Size Indicators for PAES and NEMOPAES . . . . . . . . . . . . . . . 163 5.36 Spacing Indicators for PAES and NEMOPAES . . . . . . . . . . . . . . . . . 165 5.37 Hypervolume Indicators for PAES and NEMOPAES . . . . . . . . . . . . . 165 5.38 ?+ Indicators for PAES and NEMOPAES . . . . . . . . . . . . . . . . . . . 169 xv A.1 Evolve I/O and Single Sigma Indicators . . . . . . . . . . . . . . . . . . . . 187 A.2 Heuristic I/O and Evolve Single Sigma Indicators . . . . . . . . . . . . . . . 191 A.3 Heuristic I/O and Evolve Multiple Sigmas Indicators . . . . . . . . . . . . . 195 A.4 Heuristic I/O and Heuristic Single Sigma Indicators . . . . . . . . . . . . . 199 A.5 Heuristic I/O and Heuristic Multiple Sigmas Indicators . . . . . . . . . . . 203 xvi Chapter 1 Introduction Optimization problems are commonly found in real world applications. However, many optimization problems involve multiple, often con icting, objectives [1]. These multiobjec- tive optimization problems are often much more di?cult to solve than single-objective problems. Typically, solutions to such problems are actually sets of solutions, each of which represents a particular trade-ofi for the objectives in question [2]. The more elements in such a set, the more options that are available as possible solutions [2]. Many multiobjective optimization techniques have been developed over the years [3], and, most recently, evolutionary computation approaches have been applied with great success [1]. Approaches like the vector evaluated genetic algorithm [1], the multiobjec- tive particle swarm optimizer [4], the nondominated sorting genetic algorithm [5,6], and ParEGO [7] have all been shown to be efiective. However, each of these approaches only produce a small number of nondominated solutions in a given number of function eval- uations. Increasing the size of this solution set has proven to be an extremely di?cult problem [8]. In this work, a neural network system is described that can, in essence, learn the areas where nondominated solutions are found. In this way, many such solutions can be generated without the need for a large number of function evaluations. 1 1.1 Multiobjective Optimization A multiobjective optimization problem is deflned as follows [1]: Minimize [f1(~x);f2(~x);??? ;fk(~x)] subject to the m inequality constraints gi(~x) ? 0 i = 1;2;:::;m and the p equality constraints hi(~x) = 0 i = 1;2;:::;p wherek isthenumberofobjectivefunctionsandfi :Rn !R. Thevector~x = [x1;x2;:::;xn] is referred to as the vector of decision variables [1]. We wish to flnd the values x?1;x?2;:::;x?n that yield the optimum values for all the objective functions. Even with such a clean, mathematical deflnition, the di?culties that arise in multiob- jective optimization are obvious. First and foremost, it is extremely unlikely (except in a few, handpicked problems) that a single assignment of values for the decision variables will yield an optimum value for all the objective functions. Instead, decision variable assign- ments will represent \trade-ofis" in the objective function values. The goal in multiobjective optimization is to present the decision maker with a wide range of possible trade-ofis so that the best choice can be made for the situation at hand [2]. 2 1.1.1 An Example In order to understand the complexities of multiobjective optimization, an example is in order. Suppose the following functions are to be minimized: f1 = x2 f2 = (x?2)2 x 2 [0;2] These objective functions are depicted in Figure 1.1. The best trade-ofi appears to occur at x = 1:0, even though each objective function, if considered individually, would produce minimum values at x = 0:0 and x = 2:0, respectively. In this case, intuition may be somewhat misleading, depending on our deflnition of optimality in the context of multiple objectives. When there are multiple possible trade-ofis, it is important to clearly deflne what it means for a solution to be optimal. The following subsections discuss three difierent approaches for such a deflnition. 1.1.2 Aggregation-based Approach Single-objective optimization methods have existed for hundreds of years [3]. In order to leverage those methods, one approach for multiobjective optimization is to convert such a problem into a single objective. This is called an aggregation-based approach [2] or an aggregating function [1]. Typically, a weighted sum of the objective values is used as the new scalar optimization function. For example, given objective functions fi(x) for i = 1;2;:::;k, 3 +++++++++++++++++ +++++++ +++++ +++++ ++++ ++++ +++++ +++++ ++++ ++++ ++++ ++++ ++++ ++++ +++++++++ ++++ ++++ ++++ ++++ ++++ +++++ +++++ ++++ ++++ ++++ +++++ +++++++ +++++++++++ +++++++ 0.0 0.5 1.0 1.5 2.0 0 1 2 3 4 Decision Variable Objective Value ++ f1f 2 Figure 1.1: Example Multiobjective Problem the aggregating function F(x) would be F(x) = kX i=1 wifi(x) where kX i=1 wi = 1: Such an approach produces only one solution for a given choice of weights. There- fore, to produce multiple solutions (as is typically required for multiobjective optimization 4 problems), the process must be repeated many times. Likewise, since good choices for the weight values are often unknown, those values are usually varied before each new run [1]. One of the greatest beneflts of using aggregating functions, as mentioned above, is that there are many difierent single objective optimization methods that can be applied to the transformed problem. However, there are drawbacks. One is, of course, the di?culty of choosing appropriate weights for each objective. An even greater di?culty is that these weighted sums can only generate portions of the Pareto optimal front that are convex [9]. If the Pareto optimal front is non-convex, then a weighted-sum approach will fail to flnd the points that lie on the concave portion of the front [9]. Additionally, an even spread of weights (for multiple runs of the optimizer) will often fail to produce an even spread of points on the Pareto optimal front [9]. In fact, only curves fltting a speciflc shape will produce an even spread of solutions given an even spread of weights [9]. There are also non- linear aggregation-based approaches that have been used for multiobjective optimization. These approaches are typically not limited by the convexity of the Pareto optimal front but have been criticized by the multiobjective community [10], possibly because of the current emphasis on Pareto preference. 1.1.3 Criterion-based Approach Another approach to turn multiobjective problems into single objective problems is to consider only one objective at a time. This is referred to as a criterion-based approach [2] or lexicographic ordering [10]. In the simplest case, the objectives are ranked in order of importance. The one that is considered most important is optimized flrst as a single objective problem. Then, the next objective is optimized without decreasing the quality 5 of the flrst. This process is repeated until all objectives have been considered. In a more complex evolutionary approach, criterion based methods can be used to vary which objective function will determine the selection of an individual [2]. In such cases, the choice of objectives can be random or probabilistic according to a user-deflned set of probabilities, or those probabilities can be varied during the run of the algorithm [2]. 1.1.4 Pareto-based Approach Since solutions to multiobjective optimization problems actually represent trade-ofis between some objectives in favor of others, there is no clear way to deflne what is meant by a \good" solution. To overcome this di?culty, another type of metric is used, known as Pareto preference [10]. Using this metric, a solution s1 is said to dominate another solution s2 if s1 is no worse than s2 in any objective and if s1 is strictly better than s2 in at least one objective. In mathematical terms, solution s1 is said to dominate solution s2 if and only if 8i;1 ? i ? k;fi(s1) ? fi(s2) and 9p;1 ? p ? k;fp(s1) < fp(s2): The set of all globally nondominated solutions for a particular multiobjective problem is referred to as the Pareto optimal set or the e?cient set [10]. The image of the Pareto optimal set under the objective functions is called the Pareto optimal frontier (or often just the Pareto optimal front) [10]. Finally, since the global Pareto optimal set is often unknown, most multiobjective optimization approaches actually produce a Pareto set approximation 6 [2]. In this work, the term \Pareto optimal front" is used to refer both to the Pareto optimal frontier as well as the approximation to the Pareto optimal frontier, where the particular meaning will be clear from the context. 1.2 Evolutionary Computation The beginnings of evolutionary computation (EC) can be traced back as far as the late 1950?s, but it was the last two decades that saw truly exponential increases in the amount of attention given to the fleld [11]. EC can be thought of more as a problem solving strategy than a one-size-flts-all tool [11,12]. At its core, an EC attempts to mimic the biological process of evolution in order to solve a given problem [13]. This, consequently, suggests that there should be a simulated analog to the biological process of reproduction and some measure of or mechanism for survivability. Indeed, these must be deflned for any EC. However, before these processes can be deflned, it is necessary flrst to deflne what makes up an individual. An individual is simply a candidate solution. As such, it can be represented in any way that the designer chooses, and most representations are problem-speciflc. The manner in which a candidate solution is represented is known as the genotype [11] (in keeping with the biological analogy). Since two or more solutions must be compared with one another in order to determine which is \better", it is necessary to also have some measure of \goodness", which in the EC literature is known as the fltness of an individual [11]. The fltness is a measureofhowwellonecandidatesolutionsolvesaproblem, sometimesincomparisontothe performance of other candidate solutions. Usually, candidate solutions cannot be compared (and thus their fltness values cannot be assessed) by simply looking at the genotype values. 7 Instead, the candidate solution undergoes a transformation to convert its genotype into the corresponding phenotype [11], which can be thought of as its representation in the fltness space. To make this discussion more clear, an example is in order. Suppose we wish to flnd the real value for ?10 ? x ? 10 such that it minimizes the function y = f(x) = x2: (Clearly, the minimum value will occur when x = 0.) In the simplest case, we would say that the x value represents the genotype of the candidate solution, and the set of all candidate solutions is simply the line from ?10 to 10. The phenotype for a given x value, in this case, is also that same x value. (In other words, the mapping function between genotype space and phenotype space is simply the identity function.) The phenotype space is simply all of the possible values for x, which in this case is the curve representing f(x) = x2. The fltness for a candidate solution x is simply the value of y at that point, y = x2. These concepts are illustrated in Figure 1.2. Once the representation is chosen, the evolutionary operators must be specifled. These operators deflne the mechanisms of variation and selection [11]. Variation determines how new solutions are created from existing solutions (i.e., reproduction) and how existing so- lutions are modifled to explore new areas of the search space (i.e., mutation). Selection determines how solutions are chosen to remain viable candidates or to participate in the creation of new solutions. In some cases, one or more of these operators are simply identity functions, which means that they have no actual function. However, when viewed in this way, all of the EC approaches are simply variations on this basic theme [11]. 8 ?10 ?5 0 5 10 0 20 40 60 80 100 x y Figure 1.2: Example Minimization via Evolutionary Computation The basic EC is presented in Listing 1.1. Here, Pt is a set of candidate solutions (the population) at time t. Typically, the population is of a flxed size. The Evaluate() function performs the mapping from genotype to phenotype space and assigns appropriate fltness values to each individual in the population. The Variation() function takes the current population and produces a set of new individuals, P0t. This function need not make use of all of the individuals in the current population. Often, only a subset are chosen (internal to the function) and used for variation. Finally, the Select() function performs the selection of individuals (chosen from the union of the original and modifled individuals) which will make up the population in the next generation. 9 ALGORITHM EvolutionaryComputation() t ? 0 Initialize(Pt) // Pt represents the population at time t Evaluate(Pt) functionEvaluations ? Size(Pt) WHILE functionEvaluations < MAX_FUNCTION_EVALUATIONS LOOP P0t ? Variation(Pt) // P0t represents the offspring // of the population at time t Evaluate(P0t) functionEvaluations ? functionEvaluations + Size(P0t) Pt+1 ? Select(Pt [ P0t) t ? t+1 END LOOP Listing 1.1: Basic Evolutionary Computation Animportantpointofwhichtotakenotehereisthatthenumberoffunction evaluations is critical when comparing two evolutionary computation algorithms. A function evaluation is simply one mapping from genotype to fltness. (The mapping is often called the evaluation function or fltness function, so the term \function evaluations" is appropriate.) That is the typical processing unit used in the EC literature when referring to the e?ciency of a particular algorithm. According to B?ack et al [11], the majority of current evolutionary computation im- plementations come from three difierent, but related, areas { genetic algorithms [14{17], evolutionary programming [11,18,19], and evolution strategies [19,20]. However, as stated earlier, each area is deflned by its choice of representation and/or operators. These choices are discussed in the following sections. 10 1.2.1 Genetic Algorithms Genetic algorithms were flrst proposed by John Holland in the 1960?s [14] in an at- tempt to leverage the power of natural selection to solve di?cult optimization problems. In the canonical genetic algorithm, sometimes referred to as the simple genetic algorithm (SGA) [15,17], the genotype for a candidate solution is represented as a binary string. The evaluation function is then a mapping from a binary string to some real-valued measure of fltness (as determined from the phenotype). As in all EC approaches, the fltness function is entirely problem-dependent. The selection operator used in an SGA is fltness proportionate selection [15], sometimes called \roulette wheel" selection. In this type of selection, the probability of selecting a particular individual increases proportional to that individual?s fltness. In some sense, the size of the individual?s \roulette wheel section" increases with its fltness. In this way, the more flt an individual, the more likely it is to survive. However, this operator is stochastic, so it is possible that even the strongest individual will not survive (and, likewise, the weakest individual may survive). Asimilaryetslightlymorecomplexselectionprocedureisremainder stochastic selection [15]. In this type of selection, any individual with an above average fltness is guaranteed to participate in creating the next generation of individuals, and the remaining slots are randomly assigned to individuals in a manner proportional to how well their fltness values compare to the average fltness. In this setup, the objective fltness for an individual (as determined by the fltness function) is modifled to produce a relative fltness value as follows: fri = fi?f 11 where fri is the relative fltness of individual i, fi is the objective fltness of individual i, and ?f is the average fltness of the current population. Another popular selection method is tournament selection [15]. In this type of selec- tion, a subset of candidate solutions is chosen from the existing population against each of which the individual in question is compared (as if that individual is competing in a single- elimination tournament against all individuals in the subset). If the individual?s fltness is better than all candidate solutions in the subset, then that individual is selected. The size of the subset (often called the tournament size [15]) can be used to control the selection pressure [15]. For instance, suppose that a particular individual has a relative fltness of 2.36. Then that individual automatically gets 2 copies (from the whole part of the relative fltness) that are allowed to participate in the reproduction phase. Additionally, the individual has a 36% chance of being selected again to flll any remaining slot. In contrast, and individual with a relative fltness of 0.75 would have no guaranteed copies in the reproduction phase, but it would have a 75% chance to flll any remaining slots. After selection is performed, the participating individuals undergo variation through crossover and mutation. Each of these operators is performed according to some probability of occurrence (typically denoted pc and pm, respectively) that must be specifled as param- eters to the system. The variation operators used in an SGA are single-point crossover [15] and bit- ip mutation [15]. In single-point crossover, two individuals (i.e., binary strings) are chosen, along with a single recombination point that determines the position in each string that will be \cut". The individuals are then recombined at that point to form two new individuals. This can be understood more clearly in the following example (where the 12 vertical bar represents the recombination point): Parent A: XXXXXXX j XX Parent B: YYYYYYY j YY Child 1: XXXXXXXYY Child 2: YYYYYYYXX This operation is applied to randomly selected parents with probability pc, which is typically set to be a fairly high (e.g., 0.75) value. Bit- ip mutation simply means that each bit in a newly created binary string is changed to the opposite value with probability pm, which is typically set to be a very low (e.g., 0.01) value. The resultant population is made up entirely of the newly-created ofispring. This is known as generational replacement [11], which means that no individuals from the previous generation are allowed to survive to the succeeding generations. This type of selection strat- egy (where here we mean survivor selection instead of parent selection) can be augmented with elitism [11], which would allow some proportion (as determined by system parame- ters) of the most flt individuals to survive into the next generation. Additionally, some genetic algorithms make use of steady-state replacement [11], in which only one ofispring is created in a given generation, and this ofispring always replaces the least-flt individual in the current population. 1.2.2 Evolutionary Programming In the early 1960?s, Lawrence Fogel attempted to use simulated evolution, which he called Evolutionary Programming (EP), to create artiflcial intelligence [18,19]. In this 13 seminal work, flnite state machines (FSMs) were evolved to predict future symbols from a given input stream [19]. Using a FSM representation of the individuals in the population required novel variation operators. The following operators were used in the work: changing an output symbol, changing a state transition, adding a state, deleting a state, and changing a state. The fltness of a given FSM was determined by how accurate its predictions were, given the sequence of input symbols. More recently, EP approaches have been applied to real-valued, continuous optimization problems, but these approaches are similar to the approaches used in Evolutionary Strategies [19], which are discussed below. 1.2.3 Evolution Strategies At the same time that Holland was developing the genetic algorithm, Rechenberg was independently discovering a technique for using natural selection for optimization problems, which he termed evolution strategies [20]. The simplest version of an evolution strategy (ES) is what is known as a two-membered ES [20] or, more commonly, a (1+1)-ES. In this scenario, a single individual, represented as a vector of real values, comprises the population. At each generation, that individual is mutated (the variation operator) along each dimension using a Gaussian distribution with zero mean and a given variance (provided as a parameter to the system) to produce an ofispring. The fltness values for both the parent and the ofispring are compared, and the better of the two individuals is allowed to become the parent in the next generation. It was discovered [20] that online adjustment of the mutation rate (i.e., the variance of the normal distribution) could provide better performance. This online adjustment is known as the 15 success rule [20], which states that around 15 of the mutations should be 14 successful. If the actual number of successful mutations is greater than 15, increase the variance. If the number is less than 15, decrease the variance. In addition to online adjustment of the variance, more sophisticated versions of evolu- tionary strategies can also include the particular variance as a part of the genotype to be evolved [20]. It is also possible to evolve and use a difierent variance along each dimension of the problem [20], thus allowing the search for a solution to conform more appropriately to the topology of the search space. When variances are included in the genotype, an additional parameter is needed to serve as the variance used to mutate the evolved variances. The (1+1)-ES did not truly make use of the idea of a population of individuals, so this concept was generalized and extended to yield the (? + 1)-ES [20]. In this system, a population of ? individuals is maintained in each generation. Additionally, a reproduc- tion operator is included that selects two (or more) individuals from this population and recombines them to form a new individual. This recombination is simply a random selec- tion of each component from the parents. Once the new individual is created, it undergoes mutation as mentioned above. Finally, each ofispring is added to the population if it is better than the least flt individual, producing the new population for the next generation. This approach can be and has been [20], of course, extended to a (? + ?)-ES, in which ? individuals produce ? ofispring. The best ? individuals of the ? + ? individuals are then chosen to survive. It is also possible to provide somewhat of an analog to the generational replacement of a GA within an ES. This approach is known as a (?, ?)-ES (where ? must be greater than or equal to ?) [20]. In such a scenario, the ? individuals are used to create ? ofispring, and 15 from those ofispring only, ? individuals are chosen to comprise the population in the next generation. 1.2.4 Particle Swarm Optimization In addition to the evolutionary computation techniques described above, another na- ture-inspired optimization algorithm has also been applied to multiobjective optimiza- tion problems. A technique called Particle Swarm Optimization (PSO) was developed by Kennedy and Eberhart in 1995 [21]. Inspired by the movement of bird ocks and insect swarms, they attempted to develop a model of swarm behavior that could be used to solve optimization problems. To create the analogy, they imagined a ock of birds ying around in search of a corn fleld. Each bird was capable of remembering the best location it had found, and each was capable of knowing the best location that any of the birds had found. The birds were allowed to y around while being pulled toward both their individual best locations and the ock?s best location. Kennedy and Eberhart found that their simulation of this analogy produced very realistic-looking behavior in their virtual ocks [21]. In the PSO model presented in [21] and expanded in [22], each particle is composed of three vectors: x, p, and v. These represent the particle?s current location, best location found, and velocity, respectively. These vectors are each of the same dimensionality as the search space. Additionally, each particle maintains a two values { one corresponding to the fltness of the x vector and the other to the fltness of the p vector. As the particles in the swarm move around the search space, their velocities are flrst updated according to the following equation: vid = vid +?1R1(pid ?xid)+?2R2(gid ?xid) 16 In this equation, vid is the velocity of the ith particle along the dth dimension. The g vector represents the best location found by the ock, and R1 and R2 are uniform random values chosenfrom the interval[0;1]. Finally, ?1 and?2 are twoconstantsthatcontrolthe in uence of the personal best and the global best locations, respectively, on the particle?s velocity. These values are often referred to as cognitive and social learning rates, respectively [22]. After the velocity vector is updated, the particle?s location is updated by applying the following equation: xid = xid +vid At this point, the new location?s fltness is evaluated and compared to the fltness of the particle?s personal best location. If the new location is better, then it also becomes the new personal best location for the particle. The topology for a swarm refers to the structure of the neighborhood for each particle. In a star topology, all the particles exist in the same neighborhood, so the global best vector represents the best location found by any particle in the swarm. In contrast, a ring topology arranges the particles into overlapping neighborhoods of size ?. The global best vector in this type of topology represents the best location found by any particle in that particle?s neighborhood. In 1999, Maurice Clerc introduced an improvement to the equation for updating the velocity of a particle [23]. He introduced a constant to be multiplied to the new veloc- ity before updating the location of the particle. He called this constant the constriction coe?cient [23]. The calculation of this coe?cient is as follows: K = 2flfl fl2??? p?2 ?4?flfl fl 17 In this equation, ? = ?1+?2 and ? > 4. The constriction coe?cient is used to restrain the velocity vector of each particle so that it does not grow unbounded. Finally, various other models have been proposed as alternatives to the so-called full model presented above [24]. The cognitive-only model sets ?1 to 0, while the social-only model sets ?2 to 0. A sel ess model was also developed which was identical to the social- only model except that a particle?s personal best was not included in the search for that particle?s neighborhood?s global best [24]. 1.2.5 Application to Multiobjective Optimization The flrst application of evolutionary computation to the problem of multiobjective op- timization was Schafier?s Vector Evaluated Genetic Algorithm (VEGA) [25] in 1985 [26]. Since then, evolutionary multiobjective optimization has been an incredibly active area of research [26]. This is due to the natural union of evolutionary computation and multiob- jective optimization, which stems from the fact that EC algorithms are generally very good optimizers and they work simultaneously on a set of candidate solutions (i.e., the popula- tion) [26]. Since multiobjective optimization problems typically require a set of solutions, rather than just a single solution, the flnal population from an EC algorithm provides just such a set. The next chapter will discuss in detail several of the most popular evolutionary multiobjective (EMO) algorithms. 18 1.3 Neural Computation Also inspired from natural phenomena, neural computation attempts to model the con- nectionist learning that occurs within the brain in order to produce a computational learn- ing/prediction system [27]. Examples of such systems include simple perceptrons [28,29], backpropagation feed-forward artiflcial neural networks [30], radial basis function networks [31], and general regression neural networks [32]. Neural network approaches can accom- plish unsupervised tasks, such as clustering and data mining, as well as supervised tasks, such as function approximation and prediction. 1.3.1 Unsupervised Learning Unsupervised learning [27] simply refers to learning in which the system is not given the \right answer." For example, it is not trained on pairs consisting of an input and the desired output. Instead, the system is given a set of input patterns and is left to flnd interesting patterns, regularities, or clusterings among them. In these systems, the inputs are clustered together into categories according to how similar they are to the other inputs in the dataset. Inputs that are similar enough (this similarity threshold is generally controlled by parameters to the system) to one another are grouped into the same category. The output from an unsupervised system for a given input is generally the category into which the input was assigned. For this reason, an unsupervised system can be thought of as a mapping from inputs to categories. There are a number of neural network systems that perform unsupervised learning. Two examples include self-organizing maps [33] and Adaptive Resonance Theory neural networks [34]. 19 1.3.2 Supervised Learning Supervised learning [27] occurs when the learning algorithm is provided with a set of inputs along with the corresponding correct outputs. The learning in this case involves an algorithm that compares the system?s output with the correct (supervised) output. Hence, the system is provided with a measure of error. Via many difierent learning approaches, the system can be modifled so as to minimize this error in order to produce correct outputs for given inputs. Many neural network systems, such as backpropagation networks [30] and general regression neural networks [32], make use of supervised learning. 1.3.3 Neural Network Basics There are millions of neurons within the human brain [35], and each of these neurons is connected to countless others. Thus, among the millions of neurons there are billions of connections. With this arrangement, humans have the ability to learn enormous amounts of information without forgetting things previously learned. In light of the current state of artiflcial intelligence, this is a truly remarkable feat of evolution and biology. In an efiort to simulate such a learning mechanism, computer scientists have developed what are referred to as neural networks, which are simply mathematical models, in the hopes that software agents that embed this behavior can perform similarly to their biological counterparts. To more adequately understand the model, however, it is important flrst to understand the biological mechanism of learning. 20 Biological Neurons Each neuron has three basic parts { the axon, the soma, and the dendrites [35]. The axon of a neuron is a long flbrous connector that carries the output of the neuron. The soma is the cell body of the neuron. The dendrites are the receivers of the input from other neurons. Axons are connected to the dendrites of many other neurons. Each axon is separated from its connecting dendrite by a small gap known as a synapse [35]. Within the axon, chemicals called neurotransmitters are stored [35]. When a signal is to be passed from one neuron to another, a signal is sent from the soma down the axon, which triggers the release of these neurotransmitters. They then ow across the synapse where they bind to receptors on the dendrites of other neurons. If su?cient neurotransmitters are collected by the synapse, a signal is sent to the soma of the receiving neuron [35]. It is important to note that the flring pattern of neurons is \all or none", which means that their output can be considered binary. When novel inputs trigger learning within neurons, those neurons undergo a change. The e?ciency of transmission of signals between neurons is increased or decreased in order to minimize the error [35]. This modulation of the transmission e?ciency is accomplished by increasing or decreasing the amount of neurotransmitter found in the synapse between neurons. To illustrate this idea, imagine that each neuron, rather than conducting electro- chemical signals, actually conducted water. Each neuron would receive water through its dendrites, and it would transmit water through its axon. (Imagine that there are tiny valves at the end of the axon that connect the neuron to the dendrites of its neighbors.) When learning occurs within a neuron, the ow of water from that neuron to its neighbors is modulated by changing the state of those valves. The more open the valve is, the more 21 water will be passed on to the neighboring neurons. In this manner, learning may occur within the brain by the changing the strength of signal transmission between neurons. Artiflcial Neurons If each neuron is considered to be an information-processing unit, then the human brain can be thought of as a massively parallel computing machine [35]. Neural networks are software implementations of the neural dynamics and connectivity within the human brain [35]. A neural network is made up of neurons and the weighted connections between them. These weighted connections provide the capability of modulating the signal transmission between neurons. If the weight is high (e.g., the valve is completely open), then the signal transmission from that neuron will be very strong. On the other hand, if the weight is low (e.g., the valve is nearly closed), then the signal transmission from that neuron will be very weak. Thus, by modifying these weights, the network is able to learn to perform a given task. Neural networks have two major components [35] { a function that determines at which point a neuron should flre, known as an activation (or transfer) function, and a method for updating the weights, often called the learning rule for the network. The activation function deflnes how the neuron responds to the net input, which is simply the weighted sum of all inputs into the neuron. For instance, it determines whether or not the neuron should flre given a particular input. The learning rule takes the inputs, the output, and the expected output as parameters. If the expected output is not known, as it would be for unsupervised networks, it is estimated in some way. The function then updates the weights in such a way as to associate the given input with the expected (or estimated) output. 22 1.4 Neural Enhancement To perform the neural enhancement [8], a neural network is created using some of the decision variables in the Pareto optimal set to predict the values for the remaining decision variables in the set. In essence, the neural network is used to learn the mapping, in the Pareto optimal set, from some subset of the decision variables to the remaining decision variables. To more clearly understand the approach, consider the following analogy. In the deserts of the southwestern United States, there are many plateaus and canyons. In the deepest spots of these canyons lay hidden wealth that needed only to be mined. Once news of these riches was released, thousands of people began to ock into the desert in search of their fortunes. Each one knew that wealth was only found in the deepest spots, so they quickly and carefully made their way down from the plateaus into the canyons. When they were done exhausting each mine in an area, they would begin the process again, moving downward if possible and moving towards their neighbors if not. This, as we will see, is somewhat similar to how the EMO approaches operate. Multiple people are searching for wealth in parallel, but each is unable to truly take advantage of the terrain information. One day, word of the new gold rush reached a young man named Nemo, who also was interested in making his fortune. He studied the reports, paying attention to the locations in which mines had previously been found. After a little consideration, he realized that, perhaps, all of the mines lay along some natural formation. More importantly, the reports had given him enough information to make a crude map of the formation that he could follow. So he set ofi for the desert and began to search. However, rather than begin on the plateau and work his way down, he began in the canyons (the formation he had discovered) and followed his own map. Since the map was not completely accurate, 23 he found himself climbing hills from time to time, but overall he was able to search the canyons more e?ciently because he knew how to follow them. This is similar to the manner in which the neural enhancement system discussed in this work searches for solutions to multiobjective problems. 24 Chapter 2 Literature Review This chapter provides the relevant background material from the evolutionary multi- objective optimization (EMO) and neural computation literature. 2.1 Evolutionary Multiobjective Optimization There have been many difierent evolutionary approaches to multiobjective optimization proposed in the past two decades [36]. A small subset of these approaches has been chosen for inclusion here. The algorithms presented below either have historical signiflcance or are commonly used at present as EMO techniques. For a more comprehensive survey of EMO approaches, the reader should refer to [1,2,10,26,36{38]. 2.1.1 Vector Evaluated Genetic Algorithm The flrst application of evolutionary computation to multiobjective optimization was Schafier?s Vector Evaluated Genetic Algorithm (VEGA) [25]. His approach was to modify the simple-GA to allow for multiple objective values. To do this, the selection procedure was modifled so that, given q objective functions, a population of N individuals would generate q subpopulations of size Nq , each using a particular objective function as its fltness measure. Then, these subpopulations would be shu?ed together to form the new population for the next generation. This approach has been shown [39] to be equivalent to an linear aggregationapproachwheretheweightingcoe?cientsdependonthecurrentpopulation[37]. Additionally, the shu?ing of the subpopulations tends to encourage what Schafier called 25 \speciation," which means that the algorithm prefers individuals who are particularly good at one or more of the objectives, rather than individuals that represent a balanced trade-ofi. 2.1.2 Multiobjective Genetic Algorithm Inspired by the original VEGA concept, Fonseca and Fleming [37] developed a Mul- tiobjective Genetic Algorithm (MOGA) that performs selection based on \rank," where the rank of an individual is deflned to be one more than the number of individuals that dominate it. (Thus, all nondominated individuals have rank 1.) Additionally, the authors introduce an application of fltness sharing [40] for multiobjective optimization in order to maintain diversity. Fitness sharing in this context means that individuals who are \near" one another must share the resources (i.e., the fltness value) for that area. This implies that highly flt individuals in less crowded areas will be preferred. In order to determine \nearness", a distance metric is employed (usually Euclidean) and a user-deflned parameter, share, is used to determine at what distance two individuals are considered to be \near enough" to share the resources. 2.1.3 Niched Pareto Genetic Algorithm In 1994, Horn et al. [41] developed a genetic algorithm for multiobjective optimization that they termed the Niched Pareto Genetic Algorithm (NPGA). It uses a tournament- type selection where two individuals are compared based on whether or not each dominates or is dominated by a set of other individuals in the population. If there is a tie, fltness sharing is used to determine the winner. Increasing or decreasing the size of this comparison set allows the selection pressure to be tuned similar to the tournament size in regular tournament selection. The niching in this algorithm is done through fltness sharing, where a 26 distance function is used along with a neighborhood kernel (much like that used in a General Regression Neural Network [32]). The fltness of an individual is inversely proportional to the number of individuals \near" it. The algorithm was compared against VEGA [25] on Schafier?s F2 function [25], and it was also applied to a real-world problem of optimum placement of well locations to detect groundwater contamination. 2.1.4 Nondominated Sorted Genetic Algorithm The nondominated sorting genetic algorithm (NSGA) [5] was flrst introduced in 1994 by Srinivas and Deb as an evolutionary multiobjective optimizer. The NSGA algorithm operated like a simple genetic algorithm except for the selection mechanism. There, the individuals were ranked according to Pareto preference using multiple passes(i.e., the flrst pass ranks all truly nondominated points, the second pass ranks all nondominated points from the remainder, etc.). In each pass, the nondominated points are assigned a dummy fltness value and points with the same fltness value undergo fltness sharing. In 2002, Deb et al. introduced an improvement to NSGA that they termed NSGA-II [6]. NSGA-II uses a sorting routine that runs in O(MN2) time, rather than O(MN3) (where M is the number of objectives and N is the population size). This routine is made possible by the use of a new domination operator that relies on the idea of crowding distance instead of using fltness sharing. This crowding distance ensures that the solutions adequately flll the objective space. Finally, NSGA-II uses elitism, which allows good individuals to continue to survive and reproduce. 27 2.1.5 Pareto Archived Evolution Strategy In [42], the authors present the Pareto Archived Evolution Strategy (PAES) algorithm for multiobjective optimization. The PAES is a (1+1)-ES that uses an archive of Pareto optimal solutions to aid in selection, just like the tournament selection of the NPGA. The archive maintains a list of all current nondominated solutions and stores them in the form of a d-dimensional adaptive grid, where d is the number of objectives. The grid is subdivided recursively and is encoded by a tree structure. In this way, less crowded areas of the grid are favored in the event of a tie (to maintain diversity). The algorithm is compared with NPGA on flve test functions [42], and is shown to com- pare well, even though it is arguably the simplest algorithm for multiobjective optimization. The results given are primarily qualitative (in the form of graphs for visual comparison), but PAES tends to show less bias than NPGA towards any particular area of the Pareto optimal front, though it is more noisy. The authors describe the conception of the idea as an extension of local search approaches, such as tabu search [43] and simulated annealing [44]. 2.1.6 Strength Pareto Evolutionary Algorithm In 1999, Zitzler and Thiele [45] developed the Strength Pareto Evolutionary Algorithm (SPEA). Their approach difiers from those previously discussed in two major ways. First, they use an external archive, much like in PAES, to determine the fltness of individuals in the population. They deflne the strength of a nondominated solution i in the archive to be the number of solutions in the population that are covered (i.e., dominated) by i. Using this deflnition of the strength of an archive solution, they then deflne the fltness of all archive solutions to simply be their strengths, and the fltness values of solutions in the population 28 are calculated by the following function: fj = 1+ X i;i?j si: Here, fj represents the fltness of individual j in the population, the operator ? represents Pareto dominance (so i ? j means that i dominates j), and si represents the strength of archive solution i. Note that in Zitzler?s and Thiele?s work low fltness values correspond to high reproductive probabilities. This deflnition of fltness also inherently creates a form of fltness sharing that does not require a niching parameter (like share in MOGA). Since solutions that are dominated by only a few points are preferred over those that are domi- nated by many, the population tends to converge to the Pareto optimal front with a more uniform spread [45]. Zitzler and Thiele also used a clustering approach to keep the size of the external archive bounded and to maintain a uniform distribution of points on the Pareto optimal front (which, as mentioned above, causes the population to produce more uniformly dis- tributed solutions). This was necessary given that the archive is used as part of the selection mechanism. If it were allowed to grow unbounded, then the selection pressure may sufier. The clustering algorithm used in their work was the average linkage method [46]. Using this approach, initially each point in the archive represents a cluster. Then, the closest pair of clusters is combined to form a new cluster, thus reducing the number by one. This process is repeated until the desired number of clusters is reached. Afterwards, the individual in each cluster that is nearest the centroid is selected to be retained as that cluster?s representative. The authors compare the performance of the SPEA with that of VEGA, NPGA, NSGA, and Hajela?s and Lin?s genetic algorithm (HLGA) [47], which uses weighted-sum aggregation 29 where the weights are encoded into the genotype of the individual. Zitzler and Thiele show that NSGA outperforms the remaining three algorithms (VEGA, NPGA, and HLGA) on the 0=1-knapsack problem, but that SPEA outperforms all approaches, covering 100% of the solutions found by the other approaches [45]. In 2002, Zitzler et al. produced an enhanced version of this algorithm, which they called SPEA2 [48]. In this algorithm, the strength of a solution is calculated as in SPEA, except thatitiscalculatedforallmembersofboththepopulationandthearchive. Additionally, the raw fltness of an individual is calculated based on the sum of the strengths of its dominators in both the population and the archive (as opposed to SPEA, where only the strengths of the archives were used). The raw fltness for an individual is summed with the density information about that individual (i.e., how crowded the area around that individual is), which is obtained using a variant of the k-nearest neighbor method [49]. This sum represents the individual?s fltness. The updating of the archive is also modifled in SPEA2. First, all nondominated indi- viduals in the current generation are placed in the archive for the next generation. If the archive is not yet full, then the dominated individuals in the current generation are added to the archive in a best-fltness-flrst manner until it is full. If the archive contains too many individuals after the nondominated solutions are included, then a truncation procedure is used (instead of the clustering approach used in SPEA). This procedure essentially itera- tively removes the individual that has the minimum distance to some other individual (as deflned using the k-nearest neighbor metric) at each stage. 30 2.1.7 Pareto Evelope-based Selection Algorithm Corne et al. developed the Pareto Envelope-based Selection Algorithm (PESA) [50] in 2000. This approach also maintains an external archive of nondominated solutions. This archive is to select individuals that will become parents as a part of the variation operator. With probability pc (a user-deflned parameter), two parents are selected from the archive to undergo crossover to produce one ofispring, which is then mutated. With probability (1?pc), one parent is selected to undergo mutation to create an ofispring. The ofispring makeupthenewpopulation, whichisincorporatedintotheexternalarchiveatthebeginning of each generation. Binary tournament selection is used to select the participating parents according to the lowest squeeze factor [50]. A grid system is used to break the objective space into hypercubes, and the squeeze factor of an individual is deflned to be the number of individuals in the same hypercube. This provides a method for maintaining diversity in the population, favoring less crowded areas of the Pareto optimal front. The squeeze factor is also used to maintain a bound on the size of the external archive. Whenever this bound is exceeded, old individuals are removed from the archive accord- ing to a highest-squeeze-factor-flrst strategy. The new nondominated individuals from the population always replace existing members in the archive in the case of an excess. If the squeeze factors for two individuals are the same, ties are broken randomly. The authors compared their PESA algorithm against SPEA and PAES on six test functions taken from [51]. The results show that PESA was able to outperform the other algorithms on most of the test functions at 1000, 5000, and 20000 function evaluations. While not conclusive, these results suggest that PESA has the capability to compete with some of the best EMO approaches. 31 2.1.8 Micro-Genetic Algorithm for Multiobjective Optimization Coello Coello and Pulido [52] developed the flrst micro-GA for multiobjective opti- mization. A micro-GA is deflned by a small population size (which was between 2 and 5 individuals in [52]) and small numbers of generations per run. After each run, the micro-GA is started over again with a seeded population from some replaceable memory. This process is then repeated as many times as desired. The micro-GA system in [52] uses an external archive of nondominated points so that no results from any of the runs are lost. They also use a population memory, which represents a smaller archive that has both a non-replaceable and a replaceable portion (with sizes that are user deflned). The population memory is used to seed the initial population for each run of the GA. The GA itself also uses elitism such that one nondominated solution in each generation is allowed to survive into the next generation. When a run of the GA is completed, at most two nondominated solutions from the GA?s flnal population are moved into the external archive. If this archive is full, the new solutions replace older solutions usingagridtechniquesimilartothatusedinPAES[42]. Finally, two(possiblythesametwo) nondominated solutions are moved into the replaceable portion of the population memory. Coello Coello and Pulido have shown good results [52] using very small population sizes (approximately 5) and function evaluations (150 to 1500) on flve difierent test problems from the literature. For each problem, they generated the actual Pareto optimal front using exhaustive enumeration and compared it to the results of their micro-GA. Their results were all qualitative (in the form of graphs of the generated fronts), but the micro-GA appeared to perform very well. 32 2.1.9 Multiobjective Particle Swarm Optimization Several approaches to multiobjective optimization have made use of particle swarm optimization approaches. In 2002, Coello Coello and Lechuga [4] developed a PSO for mul- tiobjective optimization that operates normally except that the global (or neighborhood) best particle is chosen from a repository of nondominated solutions. The repository is di- vided into hypercubes (using the values of the objective functions for a particular solution as its coordinates, much like in PAES [4]), and the probability of choosing a particular hy- percube (using roulette wheel selection) is inversely proportional to the number of nondom- inated solutions located there (to help maintain diversity). After a hypercube is selected, a solution from that area is selected randomly from those available in the hypercube. The performance of the MOPSO was compared with PAES and NSGA-II on three test prob- lems from the EMO literature. The authors state that the multiobjective PSO (MOPSO) remained competitive with the other approaches, even though it wasn?t necessarily better in all cases, and it required lower computational time to converge to a set of solutions. In 2004, Coello Coello et al. [53] developed a modifled version of their original MOPSO that included a constraint-handling mechanism and a mutation operator to aid in search space exploration. Their approach also used the same adaptive grid technique that was used for the micro-GA [52]. In order to deal with constrained optimization problems, the concept of dominance was modifled just like with the micro-GA so that a feasible solution dominates an infeasible solutions, and, if both are infeasible, then the solution with the fewest constraint violations dominates. The additional mutation operator essentially per- forms random mutation on each dimension of the particle according to a specifled mutation rate. However, similar to simulated annealing approaches [44], this operator is applied with 33 much less frequency as the swarm develops. In this way, the initial swarm is able to more efiectively explore the search space while the later swarm is allowed to converge. Also in 2004, Mostaghim and Teich [54] used a particle swarm approach similar to that used by Coello Coello et al. [4] to determine an initial Pareto optimal front. Once the initial front is generated, each point in the archive is used as the local guide (or neighborhood best) in a subswarm generated around that point. The subswarms are used to flll out the Pareto optimal front to cover any gaps left by the initial swarm. This approach makes use of a post-processing step that uses subswarms to provide a more fully deflned Pareto optimal front. The subswarm approach was compared to a Hybrid MOEA [55] on a real-world antenna design problem, and the subswarm was shown to cover the Pareto optimal front much faster than the Hybrid MOEA. 2.1.10 ParEGO In 2005, Knowles presented a new EMO technique called ParEGO [7], which uses e?- cient global optimization EGO [56] as a learning system to produce a model of the fltness landscape. This model is necessary when dealing with expensive multiobjective functions (such as very time-consuming experiments or experiments that must be performed using specialized machinery that require trained operators). The EGO system attempts to learn the model of the multiobjective function given some points on the Pareto optimal front. Then, this model can be used in conjunction with an evolutionary algorithm to flnd minima, which can then be checked using the expensive actual multiobjective optimization function. Any multiobjective evolutionary algorithm can be used in this system, but Knowles de- scribed the particular algorithm used to be a population-based steady-state EA using both 34 crossover and mutation. He compared the results with those produced by NSGA-II on eight difierent test functions. Only 250 function evaluations were used for each, but ParEGO was shown to perform very well, outperforming NSGA-II in all cases. 2.2 Neural Computation The following subsections describe in greater detail several modern neural network techniques. Note that this list is in no way exhaustive. For a more comprehensive list of neural network and machine learning approaches, the reader should refer to [49,57]. 2.2.1 Perceptron Many difierent types of neural networks have been designed and implemented by researchers over the last 50 years [58]. Each has their particular strengths and weak- nesses. Perhaps the easiest feed-forward network to understand is the perceptron neural network [28]. The deflning characteristic of a perceptron is that it contains only two layers of neurons { the input layer and the output layer. The simplest form of a perceptron is one that has some flnite number of input neurons and only one output neuron. In this neural network, there exist connections from each input to the output. Figure 2.1 displays a perceptron with three input neurons and one output neuron. The output of the jth output neuron of a perceptron is deflned as follows [28]: Oj = A ?X i Iiwij ! where Oj is the output of the jth output neuron, A(?) is the activation function, Ii is the ith input to the network, and wij is the weight between the ith input to the jth output. 35 Figure 2.1: Illustration of a Basic Perceptron The activation function for the perceptron is a simple threshold function [27]. Such a function takes the weighted input to the neuron (i.e., an L1 magnitude of the product of the input and the weight vector) and determines whether or not this magnitude lies above some threshold value. If so, the neuron flres. Otherwise, the neuron generates no output. While the threshold value could be set by the network designer, in actual neural networks, the value is determined by what is known as a bias neuron. A bias neuron is a neuron in each layer of a network that always has a constant input of 1.0. When a bias is used, each neuron is treated as if its threshold value is 0.0. In this way, the actual threshold value for any neuron connected to the bias will be determined by the weight from the bias to that neuron. For instance, in Figure 2.1, I1 could represent the bias neuron, which would mean that it would have a constant input of 1.0. (Note that if such a system is used, the perceptron in Figure 2.1 now only has two genuine input neurons.) In this case, the value of weight w11 would represent the threshold value, and this weight would be allowed to change during the learning phase. 36 To carry out the learning phase, a learning rule must be specifled. The learning rule for a perceptron (often called the Perceptron Learning Rule) is deflned as follows [28]: wij = wij +Ii(Ti ?Oi) where wij represents the weight from input i to output j, Ii represents the ith input, Ti represents the correct output (teaching signal) of node i, and Oi represents the actual output of node i. Clearly, this learning rule will move the weight vector closer to the desired output vector given the appropriate inputs. While the simple perceptron is capable of learning many difierent types of inputs, its efiectiveness has been mathematically proven to be extremely limited [29]. It can only rep- resent vectors that are linearly separable, which means that it must be possible to separate difierent output classes with straight lines (or hyperplanes in higher dimensions). An ex- ample of a function that is not linearly separable is the exclusive-or (XOR). This function is displayed in Figure 2.2. In order for this function to be linearly separable, it must be possible to separate the points for which the output is 0 from those where the output is 1 by using only a straight line. The graphical representation in Figure 2.2 makes it clear that no such straight line exists. 2.2.2 Backpropagation Feed-forward Networks It was discovered [30] that the limitations of the perceptron could be resolved by the introduction of at least one additional layer, a so-called hidden layer, between the input and output layers. Figure 2.3 illustrates such a neural network with three inputs, two neurons in the hidden layer (typically referred to ashidden neurons), and one output neuron. 37 Figure 2.2: Illustration of the Exclusive-Or Function Because of this additional layer, these types of networks are often referred to as Multi-Layer Perceptrons (MLPs) [49]. However, the introduction of this layer posed problems of its own because the creation of a learning rule for such a network is a non-trivial matter. A learning rule had to be found that could distribute weight changes appropriately across all those connections that led to the error. The most popular such network in use today is known as the backpropagation network [59,60], because the learning rule it applies in essence propagates the blame back through the network to each of the contributing weights as a function of their contribution to the resulting error. The backpropagation learning rule relies on gradient information in order to manipulate the weight values so as to minimize the prediction error of the network. Since gradient 38 information is required, it is important that a difierentiable activation function be used. In many cases, the sigmoid function is used. This function is deflned as [60] F(x) = 11?e?x; and whose flrst derivative is easily discovered to be [60] F0(x) = F(x)(1?F(x)): For the output layer, the learning rule is [60] wji = wji +fi aj g0(ini)(Ti ?Oi) where wji is the weight from hidden node j to output node i, fi is the learning rate, aj is the output of the jth hidden neuron, g(?) represents the activation function, ini represents the net input to the ith output neuron, Ti represents the correct output (teaching signal) of node i, and Oi represents the actual output of node i. For the hidden layer, the learning rule is [60] wkj = wkj +fi Ik g0(inj) X i ?w ji(Ti ?Oi) g0(ini) ? where wkj is the weight from the kth input neuron to the jth hidden neuron, fi is the learning rate, Ik is the input to the kth input node, g(?) represents the activation function, ini represents the net input to the ith output node, Pi is the summation across all neurons 39 i which make up the layer above (i.e., the output layer) the jth layer, Ti represents the correct output (teaching signal) of node i, and Oi represents the actual output of node i. Figure 2.3: Illustration of a Backpropagation Neural Network 2.2.3 Radial Basis Function Networks Radial basis function networks (RBFNs) [31,49,61] are often used as alternatives to backpropagation neural networks [61]. A radial basis function network has three layers { an input layer, a hidden layer, and an output layer. The input layer behaves just like the input layer in an MLP. There are no weights connecting the input and hidden layers in an RBFN. The hidden layer, however, is formed by clustering the input data into some user-deflned 40 number of clusters, where each node of the hidden layer represents one cluster?s centroid. A localized receptive fleld [61] is centered on each of the cluster centroids, which simply means that a signiflcant response is generated from a cluster if the input falls \near" its center in the input space. Finally, the output layer performs a linear combination of this hidden layer activation and the weights between the hidden and output layers. To make this discussion more precise, consider the following notation. Let Hi(?) denote the receptive fleld of the ith hidden node, which is centered on ci. Let wij denote the weight from hidden node i to output node j. The jth output of the RBFN on a given input x is then yj = X i (Hi(x)wij) Note that, just as with the perceptron and MLP, a bias node H0(?) is used which always produces an output of 1 for all inputs. The most commonly used hidden layer function (i.e., the localized receptive fleld) is the Gaussian kernel function: Hi(x) = e ?jjx?cijj2 2 2i Here, jj?jj represents Euclidean distance, and i is a smoothing parameter that specifles the \width" of this receptive fleld. This function is radially symmetric, meaning that the function produces an identical output for each input that lies the same distance from the kernel center. This radial symmetry gave rise to the name \Radial Basis Function Network". (The kernel functions are sometimes called \basis" functions.) Training an RBFN occurs in two stages [49]. First, the cluster centroids for the hidden layer must be found. This is typically accomplished through an unsupervised clustering 41 of the input data using a system such as k-means clustering [61]. Once the hidden layer centroids are found, the i value for each cluster is often set to the average distance be- tween the center and the training patterns in the cluster [61]. The second stage of the learning process involves flnding the value of the hidden-output weights that minimize the error. This is often done using the Least Mean Squares (LMS) algorithm [61], of which the backpropagation learning rule is an extension. 2.2.4 General Regression Neural Networks First introduced by Specht in 1991 [32], general regression neural networks (GRNNs) can be viewed as an extension to the k-nearest neighbor approximation using a distance- weighted average and a global neighborhood. A GRNN can also be viewed as Parzen window method [61]. GRNNs are lazy learners [57] since they do not form a model of the training set, choosing instead to store all training instances for future reference. The general form for the output of a GRNN given n training inputs (xi;yi), 1 ? i ? n, is F(x) = Pn i=1 (yid(x;xi))Pn i=1 d(x;xi) where d(?) is a weighting function, xi is a training input, and yi is the desired output for input i. Commonly, a Gaussian weighting function is used, just as with RBFNs: d(x;y) = e?jjx?yjj 2 2 2 42 Once again,jj?jjrepresents Euclidean distance, and is a smoothing parameter that specifles the neighborhood size. For the GRNN, it is possible for a difierent smoothing parameter to be used for each training input, but this approach often leads to overfltting (as well as an explosion of parameter values that must be determined). Therefore, typically a single value for is used for all training inputs. Training a GRNN consists of merely flnding the value of the smoothing parameter that provides the minimum error. 2.3 Applications of Learning to Multiobjective Optimization The author is unaware of any research from the literature that attempts to predict the values of a subset of decision variables in the Pareto optimal set given the remaining decision variables. However, Hatzakis and Wallace [62] used prediction in order to determine the next location to search on the Pareto optimal set in order to flnd new optimal solutions for time-dependent (i.e., dynamic) multiobjective optimization problems. They used a predictor based on an autoregressive model to predict the next \likely" area to flnd optimal solutions once the fltness landscape shifts. Their predictor determined two anchor points on the Pareto optimal front using the time-series information of the previous anchor points. The authors mention that \[o]ne might also flt an analytic curve which describes the Pareto optimal set (the locus of the Pareto optimal front in variable space) and subsequently forecast changes in the parameter values of this curve, instead of using speciflc points". In similar fashion, Knowles developed ParEGO [7], which uses e?cient global optimiza- tion (EGO) as a learning system to produce a model of the fltness landscape. In this work, the system attempted to learn the model of an expensive multiobjective function given some points on the Pareto optimal front. This allowed the use of the model to flnd minima, 43 which could then be checked using the expensive actual multiobjective function. The results were compared against those produced by NSGA-II on eight difierent test functions, and ParEGO was shown to perform extremely well, even though only 250 function evaluations were used. Finally, Gaspar-Cunha and Vieira [63] use a neural network to perform an inverse mapping from objective values to decision variables. In their work, they use an feedforward artiflcial neural network (using backpropagation) to learn the relationship between the val- ues of the objective functions and the decision variables. Their network is trained using the Pareto optimal front found in the preceding generation, and new points are selected (in the objective space) using a user-deflned \ofiset" from the existing points in the Pareto optimal front. These new points are passed through the neural network to produce new values for the decision variables, which are then tested using the actual multiobjective function. Their results on flve test functions from the EMO literature show that this approach can produce a Pareto optimal front using up to 40% fewer function evaluations than a typical EMO approach. (However, in this work, they use their own EMO algorithm, called the Reduced Pareto Set Genetic Algorithm (RPSGA) [64], rather than any of the more popular EMO approaches.) The work presented in this paper is an extension of Yapicioglu et al.?s work from [65]. It difiers from the previously mentioned approaches in that prediction is used to model the Pareto optimal set using a subset of the decision variables. In this way, the most promising regions of the optimal set can be explored in order to more quickly discover nondominated solutions. Rather than being concerned solely with dynamic or expensive multiobjective 44 problems, this approach is applicable for all multiobjective problems and can be used with any pre-existing set of nondominated solutions, no matter how they are generated. 45 Chapter 3 Neural Enhancement for Multiobjective Optimization The neural enhancement approach developed in this work, called Neural Enhancement for Multiobjective Optimization (NEMO), seeks to learn the function that maps a given subset of decision variables in the Pareto optimal set to the remaining decision variables. The basic operation of the NEMO approach is flrst to allow an existing multiobjective optimization approach to produce a set of solutions to a given problem. Then, that set of solutions is used to train NEMO to learn the mapping among the decision variables. Finally, NEMO can then be used to relieve the burden of producing additional solutions from the underlying multiobjective optimization technique, which is often computationally expensive. The experiments conducted in this chapter are preliminary experiments to assess the viability of the approach using a well-known evolutionary multiobjective optimization ap- proach to produce the initial set of optimal solutions. For this work, NSGA-II [6] was used to produce a preliminary set of optimal solutions for four benchmark problems from the multiobjective optimization literature, as well as one real-world multiobjective optimiza- tion problem. NSGA-II was used because it has been shown [6,7,53] to be a very efiective evolutionary multiobjective optimization approach. NEMO was then trained using these sets, and the results were compared against the performance of NSGA-II on the same prob- lems using the same number of function evaluations. The following sections describe the experimental setup in more detail. 46 3.1 Test Suite InordertoprovideathoroughtestofNEMO,asubsetoffourmultiobjectivebenchmark problems was taken from [7]. These problems were OKA2, DTLZ1a, DTLZ2a, and DTLZ4a. Finally, the semi-desirable facility location problem (SDFLP) was included from [65] (where it was cited as test case 1.1). 3.1.1 OKA2 The OKA2 test function uses three decision variables to minimize two objective func- tions and is deflned as follows: f1 = x1 f2 = 1? 14?2(x1 +?)2 +jx2 ?5cos(x1)j13 +jx3 ?5sin(x1)j13 x1 2 [??;?] x2;x3 2 [?5;5] The Pareto optimal set lies on a 3D spiral curve. 47 3.1.2 DTLZ1a The DTLZ1a test function makes use of six decision variables to minimize two objective functions and is deflned as follows: f1 = 12x1(1+g) f2 = 12(1?x1)(1+g) g = 100 " 5+ 6X i=2 ?(x i ?0:5)2 ?cos(2?(xi ?0:5)) ? # xi 2 [0;1]; i 2f1;:::;6g The Pareto optimal set consists of all decision variables set to 0.5 except the flrst value, which should come from [0;1]. 3.1.3 DTLZ2a The DTLZ2a test function uses eight decision variables to minimize three objective functions and is deflned as follows: f1 = (1+g)cos(x1?2)cos(x2?2) f2 = (1+g)cos(x1?2)sin(x2?2) f3 = sin(x1?2) g = 8X i=3 (xi ?0:5)2 xi 2 [0;1]; i 2f1;:::;8g 48 The Pareto optimal front for this function is one-eighth of a sphere centered at the origin with a radius of 1. The Pareto optimal set consists of all decision variables set to 0.5 except the flrst value, which should come from [0;1]. 3.1.4 DTLZ4a The DTLZ4a test function uses eight decision variables to minimize three objective functions and is deflned as follows: f1 = (1+g)cos(x1001 ?2)cos(x1002 ?2) f2 = (1+g)cos(x1001 ?2)sin(x1002 ?2) f3 = sin(x1001 ?2) g = 8X i=3 (xi ?0:5)2 xi 2 [0;1]; i 2f1;:::;8g As with the previous problem, the Pareto optimal front for this function is one-eighth of a sphere centered at the origin with a radius of 1. The Pareto optimal set consists of all decision variables set to 0.5 except the flrst value, which should come from [0;1]. 3.1.5 SDFLP The semi-desirable facility location problem deals with positioning a facility (such as an airport or landflll) that is necessary for members of the community. However, the facility itself produces an undesirable by-product (such as tra?c or pollution) that the community does not desire. The balance of desirable and undesirable efiects leads to the multiobjective 49 problem. f1 = 7X i=1 (w1id(~x; ~ai)) f2 = 7X i=1 vi(~x; ~ai) vi(~x; ~ai) = 8> >>> >>>< >>>> >>> : 200 if w2id(~x; ~ai) < 10 200?w2id(~x; ~ai) if 10 ? w2id(~x; ~ai) < 30 0 if 30 ? w2id(~x; ~ai) x1;x2 2 [?20;40] ~a = ((5;20);(18;8);(22;16);(14;17); (7;2);(5;15);(12;4)) ~w1 = (5;7;2;3;6;1;5) ~w2 = (1;1;1;1;1;1;1) Since the SDFLP is a real-world problem, its Pareto optimal set is unknown. 3.2 Performance Indicator To evaluate the performance of NEMO, an indicator was developed that represents the relative sizes of the resultant optimal sets. The yield ratio is simply the ratio of the NEMO?s solutions versus NSGA-II?s solutions. It is a single unit-less value that represents the proportion of NEMO solutions found compared to those of NSGA-II. 50 3.3 NEMO | The Neural Enhancer To perform the neural enhancement for each test problem, a general regression neural network (GRNN) [32] was created using a subset of the decision variables in the Pareto set approximation to predict the values for the remaining decision variables in the set. As dis- cussed previously, the GRNN is used to learn the mapping, in the Pareto set approximation, from some subset of the decision variables to the remaining decision variables. For instance, given a multiobjective optimization problem with 5 decision variables, d1;:::;d5, NEMO flrst partitions the decision variables into two sets, I and O. For instance, suppose that I contains d1 and d3, while O contains d2, d4, and d5. Then, a GRNN is created using elements of I as inputs and producing elements of O as outputs. The GRNN is trained using the elements from the original Pareto set approximation (as found by NSGA-II). Then, values, call them i1 and i3, are generated uniformly from the range of each of the elements of I. These values are then passed into the GRNN to produce the corresponding values o2, o4, and o5 in set O. Finally, the generated solution < i1;o2;i3;o4;o5 > is passed to the multiobjective function to discover the objective values associated with it. These values determine whether the generated solution is nondominated. A steady-state genetic algorithm (SSGA) [15] was used to evolve the subset of decision variables to be used as inputs to the GRNN, as well as the value of the GRNN?s parameter. Each individual in the population was represented by an array of binary values, one for each decision variable representing its inclusion/exclusion as an input to the learning system, and a single real-number value, representing the smoothing parameter for the GRNN. The values available for the component of the individual was taken from the interval [0.01, 51 20.0]. The parameters chosen for the SSGA were shown experimentally to provide decent performance and are detailed in Table 3.1. Parameter Value Population Size 100 Recombination Operator Uniform Crossover Crossover Usage Rate 1.0 Mutation Operator Bit- ip (binary) and Gaussian (real) Mutation Usage Rate 0.1 for both types Gaussian Mutation Range 1.0 Function Evaluations 1500 Table 3.1: Parameters for Genetic Algorithm for Training NEMO To evaluate each candidate solution, a GRNN was constructed using the appropriate decision variables and value. The GRNN was then trained using the Pareto optimal set for the given test problem. Then, the neural network was presented with 10 randomly generated \inputs" and was asked to predict the corresponding \outputs", which together constituted a possible point in the Pareto optimal set. Each point was then evaluated using the given test problem to produce values for the objective functions. Those values were compared to the values in the Pareto-optimal front to determine whether the point should be added, and, if so, whether it replaced any existing points in the Pareto-optimal front. The fltness for the candidate solution (i.e., the GRNN parameters) was then taken to be fitness = 2?replaced+added. Here, replaced represents the number of solutions in the original Pareto-optimal front that were dominated by the 10 solutions generated by the neural network. Likewise, added 52 represents the number of solutions generated by the neural network that were nondominated in the original Pareto-optimal front. The intention of this fltness measure was to more heavily favor neural networks capable of producing nondominated solutions in the existing Pareto-optimal front. The GA was given a maximum of 1500 function evaluations in order to flnd the best parameters for the GRNN. After the appropriate parameters were found, the GRNN was used to predict 10000 values in the Pareto optimal set. To accomplish this, equally spaced points were chosen along the entire range of the GRNN?s input dimensions and were passed to the neural enhancer to determine the values along the remaining dimensions. (The distance between points was chosen so that no more than 10000 values would be created.) Each value in the predicted Pareto optimal set was then evaluated using the multiobjective test function. The results are described below. The implementation used for NSGA-II was taken from the EMOO repository [66] and was implemented in C by Kalyanmoy Deb et. al. It was used without modiflcation except to add the problems from the test suite. It was successfully compiled under Borland version 5.5 and was executed on a Pentium 4 system running Windows XP. For each problem, NSGA- II was initialized with the default parameters as detailed in the software | a population size of 256, a crossover probability of 0.9, a mutation probability of 1nd where nd is the number of decision variables, a distribution index for crossover of 10, and a distribution index for polynomial mutation of 50. NSGA-II was allowed to run for 100 generations for each problem in the test suite. 53 3.4 Results The following subsections describe the results of applying NEMO to the test suite. These results are summarized in Table 3.2. In this table, the \NEMO Added" column displays the number of solutions that were added by NEMO to the original Pareto optimal front (produced by NSGA-II). Similarly, the \NEMO Replaced" column holds the number of NEMO solutions that replaced at least one solution from the original Pareto-optimal front. Finally, the \Yield Ratio" determines the relative efiectiveness of NEMO versus NSGA-II, as described previously. Problem NSGA-II NEMO NEMO Merged Yield Input-Output Front Added Replaced Front Ratio Assignment DTLZ1 200 6926 2592 9638 47.6 IOOOOO 8.04 DTLZ2 481 7849 1486 9572 19.4 IIOOOOOO 1.41 DTLZ4 631 9991 9 10626 15.9 OIOOOOOO 3.78 OKA2 15 84 99 39 12.2 OOI 0.01 SDFLP 1259 616 279 2015 0.71 OI 0.61 Table 3.2: Summary of NEMO Results 3.4.1 DTLZ1a For the DTLZ1a problem, the Pareto optimal front found using NSGA-II, consisting of 200 points, is shown in red in Figure 3.1. A NEMO was created using the flrst decision variable as input to predict the remaining 5 variables. The value was set to 8.04. The results of generating 10000 points using the NEMO are shown in green in Figure 3.1. The GRNN created 9999 non-dominated solutions in the predicted Pareto optimal front (which means that only one of the predicted points was dominated by other predicted points). 54 These predicted values were merged with the original values found by NSGA-II, allowing the predicted values to dominate or be dominated by the original values, which yielded a Pareto optimal front consisting of 9638 points. 3.4.2 DTLZ2a For the DTLZ2a problem, the Pareto optimal front found using NSGA-II, consisting of 481 points, is shown in red in Figure 3.2. A NEMO was created using the flrst and second decision variables as input to predict the remaining 6 variables, using a of 1.41. The results of generating 10000 points using the NEMO are shown in green in Figure 3.2. There were 10000 non-dominated points in the flnal predicted Pareto optimal front. When these points were merged with the original Pareto optimal front, the flnal Pareto optimal front consisted of 9572 points. 3.4.3 DTLZ4a For the DTLZ4a problem, the Pareto optimal front found using NSGA-II, consisting of 631 points, is shown in red in Figure 3.3. A NEMO was created using the second decision variable as input to predict the remaining 7 variables, using a of 3.78. The results of generating 10000 points using the NEMO are shown in green in Figure 3.3. There were 9999 non-dominated points in the flnal predicted Pareto optimal front. Notice that the predicted values for the third objective all lie in the range [0.02715, 0.02745]. When the predicted and original values were merged, the flnal Pareto optimal front consisted of 10626 points. Although the NEMO was able to generate over 15 times the number of nondominated points that NSGA-II produced, the spread of those points was not su?cient. It was believed 55 XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX O OO O O OOO O OO O O O O O O O OOO O O O O O O O O O O O O O O O O O O O O O OO 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 Objective 1 Objective 2 OXNSGA?IIGRNN Figure 3.1: Overlay of Pareto optimal fronts for DTLZ1a from NSGA-II and GRNN 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.40.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 0.00.2 0.40.6 0.81.0 1.2 Objective 1 Objective 2 Objective 3 XXXXOXXOOXXXXXOO XXXOX XXOOXX XXX OXXXXX XXX XXXX X O X O XX X O X X X X OOX XX XX O X X XXXO XOX XX OX XX OX X X X X XX OXX X XXX X XXXX O X XO O XO X X OO X X X X X XO X X O X X O XX XX O X X XX XX X X X X XX X OX O X OX XX X X XX X O X O X XX XX OX X O X X X O X XX X O X X O X XX X X X X X X X O X X O X XO O X X X OXO X XX X X X X X X O X X X X XXO X O OXXX XO X X X O XO O X X X X X X O X X X X OX O X X X X O X O OX X X OX X XX X X X XO O X X X O XO OX O X X O X X XX O O OX O O XOO X X O X OX O O O X XX O X O O X X X O OXX X O O XX O X X X O X X XXX XX O X O O O XX XXO O X X X XX O XO X X X X XX XXO O XXXX XX X X O O XXXX X XO OO X O XO O O O X X OXNSGA?IIGRNN Figure 3.2: Overlay of Pareto optimal fronts for DTLZ2a from NSGA-II and GRNN 56 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.40.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 0.00.2 0.40.6 0.81.0 1.2 Objective 1 Objective 2 Objective 3 OXXOOXOOOX O O XOX OO OO XO O X OO O OX O OO O OO O XOO OOOO X O O O O O O OO O XO O O O O O OX O O OO O O OO X OO O O O O O OO X O OO O O O O O XO O O OO O OOO O XO O XOOX O XO OO X O OX OOO O O X O O O O O OO XO O O O O O O O O O O O OO OOO O XNSGA?IIGRNN Figure 3.3: Overlay of Pareto optimal fronts for DTLZ4a from NSGA-II and GRNN that the failure on this problem was that the smoothing parameter ( ) for the GRNN was too large. A further experiment was conducted in which the smoothing parameter was constrained to the interval [0:001;0:00000001]. The result of this approach, using a of 0.000081, is shown in Figure 3.4. Here, it is clear that the coverage produced is much better than in Figure 3.3. 3.4.4 OKA2 For the OKA2 problem, the Pareto optimal front found using NSGA-II, consisting of 15 points, is shown in red in Figure 3.5. The neural enhancer was created using the third decision variable as input to predict the flrst 2 variables, using a of 0.01, which was the lower bound used by the GA. The results of generating 10000 points using the neural enhancer are shown in green in Figure 3.5. There were 39 non-dominated points in the flnal 57 0.0 0.5 1.0 1.5 2.0 2.50.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 0.00.2 0.40.6 0.81.0 1.2 Objective 1 Objective 2 Objective 3 OOOOX OO O OOOO X XOXO O OX XOX X OX O XO X OXOOXXO X O O O X X X OOOXXXXO O XO XOOOX X XO X O X O X X O X X X X X X X O O XX X X X X X O X OX X X X X X O O XX X X X O X O X O X X OX XOX O O O O O OX X OO O XX X X OX X X X OO X O X OO O O X O X X O O X X X O XXOOO O X O O O XO OX X O O O X O OXO X X XX O O X X O O X X XO XX O O X OO O X XO X O X X OX X X O XO OX OO X X XX O O X X O XO O X O X O OX X XO O O XX OX XXO X X XX XXX O O O OO OOO OO XXX XX X X XX X X OXNSGA?IIGRNN Figure 3.4: Overlay of Pareto optimal fronts for DTLZ4a from NSGA-II and GRNN using small value predicted Pareto optimal front. After the predicted and original values were merged, the flnal Pareto optimal front also consisted of 39 points. 3.4.5 SDFLP For the SDFLP problem, the Pareto optimal front found using NSGA-II, consisting of 1259 points, is shown in red in Figure 3.6. The neural enhancer was created using the second decision variable as input to predict the flrst, using a of 0.61. The results of generating 10000 points using the neural enhancer are shown in green in Figure 3.6. There were 3753 non-dominated points in the flnal predicted Pareto optimal front. After merging the predicted and original values, the Pareto optimal front consisted of 2015 points. 58 X X XX XXX XX XXXX XXXX X OOO O O O O O O ?3 ?2 ?1 0 1 2 0.4 0.6 0.8 1.0 1.2 Objective 1 Objective 2 OXNSGA?IIGRNN Figure 3.5: Overlay of Pareto optimal fronts for OKA2 from NSGA-II and GRNN X X X X XXXX XXXXXXXXXXXXXXXXXXX XXX X XXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX XXX O OOOO O O OO O O O O O O O O O O O O O OOO O OO O O O O O O O OO O O O O O O OO O 400 600 800 1000 0 200 400 600 800 1000 1200 1400 Objective 1 Objective 2 OXNSGA?IIGRNN Figure 3.6: Overlay of Pareto optimal fronts for SDFLP from NSGA-II and GRNN 59 3.5 Conclusions The neural enhancement approach presented here has been shown to be quite efiective at expanding the Pareto optimal frontier for several benchmark multiobjective optimization problems. For each problem, the NEMO approach produced over 10 times more solutions than NSGA-II. However, the real-world problem tested proved more di?cult. NEMO was only able to produce about 70% of the solutions that NSGA-II produced. Even so, about 20% of those solutions replaced the solutions produced by NSGA-II. Given the very promising results presented here, further work must be done to de- termine their statistical signiflcance. An extension of this study can also be made for problems having more than three objectives. Additionally, work should be done to flnd a more appropriate training approach for the GRNN, either through a better-tuned evolu- tionary computation system or through a difierent training method such as leave-one-out training. In this way, an acceptable smoothing parameter can be found more quickly, which will provide greater performance for the neural enhancer. In addition to a more e?cient training algorithm, a computational time comparison should be made between NEMO and NSGA-II. 60 Chapter 4 Neural Enhancement Training Algorithm The results achieved in the previous chapter are promising. However, one of the most important criticisms of the NEMO approach is the training time required to achieve success. (This was a criticism discovered in the peer-reviewed comments for [8].) In order to improve on this shortcoming, several experiments were conducted to determine whether a training approach exists that could generate good results in an acceptable time. To more accurately evaluate NEMO?s performance using the various training approach- es, several modiflcations were made to the experimental setup described in the previous chapter. First, flve additional multiobjective optimization problems were added to the test suite to provide a more thorough testbed for the NEMO approach. Second, the comparison between NEMO and NSGA-II was modifled so as to be as fair as possible to NSGA-II. Previously, the results achieved by NEMO were compared against the original optimal solutions found by NSGA-II. This seemed unfair because NEMO was able essentially to beneflt from the work already done by NSGA-II, giving it a \head start" toward producing optimal solutions. To remedy this, NSGA-II was executed for a given set of function evalu- ations, at which point the current optimal set found was stored and used to train NEMO. NSGA-II was then allowed to execute for an additional set of function evaluations (the same number given to NEMO). In this way, NEMO and NSGA-II could be compared from the same baseline set of optimal solutions. Finally, four additional indicators were used to compare the results of NEMO and NSGA-II to more clearly understand the strengths of their performances. 61 4.1 Test Suite In order to test the neural enhancement system and the training approaches, all nine multiobjective benchmark problems were taken from [7]. These problems were KNO1, OKA1 [67], OKA2 [67], VLMOP2 [68], VLMOP3 [68], DTLZ1a, DTLZ2a, DTLZ4a, and DTLZ7a. Finally, the semi-desirable facility location problem (SDFLP) was once again included from [65] (where it was cited as test case 1.1). 4.1.1 KNO1 The KNO1 test function makes use of two decision variables to minimize two objective functions and is deflned as follows: f1 = 20?rcos(`) f2 = 20?rsin(`) r = 9? ? 3sin 5 2(x1 +x2) 2 ? +3sin(4(x1 +x2))+5sin(2(x1 +x2)+2) ? ` = ?12(x1 ?x2 +3) x1;x2 2 [0;3] The Pareto optimal set consists of all pairs of decision variable values that sum to 4.4116. 62 4.1.2 OKA1 The OKA1 test function makes use of two decision variables to minimize two objective functions and is deflned as follows: f1 = x01 f2 = p2? ? q jx01j+2flflx02 ?3cos(x01)?3flfl13 x01 = cos ? ? 12 ? x1 ?sin ? ? 12 ? x2 x02 = sin ? ? 12 ? x1 +cos ? ? 12 ? x2 x1 2 h 6sin ? ? 12 ? ;6sin ? ? 12 ? +2?cos ? ? 12 ?i x2 2 h ?2?sin ? ? 12 ? ;6cos ? ? 12 ?i The Pareto optima (i.e., the values in the e?cient set) lie on the curve x02 = 3cos(x01)+3; x01 2 [0;2?]: 63 4.1.3 OKA2 The OKA2 test function uses three decision variables to minimize two objective func- tions and is deflned as follows: f1 = x1 f2 = 1? 14?2(x1 +?)2 +jx2 ?5cos(x1)j13 +jx3 ?5sin(x1)j13 x1 2 [??;?] x2;x3 2 [?5;5] The Pareto optimal set lies on a 3D spiral curve. 4.1.4 VLMOP2 The VLMOP2 test function uses two decision variables to minimize two objective func- tions and is deflned as follows: f1 = 1?exp ? ? 2X i=1 xi ? 1p2 ?2! f2 = 1?exp ? ? 2X i=1 xi + 1p2 ?2! x1;x2 2 [?2;2] The Pareto optima lie on the diagonal passing from (?1pn; ?1pn) to ( 1pn; 1pn). 64 4.1.5 VLMOP3 The VLMOP3 test function uses two decision variables to minimize three objective functions and is deflned as follows: f1 = 0:5(x21 +x22)+sin(x21 +x22) f2 = (3x1 ?2x2 +4) 2 8 + (x1 ?x2 +1)2 27 +15 f3 = 1x2 1 +x22 +1 ?1:1exp(?x21 ?x22) x1;x2 2 [?3;3] This test function has a disconnected Pareto optimal set. 4.1.6 DTLZ1a The DTLZ1a test function makes use of six decision variables to minimize two objective functions and is deflned as follows: f1 = 12x1(1+g) f2 = 12(1?x1)(1+g) g = 100 " 5+ 6X i=2 ?(x i ?0:5)2 ?cos(2?(xi ?0:5)) ? # xi 2 [0;1]; i 2f1;:::;6g The Pareto front for this function is the line f1 = ?f2. The Pareto optimal set consists of all decision variables set to 0.5 except the flrst value, which should come from [0;1]. 65 4.1.7 DTLZ2a The DTLZ2a test function uses eight decision variables to minimize three objective functions and is deflned as follows: f1 = (1+g)cos(x1?2)cos(x2?2) f2 = (1+g)cos(x1?2)sin(x2?2) f3 = sin(x1?2) g = 8X i=3 (xi ?0:5)2 xi 2 [0;1]; i 2f1;:::;8g The Pareto optimal front for this function is one-eighth of a sphere centered at the origin with a radius of 1. The Pareto optimal set consists of all decision variables set to 0.5 except the flrst value, which should come from [0;1]. 66 4.1.8 DTLZ4a The DTLZ4a test function uses eight decision variables to minimize three objective functions and is deflned as follows: f1 = (1+g)cos(x1001 ?2)cos(x1002 ?2) f2 = (1+g)cos(x1001 ?2)sin(x1002 ?2) f3 = sin(x1001 ?2) g = 8X i=3 (xi ?0:5)2 xi 2 [0;1]; i 2f1;:::;8g The Pareto optimal front for this function is also one-eighth of a sphere centered at the origin with a radius of 1. The Pareto optimal set consists of all decision variables set to 0.5 except the flrst value, which should come from [0;1]. 67 4.1.9 DTLZ7a The DTLZ7a test function uses eight decision variables to minimize three objective functions and is deflned as follows: f1 = x1 f2 = x2 f3 = (1+g)h g = 1+ 96 8X i=3 xi h = 3? 2X i=1 ? f i 1+g(1+sin(3?fi)) ? xi 2 [0;1]; i 2f1;:::;8g 4.1.10 SDFLP The semi-desirable facility location problem deals with positioning a facility (such as an airport or landflll) that is necessary for members of the community. However, the facility itself produces an undesirable by-product (such as tra?c or pollution) that the community does not desire. The balance of desirable and undesirable efiects leads to the multiobjective 68 problem. f1 = 7X i=1 (w1id(~x; ~ai)) f2 = 7X i=1 vi(~x; ~ai) vi(~x; ~ai) = 8> >>> >>>< >>>> >>> : 200 if w2id(~x; ~ai) < 10 200?w2id(~x; ~ai) if 10 ? w2id(~x; ~ai) < 30 0 if 30 ? w2id(~x; ~ai) x1;x2 2 [?20;40] ~a = ((5;20);(18;8);(22;16);(14;17); (7;2);(5;15);(12;4)) ~w1 = (5;7;2;3;6;1;5) ~w2 = (1;1;1;1;1;1;1) Since the SDFLP is a real-world problem, its Pareto optimal set is unknown. 4.2 Performance Assessment In the previous chapter, the only performance indicator used was the relative yield, or yield ratio. However, it is clear that the yield ratio is only an indicator of the relative sizes of the flnal Pareto optimal sets. This is not su?cient to adequately describe or compare the resultant Pareto optimal sets and fronts. 69 To overcome this limitation, flve indicators were chosen as measures of success and points of comparison between NEMO and NSGA-II { training time, yield ratio, spacing [69], hypervolume, and binary ?+ [70]. These indicators are described in more detail in the following sections. The work presented in [70{73] provide a more thorough discussion of assessment indicators for evolutionary multiobjective optimization. 4.2.1 Training Time The training time measures the total time required to train NEMO with the given algorithm. This indicator does not take into account the time required to add solutions to the existing Pareto front approximation. This is because doing so would unnecessarily punish NEMO for generating many solutions. Therefore, only the actual training time is taken into account. Additionally, the training time for NSGA-II was not collected. However, all NSGA-II runs were completed in, at most, several seconds. Therefore, NEMO must also be trainable in seconds if it is to be a legitimate alternative. As will be seen, the training time for the GRNN using the best training approach found in these experiments has a complexity of O(n), where n is the size of the training set. 4.2.2 Yield Ratio As in the previous chapter, the yield ratio was calculated, though its deflnition was expanded to allow for comparison of algorithms that use difierent numbers of function evaluations. The yield of a multiobjective optimization (MO) algorithm is deflned as the ratio of solutions in the Pareto optimal front approximation to the number of function evaluations used by the algorithm. The yield of a MO algorithm is expressed in units of solutions per function evaluation. The yield ratio is a binary indicator that is simply the 70 yield of one MO algorithm divided by the yield of another. For instance, if there are two MO algorithms, A and B, then the yield ratio would be YR(A;B) = yield(A)yield(B). In this way, the yield ratio is a unit-less measure of the number of solutions algorithm A produces relative to algorithm B in the same number of function evaluations. For example, a yield ratio of 2.0 means that algorithm A produces twice as many solutions as algorithm B. 4.2.3 Spacing Veldhuizen and Lamont [69] make use of the spacing indicator to measure the \spread" or distribution of solutions throughout the Pareto optimal front. The indicator S is calcu- lated as follows: S = vu ut 1 n?1 nX i=1 (?d?di)2 where di = minj ?Pm k=1 flfl flfik(~x)?fjk(~x) flfl fl ? , i;j = 1;:::;n, m is the number of objectives, ?d is the mean of all di, and n is the number of vectors in the Pareto optimal front approximation. Clearly, each di represents the L1 distance between solution i and the solution j that is closest to i. Likewise, ?d represents the mean of the closest distances for all solutions i. This means that the spacing indicator represents the average deviation from this mean across all solutions in the Pareto optimal set. With this indicator, a value of 0 would imply that the solutions found are all equally spaced. The larger the value, the less uniformly distributed the solutions tend to be. 71 4.2.4 Hypervolume Indicator In [70] and [7], the hypervolume indicator is described. This indicator is used to measure the volume subsumed by a given Pareto optimal front. In essence, every solution on a Pareto optimal front approximation can be seen as a lower corner of a hypervolume into which no other Pareto optimal solution may fall. (The other solution would not be Pareto optimal given that the flrst dominates it.) While this hypervolume is, in fact, inflnite in extent, it can be bounded by choosing any point dominated by it. In fact, it is possible [7] to flnd a single suitable bounding point that can be applied to two difierent Pareto optimal fronts in order to make their comparison possible. To flnd a suitable bounding point, the minimum and maximum values must be found along all dimensions (i.e., objectives) and across all given Pareto optimal front approxima- tions. Then, given the minimum and maximum points, the bounding point~b is determined as follows: bi = maxi +?(maxi ?mini); i = 1;:::;k where k is the number of objectives. For these experiments, ? was set to 0.01. This is presented in algorithmic form in Listing 4.1. 72 ALGORITHM BoundingPoint(front, ?) // front is a set of Pareto optimal front approximations to be compared. // Each front[i] is a Pareto optimal front approximation. // numFronts is the number of fronts in the front set. // numSolutions[i] is the number of solutions in front[i]. // ? is a user parameter that defaults to 0.01. min ? CreateArray(numObjectives, 0) max ? CreateArray(numObjectives, 0) FOR o ? 0 TO numObjectives - 1 LOOP FOR f ? 0 TO numFronts - 1 LOOP FOR s ? 0 TO numSolutions[f] - 1 LOOP soln ? front[f][s] IF f = 0 AND s = 0 THEN min[o] ? soln.output[o] max[o] ? soln.output[o] ELSE IF soln.output[o] < min[0] THEN min[o] ? soln.output[o] ELSE IF soln.output[o] > max[0] THEN max[o] ? soln.output[o] END IF END IF END LOOP END LOOP END LOOP boundingPoint ? CreatePoint(numObjectives, 0) FOR i ? 0 TO numObjectives - 1 LOOP boundingPoint[i] ? max[i] + ? * (max[i] - min[i]) END LOOP return boundingPoint Listing 4.1: Calculation of Bounding Point for Hypervolume Indicator 73 4.2.5 Binary ?+ Indicator The flnal indicator used in this work is the binary ?+ indicator [7,70]. This indicator is needed to support the hypervolume indicator [7]. The hypervolume values do not immedi- ately determine which Pareto set is better, and the bounding point chosen to calculate the hypervolumes is, in some sense, arbitrary. In order to deflne this indicator, it is necessary to make use of a comparison known as ?-dominance [7]. Given two solutions a and b, each with k objectives, a ??+ b ? ai ? ?+bi 8i = 1;:::;k Each binary ?+ indicator is actually a pair of numbers (IA;IB) calculated as follows: IA = inf ?2< f8b 2 B 9a 2 A : a ??+ bg IB = inf ?2< f8a 2 A 9b 2 B : b ??+ ag In essence, IA is the smallest value for ? that can be added to every dimension of every solution in B such that some solution is A dominates it. It may be considered something like a \fudge factor" that can ensure that A dominates B. IB can be interpreted in a similar fashion. (This deflnition assumes that all objectives are to be minimized. Maximization problems can be handled similarly.) The results of the binary ?+ indicator are conclusive in the event that either IA or IB is positive and the other is not. In such a case, the nonpositive element (for instance, IA) clearly dominates (in other words, A dominates B). However, In the event that both elements IA and IB are positive, the indicator is inconclusive. However, the smaller of the 74 two values can imply a very weak form of dominance given that it takes a smaller \fudge factor" to create a Pareto-dominance situation. 4.3 Experimental Setup Five difierent experiments were conducted to determine the efiectiveness of various training methods for NEMO. NSGA-II was run flve times for each test problem, yielding 50 difierent Pareto optimal sets/fronts. NSGA-II was initialized with a population size of 256 individuals for each run, and it was given 100 generations to execute (for a total of 25600 function evaluations). At that point, the current Pareto optimal set/front was saved (to be used for training NEMO), and NSGA-II was allowed to execute for an additional 100 generations (for a total of 51200 function evaluations). In this way, it was possible to accurately compare NEMO against NSGA-II because both would be judged on their performance after being given the results of the flrst 25600 function evaluations from NSGA- II. As in the previous chapter, the implementation used for NSGA-II was taken from the EMOO repository [66] and was implemented in C by Kalyanmoy Deb et. al. It was used without modiflcation except to add the problems from the test suite. It was successfully compiled under Borland version 5.5 and was executed on a Pentium 4 system running Windows XP. For each problem, NSGA-II was initialized with the default parameters of thesoftware|apopulationsizeof256, acrossoverprobabilityof0.9, amutationprobability of 1nd where nd is the number of decision variables, a distribution index for crossover of 10, and a distribution index for polynomial mutation of 50. 75 4.3.1 Evolve I/O and Evolve Single Sigma A steady-state genetic algorithm (GA) was used to evolve the subset of decision vari- ablesto beused as inputsto the GRNN, as wellas the valueof the GRNN?s sigmaparameter. Each individual in the population was represented by an array of binary values, one for each decision variable representing its inclusion/exclusion as an input to the learning system, and a single real-number value, representing the smoothing parameter for the GRNN. The values available for the component of the individual was taken from the interval [0.01, 20.0] (which was determined experimentally to produce decent performance). The GA was created with a population size of 100, uniform crossover, bit- ip mutation on the binary seg- ment of the chromosome, and Gaussian mutation on the real-coded segment. The crossover usage rate was set to 1.0, the mutation rates for both types of mutations were set to 0.1, and the mutation range for the Gaussian mutation was set to 1.0. To evaluate a given candidate input/output assignment and value, the inputs and outputs were used along with the value to construct a GRNN. This GRNN was trained using 90% of the given Pareto optimal set and was then tested on the remaining 10%. (In efiect, the flrst element out of every 10 in the Pareto optimal set was used for testing.) The mean squared error across the test set was used as the fltness value. 4.3.2 Heuristic I/O and Evolve Single Sigma A very simple heuristic (described in Listing 4.2) was used to determine the particular inputs and outputs that would be used by the GRNN. For each decision variable (i.e., candidate input/output) for a given problem, the variance of that variable in the training 76 set was calculated. The decision variable with the highest variance was used as the input in order to predict the remaining decision variables. A steady-state GA was used to evolve the value of the GRNN?s parameter. Each individual in the population was simply a single real-number value, representing the smoothing parameter. As before, the allowed range of the value was given to be [0.01, 20.0]. The GA was created with a population size of 100, uniform crossover, and Gaussian mutation. The crossover usage rate was set to 1.0, the mutation rate was set to 0.1, and the mutation range was set to 1.0. To evaluate a given candidate value, the pre-calculated inputs/outputs were used along with the candidate value to construct a GRNN. This GRNN was trained using 90% of the given Pareto optimal set and was then tested on the remaining 10%. The mean squared error across the test set was used as the fltness value. 77 ALGORITHM HeuristicIO(front, N, ioMask) variance ? CreateArray(numDecisionVars, 0.0) FOR d ? 0 TO numDecisionVars - 1 LOOP sum ? 0 FOR i ? 0 TO N ?1 LOOP sum ? sum + front[i].input[d] END LOOP mean ? sumN sum ? 0 sumSq ? 0 FOR i ? 0 TO N ?1 LOOP sumSq ? sumSq + (front[i].input[d] - mean)2 sum ? sum + (front[i].input[d] - mean) END LOOP variance[d] ? sumSq?sum2N(N?1) END LOOP largest ? FindLargest(variance) Initialize(ioMask, numDecisionVars, -1) ioMask[largest] ? 1 Listing 4.2: Heuristic Calculation of I/O Values 4.3.3 Heuristic I/O and Evolve Multiple Sigmas As in the previous section, this experiment used the same heuristic approach to flnd the inputs and outputs of the GRNN. However, the GRNN here was created with a separate smoothing parameter for each element of the training set. Therefore, the evolutionary approach was slightly modifled to allow a chromosome of length equal to that of the Pareto optimal set produced by NSGA-II after 25600 function evaluations. As before each value 78 was allowed to range between 0.01 and 20.0, inclusive. And, as before, evaluating a given set of candidate values was accomplished by using a 9:1 training/testing ratio and calculating the mean squared error on the test set. 4.3.4 Heuristic I/O and Heuristic Single Sigma As before, the heuristic approach was used to calculate the inputs and outputs of the GRNN. In this experiment, however, a heuristic approach was also applied to calculating the values for the GRNN. This approach is described in Listing 4.3. Essentially, for each point p in the Pareto optimal set, the distance from p to all the other points is calculated. Then, the average distance between p and its K closest neighbors is used as the value for element p. This is repeated for all points in the Pareto optimal set. Those calculated ?s are then averaged together to arrive at a single for the GRNN. 79 ALGORITHM HeuristicSigma(front, ioMask, N, K, sigma) count ? 0 FOR i ? 0 TO numDecisionVars - 1 LOOP IF ioMask[i] THEN inputIndex[count] ? i count ? count + 1 END IF END LOOP FOR i ? 0 TO N - 1 LOOP distance ? CreateArray(N - 1, 0.0) index ? 0 FOR j ? 0 TO N - 1 LOOP IF i 6= j THEN FOR d ? 0 TO count - 1 LOOP distance[index] ? distance[index] + (front[i].input[inputIndex[d]] - front[j].input[inputIndex[d]])2 END FOR distance[index] ? pdistance[index] index ? index + 1 END IF END FOR SortAscending(distance) FOR j ? 0 TO K - 1 LOOP sigma[i] ? sigma[i] + distance[j] END FOR sigma[i] ? sigma[i]K END FOR Listing 4.3: Heuristic Calculation of Values 80 4.3.5 Heuristic I/O and Heuristic Multiple Sigmas In this flnal experiment, the heuristic was used to determine the I/O mask, and the heuristic in the previous section was used to flnd a value for each point in the Pareto optimal set. However, the values found were used without being averaged together. 4.4 Results Figures 4.1-4.6 provide graphical depictions of the results of each training algorithm on each test problem, grouped according to the indicator in question. In these flgures, the test suite problems lie along the horizontal axis while the indicator values lie along the vertical axis. Each problem is associated with flve vertical bars, representing the performance of each ofthetrainingapproachesonthatproblem. Ineachflgure, theresultsoftheSDFLPproblem were omitted because the NSGA-II algorithm failed to produce any further solutions when allowed to execute for another 100 generations. As Table 4.1 shows, however, the NEMO approach (with all training approaches) found a great number of solutions in each case for SDFLP compared to the original NSGA-II production (denoted "NSGA-II (half)" in the table). For each training approach and test problem, the solutions found by NSGA-II and NEMO are plotted for the flrst of the flve runs. In each of these graphs, the NSGA-II solutions are depicted in red while the NEMO solutions are in blue. These solutions are those found only after the initial 100 generations of the NSGA-II algorithm. (This eliminates the \clutter" of solutions that were the common foundation of both approaches and serves to highlight their individual performances.) It is worth noting that these graphs were created 81 by flrst plotting the NEMO results and then the NSGA-II results. This means that any visible NEMO points certainly do not obscure any NSGA-II points. Likewise, the e?cient sets produced by NSGA-II and NEMO for the flrst run on prob- lems KNO1, OKA1, VLMOP2, VLMOP3, and SDFLP are represented graphically for each training approach. Only those flve test problems are displayed, since they have only two de- cision variables. The e?cient set predicted by NEMO is represented in black, the NSGA-II solutions in the e?cient set are represented in red, and the NEMO solutions are represented in blue. File NSGA-II EIO-ESS HIO-ESS HIO-EMS HIO-HSS HIO-HMS (half) 1 1502 5427 5297 12544 5439 6349 2 1482 6747 6136 12472 5231 6356 3 1510 6105 6130 12630 6046 6564 4 1498 6694 6694 12689 5440 6251 5 1519 10423 7320 12466 5933 7195 Average 1502.2 7079.2 6315.4 12560.2 5617.8 6543 St. Dev. 13.86 1944.13 751.04 97.95 352.1 381.91 Table 4.1: NEMO Solutions for SDFLP Compared with NSGA-II (half) 4.4.1 Evolve I/O and Evolve Single Sigma The results of this approach on the flrst run are depicted in Figure 4.7. Figure 4.8 shows the e?cient sets produced for this training approach. The results from all flve runs are provided in Appendix A as Table A.1. The most relevant and important indicator for this approach is training time (Fig- ure 4.1). Evolving I/O and sigma values takes much more time than the other approaches, especially for DTLZ2a and DTLZ4a, requiring an average of more than 20 minutes to train. 82 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a EIO?ESSHIO?ESS HIO?EMSHIO?HSS HIO?HMS Problem Mean Training Time (secs) 0 500 1000 1500 Figure 4.1: Comparison of Training Method Impact on Training Time with Standard Error Bars. 83 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a EIO?ESSHIO?ESS HIO?EMSHIO?HSS HIO?HMS Problem Mean Yield Ratio 0 50 100 150 200 250 300 350 Figure 4.2: Comparison of Training Method Impact on Yield Ratio with Standard Error Bars. 84 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a EIO?ESSHIO?ESS HIO?EMSHIO?HSS HIO?HMS Problem Mean NEMO Spacing 0.0 0.2 0.4 0.6 0.8 1.0 Figure 4.3: Comparison of Training Method Impact on NEMO Spacing with Standard Error Bars. 85 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a EIO?ESSHIO?ESS HIO?EMSHIO?HSS HIO?HMS Problem Mean NEMO Hypervolume 0e+00 1e+05 2e+05 3e+05 4e+05 Figure 4.4: Comparison of Training Method Impact on NEMO Hypervolume with Standard Error Bars. 86 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a EIO?ESSHIO?ESS HIO?EMSHIO?HSS HIO?HMS Problem Mean NSGA?NEMO Binary ?+ ?1.0 ?0.5 0.0 0.5 1.0 1.5 Figure 4.5: Comparison of Training Method Impact on NSGA-NEMO Binary ?+ Indicator with Standard Error Bars. 87 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a EIO?ESSHIO?ESS HIO?EMSHIO?HSS HIO?HMS Problem Mean NEMO?NSGA Binary ?+ 0 5 10 15 20 25 30 Figure 4.6: Comparison of Training Method Impact on NEMO-NSGA Binary ?+ Indicator with Standard Error Bars. 88 In terms of yield ratio (Figure 4.2), this approach generally does no better than the other approaches, and in many cases (OKA1, OKA2, DTLZ1a, DTLZ2a, DTLZ4a, and DTLZ7a) it performs signiflcantly worse. With spacing (Figure 4.3), this approach performs par- ticularly poorly on OKA1, OKA2, DTLZ1a, DTLZ2a, and DTLZ7a. Finally, in terms of hypervolume (Figure 4.4), evolving I/O and sigma values is unable to outperform other approaches. 4.4.2 Heuristic I/O and Evolve Single Sigma The results of this approach on the flrst run are depicted in Figure 4.9. Figure 4.10 shows the e?cient sets produced for this training approach. The results from all flve runs are provided in Appendix A as Table A.2. As with all of the evolutionary training approaches, the training time (Figure 4.1) is very high when compared to the non-evolutionary approaches, especially for DTLZ2a and DTLZ4a. In terms of yield ratio (Figure 4.2), this approach performs relatively well, on par with the yield ratio of the non-evolutionary approaches. The spacing indicator (Figure 4.3) reveals that this approach performs better than the previous one, but is outperformed by the non-evolutionary approaches. Finally, in terms of hypervolume (Figure 4.4), this approach has performance that is on par with the non-evolutionary approaches. 4.4.3 Heuristic I/O and Evolve Multiple Sigmas The results of this approach on the flrst run are depicted in Figure 4.11. Figure 4.12 shows the e?cient sets produced for this training approach. The results from all flve runs are provided in Appendix A as Table A.3. 89 2 4 6 8 10 122 4 6 8 10 12 Objective 1 Objective 2 NSGA?IINEMO (a) kno1 0 1 2 3 4 5 60.4 0.6 0.8 1.0 1.2 1.4 1.6 Objective 1 Objective 2 NSGA?IINEMO (b) oka1 ?3?2?1 0 1 2 3 0.2 0.4 0.6 0.8 1.0 1.2 Objective 1 Objective 2 NSGA?IINEMO (c) oka2 0.0 0.2 0.4 0.6 0.8 1.00.0 0.2 0.4 0.6 0.8 1.0 Objective 1 Objective 2 NSGA?IINEMO (d) vlmop2 0 2 4 6 8 10?0.10 ?0.05 0.00 0.05 0.10 0.15 0.20 15.015.516.0 16.517.017.5 Objective 1 Objective 2Objective 3 (e) vlmop3 0.00.10.20.30.40.50.0 0.1 0.2 0.3 0.4 0.5 Objective 1 Objective 2 NSGA?IINEMO (f) dtlz1a 0.00.20.40.60.81.01.21.40.0 0.2 0.4 0.6 0.8 1.0 0.00.2 0.40.6 0.81.0 1.2 Objective 1 Objective 2Objective 3 (g) dtlz2a 0.00.20.40.60.81.01.21.40.0 0.2 0.4 0.6 0.8 1.0 0.00.2 0.40.6 0.81.0 1.2 Objective 1 Objective 2Objective 3 (h) dtlz4a 0.00.20.40.60.81.02.5 3.03.5 4.04.5 5.05.5 6.06.5 0.00.20.4 0.60.81.0 Objective 1 Objective 2 Objective 3 (i) dtlz7a 40050060070080090010000 200 400 600 800 1000 1200 Objective 1 Objective 2 NSGA?IINEMO (j) sd p Figure 4.7: Results of Evolving I/O and Single Sigma on Test Suite 90 0.0 0.5 1.0 1.5 2.0 2.5 3.00.0 0.5 1.0 1.5 2.0 2.5 3.0 Input 1 Input 2 NSGA?IINEMOFull Learner (a) kno1 0 1 2 3 4 5 60 1 2 3 4 Input 1 Input 2 NSGA?IINEMOFull Learner (b) oka1 ?2 ?1 0 1 2 ?0.6 ?0.4 ?0.2 0.0 0.2 0.4 0.6 Input 1 Input 2 NSGA?IINEMOFull Learner (c) vlmop2 ?3.0?2.5?2.0?1.5?1.0?0.50.0?3 ?2 ?1 0 1 2 3 Input 1 Input 2 NSGA?IINEMOFull Learner (d) vlmop3 ?20?100 10 20 30 40?20 ?15 ?10 ?5 0 Input 1 Input 2 NSGA?IINEMOFull Learner (e) sd p Figure 4.8: E?cient Set Created by Evolving I/O and Single Sigma on Test Suite 91 2 4 6 8 10 122 4 6 8 10 12 Objective 1 Objective 2 NSGA?IINEMO (a) kno1 0 1 2 3 4 5 6 0.5 1.0 1.5 2.0 2.5 Objective 1 Objective 2 NSGA?IINEMO (b) oka1 ?3?2?1 0 1 2 3 0.2 0.4 0.6 0.8 1.0 Objective 1 Objective 2 NSGA?IINEMO (c) oka2 0.0 0.2 0.4 0.6 0.8 1.00.0 0.2 0.4 0.6 0.8 1.0 Objective 1 Objective 2 NSGA?IINEMO (d) vlmop2 0 2 4 6 8 10?0.10 ?0.05 0.00 0.05 0.10 0.15 0.20 15.015.516.0 16.517.017.5 Objective 1 Objective 2Objective 3 (e) vlmop3 0.00.10.20.30.40.50.0 0.1 0.2 0.3 0.4 0.5 Objective 1 Objective 2 NSGA?IINEMO (f) dtlz1a 0.00.20.40.60.81.01.21.40.0 0.2 0.4 0.6 0.8 1.0 0.00.2 0.40.6 0.81.0 1.2 Objective 1 Objective 2Objective 3 (g) dtlz2a 0.00.20.40.60.81.01.21.40.0 0.2 0.4 0.6 0.8 1.0 0.00.2 0.40.6 0.81.0 1.2 Objective 1 Objective 2Objective 3 (h) dtlz4a 0.00.20.40.60.81.02.5 3.03.5 4.04.5 5.05.5 6.06.5 0.00.20.4 0.60.81.0 Objective 1 Objective 2 Objective 3 (i) dtlz7a 40050060070080090010000 200 400 600 800 1000 1200 Objective 1 Objective 2 NSGA?IINEMO (j) sd p Figure 4.9: Results of Heuristic I/O and Evolving Single Sigma on Test Suite 92 0.0 0.5 1.0 1.5 2.0 2.5 3.00.0 0.5 1.0 1.5 2.0 2.5 3.0 Input 1 Input 2 NSGA?IINEMOFull Learner (a) kno1 0 1 2 3 4 5 60 1 2 3 4 5 6 Input 1 Input 2 NSGA?IINEMOFull Learner (b) oka1 ?2 ?1 0 1 2 ?0.6 ?0.4 ?0.2 0.0 0.2 0.4 0.6 Input 1 Input 2 NSGA?IINEMOFull Learner (c) vlmop2 ?3.0?2.5?2.0?1.5?1.0?0.50.0?3 ?2 ?1 0 1 2 3 Input 1 Input 2 NSGA?IINEMOFull Learner (d) vlmop3 ?20?100 10 20 30 40?20 ?15 ?10 ?5 0 Input 1 Input 2 NSGA?IINEMOFull Learner (e) sd p Figure 4.10: E?cient Set Created by Heuristic I/O and Evolving Single Sigma on Test Suite 93 Just like the other evolutionary training approaches, the training time (Figure 4.1) is very high when compared to the non-evolutionary approaches, especially for DTLZ2a and DTLZ4a. In terms of yield ratio (Figure 4.2), this approach performs relatively well, on par with the yield ratio of the non-evolutionary approaches, with the exception of KNO1 (where it performs very poorly) and OKA2 (where it performs remarkably well, achieving a yield ratio of over 300). With the spacing indicator (Figure 4.3), this system outperforms all other training approaches. Finally, in terms of hypervolume (Figure 4.4), this approach performs well except on the KNO1 test problem. 4.4.4 Heuristic I/O and Heuristic Single Sigma The results of this approach on the flrst run are depicted in Figure 4.13. Figure 4.14 shows the e?cient sets produced for this training approach. The results from all flve runs are provided in Appendix A as Table A.4. Clearly, the training time (Figure 4.1) for this and the next non-evolutionary approach is quite low, averaging less than 2 seconds on all problems except DTLZ2a and DTLZ4a, where it needed around 6 seconds to train. The yield ratio (Figure 4.2) is greater than 1 (which means that NEMO produced more solutions than NSGA-II) on all but three of the test problems (OKA1, DTLZ4a, and DTLZ7a). For the spacing indicator (Figure 4.3), this approach performs very well. It actually performs statistically better than NSGA-II on four of the problems (KNO1, OKA2, VLMOP3, and DTLZ2a) and is only outperformed on DTLZ1a. The hypervolume indicator (Figure 4.4) reveals that this approach outper- forms NSGA-II on six of the problems (KNO1, OKA2, VLMOP2, VLMOP3, DTLZ1a, and DTLZ2a) while being outperformed on only two problems (OKA1 and DTLZ4a). For the 94 2 4 6 8 10 122 4 6 8 10 12 Objective 1 Objective 2 NSGA?IINEMO (a) kno1 0 1 2 3 4 5 60.4 0.6 0.8 1.0 1.2 1.4 1.6 Objective 1 Objective 2 NSGA?IINEMO (b) oka1 ?3?2?1 0 1 2 3 0.2 0.4 0.6 0.8 1.0 Objective 1 Objective 2 NSGA?IINEMO (c) oka2 0.0 0.2 0.4 0.6 0.8 1.00.0 0.2 0.4 0.6 0.8 1.0 Objective 1 Objective 2 NSGA?IINEMO (d) vlmop2 0 2 4 6 8 10?0.10 ?0.05 0.00 0.05 0.10 0.15 0.20 15.015.516.0 16.517.017.5 Objective 1 Objective 2Objective 3 (e) vlmop3 0.00.10.20.30.40.50.0 0.1 0.2 0.3 0.4 0.5 Objective 1 Objective 2 NSGA?IINEMO (f) dtlz1a 0.00.20.40.60.81.01.21.40.0 0.2 0.4 0.6 0.8 1.0 0.00.2 0.40.6 0.81.0 1.2 Objective 1 Objective 2Objective 3 (g) dtlz2a 0.00.20.40.60.81.01.21.40.0 0.2 0.4 0.6 0.8 1.0 0.00.2 0.40.6 0.81.0 1.2 Objective 1 Objective 2Objective 3 (h) dtlz4a 0.00.20.40.60.81.02.5 3.03.5 4.04.5 5.05.5 6.06.5 0.00.20.4 0.60.81.0 Objective 1 Objective 2 Objective 3 (i) dtlz7a 50060070080090010001100200 400 600 800 1000 1200 Objective 1 Objective 2 NSGA?IINEMO (j) sd p Figure 4.11: Results of Heuristic I/O and Evolving Multiple Sigmas on Test Suite 95 0.0 0.5 1.0 1.5 2.0 2.5 3.0 1.5 2.0 2.5 3.0 Input 1 Input 2 NSGA?IINEMOFull Learner (a) kno1 0 1 2 3 4 5 60 1 2 3 4 5 6 Input 1 Input 2 NSGA?IINEMOFull Learner (b) oka1 ?2 ?1 0 1 2 ?0.6 ?0.4 ?0.2 0.0 0.2 0.4 0.6 Input 1 Input 2 NSGA?IINEMOFull Learner (c) vlmop2 ?3.0?2.5?2.0?1.5?1.0?0.50.0?3 ?2 ?1 0 1 2 3 Input 1 Input 2 NSGA?IINEMOFull Learner (d) vlmop3 ?20?100 10 20 30 40?9.0 ?8.5 ?8.0 ?7.5 ?7.0 ?6.5 ?6.0 ?5.5 Input 1 Input 2 NSGA?IINEMOFull Learner (e) sd p Figure 4.12: E?cient Set Created by Heuristic I/O and Evolving Multiple Sigmas on Test Suite 96 ?+ indicator, neither NSGA-II nor NEMO clearly outperforms the other on average, but NSGA-II has a statistically signiflcant lower value for this indicator on OKA2, DTLZ2a, DTLZ4a, and DTLZ7a. 4.4.5 Heuristic I/O and Heuristic Multiple Sigmas The results of this approach on the flrst run are depicted in Figure 4.15. Figure 4.16 shows the e?cient sets produced for this training approach. The results from all flve runs are provided in Appendix A as Table A.5. Once again, the training time (Figure 4.1) for this non-evolutionary approach is quite low, averaging less than 2 seconds on all problems except DTLZ2a and DTLZ4a, where it needed around 6 seconds to train. The yield ratio (Figure 4.2) is greater than 1 (which means that NEMO produced more solutions than NSGA-II) on all test problems, reaching over 4 (which means 4 times the number of NSGA-II solutions) on 6 of the 10 problems. As with the previous non-evolutionary approach, the spacing indicator (Figure 4.3) reveals that NEMO trained with this approach is statistically better than NSGA-II on KNO1, OKA2, VLMOP3, and DTLZ2a, while being outperformed on DTLZ1a. For the hypervolume indicator, NEMO with this training approach is statistically superior to NSGA-II on KNO1, VLMOP2, VLMOP3, DTLZ1a, and DTLZ2a, while being outperformed on none of the problems. As before, for the ?+ indicator, neither NSGA-II nor NEMO clearly outperforms the other on average, but NSGA-II has a statistically signiflcant lower value for this indicator on OKA2, DTLZ2a, DTLZ4a, and DTLZ7a. 97 2 4 6 8 10 122 4 6 8 10 12 Objective 1 Objective 2 NSGA?IINEMO (a) kno1 0 1 2 3 4 5 60.4 0.6 0.8 1.0 1.2 1.4 1.6 Objective 1 Objective 2 NSGA?IINEMO (b) oka1 ?3?2?1 0 1 2 3 0.2 0.4 0.6 0.8 1.0 Objective 1 Objective 2 NSGA?IINEMO (c) oka2 0.0 0.2 0.4 0.6 0.8 1.00.0 0.2 0.4 0.6 0.8 1.0 Objective 1 Objective 2 NSGA?IINEMO (d) vlmop2 0 2 4 6 8 10?0.10 ?0.05 0.00 0.05 0.10 0.15 0.20 15.015.516.0 16.517.017.5 Objective 1 Objective 2Objective 3 (e) vlmop3 0.00.10.20.30.40.50.0 0.1 0.2 0.3 0.4 0.5 Objective 1 Objective 2 NSGA?IINEMO (f) dtlz1a 0.00.20.40.60.81.01.21.40.0 0.2 0.4 0.6 0.8 1.0 0.00.2 0.40.6 0.81.0 1.2 Objective 1 Objective 2Objective 3 (g) dtlz2a 0.00.20.40.60.81.01.21.40.0 0.2 0.4 0.6 0.8 1.0 0.00.2 0.40.6 0.81.0 1.2 Objective 1 Objective 2Objective 3 (h) dtlz4a 0.00.20.40.60.81.02.5 3.03.5 4.04.5 5.05.5 6.06.5 0.00.20.4 0.60.81.0 Objective 1 Objective 2 Objective 3 (i) dtlz7a 30040050060070080090010000 200 400 600 800 1000 1200 Objective 1 Objective 2 NSGA?IINEMO (j) sd p Figure 4.13: Results of Heuristic I/O and Heuristic Single Sigma on Test Suite 98 0.0 0.5 1.0 1.5 2.0 2.5 3.00.0 0.5 1.0 1.5 2.0 2.5 3.0 Input 1 Input 2 NSGA?IINEMOFull Learner (a) kno1 0 1 2 3 4 5 60 1 2 3 4 5 6 Input 1 Input 2 NSGA?IINEMOFull Learner (b) oka1 ?2 ?1 0 1 2 ?0.6 ?0.4 ?0.2 0.0 0.2 0.4 0.6 Input 1 Input 2 NSGA?IINEMOFull Learner (c) vlmop2 ?3.0?2.5?2.0?1.5?1.0?0.50.0?3 ?2 ?1 0 1 2 3 Input 1 Input 2 NSGA?IINEMOFull Learner (d) vlmop3 ?20?100 10 20 30 40?20 ?15 ?10 ?5 0 5 Input 1 Input 2 NSGA?IINEMOFull Learner (e) sd p Figure 4.14: E?cient Set Created by Heuristic I/O and Heuristic Single Sigma on Test Suite 99 2 4 6 8 10 122 4 6 8 10 12 Objective 1 Objective 2 NSGA?IINEMO (a) kno1 0 1 2 3 4 5 60.4 0.6 0.8 1.0 1.2 1.4 1.6 Objective 1 Objective 2 NSGA?IINEMO (b) oka1 ?3?2?1 0 1 2 3 0.2 0.4 0.6 0.8 1.0 Objective 1 Objective 2 NSGA?IINEMO (c) oka2 0.0 0.2 0.4 0.6 0.8 1.00.0 0.2 0.4 0.6 0.8 1.0 Objective 1 Objective 2 NSGA?IINEMO (d) vlmop2 0 2 4 6 8 10?0.10 ?0.05 0.00 0.05 0.10 0.15 0.20 15.015.516.0 16.517.017.5 Objective 1 Objective 2Objective 3 (e) vlmop3 0.00.10.20.30.40.50.0 0.1 0.2 0.3 0.4 0.5 Objective 1 Objective 2 NSGA?IINEMO (f) dtlz1a 0.00.20.40.60.81.01.21.40.0 0.2 0.4 0.6 0.8 1.0 0.00.2 0.40.6 0.81.0 1.2 Objective 1 Objective 2Objective 3 (g) dtlz2a 0.00.20.40.60.81.01.21.40.0 0.2 0.4 0.6 0.8 1.0 0.00.2 0.40.6 0.81.0 1.2 Objective 1 Objective 2Objective 3 (h) dtlz4a 0.00.20.40.60.81.02.5 3.03.5 4.04.5 5.05.5 6.06.5 0.00.20.4 0.60.81.0 Objective 1 Objective 2 Objective 3 (i) dtlz7a 400 600 800 10000 200 400 600 800 1000 1200 Objective 1 Objective 2 NSGA?IINEMO (j) sd p Figure 4.15: Results of Heuristic I/O and Heuristic Multiple Sigmas on Test Suite 100 0.0 0.5 1.0 1.5 2.0 2.5 3.00.0 0.5 1.0 1.5 2.0 2.5 3.0 Input 1 Input 2 NSGA?IINEMOFull Learner (a) kno1 0 1 2 3 4 5 60 1 2 3 4 5 6 Input 1 Input 2 NSGA?IINEMOFull Learner (b) oka1 ?2 ?1 0 1 2 ?0.6 ?0.4 ?0.2 0.0 0.2 0.4 0.6 Input 1 Input 2 NSGA?IINEMOFull Learner (c) vlmop2 ?3.0?2.5?2.0?1.5?1.0?0.50.0?3 ?2 ?1 0 1 2 3 Input 1 Input 2 NSGA?IINEMOFull Learner (d) vlmop3 ?20?100 10 20 30 40?20 ?15 ?10 ?5 0 5 Input 1 Input 2 NSGA?IINEMOFull Learner (e) sd p Figure 4.16: E?cient Set Created by Heuristic I/O and Heuristic Multiple Sigmas on Test Suite 101 4.5 Conclusions These experiments reveal several aspects of the NEMO approach. First, the non- evolutionary training approaches are superior to the evolutionary approaches in terms of training time, which is clearly the most important indicator for this comparison. Second, the heuristic I/O and heuristic multiple sigmas (HIO-HMS) training approach is superior to the other non-evolutionary approach in the remaining indicators. Third, NEMO trained with HIO-HMS more often outperforms NSGA-II on all indicators except for ?+. Finally, the ?+ indicator revealed that neither NEMO nor NSGA-II clearly outperformed the other. Using the HIO-HMS training algorithm to produce a NEMO is the most efiective approach, taking only a few seconds to train. Such a NEMO can be quite competitive with NSGA-II in all indicators. Most importantly, it can even produce many times more solutions than NSGA-II for most of the test problems considered here. 102 Chapter 5 Neural Enhancement Using Other Foundational Optimization Algorithms In the previous chapter, the NEMO approach was shown to be practical and efiective in expanding the set of optimal solutions found by NSGA-II. It seems reasonable to expect that similar results would arise by using a difierent optimization algorithm to produce the training set for NEMO. However, it is important to determine whether NEMO can perform as well when attached to other optimization approaches. There may be aspects of certain optimization techniques that enable NEMO to perform even more successful enhancement of the Pareto optimal set. Or it may be that NEMO requires a certain level of success from the underlying optimization algorithm in order to have any measure of success of its own. In this chapter, two additional multiobjective optimization approaches are considered { Pareto Archived Evolution Strategies (PAES) [42] and Multiobjective Particle Swarm Optimization (MOPSO) [4,53]. These two approaches difier both from NSGA-II and from one another in interesting ways. First, PAES is an evolutionary approach, like NSGA-II, but it is not a population-based evolutionary system, being based on the simple f1+1g evolution strategy [20]. Likewise, the MOPSO approach is population-based, like NSGA- II, but it is not truly an evolutionary system since it does not explicitly use evolutionary operators such as recombination or mutation. The results achieved by using each of these algorithms to train NEMO are compared to one another and to the previous results from NSGA-II. 103 5.1 Test Suite The set of multiobjective optimization problems from the previous chapter was also used for this experiment. These are the nine multiobjective benchmark problems taken from [7] { KNO1, OKA1, OKA2 [67], VLMOP2, VLMOP3 [68], DTLZ1a, DTLZ2a, DTLZ4a, and DTLZ7a { and the semi-desirable facility location problem (SDFLP) [65]. 5.2 Performance Assessment As with the test suite, most of the performance indicators from the previous chapter were used in this experiment as well. These indicators were spacing [69], hypervolume [70], and binary ?+ [70]. These indicators were described in detail in the previous chapter and, thus, will not be discussed again here. However, the yield ratio that was used in the previous chapter was not used for this work. Instead, only the sizes of the resultant Pareto optimal frontiers were used. For each indicator, the two Pareto optimal frontiers under comparison were merged to determine the number of solutions from each that would remain nondominated. This approach reduces the size of the Pareto optimal frontiers under comparison, but a side efiect is that sometimes the resultant Pareto frontier for a particular approach is empty. In such a case, it is possible for the value of the indicators to be undeflned (such as the yield ratio). In the following sections, those situations will be acknowledged when necessary. 5.3 Experimental Setup Three evolutionary multiobjective optimization approaches { NSGA-II, MOPSO, and PAES { were used to produce the training sets for NEMO. As in the previous chapter, 104 each EMO approach was allowed 25600 function evaluations after which the current Pareto optimal set was saved to be used for training. Then, the EMO was allowed to continue for another 25600 function evaluations (51200 total). The subsections below describe the implementations of the EMO approaches used. Three difierent NEMOs were then created using those training sets { NEMONSGA?II, NEMOMOPSO, and NEMOPAES. Each NEMO was allowed 25600 function evaluations to expand the training set for comparison with the EMO-only approach. As before, each NEMO consisted of a general regression neural network [32] trained using heuristic ap- proaches for flnding the input/output assignment and multiple values. In order to determine statistical signiflcance, each EMO was used to produce 30 difier- ent Pareto optimal fronts (using a difierent pseudorandom seed each time) on each of the 10 problems in the test suite. Therefore, 30 difierent runs were generated, each yielding three difierent EMOs and three difierent NEMOs. 5.3.1 NSGA-II The implementation used for NSGA-II was the same as was used in the previous chap- ters. For each problem, NSGA-II was initialized using the software?s default parameters as specifled in Table 5.1. 5.3.2 PAES The implementation used for PAES was based on C source code from Joshua Knowles? professional website [74]. This code was re-implemented in C++ by the author for these experiments using object-oriented programming techniques to make it more modular. The 105 Problem Population Crossover Mutation Crossover Mutation Size Probability Probability Distribution Distribution Index Index kno1 256 0.9 0.5 10 50 oka1 256 0.9 0.5 10 50 oka2 256 0.9 0.3333 10 50 vlmop2 256 0.9 0.5 10 50 vlmop3 256 0.9 0.5 10 50 dtlz1a 256 0.9 0.1667 10 50 dtlz2a 256 0.9 0.125 10 50 dtlz4a 256 0.9 0.125 10 50 dtlz7a 256 0.9 0.125 10 50 sd p 256 0.9 0.5 10 50 Table 5.1: Parameters for NSGA-II new implementation was tested against Knowles? existing code to ensure a correct imple- mentation. It was successfully compiled under Borland version 5.5 and was executed on a Pentium 4 system running Windows XP. For each problem, PAES was initialized with parameters as illustrated in Table 5.2. 5.3.3 MOPSO The implementation used for MOPSO was based on C source code from the EMOO repository [75]. As with the PAES source code, this code was re-implemented in C++ by the author for these experiments using object-oriented programming techniques. The new implementation was tested against the original code to ensure a correct implementation. It was successfully compiled under Borland version 5.5 and was executed on a Pentium 4 system running Windows XP. For all problems, MOPSO was initialized with a population 106 Problem Population Archive Grid Mutation Mutation Size Size Divisions Usage Rate Range kno1 100 100 5 0.1 0.2 oka1 100 100 5 0.1 0.001 oka2 100 100 5 0.1 0.001 vlmop2 100 100 5 0.1 0.1 vlmop3 100 100 5 0.1 0.1 dtlz1a 100 100 5 0.1 0.001 dtlz2a 100 100 5 0.1 0.1 dtlz4a 100 100 5 0.1 0.01 dtlz7a 100 100 5 0.1 0.01 sd p 100 100 5 0.1 0.1 Table 5.2: Parameters for PAES size of 100, an archive size of 100, cognitive and social rates of 1.0, an inertia weight of 0.4, and 30 grid divisions. Mutation was also used on the population as described in [53] with a mutation rate of 0.1. 5.4 Results The results from this experiment consist of six Pareto optimal sets { one from each of the EMOs and one from each of the NEMOs. In order to analyze these sets and calculate the performance indicators, it was necessary to consider them pairwise as one EMO versus one NEMO. The Pareto optimal sets from each pair were \merged" to determine the solutions that \survived", and only those solutions were kept in each optimal set for comparison. Additionally, the hypervolume indicator requires a bounding point for comparison, and such a bounding point must be calculated using a pair of optimal sets. 107 Once the indicators were calculated for each of the 30 runs on a given pairing, the mean and standard deviation were calculated for the sizes of the Pareto optimal sets, the spacing indicator, and the hypervolume indicator. The median and interquartile range (IQR) were calculated for the binary ?+ indicator instead of the mean and standard deviation. This is because a median of less than 0 for the ?+ indicator means that the indicator was less than 0 for at least 50% of the runs. Finally, tests for statistical signiflcance were performed. Because there is no evidence to suggest that the distribution of the indicators is approximately normal, a Mann-Whitney rank-sum test was used to test for statistical signiflcance. In the following nine sections, each pairwise comparison is analyzed. For each pairwise comparison described below, tables are provided that display the average indicator values for the Pareto optimal front size, spacing indicator, and hypervol- ume indicator for each problem in the test suite, along with the p-value from the rank-sum test. To make the analysis more accessible, if p < 0:05, the statistically dominant average in each table is boldfaced and underlined. Additionally, tables are given that show the me- dian and interquartile ranges for the binary ?+ indicator, along with the p-value from the rank-sum test (with similar boldfacing and underlining denoting statistical signiflcance). Likewise, flgures are provided that display the average and standard errors of the optimal front sizes, spacing, hypervolume indicators, and the median and standard errors of the binary ?+ indicators, for each problem except SDFLP. It was removed from the flgures because the indicator values produced by NEMO were, in many cases, so large that the graph was rendered essentially useless for the remaining problems in the test suite. 108 5.4.1 NSGA-II versus NEMONSGA?II This comparison is identical to the flnal experiment in the previous chapter. However, in this case, 30 runs were used to ensure statistical signiflcance. Tables 5.3-5.5 show the average Pareto optimal front size, spacing indicator, and hypervolume indicators, while Ta- ble 5.6 shows the median and interquartile ranges for the binary ?+ indicator. Additionally, Figures 5.1-5.4 display the average and standard errors of the optimal front sizes, spacing, hypervolume indicators, and the median and standard errors of the binary ?+ indicators. As was seen previously, NEMONSGA?II greatly outperforms NSGA-II in terms of Pareto optimal set size. It also outperforms NSGA-II on the spacing and hypervolume indicators for most problems in the test suite. However, on the binary ?+ indicator, NSGA- II outperforms NEMONSGA?II on most problems, but it speciflcally dominates on OKA2, where it achieves a negative median value. Problem NSGA-II NEMONSGA?II p-value Mean St. Dev. Mean St. Dev. kno1 611.57 19.04 9856.07 201.07 3.00E-011 oka1 795.53 234.77 239.1 102.12 7.38E-011 oka2 29.17 7.48 2.97 9.73 3.17E-010 vlmop2 2526.03 41.17 8434.1 149.72 3.00E-011 vlmop3 2016.47 1182.48 10211.1 562.54 3.02E-011 dtlz1a 2210.1 229.07 7867.37 1180.86 3.02E-011 dtlz2a 162.43 44.68 10874.63 38.69 3.00E-011 dtlz4a 3467.77 2012.24 7327.6 1348.46 3.50E-009 dtlz7a 4274.87 1294.04 3639.5 725.25 0.19 sd p 1236.1 36.35 5318.03 440.75 3.01E-011 Table 5.3: Pareto Size Indicators for NSGA-II and NEMONSGA?II 109 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a nsga2NEMO?nsga2 Problem Pareto Front Size 0 2000 4000 6000 8000 10000 Figure 5.1: Pareto Size Indicators for NSGA-II and NEMONSGA?II with Standard Error Bars. 110 Problem NSGA-II NEMONSGA?II p-value Mean St. Dev. Mean St. Dev. kno1 0.02 0 0 0 4.15E-014 oka1 0.06 0.04 0.12 0.15 0.02 oka2 0.49 0.33 0.06 0.26 2.95E-010 vlmop2 0 0 0 0 NaN vlmop3 0.01 0 0 0 5.36E-009 dtlz1a 0 0 0 0 NaN dtlz2a 0.06 0.02 0.01 0 4.50E-012 dtlz4a 0.01 0.01 0.01 0 0.16 dtlz7a 0.03 0.03 0.03 0.02 0.76 sd p 2.36 2.23 2.27 0.4 0.53 Table 5.4: Spacing Indicators for NSGA-II and NEMONSGA?II Problem NSGA-II NEMONSGA?II p-value Mean St. Dev. Mean St. Dev. kno1 18242.5 782.66 350345.27 6445.41 3.02E-011 oka1 943.09 353.19 276.74 127.2 5.97E-009 oka2 11.6 4.45 1.59 4.41 2.14E-009 vlmop2 237.79 3.85 655.41 9.28 3.02E-011 vlmop3 2155.04 1314.66 6592.21 728.88 3.02E-011 dtlz1a 99.34 13.51 331.46 69.83 3.02E-011 dtlz2a 10.41 3.6 769.54 201.72 3.02E-011 dtlz4a 285.93 213.42 478.05 197.55 0 dtlz7a 3932.94 2409.43 3118.31 2098.3 0.13 sd p 135300000 4549725.27 6.81E+008 70958462.84 2.94E-011 Table 5.5: Hypervolume Indicators for NSGA-II and NEMONSGA?II 111 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a nsga2NEMO?nsga2 Problem Spacing Indicator 0.0 0.1 0.2 0.3 0.4 0.5 Figure 5.2: Spacing Indicators for NSGA-II and NEMONSGA?II with Standard Error Bars. 112 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a nsga2NEMO?nsga2 Problem Hypervolume Indicator 0 50000 100000 150000 200000 250000 300000 350000 Figure 5.3: Hypervolume Indicators for NSGA-II and NEMONSGA?II with Standard Error Bars. 113 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a nsga2NEMO?nsga2 Problem Epsilon Indicator ?1.000000e+00 ?6.000000e?01 ?2.000000e?01 2.000000e?01 Figure 5.4: ?+ Indicators for NSGA-II and NEMONSGA?II with Standard Error Bars. 114 Problem NSGA-II NEMONSGA?II p-value Median IQR Median IQR kno1 0.08 0.02 0.06 0.04 0.01 oka1 0.03 0.06 0.19 1.64 4.14E-010 oka2 -1 1.04 0 3.14 6.63E-005 vlmop2 0 0 0 0 NaN vlmop3 0.01 0 0 0.01 0 dtlz1a 0 0 0.04 0.02 1.09E-012 dtlz2a 0.09 0.03 0.27 0.25 1.73E-009 dtlz4a 0.02 0.01 0.03 0.03 0.01 dtlz7a 0.01 0.01 0.03 0.01 3.24E-012 sd p 170 168.14 0 0 1.20E-012 Table 5.6: ?+ Indicators for NSGA-II and NEMONSGA?II 115 5.4.2 NSGA-II versus NEMOMOPSO In this comparison, MOPSO was used as the underlying EMO algorithm to train NEMO. Those results were compared against NSGA-II. Tables 5.7-5.9 show the average Pareto optimal front size, spacing indicator, and hypervolume indicator, and Table 5.10 shows the median and interquartile ranges for the binary ?+ indicator. Figures 5.5-5.8 dis- play the average and standard errors of the optimal front sizes, spacing, and hypervolume indicators, and the median and standard errors of the binary ?+ indicators. In terms of the size of the Pareto optimal set, NEMOMOPSO outperforms NSGA-II in 60% of the problems in the test suite. It outperforms NSGA-II in terms of the spacing indicator on 50% of the problems, while being outperformed on only 30% of the problems. For the hypervolume indicator, NEMOMOPSO outperforms NSGA-II in 60% of the prob- lems in the test suite. However as before, on the binary ?+ indicator, NSGA-II outperforms NEMOMOPSO on 60% of the problems. While it never dominates NEMOMOPSO conclu- sively (i.e., attaining a negative value for ?+), NSGA-II does very well compared to the NEMO approach in terms of this indicator. 5.4.3 NSGA-II versus NEMOPAES In this comparison, PAES was used as the underlying EMO algorithm to train NEMO and compared against NSGA-II. This comparison is interesting in that NSGA-II has been shown to consistently outperform PAES in a head-to-head comparison [6]. If NEMOPAES is capable of even competing with NSGA-II, then the power of the NEMO approach will be realized. 116 Problem NSGA-II NEMOMOPSO p-value Mean St. Dev. Mean St. Dev. kno1 206.53 317.78 9916.73 3011.68 9.72E-010 oka1 793.83 160.03 14.4 8.05 2.94E-011 oka2 28.47 7.55 3.3 4.09 1.94E-011 vlmop2 101.43 29.16 10898.03 29.14 3.01E-011 vlmop3 881.03 823.62 10118.83 823.56 3.02E-011 dtlz1a 1726.8 933.24 7252.73 3817.22 8.88E-006 dtlz2a 3891.13 2143.89 7105 2143.56 3.32E-006 dtlz4a 3605.77 3694.08 920.63 1395.93 0.01 dtlz7a 5333.2 1423.7 1261.3 2849.29 1.73E-007 sd p 1196.33 33.13 2399.1 382.7 3.01E-011 Table 5.7: Pareto Size Indicators for NSGA-II and NEMOMOPSO Problem NSGA-II NEMOMOPSO p-value Mean St. Dev. Mean St. Dev. kno1 0.21 0.16 0.01 0.01 5.58E-010 oka1 0.06 0.04 0.3 0.36 0.02 oka2 0.2 0.26 0.65 1.07 0.01 vlmop2 0.01 0 0 0 6.11E-014 vlmop3 0.02 0.01 0 0 6.27E-010 dtlz1a 0 0 9.20E-008 5.04E-007 1 dtlz2a 0.01 0 0 0.01 1.29E-007 dtlz4a 0.02 0.03 0.01 0.01 0.18 dtlz7a 0.02 0.02 0.16 0.18 1.16E-005 sd p 4.35 1.55 0.82 0.91 3.86E-007 Table 5.8: Spacing Indicators for NSGA-II and NEMOMOPSO 117 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a nsga2NEMO?mopso Problem Pareto Front Size 0 2000 4000 6000 8000 10000 Figure 5.5: Pareto Size Indicators for NSGA-II and NEMONSGA?II with Standard Error Bars. 118 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a nsga2NEMO?mopso Problem Spacing Indicator 0.0 0.2 0.4 0.6 0.8 Figure 5.6: Spacing Indicators for NSGA-II and NEMONSGA?II with Standard Error Bars. 119 Problem NSGA-II NEMOMOPSO p-value Mean St. Dev. Mean St. Dev. kno1 6169.27 10544.5 347121.6 104564.27 1.41E-009 oka1 947.34 317.35 15.75 9.68 8.15E-011 oka2 13.37 5.56 0.82 1.52 4.13E-011 vlmop2 8.46 2.06 872.23 10.12 3.01E-011 vlmop3 1060.54 830.44 5508.68 1283.84 3.02E-011 dtlz1a 74.38 42.44 310.36 157.96 6.28E-006 dtlz2a 303.52 199.25 434.55 126.85 0 dtlz4a 235.6 243.77 52.96 105.58 0 dtlz7a 4308.02 3040.91 236.97 513.52 2.67E-009 sd p 132700000 3524789.06 287100000 54732169.2 2.94E-011 Table 5.9: Hypervolume Indicators for NSGA-II and NEMOMOPSO Problem NSGA-II NEMOMOPSO p-value Median IQR Median IQR kno1 0.51 0.59 0.14 0.33 0.01 oka1 0.05 0.04 2.26 0.04 3.72E-011 oka2 2.84 3.76 0.88 0.35 0.37 vlmop2 0.03 0.01 0 0 7.50E-013 vlmop3 0.02 0.02 0.01 0.02 0.08 dtlz1a 0 0 0.07 0.15 1.31E-010 dtlz2a 0.02 0.02 0.59 0.03 2.03E-011 dtlz4a 0.02 0.05 0.26 0.34 0 dtlz7a 8.03E-006 3.33E-005 2.25 1.6 5.54E-010 sd p 1.9 2.46 170.11 0.19 2.72E-010 Table 5.10: ?+ Indicators for NSGA-II and NEMOMOPSO 120 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a nsga2NEMO?mopso Problem Hypervolume Indicator 0 50000 100000 150000 200000 250000 300000 350000 Figure 5.7: Hypervolume Indicators for NSGA-II and NEMONSGA?II with Standard Error Bars. 121 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a nsga2NEMO?mopso Problem Epsilon Indicator 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Figure 5.8: ?+ Indicators for NSGA-II and NEMONSGA?II with Standard Error Bars. 122 Tables 5.11-5.13 show the average Pareto optimal front size, spacing indicator, and hypervolume indicator for each problem in the test suite, while Table 5.14 shows the median and interquartile ranges for the binary ?+ indicator. Likewise, Figures 5.9-5.12 display the average and standard errors of the optimal front sizes, spacing, and hypervolume indicators, and the median and standard errors of the binary ?+ indicators. In terms of the size of the Pareto optimal set, NEMOPAES outperforms NSGA-II in 50% of the problems in the test suite. It outperforms NSGA-II in terms of the spacing indicator on 40% of the problems, while being outperformed on only 30% of the problems. For the hypervolume indicator, NEMOPAES outperforms NSGA-II in 50% of the problems in the test suite. However, on the binary ?+ indicator, NSGA-II outperforms NEMOPAES on 80% of the problems. While it never dominates NEMOPAES conclusively (i.e., attaining a negative value for ?+), NSGA-II does very well compared to the NEMO approach in terms of this indicator. Problem NSGA-II NEMOPAES p-value Mean St. Dev. Mean St. Dev. kno1 863.67 126.84 5788.2 2295.89 1.07E-007 oka1 721.83 105.15 11.1 10.98 2.98E-011 oka2 29.03 7.82 1.13 0.35 4.03E-012 vlmop2 2412.37 42.36 8097.97 121.75 3.01E-011 vlmop3 465.83 172.62 10533.8 172.51 3.01E-011 dtlz1a 2756.23 244.18 806.47 288.01 3.02E-011 dtlz2a 2795.47 357.41 8203.43 357.41 3.02E-011 dtlz4a 7863.73 4430.74 4.3 4.92 0 dtlz7a 5609.97 387.23 2764.8 1453.37 1.20E-008 sd p 1027.97 30.6 3592.17 326.18 3.02E-011 Table 5.11: Pareto Size Indicators for NSGA-II and NEMOPAES 123 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a nsga2NEMO?paes Problem Pareto Front Size 0 2000 4000 6000 8000 10000 Figure 5.9: Pareto Size Indicators for NSGA-II and NEMOPAES with Standard Error Bars. 124 Problem NSGA-II NEMOPAES p-value Mean St. Dev. Mean St. Dev. kno1 0.01 0.01 0.02 0.04 0 oka1 0.05 0.04 0.5 0.58 0.02 oka2 0.37 0.33 0 0 1.19E-012 vlmop2 0 0 0 0 NaN vlmop3 0.02 0.01 0 0 5.09E-013 dtlz1a 0 0 13.64 11.3 2.07E-011 dtlz2a 0.01 0 0 0 3.78E-010 dtlz4a 0.01 0 0.28 0.45 0.34 dtlz7a 0.01 0.01 0.01 0 0.01 sd p 3.59 2.37 1.17 1.24 5.25E-007 Table 5.12: Spacing Indicators for NSGA-II and NEMOPAES Problem NSGA-II NEMOPAES p-value Mean St. Dev. Mean St. Dev. kno1 27593.36 4615.17 210057.82 82407.67 1.07E-007 oka1 857.13 231.13 11.31 12.66 3.02E-011 oka2 46.9 45.1 0.26 0.81 3.17E-011 vlmop2 228.48 3.74 613.15 9.89 3.02E-011 vlmop3 678.68 204.08 5379.99 313.68 3.02E-011 dtlz1a 253036.69 197674.82 4718.41 5041.25 0 dtlz2a 185.34 35.88 372.58 27.55 3.02E-011 dtlz4a 709.54 510.14 0.04 0.07 8.30E-005 dtlz7a 1194.34 1587.76 1044.37 1483.07 0 sd p 115466666.67 4023494.22 404233333.33 42154627.78 2.95E-011 Table 5.13: Hypervolume Indicators for NSGA-II and NEMOPAES 125 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a nsga2NEMO?paes Problem Spacing Indicator 0 5 10 15 Figure 5.10: Spacing Indicators for NSGA-II and NEMOPAES with Standard Error Bars. 126 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a nsga2NEMO?paes Problem Hypervolume Indicator 0 50000 100000 150000 200000 250000 Figure 5.11: Hypervolume Indicators for NSGA-II and NEMOPAES with Standard Error Bars. 127 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a nsga2NEMO?paes Problem Epsilon Indicator 0.0 0.5 1.0 1.5 2.0 2.5 Figure 5.12: ?+ Indicators for NSGA-II and NEMOPAES with Standard Error Bars. 128 Problem NSGA-II NEMOPAES p-value Median IQR Median IQR kno1 0.07 0.02 0.16 0.07 1.77E-009 oka1 0.11 2.18 0.69 1.79 0 oka2 2.65E-006 3.03 1.31 1.55 0.05 vlmop2 0 0 0 0 NaN vlmop3 0.02 0.01 0.01 0 8.43E-007 dtlz1a 0 0 0.48 0.01 8.73E-013 dtlz2a 0.02 0 0.57 0.06 8.99E-012 dtlz4a 0.03 0.02 1 0.02 0 dtlz7a 0.01 0.01 2.61 0.08 1.22E-011 sd p 4.07 167.66 170.04 165.96 0.02 Table 5.14: ?+ Indicators for NSGA-II and NEMOPAES 5.4.4 MOPSO versus NEMONSGA?II In this comparison, NSGA-II was used as the underlying EMO algorithm to train NEMO and compared against MOPSO. Tables 5.15-5.17 show the average Pareto optimal front size, spacing indicator, and hypervolume indicator, and Table 5.18 shows the median and interquartile ranges for the binary ?+ indicator. Figures 5.13-5.16 display the average and standard errors of the optimal front sizes, spacing, and hypervolume indicators, and the median and standard errors of the binary ?+ indicators. Not surprisingly, NEMONSGA?II outperforms MOPSO in 80% of the problems in terms of the size of the Pareto optimal set. It outperforms MOPSO in terms of the spacing indi- cator on 40% of the problems, while being outperformed on only 20% of the problems. For the hypervolume indicator, NEMONSGA?II outperforms MOPSO in 80% of the problems 129 in the test suite. Finally, on the binary ?+ indicator, NEMONSGA=II outperforms MOPSO on 60% of the problems (though without any conclusive results). Problem MOPSO NEMONSGA?II p-value Mean St. Dev. Mean St. Dev. kno1 5693.4 1244.95 5252.57 1162.38 0 oka1 31.07 14.89 484.17 122.62 3.45E-010 oka2 5.5 6.78 35.07 28.16 1.80E-009 vlmop2 4047.77 186.34 6952 186.3 3.02E-011 vlmop3 1491.77 378.88 9507.6 378.6 3.02E-011 dtlz1a 5152.43 3118.65 5801.17 3139.25 0.67 dtlz2a 48.1 50.4 10950.9 50.4 2.93E-011 dtlz4a 357.2 361.53 7817.9 1521.3 3.02E-011 dtlz7a 488.53 1323.81 5565.4 712.02 5.76E-011 sd p 954.3 49.87 4446.3 428.86 3.01E-011 Table 5.15: Pareto Size Indicators for MOPSO and NEMONSGA?II 5.4.5 MOPSO versus NEMOMOPSO Inthiscomparison, MOPSOwasusedastheunderlyingEMOalgorithmtotrainNEMO and compared against standard MOPSO. This is similar to the experiment carried out in the previous chapter in which NSGA-II was used as the underlying EMO. Like that experiment, this one is a \pure" evaluation of the NEMO approach since it uses the same EMO for the training set and for the comparative set. Tables 5.19-5.21 show the average Pareto optimal front size, spacing indicator, and hypervolume indicator, and Table 5.22 shows the median and interquartile ranges for the binary ?+ indicator. Additionally, Figures 5.17-5.20 display the average and standard errors of the optimal front sizes, spacing, and hypervolume indicators, and the median and standard errors of the binary ?+ indicators. 130 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a mopsoNEMO?nsga2 Problem Pareto Front Size 0 2000 4000 6000 8000 10000 Figure 5.13: Pareto Size Indicators for MOPSO and NEMONSGA?II with Standard Error Bars. 131 Problem MOPSO NEMONSGA?II p-value Mean St. Dev. Mean St. Dev. kno1 0 0 0.01 0.01 3.34E-011 oka1 0.15 0.19 0.09 0.1 0.74 oka2 0.85 0.97 0.35 0.51 0.5 vlmop2 0 0 0 0 NaN vlmop3 0.01 0 0 0 1.55E-013 dtlz1a 0 0.01 0 0 0.63 dtlz2a 0.04 0.04 0.01 0 1.10E-005 dtlz4a 0.04 0.04 0.01 0 0 dtlz7a 0.15 0.18 0.02 0.02 0 sd p 1.41 1.43 2.26 0.82 0 Table 5.16: Spacing Indicators for MOPSO and NEMONSGA?II Problem MOPSO NEMONSGA?II p-value Mean St. Dev. Mean St. Dev. kno1 212313.37 46357.67 179715.17 39159.2 2.88E-006 oka1 34.49 19.32 594.16 231.79 2.39E-008 oka2 1.78 2.72 12.52 9.71 1.23E-009 vlmop2 390.29 18.48 526.68 12.57 3.02E-011 vlmop3 837.76 264.35 6549.11 597.26 3.02E-011 dtlz1a 274.52 165.17 205.67 137.71 0.09 dtlz2a 1.42 1.69 619.68 165.65 2.95E-011 dtlz4a 19.27 22.32 452.21 126.93 3.02E-011 dtlz7a 119.61 353.65 4547.41 3695.88 1.46E-010 sd p 96530000 3838565.64 560366666.67 66567302.39 3.01E-011 Table 5.17: Hypervolume Indicators for MOPSO and NEMONSGA?II 132 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a mopsoNEMO?nsga2 Problem Spacing Indicator 0.0 0.2 0.4 0.6 0.8 1.0 Figure 5.14: Spacing Indicators for MOPSO and NEMONSGA?II with Standard Error Bars. 133 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a mopsoNEMO?nsga2 Problem Hypervolume Indicator 0 50000 100000 150000 200000 Figure 5.15: Hypervolume Indicators for MOPSO and NEMONSGA?II with Standard Error Bars. 134 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a mopsoNEMO?nsga2 Problem Epsilon Indicator 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Figure 5.16: ?+ Indicators for MOPSO and NEMONSGA?II with Standard Error Bars. 135 Problem MOPSO NEMONSGA?II p-value Median IQR Median IQR kno1 0.01 0.01 0.1 0.08 3.39E-010 oka1 2.25 0.06 0.04 0.05 2.75E-011 oka2 0.75 0.39 3.01 3.65 0.32 vlmop2 0 0 0 0 0.33 vlmop3 0.05 0.04 0.01 0 1.24E-010 dtlz1a 0 0 0.05 0.05 4.56E-007 dtlz2a 0.77 0.19 0.06 0.1 1.30E-006 dtlz4a 0.26 0.38 0.05 0.06 1.30E-010 dtlz7a 2.91 1.5 2.17E-005 0.01 3.67E-009 sd p 170.11 0.2 1.58 0.9 2.03E-010 Table 5.18: ?+ Indicators for MOPSO and NEMONSGA?II Not surprisingly, NEMOMOPSO outperforms MOPSO in 70% of the problems in terms of the size of the Pareto optimal set. It outperforms MOPSO in terms of the spacing indicator on 60% of the problems, while being outperformed on none of the problems. For the hypervolume indicator, NEMOMOPSO outperforms MOPSO in 60% of the problems in the test suite. Finally, on the binary ?+ indicator, MOPSO outperforms NEMOMOPSO on 60% of the problems (though without any conclusive results). 5.4.6 MOPSO versus NEMOPAES In this comparison, PAES was used as the underlying EMO algorithm to train NEMO and compared against MOPSO. Tables 5.23-5.25 show the average Pareto optimal front size, spacing indicator, and hypervolume indicator, while Table 5.26 shows the median and interquartile ranges for the binary ?+ indicator. Figures 5.21-5.24 display the average and 136 Problem MOPSO NEMOMOPSO p-value Mean St. Dev. Mean St. Dev. kno1 1476.47 1037.93 9433.67 2353.51 9.75E-010 oka1 63.5 28.31 38.2 21.7 0 oka2 14.23 19.13 13.93 32.04 0.44 vlmop2 2597.33 172.99 10539.07 74.36 3.02E-011 vlmop3 1339.73 1547.22 10508.97 778.98 3.02E-011 dtlz1a 3799.57 2007.53 6382.9 3376.67 0 dtlz2a 2114.17 1047.76 9097.13 1031.99 3.02E-011 dtlz4a 480.83 797.17 1131.8 1665.3 0.16 dtlz7a 3374.77 2372.77 9332.1 1951.78 9.52E-010 sd p 1288.53 52.12 3567.2 430.29 3.01E-011 Table 5.19: Pareto Size Indicators for MOPSO and NEMOMOPSO Problem MOPSO NEMOMOPSO p-value Mean St. Dev. Mean St. Dev. kno1 0.02 0.01 0.01 0.01 2.41E-005 oka1 0.23 0.14 0.19 0.16 0.3 oka2 0.46 0.55 0.23 0.6 0.23 vlmop2 0 0 0 0 NaN vlmop3 0.01 0.01 0 0 3.55E-011 dtlz1a 0 0 0 0.01 0.08 dtlz2a 0.02 0.01 0.01 0.01 1.15E-007 dtlz4a 0.02 0.02 0.01 0.01 0.05 dtlz7a 0.02 0.01 0.01 0.01 0 sd p 4.8 0.1 1.94 1.29 2.98E-011 Table 5.20: Spacing Indicators for MOPSO and NEMOMOPSO 137 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a mopsoNEMO?mopso Problem Pareto Front Size 0 2000 4000 6000 8000 10000 Figure 5.17: Pareto Size Indicators for MOPSO and NEMOMOPSO with Standard Error Bars. 138 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a mopsoNEMO?mopso Problem Spacing Indicator 0.0 0.1 0.2 0.3 0.4 0.5 Figure 5.18: Spacing Indicators for MOPSO and NEMOMOPSO with Standard Error Bars. 139 Problem MOPSO NEMOMOPSO p-value Mean St. Dev. Mean St. Dev. kno1 51648.34 39912.43 320608.43 82603.25 8.10E-010 oka1 24.07 18.05 14.37 9.86 0.03 oka2 2.67 4.24 2.16 7.29 0.53 vlmop2 252.39 17.14 836.49 7.78 3.02E-011 vlmop3 898.78 962.6 5989.47 1244.66 3.02E-011 dtlz1a 191.58 111.35 244.04 134.87 0.08 dtlz2a 36.7 25.84 283.96 253.61 6.70E-011 dtlz4a 21.38 36.56 74.24 127.22 0.22 dtlz7a 91.89 139.85 958.32 1449.83 2.38E-007 sd p 137833333.33 4202489.68 423766666.67 60634395.23 2.96E-011 Table 5.21: Hypervolume Indicators for MOPSO and NEMOMOPSO Problem MOPSO NEMOMOPSO p-value Mean St. Dev. Mean St. Dev. kno1 0.05 0.02 0.3 0.33 1.21E-008 oka1 0.04 0.1 0.13 0.16 0 oka2 0 0.05 0.05 0.6 0.03 vlmop2 0 0 0 0 NaN vlmop3 0.03 0.02 0 0.01 5.39E-011 dtlz1a 0 0 0.05 0.08 1.15E-012 dtlz2a 0.17 0.12 0.09 0.05 0 dtlz4a 0.01 0.02 0.24 0.49 0 dtlz7a 0.1 0.2 0 0.01 1.53E-008 sd p 2.82 3.36 5.11 166.89 0 Table 5.22: ?+ Indicators for MOPSO and NEMOMOPSO 140 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a mopsoNEMO?mopso Problem Hypervolume Indicator 0 50000 100000 150000 200000 250000 300000 Figure 5.19: Hypervolume Indicators for MOPSO and NEMOMOPSO with Standard Error Bars. 141 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a mopsoNEMO?mopso Problem Epsilon Indicator ?0.2 0.0 0.2 0.4 Figure 5.20: ?+ Indicators for MOPSO and NEMOMOPSO with Standard Error Bars. 142 standard errors of the optimal front sizes, spacing, and hypervolume indicators, and the median and standard errors of the binary ?+ indicators. NEMOPAES outperforms MOPSO in 50% of the problems in terms of the size of the Pareto optimal set. It outperforms MOPSO in terms of the spacing indicator on 40% of the problems, while being outperformed on only 20% of the problems. For the hyper- volume indicator, NEMOPAES outperforms MOPSO in 50% of the problems in the test suite. Finally, on the binary ?+ indicator, MOPSO outperforms NEMOPAES on 40% of the problems. On OKA2, the performance was conclusive, yielding a negative median value. However, NEMOPAES was able to outperform MOPSO in 30% of the problems, though none were conclusive. Problem MOPSO NEMOPAES p-value Mean St. Dev. Mean St. Dev. kno1 8009.97 2006.63 2411.07 1717.02 6.23E-009 oka1 41.87 25.26 20.1 19.47 6.01E-006 oka2 13.53 18.6 0.33 0.66 4.62E-009 vlmop2 4114.97 157.18 6884.6 157.21 3.02E-011 vlmop3 302.5 148.67 10697.1 148.54 3.01E-011 dtlz1a 8982.03 2029.81 1049.67 424.34 3.02E-011 dtlz2a 10.87 20.37 10988.23 20.43 2.30E-011 dtlz4a 1008.53 1410.45 16.93 24.28 0 dtlz7a 3561.6 2668.1 5525.1 1536.58 0.05 sd p 951.23 43.88 4109.73 389.29 3.01E-011 Table 5.23: Pareto Size Indicators for MOPSO and NEMOPAES 143 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a mopsoNEMO?paes Problem Pareto Front Size 0 2000 4000 6000 8000 10000 Figure 5.21: Pareto Size Indicators for MOPSO and NEMOPAES with Standard Error Bars. 144 Problem MOPSO NEMOPAES p-value Mean St. Dev. Mean St. Dev. kno1 0 0 0.04 0.07 9.44E-012 oka1 0.15 0.16 0.52 0.43 0 oka2 0.35 0.52 0 0 5.36E-006 vlmop2 0 0 0 0 NaN vlmop3 0.04 0.01 0 0 8.53E-013 dtlz1a 3.86E-005 4.51E-005 6.88E-006 7.70E-006 0.37 dtlz2a 0.07 0.11 0 0 0 dtlz4a 0.01 0.01 0.17 0.26 0.32 dtlz7a 0.02 0.05 0 0 0.03 sd p 2.79 2.35 1.72 1.21 0.06 Table 5.24: Spacing Indicators for MOPSO and NEMOPAES Problem MOPSO NEMOPAES p-value Mean St. Dev. Mean St. Dev. kno1 298118.9 74605.38 83326.47 61096.21 5.71E-009 oka1 22.79 10.72 13.76 13.02 0 oka2 1.27 2.01 0.03 0.14 2.40E-008 vlmop2 398.99 15.27 500.73 12.42 3.02E-011 vlmop3 291.23 152.77 5725.36 308.86 3.02E-011 dtlz1a 481.19 122.39 9.32 5.93 3.02E-011 dtlz2a 0.4 0.93 287.66 102.47 2.40E-011 dtlz4a 56.86 93.03 0.24 0.49 0 dtlz7a 48.84 94.38 635.55 633.14 1.00E-008 sd p 101010000 3865796.08 4.68E+008 54288501.35 2.93E-011 Table 5.25: Hypervolume Indicators for MOPSO and NEMOPAES 145 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a mopsoNEMO?paes Problem Spacing Indicator 0.0 0.1 0.2 0.3 0.4 0.5 Figure 5.22: Spacing Indicators for MOPSO and NEMOPAES with Standard Error Bars. 146 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a mopsoNEMO?paes Problem Hypervolume Indicator 0 50000 100000 150000 200000 250000 300000 Figure 5.23: Hypervolume Indicators for MOPSO and NEMOPAES with Standard Error Bars. 147 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a mopsoNEMO?paes Problem Epsilon Indicator ?1.0 ?0.5 0.0 0.5 1.0 1.5 2.0 Figure 5.24: ?+ Indicators for MOPSO and NEMOPAES with Standard Error Bars. 148 Problem MOPSO NEMOPAES p-value Median IQR Median IQR kno1 0.01 0 0.3 0.24 2.19E-009 oka1 2.25 1.95 0.45 2.12 0.41 oka2 -1 0 0 0 6.08E-005 vlmop2 0 0 0 0 NaN vlmop3 0.06 0.03 0.01 0.01 1.61E-011 dtlz1a 0 0.01 0.48 0.01 7.38E-012 dtlz2a 0.55 0.7 0.06 1.28 0 dtlz4a 0.07 0.07 0.99 0.07 0 dtlz7a 0.72 0.34 0.53 1.14 0.62 sd p 170.02 148.99 1.91 168.64 0 Table 5.26: ?+ Indicators for MOPSO and NEMOPAES 5.4.7 PAES versus NEMONSGA?II In this comparison, NSGA-II was used as the underlying EMO algorithm to train NEMO and compared against PAES. Tables 5.27-5.29 show the average Pareto optimal front size, spacing indicator, and hypervolume indicator, and Table 5.30 shows the median and interquartile ranges for the binary ?+ indicator. Likewise, Figures 5.25-5.28 display the average and standard errors of the optimal front sizes, spacing, and hypervolume indicators, and the median and standard errors of the binary ?+ indicators. NEMONSGA?II outperforms PAES in 100% of the problems in terms of the size of the Pareto optimal set. It outperforms PAES in terms of the spacing indicator on 40% of the problems, while being outperformed on only 10% of the problems. For the hypervolume indicator, NEMONSGA?II outperforms PAES in 100% of the problems in the test suite. 149 Finally, on the binary ?+ indicator, NEMONSGA?II outperforms PAES on 90% of the problems, and, on DTLZ2a, its performance is conclusive. Problem PAES NEMONSGA?II p-value Mean St. Dev. Mean St. Dev. kno1 90.8 46.53 10397.27 153.92 3.01E-011 oka1 10.03 11.47 452.27 108.65 4.91E-011 oka2 1.17 0.46 46.13 47.29 4.07E-012 vlmop2 1050.27 33.96 8425.47 147.38 3.01E-011 vlmop3 1532.2 316.78 9465.67 315.62 3.02E-011 dtlz1a 1.7 2.35 10997.23 2.65 1.85E-011 dtlz2a 1.33 2.34 10997.67 2.34 9.85E-012 dtlz4a 4.63 4.44 8058 1439.77 2.74E-011 dtlz7a 471.77 117.47 4948.57 552.63 3.02E-011 sd p 1053.37 50.2 4217.73 397.14 3.01E-011 Table 5.27: Pareto Size Indicators for PAES and NEMONSGA?II 5.4.8 PAES versus NEMOMOPSO Inthiscomparison, MOPSOwasusedastheunderlyingEMOalgorithmtotrainNEMO and compared against PAES. Tables 5.31-5.33 show the average Pareto optimal front size, spacing indicator, and hypervolume indicator, and Table 5.34 shows the median and in- terquartile ranges for the binary ?+ indicator. Figures 5.29-5.32 display the average and standard errors of the optimal front sizes, spacing, and hypervolume indicators, and the median and standard errors of the binary ?+ indicators. NEMOMOPSO outperforms PAES in 100% of the problems in terms of the size of the Pareto optimal set. It outperforms PAES in terms of the spacing indicator on 50% of the problems, while being outperformed on only 10% of the problems. For the hypervolume 150 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a paesNEMO?nsga2 Problem Pareto Front Size 0 2000 4000 6000 8000 10000 Figure 5.25: Pareto Size Indicators for PAES and NEMONSGA?II with Standard Error Bars. 151 Problem PAES NEMONSGA?II p-value Mean St. Dev. Mean St. Dev. kno1 0.63 2.34 0 0 4.09E-011 oka1 0.43 0.68 0.07 0.1 0.2 oka2 0.13 0.68 0.5 0.51 4.52E-011 vlmop2 0 0 0 0 NaN vlmop3 0.01 0 0 0 8.70E-014 dtlz1a 0 0 0 0 0.16 dtlz2a 0.05 0.14 0.01 0 0 dtlz4a 0.03 0.16 0.01 0 8.19E-010 dtlz7a 0.01 0 0.01 0 NaN sd p 1.82 2 2 1.09 0.41 Table 5.28: Spacing Indicators for PAES and NEMONSGA?II Problem PAES NEMONSGA?II p-value Mean St. Dev. Mean St. Dev. kno1 1649.12 793.33 369790.1 5840.04 3.02E-011 oka1 9.94 12.95 533.73 204.29 8.13E-011 oka2 0.84 1.99 74.6 79.76 3.57E-011 vlmop2 90.68 3.8 659.74 8.7 3.02E-011 vlmop3 644.47 192.35 6640.39 579.65 3.02E-011 dtlz1a 0.01 0.01 457.84 63.9 1.42E-011 dtlz2a 0.04 0.07 629.47 175.4 1.78E-011 dtlz4a 0.08 0.18 535.18 225.24 2.76E-011 dtlz7a 114.52 33.2 783.95 130.35 3.02E-011 sd p 99226666.67 5360193.85 534733333.33 64608600.54 3.00E-011 Table 5.29: Hypervolume Indicators for PAES and NEMONSGA?II 152 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a paesNEMO?nsga2 Problem Spacing Indicator 0.0 0.2 0.4 0.6 0.8 1.0 Figure 5.26: Spacing Indicators for PAES and NEMONSGA?II with Standard Error Bars. 153 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a paesNEMO?nsga2 Problem Hypervolume Indicator 0 50000 100000 150000 200000 250000 300000 350000 Figure 5.27: Hypervolume Indicators for PAES and NEMONSGA?II with Standard Error Bars. 154 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a paesNEMO?nsga2 Problem Epsilon Indicator ?1.0 ?0.5 0.0 0.5 1.0 1.5 2.0 2.5 Figure 5.28: ?+ Indicators for PAES and NEMONSGA?II with Standard Error Bars. 155 Problem PAES NEMONSGA?II p-value Median IQR Median IQR kno1 0.53 0.21 0.04 0.04 9.24E-009 oka1 2.27 1.95 0.09 2.19 0 oka2 2.05 1.46 2.65E-006 0.11 0 vlmop2 0 0 0 0 NaN vlmop3 0.04 0.03 0.01 0 4.95E-010 dtlz1a 0.48 0.49 0 1.01 1.95E-005 dtlz2a 0 0.77 -1 1.01 0 dtlz4a 1 0 0.05 0.02 1.52E-012 dtlz7a 2.6 0.05 0.02 0.01 1.28E-011 sd p 170.06 0.1 2.42 8.46 1.15E-007 Table 5.30: ?+ Indicators for PAES and NEMONSGA?II indicator, NEMONSGA?II outperforms PAES in 90% of the problems in the test suite. Finally, on the binary ?+ indicator, NEMONSGA?II outperforms PAES on 90% of the problems, and, on OKA2 and DTLZ1a, its performance is conclusive. 5.4.9 PAES versus NEMOPAES In this comparison, PAES was used as the underlying EMO algorithm to train NEMO and compared against standard PAES. Tables 5.35-5.37 show the average Pareto optimal front size, spacing indicator, and hypervolume indicator, whereas Table 5.38 shows the median and interquartile ranges for the binary ?+ indicator. Figures 5.33-5.36 display the average and standard errors of the optimal front sizes, spacing, and hypervolume indicators, and the median and standard errors of the binary ?+ indicators. NEMOPAES outperforms PAES in 90% of the problems in terms of the size of the Pareto optimal set. It outperforms PAES in terms of the spacing indicator on 50% of the 156 Problem PAES NEMOMOPSO p-value Mean St. Dev. Mean St. Dev. kno1 31.8 71.6 10060.5 2916.42 4.75E-010 oka1 15.97 15.06 29.3 21.37 0 oka2 0.17 0.46 28.23 57.68 9.55E-011 vlmop2 59.67 29.18 10939.6 29.17 2.96E-011 vlmop3 647.63 790.02 10352.07 789.89 3.02E-011 dtlz1a 19.5 45.23 10939.93 187.54 8.25E-012 dtlz2a 8.77 11.63 10990.23 11.63 2.78E-011 dtlz4a 4.37 3.12 1317.3 2005.47 0 dtlz7a 595.9 167.76 3916.77 3093.48 9.48E-006 sd p 1233.97 46.2 2597.03 418.57 3.01E-011 Table 5.31: Pareto Size Indicators for PAES and NEMOMOPSO Problem PAES NEMOMOPSO p-value Mean St. Dev. Mean St. Dev. kno1 1.1 1.07 0.01 0.03 6.94E-007 oka1 0.31 0.38 0.22 0.25 0.81 oka2 0 0 0.3 0.5 5.35E-006 vlmop2 0.02 0.01 0 0 9.20E-013 vlmop3 0.04 0.03 0 0 1.10E-012 dtlz1a 0 0 6.20E-008 3.40E-007 0.54 dtlz2a 0.09 0.1 0 0 0 dtlz4a 0.02 0.05 0.01 0.01 0.78 dtlz7a 0.01 0 0.01 0.01 5.77E-005 sd p 3.29 2.16 1.62 1.42 0.01 Table 5.32: Spacing Indicators for PAES and NEMOMOPSO 157 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a paesNEMO?mopso Problem Pareto Front Size 0 2000 4000 6000 8000 10000 Figure 5.29: Pareto Size Indicators for PAES and NEMOMOPSO with Standard Error Bars. 158 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a paesNEMO?mopso Problem Spacing Indicator 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Figure 5.30: Spacing Indicators for PAES and NEMOMOPSO with Standard Error Bars. 159 Problem PAES NEMOMOPSO p-value Mean St. Dev. Mean St. Dev. kno1 728.93 1951.34 339228.54 103995.7 5.81E-010 oka1 10.17 9.86 15.44 10.04 0.01 oka2 0.03 0.14 4.81 13.01 0 vlmop2 4.3 2.01 876.69 10.17 3.02E-011 vlmop3 402.55 343.97 5944.66 1253.2 3.02E-011 dtlz1a 0.15 0.35 384.34 93.13 1.10E-011 dtlz2a 0.37 0.58 332.32 217.45 2.90E-011 dtlz4a 0.04 0.05 84.31 152.2 0.01 dtlz7a 59.91 60.68 53.13 82.93 0.2 sd p 122066666.67 5570447.47 314700000 56847739.77 2.97E-011 Table 5.33: Hypervolume Indicators for PAES and NEMOMOPSO Problem PAES NEMOMOPSO p-value Median IQR Median IQR kno1 2.14 1.85 0.04 0.17 2.52E-006 oka1 2.29 2.02 0.35 2.08 0.01 oka2 0 0 -1 0 4.53E-008 vlmop2 0.05 0.02 0 0 9.62E-013 vlmop3 0.03 0.03 0.01 0.01 1.59E-009 dtlz1a 0 0.48 -1 1 1.69E-005 dtlz2a 0.58 0.34 0.29 0.37 0 dtlz4a 0.83 1.73 0.09 0.15 0.01 dtlz7a 0.87 0.94 0.69 0.59 0.05 sd p 3.33 168.16 170.02 160.55 0 Table 5.34: ?+ Indicators for PAES and NEMOMOPSO 160 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a paesNEMO?mopso Problem Hypervolume Indicator 0 50000 100000 150000 200000 250000 300000 350000 Figure 5.31: Hypervolume Indicators for PAES and NEMOMOPSO with Standard Error Bars. 161 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a paesNEMO?nsga2 Problem Epsilon Indicator ?1.0 ?0.5 0.0 0.5 1.0 1.5 2.0 2.5 Figure 5.32: ?+ Indicators for PAES and NEMOMOPSO with Standard Error Bars. 162 problems, while being outperformed on only 20% of the problems. For the hypervolume indicator, NEMOPAES outperforms PAES in 70% of the problems in the test suite. Finally, on the binary ?+ indicator, NEMOPAES outperforms PAES on 80% of the problems, and, on DTLZ4a, its performance is conclusive. Problem PAES NEMOPAES p-value Mean St. Dev. Mean St. Dev. kno1 173.43 109.92 9488.5 2208.73 3.01E-011 oka1 748.6 1239.38 2255.7 3977.04 0.12 oka2 13.2 18.65 375.77 1802.26 0.05 vlmop2 1121.9 39.89 8487.3 104.74 3.00E-011 vlmop3 883.03 1029.59 10964.3 142.8 2.50E-011 dtlz1a 2.6 1.28 1830.17 1215.96 2.07E-011 dtlz2a 17.4 21.19 10998.6 0.86 2.08E-011 dtlz4a 1.03 2.55 68.77 47.63 4.57E-011 dtlz7a 515.2 85.42 5627.33 1578.23 3.02E-011 sd p 1325.2 49.04 5014.7 391.87 3.01E-011 Table 5.35: Pareto Size Indicators for PAES and NEMOPAES 5.5 Conclusions The results from these experiments reveal several important facts. First, the prelim- inary results from the previous chapters are shown to hold up to statistical scrutiny. The three experiments { NSGA-II versus NEMONSGA?II, MOPSO versus NEMOMOPSO, and PAES versus NEMOPAES { reveal that the NEMO approach is very efiective when used on a given EMO algorithm to increase the size of the Pareto optimal set. The NEMO 163 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a paesNEMO?paes Problem Pareto Front Size 0 2000 4000 6000 8000 10000 Figure 5.33: Pareto Size Indicators for PAES and NEMOPAES with Standard Error Bars. 164 Problem PAES NEMOPAES p-value Mean St. Dev. Mean St. Dev. kno1 0.12 0.06 0.01 0.01 1.35E-011 oka1 0.21 0.31 0.31 0.3 0.27 oka2 0.18 0.56 0.05 0.07 0.62 vlmop2 0 0 0 0 NaN vlmop3 0.02 0.01 0 0 2.05E-012 dtlz1a 0 0.01 0.01 0.01 0.03 dtlz2a 0.08 0.08 0 0 9.24E-007 dtlz4a 0.04 0.17 0.1 0.09 2.35E-005 dtlz7a 0.01 0 0 0 2.51E-011 sd p 3.87 1.7 2.43 0.1 6.72E-005 Table 5.36: Spacing Indicators for PAES and NEMOPAES Problem PAES NEMOPAES p-value Mean St. Dev. Mean St. Dev. kno1 3220.31 1983.64 334196.11 86365.77 3.02E-011 oka1 3384.32 6026.43 9868.94 18793.45 0.3 oka2 6.18 15.38 462.68 2427.21 0.49 vlmop2 98.91 4.09 647.98 8.77 3.02E-011 vlmop3 530.87 471.19 6016.88 283.83 3.02E-011 dtlz1a 17.03 14.25 31409.09 43608.72 3.01E-011 dtlz2a 0.27 0.48 262.03 73.22 2.95E-011 dtlz4a 2.12E-008 1.15E-007 3.30E-007 1.81E-006 0.99 dtlz7a 11.47 7.22 129.02 104.45 3.02E-011 sd p 136300000 5298991.45 594100000 55494237.04 2.96E-011 Table 5.37: Hypervolume Indicators for PAES and NEMOPAES 165 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a paesNEMO?paes Problem Spacing Indicator 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Figure 5.34: Spacing Indicators for PAES and NEMOPAES with Standard Error Bars. 166 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a paesNEMO?paes Problem Hypervolume Indicator 0 50000 100000 150000 200000 250000 300000 Figure 5.35: Hypervolume Indicators for PAES and NEMOPAES with Standard Error Bars. 167 kno1 oka1 oka2 vlmop2vlmop3 dtlz1a dtlz2a dtlz4a dtlz7a paesNEMO?paes Problem Epsilon Indicator ?1.0 ?0.5 0.0 Figure 5.36: ?+ Indicators for PAES and NEMOPAES with Standard Error Bars. 168 Problem PAES NEMOPAES p-value Median IQR Median IQR kno1 0.3 0.12 0 0.01 1.50E-011 oka1 0.16 2.24 0 0 7.16E-010 oka2 0 0.13 0 0.75 0 vlmop2 0 0 0 0 NaN vlmop3 0.03 0.02 0 0 1.46E-012 dtlz1a 0.49 0.02 0 0 8.74E-013 dtlz2a 0.33 0.22 0 0 5.99E-009 dtlz4a 0 0 -1 0 5.76E-010 dtlz7a 0.03 0.11 0.01 0.01 1.06E-009 sd p 3.8 8 0 0 1.21E-012 Table 5.38: ?+ Indicators for PAES and NEMOPAES approach in these instances also appears to do well on all indicators except for the binary ?+ indicator, which was, however, rarely conclusive. Second, the NSGA-II versus NEMOMOPSO and NSGA-II versus NEMOPAES experi- ments reveal that the NEMO approach is capable of taking a less successful EMO algorithm and making it competitive with NSGA-II. This is especially true of PAES, which has been shown to be outperformed by NSGA-II in head-to-head comparisons [6]. In this work, however, NEMOPAES is shown to be very competitive with NSGA-II in the sizes of the Pareto optimal sets, the spacing indicator, and the hypervolume indicator. This in itself is a remarkable statement about the power of the NEMO approach. Finally, these experiments reveal that the NEMO approach has di?culty yielding com- petitive values for the ?+ indicator. In many of the experiments, the NEMO algorithm performed well on all indicators except for the ?+ indicator. Further investigation is war- ranted to determine whether this is a limitation of the NEMO approach or whether the 169 ?+ indicator is simply not salient enough to be used as a basis for comparison without additional indicators. 170 Chapter 6 Conclusions and Future Work The neural enhancement technique described in this work was developed in an efiort to learn the active areas of decision space where Pareto optimal solutions could be found. As was discussed in the introduction and literature review, this technique is applied directly to multiobjective optimization algorithms and relies heavily on neural network concepts. While several researchers have made use of artiflcial intelligence or pattern recognition approaches to assist with multiobjective optimization, the exhaustive literature survey conducted by the author revealed no studies (except for [65], for which this work is an extension) that attempt to solve the problem of learning promising areas of decision space. In light of this absence of research, the flrst efiort put forth in the current work was to justify the use of the neural network technique by proving its efiectiveness. It was shown that the NEMO approach could produce many times more solutions than a very successful existing EMO approach (NSGA-II). While this was a promising result, the experiment left many unanswered questions. First and foremost, would it be possible to train the NEMO in an acceptable period of time so as to make the approach computationally competitive with the existing EMO approach? Second, would the NEMO approach perform well when evaluated based on other indicators that take into account the shape and spread of the Pareto optimal frontier, rather than just its size? A flnal concern with the results of that flrst experiment involved the manner in which the NEMO and EMO approaches were compared. In that experiment, the EMO approach was given 25600 function evaluations, from which a training set was extracted. Then, the 171 NEMO was trained on that set and was given an additional 25600 function evaluations to expand its Pareto optimal frontier. Finally, its results were compared to those produced by the EMO (essentially, the training set). This seemed extremely \unfair" to the EMO since the NEMO was given a \head start." To address all of these concerns, a second experiment was carried out. In this ex- periment, flve difierent training algorithms for NEMO were evaluated and compared using multiple indicators from the multiobjective optimization literature. Additionally, the train- ing time was collected for each training approach. To alleviate the \unfair" comparison mentioned above, a new methodology was enacted. In this experiment, the EMO was given 25600 function evaluations in order to produce a training set, but it was then given an additional 25600 function evaluations in which to optimize that set even further. Likewise, the NEMO was given 25600 function evaluations to improve and expand upon the training set. The results from each after 51200 total function evaluations were then compared. This second experiment generated extremely positive results. First, it led to the cre- ation of heuristic training methods for the NEMO approach that produced training times that were well in the range of the existing EMO approaches. Additionally, these heuristic methods also generated NEMOs that generally outperformed the EMO in terms of relative yield, spacing, and hypervolume. In terms of the binary ?+ indicator, the EMO approach performed better, but often this better performance was not conclusive due to the nature and limits of the ?+ indicator. There were several relevant open issues generated from this experiment. First, while the second experiment used 5 runs in an efiort to generate statistical measures of success, 172 it seemed necessary to repeat the EMO/NEMO comparison many times in order to deter- mine statistical signiflcance. Additionally, it remained to be seen whether the NEMO with heuristic training approach could be used with other EMO algorithms with equal success. Consequently, it was important to determine whether relatively \poor" EMO algorithms could be bolstered by NEMO to make them competitive with \good" EMO algorithms. Therefore, a flnal experiment was carried out that made use of the heuristically- trained NEMO using training sets generated from NSGA-II (as before), MOPSO, and PAES. The nine difierent pairwise comparisons were made (NEMONSGA?II/NSGA-II, NEMONSGA?II/MOPSO, etc.), each using results collected across 30 difierent runs. When analyzing the results in this experiment, several interesting observations were made. First, the results from the previous experiments (essentially the NEMONSGA?II/NSGA-II com- parison) were verifled under statistical scrutiny. Second, the MOPSO and PAES EMO algorithms were both enhanced by the use of the NEMO approach. (This observation applies to the NEMOMOPSO/MOPSO and NEMOPAES/PAES comparisons.) Most importantly, however, this flnal experiment revealed that the NEMO approach could make less powerful EMO approaches competitive with powerful EMO algorithms. In particular, it has been shown in [6] that NSGA-II outperforms PAES. However, we show that NEMOPAES performs very well when compared to NSGA-II, making it certainly competitive if not dominant. This is an exciting result that underscores the power of the NEMO method. It remains to be seen whether a difierent learning system might outperform the gen- eral regression neural network when used as the NEMO learner. It seems unlikely that a backpropagation feed-forward neural network would be successful, given the prohibitive 173 computation time needed for training. However, it?s possible that an optimized radial ba- sis function network might provide a legitimate alternative. In such a case, it would be interesting to discover whether the particular strengths of the RBF network would provide gains on certain types of multiobjective optimization problems. However, there may be additional learning approaches that are equally or more efiective. A thorough investigation of this area is warranted. Another very important direction for future work lies in flnding the ideal point at which toengagethe NEMOapproach. Itshouldbeclear thatifthe EMOishalted prematurely, the NEMO will be trained on a poor representative Pareto optimal frontier. If given additional function evaluations, the EMO will be much more likely to flnd solutions that dominate those found by the NEMO. This is because NEMO operates \horizontally," expanding the Pareto optimal frontier, whereas the EMO operates \vertically," attempting to flnd solutions that are more optimal than the existing set. However, if the EMO is halted near the actual Pareto optimal frontier, the NEMO can be engaged immediately to generate a comprehensive sampling of the frontier. It is important to determine the point at which the switch from EMO to NEMO should occur, including any metrics or indicators that might provide some insight in this area. If such a set of indicators could be found, it may then be possible to incorporate the NEMO approach into an existing EMO algorithm so that it could expand solutions on-line in order to give the EMO a better chance to flnd the Pareto optimal frontier. However, without such indicators, the NEMO approach would almost certainly be a liability, \wasting" function evaluations to flll out a sub-optimal Pareto frontier. It may, instead, be 174 possible to use a NEMO approach periodically to generate promising areas of the search space for the EMO to explore. The results presented in this work, plus these future directions, reveal the wealth of opportunity that the NEMO approach provides for multiobjective optimization. With the multi-tasking, multi-cultural, multi-billion members of this 21st century society, there is no doubt that single objective optimization will be unable to solve the complex problems that globalization has created. Multiobjective optimization arises at the dawn of this era as the answer to these di?cult challenges, but it has challenges of its own to overcome. It is in answer to some of those challenges that NEMO is ofiered, in the hopes of pushing the limits of understanding a little further and making the world a little better, which is a multiobjective problem in itself. 175 Bibliography [1] C. A. C. Coello, \A short tutorial on evolutionary multiobjective optimization," in Proceedings of First International Conference on Evolutionary Multi-Criterion Optimization, E. Zitzler, K. Deb, L. Thiele, C. A. C. Coello, and D. Corne, Eds. Springer-Verlag, 2001, pp. 21{40. [2] E. Zitzler, M. Laumanns, and S. Bleuler, \A tutorial on evolutionary multiobjective op- timization," inWorkshoponMultipleObjectiveMetaheuristics(MOMH2002). Berlin, Germany: Springer-Verlag, 2002. [3] H. Eschenauer, J. Koski, and A. Osyczka, Multicriteria Design Optimization. Berlin: Springer-Verlag, 1990. [4] C. A. C. Coello and M. Lechunga, \MOPSO: A proposal for multiple objective parti- cle swarm optimization," in Proceedings of IEEE World Congress on Computational Intelligence, 2002, pp. 1051{1056. [5] N. Srinivas and K. Deb, \Multiobjective optimization using nondominated sorting in genetic algorithms," Evolutionary Computation, vol. 2, no. 3, pp. 221{248, 1994. [6] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, \A fast and elitist multiobjective geneticalgorithm: NSGA-II,"IEEETransactionsonEvolutionaryComputation, vol.6, no. 2, pp. 182{197, apr 2002. [7] J. Knowles, \ParEGO: A hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems," IEEE Transactions on Evolutionary Computation, vol. 10, no. 1, pp. 50{66, feb 2005. [8] A. Garrett, G. Dozier, and K. Deb, \NEMO: neural enhancement for multiobjective optimization," in Proceedings of the 2007 Congress on Evolutionary Computation. IEEE Press, sep 2007, pp. 3108{3113. [9] I. Das and J. E. Dennis, \A closer look at drawbacks of minimizing weighted sums of objectives for pareto set generation in multicriteria optimization problems," Structural and Multidisciplinary Optimization, vol. 14, no. 1, pp. 63{69, aug 1997. [10] C. A. C. Coello, \20 years of evolutionary multi-objective optimization: What has been done and what remains to be done," in Computational Intelligence: Principles and Practice, G. Y. Yen and D. B. Fogel, Eds. IEEE Computational Intelligence Society, 2006, pp. 73{88. 176 [11] T. B?ack, U. Hammel, and H.-P. Schwefel, \Evolutionary computation: Comments on the history and current state," IEEE Transactions on Evolutionary Computation, vol. 1, no. 1, pp. 3{17, apr 1997. [12] Z. Michalewicz and D. B. Fogel, How to Solve It: Modern Heuristics. Springer, 2004. [13] D. B. Fogel, \What is evolutionary computation?" IEEE Spectrum, vol. 37, no. 2, pp. 26{32, Feb. 2000. [14] J. H. Holland, Adaptation in Natural and Artiflcial Systems. Ann Arbor, MI: Uni- versity of Michigan Press, 1975. [15] D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley Publishing Company, Inc., 1989. [16] S. Forrest, \Genetic algorithms: principles of natural selection applied to computa- tion," Science, vol. 60, pp. 872{878, Aug. 1993. [17] M. D. Vose, The Simple Genetic Algorithm: Foundations and Theory. MIT Press, 1999. [18] L. J. Fogel, A. J. Owens, and M. J. Walsh, Artiflcial intelligence through simulated evolution. New York: Wiley, 1966. [19] D. B. Fogel, \An introduction to simulated evolutionary optimization," IEEE Transactions on Neural Networks, vol. 5, no. 1, pp. 3{14, Jan. 1994. [20] T. B?ack, F. Hofimeister, and H.-P. Schwefel, \A survey of evolution strategies," in Proceedings of the 4th International Conference on Genetic Algorithms, R. K. Belew and L. B. Booker, Eds. Morgan Kaufman, 1991, pp. 2{9. [21] J. Kennedy and R. Eberhart, \Particle swarm optimization," in Proceedings of the IEEE Conference on Neural Networks, Perth, Australia, 1995, pp. 1942{1948. [22] J. Kennedy, \The particle swarm: Social adaptation of knowledge," in Proceedings of the International Conference on Evolutionary Computation, Indianapolis, IN, 1997, pp. 303{308. [23] M. Clerc, \The swarm and the queen: towards a deterministic and adaptive particle swarm optimization," in Proceedings of the International Conference on Evolutionary Computation, Washington, DC, 1999, pp. 1951{1957. [24] R.C.EberhartandY.Shi, \Comparinginertiaweightsandconstrictionfactorsinparti- cle swarm optimization," in Proceedings of the Congress on Evolutionary Computation, Washington, DC, 2000, pp. 84{88. 177 [25] J. D. Schafier, \Multiple objective optimization with vector evaluated genetic algo- rithms," in Proceedings of the 1st International Conference on Genetic Algorithms, L. Erlbaum, Ed., 1985, pp. 93{100. [26] C. A. C. Coello, \Recent trends in evolutionary multiobjective optimization," in Evolutionary Multiobjective Optimization: Theoretical Advances And Applications, A. Abraham, L. Jain, and R. Goldberg, Eds. Springer-Verlag, 2005, pp. 7{32. [27] A. K. Jain, J. Mao, and K. M. Mohiuddin, \Artiflcial neural networks: A tutorial," IEEE Computer, vol. 29, no. 3, pp. 31{44, mar 1996. [28] F. Rosenblatt, \The perceptron: A probabilistic model for information storage and organization in the brain," Cornell Aeronautical Laboratory, Psychological Review, vol. 65, no. 6, pp. 386{408, 1958. [29] M. L. Minsky and S. Papert, Perceptrons. MIT Press, 1969. [30] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, \Learning internal representations by error propagation," in Parallel Data Processing, D. Rumelhart and J. McClelland, Eds. MIT Press, 1986, pp. 318{362. [31] T. Poggio and F. Girosi, \Networks for approximation and learning," Proceedings of the IEEE, vol. 78, no. 9, pp. 1484{1487, 1990. [32] D. F. Specht, \A general regression neural network," IEEE Transactions on Neural Networks, vol. 2, no. 6, pp. 568{576, 1991. [33] T. Kohonen, Self-Organizing Maps. Berlin, Heidelberg: Springer, 1995. [34] G. Carpenter and S. Grossberg, \The ART of adaptive pattern recognition by a self- organizing neural network," IEEE Computer, vol. 21, no. 3, pp. 77{88, 1988. [35] I. A. Basheer and M. Hajmeer, \Artiflcial neural networks: fundamentals, computing, design, and application," Journal of Microbiological Methods, vol. 43, pp. 3{31, 2000. [36] C. A. C. Coello, \An updated survey of GA-based multiobjective optimization tech- niques," ACM Computing Surveys, vol. 32, no. 2, pp. 109{143, jun 2000. [37] C. M. Fonseca and P. J. Fleming, \Genetic algorithms for multiobjective optimization: Formulation, discussion, and generalization," in Proceedings of the 5th International Conference on Genetic Algorithms, S. Forrest, Ed., 1993, pp. 416{423. [38] ||, \An overview of evolutionary algorithms in multiobjective optimization," Evolutionary Computation, vol. 3, no. 1, pp. 1{16, 1995. 178 [39] J. T. Richardson, M. R. Palmer, G. Liepins, and M. Hilliard, \Some guidelines for genetic algorithms with penalty functions," in Proceedings of the 3rd International Conference on Genetic Algorithms, J. D. Schafier, Ed. Morgan Kaufmann, 1989, pp. 191{197. [40] D. E. Goldberg and J. Richardson, \Genetic algorithms with sharing for multimodal function optimization," in Proceedings of the Second International Conference on Genetic Algorithms, 1987, pp. 41{49. [41] J. Horn, N. Nafpliotis, and D. E. Goldberg, \A niched pareto genetic algorithm for mul- tiobjective optimization," in Proceedings of the First IEEE Conference on Evolutionary Computation, 1994, pp. 82{87. [42] J. Knowles and D. Corne, \The pareto archived evolution strategy: A new baseline algorithm for pareto multiobjective optimisation," in Proceedings of 1999 Congress on Evolutionary Computation, 1999, pp. 98{105. [43] F. Glover and M. Laguna, Tabu Search. Norwell, MA: Kluwer, 1997. [44] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, \Optimization by simulated annealing," Science, vol. 220, no. 4598, pp. 671{680, 1983. [45] E. Zitzler and L. Thiele, \Multiobjective evolutionary algorithms: A comparative case study and the strength pareto approach," IEEE Transactions on Evolutionary Computation, vol. 3, no. 4, pp. 257{271, nov 1999. [46] J. N. Morse, \Reducing the size of the nondominated set: Pruning by clustering," Computer Operations Research, vol. 7, no. 1{2, 1980. [47] P. Hajela and C.-Y. Lin, \Genetic search strategies in multicriterion optimal design," Structural Optimization, vol. 4, pp. 99{107, Aug. 1992. [48] E. Zitzler, M. Laumanns, and L. Thiele, \SPEA2: Improving the strength pareto evo- lutionary algorithm," in Proceedings of EUROGEN 2002, Evolutionary Methods for Design, Optimization and Control with Applications to Industrial Problems, K. Gian- nakoglou, D. Tsahalis, J. Periaux, P. Papailou, and T. Fogarty, Eds., 2002, pp. 95{100. [49] A. P. Engelbrecht, Computational Intelligence: An Introduction. John Wiley and Sons, Ltd., 2002. [50] D. W. Corne, J. D. Knowles, and M. J. Oates, \The pareto envelope-based selection algorithm for multiobjective optimization," in Proceedings of the Parallel Problem Solving from Nature VI Conference, M. Shoenauer, K. Deb, G. Rudolph, X. Yao, E. Lutton, J. J. Merelo, and H.-P. Schwefel, Eds. Springer, 2000, pp. 839{848. 179 [51] E. Zitzler, K. Deb, and L. Thiele, \Comparison of multiobjective evolutionary algo- rithms: Empirical results," Evolutionary Computation, vol. 8, no. 2, pp. 173{195, 2000. [52] C. A. Coello Coello and G. Toscano Pulido, \A micro-genetic algorithm for multiobjec- tive optimization," in First International Conference on Evolutionary Multi-Criterion Optimization, E. Zitzler, K. Deb, L. Thiele, C. A. C. Coello, and D. Corne, Eds. Springer-Verlag LNCS No. 1993, 2001, pp. 126{140. [53] C. A. C. Coello, G. T. Pulido, and M. S. Lechuga, \Handling multiple objectives with particle swarm optimization," IEEE Transactions on Evolutionary Computation, vol. 8, no. 3, pp. 256{279, jun 2004. [54] S. Mostaghim and J. Teich, \Covering pareto-optimal fronts by subswarms in multi-objective particle swarm optimization," in Proceedings of 2004 Congress on Evolutionary Computation, 2004, pp. 1404{1411. [55] O. Schiitze, S. Mostaghim, M. Dellnitz, and J. Teich, \Covering pareto sets by multilevel evolutionary subdivision techniques," in Proceedings of 2nd International Conference on Evolutionary Multi-Criterion Optimization, Theory ond Applications, 2003, pp. 118{132. [56] D. Jones, M. Schonlau, and W. Welch, \E?cient global optimization of expensive black-box functions," Journal of Global Optimization, vol. 13, pp. 455{492, 1998. [57] T. M. Mitchell, Machine Learning. McGraw Hill, 1997. [58] D. E. Rumelhart, B. Widrow, and M. A. Lehr, \The basic ideas in neural networks," Communications of the ACM, vol. 37, no. 3, pp. 87{92, 1994. [59] A. Bryson and Y. Ho, Applied Optimal Control. Blaisdell Publishing, 1969. [60] J. L. McClelland and D. E. Rumelhart, Explorations in parallel distributed processing: A handbook of models, programs, and exercises. MIT Press, 1988. [61] D. R. Hush and B. G. Horne, \Progress in supervised neural networks," IEEE Signal Processing Magazine, vol. 10, no. 1, pp. 8{39, Jan. 1993. [62] I. Hatzakis and D. Wallace, \Dynamic multi-objective optimization with evolutionary algorithms: A forward-looking approach," in Proceedings of the 8th annual conference on Genetic and evolutionary computation (GECCO ?06), 2006, pp. 1201{1208. [63] A. Gaspar-Cunha and A. Vieira, \A hybrid multi-objective evolutionary algorithm using an inverse neural network," in Proceedings of Hybrid Metaheuristics Workshop at ECAI 2004, 2004, pp. 25{30. 180 [64] A. Gaspar-Cunha and J. A. Covas, \RPSGAe: A multiobjective genetic algorithm with elitism: Application to polymer extrusion," Lecture Notes in Economics and Mathematical Systems, 2002. [65] H. Yapicioglu, G. Dozier, and A. E. Smith, \Neural network enhancement of multi- objective evolutionary search," in Proceedings of the 2006 Congress on Evolutionary Computation, 2006, pp. 1909{1915. [66] K. Deb, \NSGA-II source code in C," http://delta.cs.cinvestav.mx/?ccoello/EMOO/ EMOOsoftware.html. [67] T. Okabe, Y. Jin, M. Olhofer, and B. Sendhofi, \On test functions for evolutionary multi-objective optimization," in Proceedings of Parallel Problem Solving from Nature (PPSN VIII), 2004, pp. 792{802. [68] D. A. V. Veldhuizen and G. B. Lamont, \Multiobjective evolutionary algorithm test suites," in Proceedings of the 1999 ACM symposium on Applied computing, 1999, pp. 351{357. [69] ||, \On measuring multiobjective evolutionary algorithm performance," in Proceedings of the 2000 Congress on Evolutionary Computation, 2000, pp. 204{211. [70] E.Zitzler, L.Thiele, M.Laumanns, C.M.Fonseca, andV.G.daFonseca, \Performance assessment of multiobjective optimizers: an analysis and review," IEEE Transactions on Evolutionary Computation, vol. 7, no. 2, pp. 117{132, Apr. 2003. [71] T. Okabe, Y. Jin, and B. Sendhofi, \A critical survey of performance indices for multi-objective optimisation," in Proceedings of the 2003 Congress on Evolutionary Computation, 2003, pp. 878{885. [72] E. Zitzler, \Evolutionary algorithms for multiobjective optimization: methods and applications," Ph.D. dissertation, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland, 1999. [73] R. Sarker and C. A. C. Coello, \Assessment methodologies for multiobjective evolu- tionary algorithms," Evolutionary Computation, pp. 177{195, feb 2002. [74] J. Knowles, \PAES source code in C," http://dbkgroup.org/knowles/multi/. [75] M. S. Lechuga and G. T. Pulido, \MOPSO source code in C," http://delta.cs.cinvestav. mx/?ccoello/EMOO/EMOOsoftware.html. 181 Appendices 182 Appendix A Results from Training Algorithm Experiments These tables display the results from the flve training algorithm experiments from Chapter 4. Each of them contain the average and standard deviation across all flve runs, as well as the p-values associated with the spacing, hypervolume, and binary ?+ indicators. In each of those indicators, if p < 0:05, the average is boldfaced and underlined. 183 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ kno1 1 12.41 18.24 0.02 0 19462.2 430559 0.04 0.15 kno1 2 15.09 17.54 0.02 0 21270.8 411739 0.06 0.89 kno1 3 13.27 17.32 0.02 0 20381.3 428533 0.04 0.08 kno1 4 13.84 17.71 0.02 0 20820.9 434878 0.05 0.03 kno1 5 15.03 16.92 0.02 0 21254.9 423308 0.05 0.21 Av erage 13.928 17.549 0.02 0 20638 425803 0.048 0.272 St. Dev. 1.15214 0.487729 0 0 752.14 8894.51 0.0083666 0.352165 p-v alue 0.00397675 0.0079365 0.141238 ok a1 1 11.16 0.11 0.09 0.02 798.55 70.27 0.31 0.09 ok a1 2 5.67 0.21 0.09 0.22 805.18 119.07 0.07 0.18 ok a1 3 9.27 0.22 0.06 0.21 635.27 98.43 2.26 0.19 ok a1 4 8.52 1.54 0.03 0.07 3739.59 1626.73 2.23 0.42 ok a1 5 10.86 1.55 0 0.08 46.19 548.72 2.27 0.22 Av erage 9.096 0.73 0.054 0.12 1204.96 492.644 1.428 0.22 St. Dev. 2.20632 0.748886 0.0391152 0.0897218 1450.57 663.826 1.13341 0.121861 p-v alue 0.401965 0.420635 0.222222 ok a2 1 0.13 0.19 0.48 1.04 56.83 2.79 2.65e-06 0.41 ok a2 2 0.13 0.27 0.68 1.12 27.14 3.87 0 0.4 ok a2 3 0.14 0.21 0.28 0.82 35.47 4.72 0 0.41 ok a2 4 0.17 0.24 0.59 1.29 30.76 2.66 2.65e-06 0.53 ok a2 5 0.14 0.35 0.79 1.08 17.08 3.18 0.04 0.41 Av erage 0.142 0.252 0.564 1.07 33.456 3.444 0.00800106 0.432 St. Dev. 0.0164317 0.0626099 0.195525 0.169115 14.7117 0.854652 0.0178880 0.0549545 p-v alue 0.0079365 0.0079365 0.0106623 Con tin ued on Next Page. .. 184 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ vlmop2 1 151.05 4.8 0 9.59e-05 172.39 668.38 0 0 vlmop2 2 166.22 4.79 0 9.74e-05 175.85 666 0 0 vlmop2 3 168.27 4.88 0 9.72e-05 171.92 670.42 0 0 vlmop2 4 150.2 4.9 0 9.68e-05 172.78 668.07 0 0 vlmop2 5 149.66 4.87 0 9.68e-05 170.7 666.31 0 0 Av erage 157.08 4.848 0 9.682e-05 172.728 667.836 0 0 St. Dev. 9.32078 0.0496991 0 5.76194e-07 1.91243 1.78377 0 0 p-v alue 0.00729036 0.0079365 NaN vlmop3 1 208.55 4.44 0.01 0 2664.43 8818.84 0.01 0.02 vlmop3 2 196 3.76 0.01 0 2645.68 9722.25 0 0.02 vlmop3 3 187.25 4.2 0.01 0 2665.84 10136.8 0.01 0.01 vlmop3 4 197.72 4 0.01 0 2570.03 9900.39 0.01 0.01 vlmop3 5 203.91 4.54 0.01 0 2579.12 8553.37 0.01 0.02 Av erage 198.686 4.188 0.01 0 2625.02 9426.33 0.008 0.016 St. Dev. 8.11663 0.318622 0 0 46.8427 697.885 0.00447214 0.00547723 p-v alue 0.00397675 0.0079365 0.0558293 dtlz1a 1 34.08 0 0 8.24e-07 201126 75.97 0 5.13 dtlz1a 2 48.5 0 0 0.14 35800000 12765.4 0 66.22 dtlz1a 3 32.06 0 0 8.66e-07 177832 75.97 0 5.13 dtlz1a 4 34.64 0 0 5.12e-10 194033 75.97 0 5.13 dtlz1a 5 36.02 0 0 0 5996.98 0.05 0 9.39 Av erage 37.06 0 0 0.0280003 7275798 2598.67 0 18.2 St. Dev. 6.55156 0 0 0.0626097 15945717 5683.47 0 26.9073 p-v alue 0.0253699 0.0200082 0.00669438 Con tin ued on Next Page. .. 185 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ dtlz2a 1 1472.8 0 0.01 0.22 624.26 0.91 0.02 0.23 dtlz2a 2 1669.23 0 0.01 0.22 628.39 0.91 0.01 0.22 dtlz2a 3 1792.59 0 0.01 0.22 626.14 0.91 0.04 0.23 dtlz2a 4 1530.09 0 0.01 0.22 604.02 0.91 0.01 0.23 dtlz2a 5 1452.7 0 0.01 0.22 623.12 0.91 0.02 0.22 Av erage 1583.48 0 0.01 0.22 621.186 0.91 0.02 0.226 St. Dev. 144.295 0 0 0 9.8018 0 0.0122474 0.00547723 p-v alue 0.00397675 0.00749496 0.0104178 dtlz4a 1 1459.44 0 0.01 1.33e-11 524.88 0.04 0 1.06 dtlz4a 2 1368.28 0 0.01 0 525.61 0.02 1.39e-06 1.06 dtlz4a 3 1746.44 0 0.01 1.27e-11 509.45 0.04 0 1.06 dtlz4a 4 1386.06 0 0.01 0 533.43 0.02 0.01 1.06 dtlz4a 5 1502.78 0 0.01 4.25e-08 515.87 0.04 0 1.06 Av erage 1492.6 0 0.01 8.5052e-09 521.848 0.032 0.00200028 1.06 St. Dev. 152.023 0 0 1.90037e-08 9.3131 0.0109545 0.00447198 0 p-v alue 0.00729036 0.0109095 0.00669438 dtlz7a 1 384.45 0 0.01 0.37 826.02 1.1 0 0.28 dtlz7a 2 372.22 0 0.01 0.37 742.31 0.94 0 0.29 dtlz7a 3 389.08 0 0.01 0.37 841.57 1.05 0 0.28 dtlz7a 4 367.44 0 0.01 0.37 892.92 1.31 0.01 0.28 dtlz7a 5 379.67 0 0.01 0.37 808.86 1.03 0.01 0.28 Av erage 378.572 0 0.01 0.37 822.336 1.086 0.004 0.282 St. Dev. 8.80971 0 0 0 54.6612 0.137949 0.00547723 0.00447214 p-v alue 0.00397675 0.0079365 0.00856192 Con tin ued on Next Page. .. 186 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ sd p 1 74.03 Inf 0 2.32 0 6.32e+08 0 Inf sd p 2 74.99 Inf 0 0.18 0 7.81e+08 0 Inf sd p 3 68.56 Inf 0 0.21 0 7.27e+08 0 Inf sd p 4 72.7 Inf 0 2.09 0 6.52e+08 0 Inf sd p 5 79.59 Inf 0 0.06 0 9.35e+08 0 Inf Av erage 73.974 Inf 0 0.972 0 745400000 0 Inf St. Dev. 3.98464 NaN 0 1.12990 0 121590707 0 NaN p-v alue 0.00749496 0.00749496 Inf Table A.1: Ev olv eI/O and Single Sigma Indicators 187 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ kno1 1 16.36 18.24 0.02 0 19462.2 428537 0.04 0.16 kno1 2 17.14 17.54 0.02 0 21270.8 411739 0.06 0.89 kno1 3 17.72 17.32 0.02 0 20381.3 428706 0.04 0.06 kno1 4 18.58 17.71 0.02 0 20820.9 434931 0.05 0.03 kno1 5 18.27 16.92 0.02 0 21254.9 423562 0.05 0.22 Av erage 17.614 17.55 0.02 0 20638 425495 0.048 0.272 St. Dev. 0.890494 0.49 0 0 752.14 8682.83 0.0083666 0.353794 p-v alue 0.00397675 0.0079365 0.170587 ok a1 1 12.34 0.54 0.09 0.03 1881.62 594.66 0.36 0.42 ok a1 2 5.78 11.57 0.09 0 4518.99 28735.5 3.5e-06 0.67 ok a1 3 10.44 19.06 0.06 0 2933.96 33502.8 4.16e-06 1.84 ok a1 4 9.64 1.57 0.03 0.07 3775.46 1601.57 2.23 0.42 ok a1 5 11.05 0.26 0 0.09 1309.98 115.76 2.3 0.39 Av erage 9.85 6.6 0.054 0.038 2884 12910.1 0.978002 0.748 St. Dev. 2.47978 8.4 0.0391152 0.0408656 1318.40 16716.4 1.18428 0.620862 p-v alue 0.591208 0.84127 0.675174 ok a2 1 0.09 8.12 0.48 0.25 295.43 2171.74 0.01 2.64 ok a2 2 0.08 11.37 0.68 0 26.65 160.2 -0.01 3.35 ok a2 3 0.09 8.88 0.28 0 25.5 100.82 -0.05 3.02 ok a2 4 0.14 76.37 0.59 0.01 86.76 4755.86 -0.02 3.14 ok a2 5 0.08 19.88 0.79 0.17 76.66 1392.24 0.34 2.17 Av erage 0.096 24.92 0.564 0.086 102.2 1716.17 0.054 2.864 St. Dev. 0.0250998 29.14 0.195525 0.116748 111.601 1909.36 0.161338 0.465972 p-v alue 0.0119252 0.031746 0.0079365 Con tin ued on Next Page. .. 188 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ vlmop2 1 216.14 4.8 0 9.59e-05 172.39 668.39 0 0 vlmop2 2 225.75 4.79 0 9.75e-05 175.78 666.16 0 0 vlmop2 3 219.5 4.88 0 9.72e-05 171.92 670.42 0 0 vlmop2 4 211.48 4.9 0 9.68e-05 172.78 668.07 0 0 vlmop2 5 216.09 4.9 0 9.68e-05 170.15 666.03 0 0 Av erage 217.792 4.85 0 9.684e-05 172.604 667.814 0 0 St. Dev. 5.28475 0.05 0 6.02495e-07 2.04045 1.81017 0 0 p-v alue 0.00729036 0.0079365 NaN vlmop3 1 272.31 4.44 0.01 0 2664.43 8818.86 0.01 0.02 vlmop3 2 247.03 4.37 0.01 0 2636.05 8347.01 0 0.01 vlmop3 3 249.95 4.55 0.01 0 2634.8 7405.36 0.01 0.02 vlmop3 4 279.13 4.55 0.01 0 2565.57 8246.04 0.01 0.02 vlmop3 5 269.31 4.54 0.01 0 2579.12 8553.37 0.01 0.02 Av erage 263.546 4.49 0.01 0 2615.99 8274.13 0.008 0.018 St. Dev. 14.2348 0.08 0 0 41.8456 532.815 0.00447214 0.00447214 p-v alue 0.00397675 0.0079365 0.0209213 dtlz1a 1 28.98 4.22 0 3.1e-10 115.49 1131.31 0 1.95e-05 dtlz1a 2 45.8 4.18 0 3.68e-08 115.63 1132.2 0 7.82e-05 dtlz1a 3 29.95 4.77 0 3.31e-09 103.25 1131.8 0 8.53e-05 dtlz1a 4 31.09 4.37 0 1.48e-10 110.85 1131.25 0 1.95e-05 dtlz1a 5 33.89 4.26 0 1.12e-09 113.84 1131.35 0 3.68e-05 Av erage 33.942 4.36 0 8.3376e-09 111.812 1131.58 0 4.786e-05 St. Dev. 6.87943 0.24 0 1.59608e-08 5.15875 0.408497 0 3.18323e-05 p-v alue 0.00749496 0.0079365 0.00729036 Con tin ued on Next Page. .. 189 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ dtlz2a 1 1162.88 1.59 0.01 0 447.45 2085.62 0.02 0.58 dtlz2a 2 1316.06 1.58 0.01 6.14e-06 448.95 2066.8 0.01 0.59 dtlz2a 3 1613.16 1.59 0.01 0 460.95 2111.05 0.02 0.57 dtlz2a 4 1187.58 1.63 0.01 7.42e-05 430.12 2057.41 0.02 0.59 dtlz2a 5 1236.02 1.59 0.01 6.31e-06 456.11 2102.96 0.02 0.58 Av erage 1303.14 1.6 0.01 1.733e-05 448.716 2084.77 0.018 0.582 St. Dev. 182.901 0.02 0 3.19434e-05 11.7453 22.8786 0.00447214 0.0083666 p-v alue 0.00729036 0.0079365 0.00923674 dtlz4a 1 1177.44 1.24 0.01 0 558.32 779 0.01 0.25 dtlz4a 2 1069.48 1.61 0.01 0 439.89 879.66 0.01 0.4 dtlz4a 3 1374.08 1.68 0.01 0 428.84 645.88 0.02 0.25 dtlz4a 4 1081.72 1.64 0.01 0 447.86 686.54 0.01 0.69 dtlz4a 5 1203.84 1.65 0.01 0 430.1 682.57 0.01 0.69 Av erage 1181.31 1.56 0.01 0 461.002 734.73 0.012 0.456 St. Dev. 122.584 0.18 0 0 54.9512 94.7389 0.00447214 0.222216 p-v alue 0.00397675 0.0079365 0.00923674 dtlz7a 1 305.94 2.51 0.01 9.77e-05 645.08 1046.92 0 1.72 dtlz7a 2 333.89 2.57 0.01 0 574.57 1282.63 0 1.63 dtlz7a 3 339.38 2.23 0.01 0 899 1376.93 4.4e-11 1.48 dtlz7a 4 313.48 2.56 0.01 9.85e-05 698.19 1242.01 0 1.72 dtlz7a 5 346.92 2.52 0.01 0 725.65 1361.57 0 1.62 Av erage 327.922 2.48 0.01 3.924e-05 708.498 1262.01 8.8e-12 1.634 St. Dev. 17.4614 0.14 0 5.37323e-05 121.089 132.484 1.96774e-11 0.098387 p-v alue 0.00669438 0.0079365 0.00946735 Con tin ued on Next Page. .. 190 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ sd p 1 79.23 Inf 0 2.35 0 6.11e+08 0 Inf sd p 2 82.19 Inf 0 2.18 0 7.1e+08 0 Inf sd p 3 81.45 Inf 0 0.22 0 7.27e+08 0 Inf sd p 4 88.11 Inf 0 2.09 0 6.52e+08 0 Inf sd p 5 86.91 Inf 0 2 0 8.36e+08 0 Inf Av erage 83.578 Inf 0 1.768 0 707200000 0 Inf St. Dev. 3.77496 NaN 0 0.874969 0 85572776 0 NaN p-v alue 0.00749496 0.00749496 Inf Table A.2: Heuristic I/O and Ev olv eSingle Sigma Indicators 191 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ kno1 1 15.89 0.1 0.02 2.78e-05 19462.2 2834.98 0.01 5.22 kno1 2 16.28 0.097 0.02 5.59e-05 21270.8 2727.24 0.02 6.02 kno1 3 17.27 0.096 0.02 1.76e-05 20381.3 2910.33 0.01 4.92 kno1 4 17.59 0.097 0.02 8.7e-07 20820.9 2889.3 0.02 4.46 kno1 5 17.97 0.092 0.02 2.09e-05 21254.9 2828.88 0.02 5 Av erage 17 0.096 0.02 2.4614e-05 20638 2838.15 0.016 5.124 St. Dev. 0.8821 0.003 0 2.00981e-05 752.14 71.1043 0.00547723 0.572259 p-v alue 0.00749496 0.0079365 0.0109095 ok a1 1 12.5 17.66 0.09 9.75e-05 4183.03 32559.8 -0.48 3.83 ok a1 2 5.89 18.39 0.09 0 5322.52 48700.1 -0.56 3.98 ok a1 3 10.61 13.64 0.06 0 2948.77 14011 -0.05 1.84 ok a1 4 9.99 7.06 0.03 0 2614.18 5570.1 -0.03 1.89 ok a1 5 11.06 1.36 0 0 15.72 8.1 -0.2 1.81 Av erage 10.01 11.62 0.054 1.95e-05 3016.84 20169.8 -0.264 2.67 St. Dev. 2.48180 7.29 0.0391152 4.36033e-05 1990.58 20156.1 0.244397 1.12900 p-v alue 0.0441713 0.150794 0.0079365 ok a2 1 0.09 261.9 0.48 0 279.63 8578.79 -1.03 4.33 ok a2 2 0.08 366.7 0.68 0 107.69 10297.1 -0.95 4.69 ok a2 3 0.13 323.5 0.28 0 151.96 12393.3 -1.16 4.88 ok a2 4 0.17 289.5 0.59 0 152.47 9717.75 -0.88 4.96 ok a2 5 0.09 421.8 0.79 0 71.91 6297.18 -0.63 4.53 Av erage 0.112 332.7 0.564 0 152.732 9456.82 -0.93 4.678 St. Dev. 0.0376829 63.4 0.195525 0 78.525 2244.51 0.197358 0.256652 p-v alue 0.00749496 0.0079365 0.0079365 Con tin ued on Next Page. .. 192 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ vlmop2 1 218.36 4.79 0 3.99e-05 172.39 860.17 0 0.39 vlmop2 2 226.56 4.73 0 3.97e-05 175.43 856.28 0 0.4 vlmop2 3 224.97 4.81 0 3.98e-05 171.92 859.42 0 0.4 vlmop2 4 213.2 4.83 0 3.95e-05 171.93 853.48 0 0.39 vlmop2 5 213.38 4.83 0 3.92e-05 169.43 852 0 0.39 Av erage 219.294 4.8 0 3.962e-05 172.22 856.27 0 0.394 St. Dev. 6.28483 0.04 0 2.77489e-07 2.13841 3.57595 0 0.00547723 p-v alue 0.00749496 0.0079365 0.0065017 vlmop3 1 269.8 4.44 0.01 0 11167 14052.9 0 1.23 vlmop3 2 244.34 4.37 0.01 0 11454.4 14088.5 0 1.23 vlmop3 3 243.61 4.55 0.01 0 10957.6 13638.6 0 1.23 vlmop3 4 262.47 4.55 0.01 0 11302.8 14824.4 0 1.23 vlmop3 5 262.02 4.54 0.01 0 10858.2 13894.4 0 1.23 Av erage 256.448 4.49 0.01 0 11148 14099.8 0 1.23 St. Dev. 11.8005 0.08 0 0 244.152 442.229 0 0 p-v alue 0.00397675 0.0079365 0.00397675 dtlz1a 1 30.84 4.22 0 6.13e-11 115.49 1131.28 0 1.95e-05 dtlz1a 2 45.64 4.18 0 1.07e-10 115.55 1131.49 0 1.95e-05 dtlz1a 3 30.09 4.77 0 1.88e-10 103.24 1131.69 0 6.85e-05 dtlz1a 4 31.28 4.37 0 8.41e-11 110.85 1131.26 0 1.95e-05 dtlz1a 5 32.64 4.26 0 2.04e-09 113.84 1131.32 0 3.6e-05 Av erage 34.098 4.36 0 4.9608e-10 111.794 1131.41 0 3.26e-05 St. Dev. 6.5185 0.24 0 8.644e-10 5.14821 0.181852 0 2.13026e-05 p-v alue 0.00749496 0.0079365 0.00669438 Con tin ued on Next Page. .. 193 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ dtlz2a 1 1263.83 1.59 0.01 6.13e-06 447.45 2077.58 0.01 0.6 dtlz2a 2 1317.97 1.58 0.01 6.16e-06 448.95 2066.83 0.01 0.59 dtlz2a 3 1595.53 1.59 0.01 6.18e-06 460.95 2107.93 0.02 0.59 dtlz2a 4 1393.7 1.63 0.01 6.2e-06 430.12 2054.71 0.01 0.59 dtlz2a 5 1282.77 1.59 0.01 6.2e-06 456.11 2103.96 0.02 0.59 Av erage 1370.76 1.6 0.01 6.174e-06 448.716 2082.2 0.014 0.592 St. Dev. 135.103 0.02 0 2.96648e-08 11.7453 23.1776 0.00547723 0.00447214 p-v alue 0.00729036 0.0079365 0.00856192 dtlz4a 1 1232.91 1.62 0.01 0 438.8 877.96 0.01 0.69 dtlz4a 2 1077.58 1.61 0.01 0 439.89 584.27 0.01 0.69 dtlz4a 3 1389.55 1.68 0.01 0 428.84 765.54 0.01 0.69 dtlz4a 4 1086.53 1.64 0.01 0 447.86 702.74 0.01 0.69 dtlz4a 5 1205.44 1.65 0.01 0 430.1 400.61 0.01 0.69 Av erage 1198.40 1.64 0.01 0 437.098 666.224 0.01 0.69 St. Dev. 127.369 0.03 0 0 7.80511 182.549 0 0 p-v alue 0.00397675 0.150794 0.00397675 dtlz7a 1 324.81 2.51 0.01 0 884.33 1642.18 0 1.72 dtlz7a 2 343.77 2.57 0.01 9.84e-05 574.57 915.58 0 1.71 dtlz7a 3 334.34 2.44 0.01 9.63e-05 701.17 1093.68 2e-12 1.71 dtlz7a 4 311.61 2.56 0.01 9.76e-05 698.19 1198.54 0 1.72 dtlz7a 5 353.75 2.52 0.01 9.75e-05 681.34 1127.08 0 1.73 Av erage 333.656 2.52 0.01 7.796e-05 707.92 1195.41 4e-13 1.718 St. Dev. 16.3616 0.05 0 4.35874e-05 111.523 270.618 8.94427e-13 0.0083666 p-v alue 0.00749496 0.0079365 0.00923674 Con tin ued on Next Page. .. 194 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ sd p 1 82.72 Inf 0 0.02 0 1.34e+09 0 Inf sd p 2 83.84 Inf 0 0.02 0 1.35e+09 0 Inf sd p 3 79.53 Inf 0 0.02 0 1.32e+09 0 Inf sd p 4 89.34 Inf 0 0.02 0 1.27e+09 0 Inf sd p 5 89.5 Inf 0 0.02 0 1.32e+09 0 Inf Av erage 84.986 Inf 0 0.02 0 1.32e+09 0 Inf St. Dev. 4.34596 NaN 0 0 0 30822070 0 NaN p-v alue 0.00397675 0.00729036 Inf Table A.3: Heuristic I/O and Ev olv eMultiple Sigmas Indicators 195 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ kno1 1 0.06 18.24 0.02 0 19462.2 416652 0.04 0.14 kno1 2 0.08 17.54 0.02 0 21270.8 410043 0.06 0.87 kno1 3 0.09 17.32 0.02 0 20381.3 427874 0.04 0.05 kno1 4 0.09 17.71 0.02 0 20820.9 434539 0.05 0.01 kno1 5 0.09 16.92 0.02 0 21254.9 418882 0.05 0.22 Av erage 0.082 17.55 0.02 0 20638 421598 0.048 0.258 St. Dev. 0.0130384 0.49 0 0 752.14 9646.44 0.0083666 0.351667 p-v alue 0.00397675 0.0079365 0.288844 ok a1 1 0.08 0.18 0.09 0.01 766.48 44.28 0.32 0.42 ok a1 2 0.02 0.3 0.09 0.05 1798.47 437.12 0.19 0.38 ok a1 3 0.05 0.3 0.06 0.09 2024.26 364.63 2.29 0.41 ok a1 4 0.05 0.4 0.03 0.07 1966.83 616.08 2.23 0.42 ok a1 5 0.08 0.35 0 0.07 1363.74 128.21 2.3 0.39 Av erage 0.056 0.31 0.054 0.058 1583.96 318.064 1.466 0.404 St. Dev. 0.0250998 0.08 0.0391152 0.0303315 525.176 232.464 1.10677 0.0181659 p-v alue 1 0.0079365 0.675174 ok a2 1 0 26.07 0.48 0.01 334.72 7697.9 -6e-05 2.96 ok a2 2 0 12 0.68 0 26.65 172.25 -0.01 3.35 ok a2 3 0 19.21 0.28 0 34.08 361.72 -0.04 3 ok a2 4 0 4.97 0.59 0.15 14.36 78.22 0.28 3.14 ok a2 5 0 196.6 0.79 0.02 111.32 27523.1 -0.01 3 Av erage 0 51.77 0.564 0.036 104.226 7166.64 0.043988 3.09 St. Dev. 0 81.35 0.195525 0.0642651 134.343 11833.7 0.132783 0.160624 p-v alue 0.0119252 0.0555556 0.0116673 Con tin ued on Next Page. .. 196 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ vlmop2 1 0.98 3.76 0 0 172.39 561.49 0 0 vlmop2 2 1.06 3.74 0 0 176.19 569.02 0 0 vlmop2 3 1.02 3.86 0 0 171.95 575.43 0 0 vlmop2 4 0.99 3.89 0 0 172.99 573.75 0 0 vlmop2 5 0.97 3.8 0 0 170.77 563.48 0 0 Av erage 1.004 3.81 0 0 172.858 568.634 0 0 St. Dev. 0.0364692 0.06 0 0 2.03242 6.12594 0 0 p-v alue NaN 0.0079365 NaN vlmop3 1 1.31 4.2 0.01 0 2653.55 7242.62 0.01 0.01 vlmop3 2 1.28 4.04 0.01 0 2652.53 6908.2 0.01 0.01 vlmop3 3 1.28 4.24 0.01 0 2666.58 6920.45 0.01 0.01 vlmop3 4 1.34 4.21 0.01 0 2583.81 6775.09 0.01 0.01 vlmop3 5 1.34 4.41 0.01 0 2558.02 7384.96 0.01 0.01 Av erage 1.31 4.22 0.01 0 2622.9 7046.26 0.01 0.01 St. Dev. 0.03 0.13 0 0 48.6383 255.782 0 0 p-v alue 0.00397675 0.0079365 NaN dtlz1a 1 0.11 4.22 0 4.62e-07 115.66 1132.45 0 0 dtlz1a 2 0.17 4.18 0 1.46e-06 116.8 1143.09 0 0 dtlz1a 3 0.13 4.77 0 6.13e-07 103.29 1131.59 0 0 dtlz1a 4 0.13 4.37 0 1.15e-05 121.03 1233.13 0 0 dtlz1a 5 0.14 4.26 0 6.06e-05 170.04 1689.31 0 0 Av erage 0.136 4.36 0 1.4927e-05 125.364 1265.91 0 0 St. Dev. 0.0219089 0.24 0 2.59483e-05 25.8336 240.458 0 0 p-v alue 0.00749496 0.0079365 NaN Con tin ued on Next Page. .. 197 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ dtlz2a 1 5.34 1.59 0.01 0 458.42 1744.09 0.02 0.12 dtlz2a 2 6.03 1.58 0.01 0 448.95 1679.39 0.02 0.15 dtlz2a 3 7 1.59 0.01 0 460.95 1742.37 0.02 0.1 dtlz2a 4 5.72 1.63 0.01 0 439.72 1702.93 0.03 0.11 dtlz2a 5 5.39 1.59 0.01 0 456.11 1693.31 0.02 0.14 Av erage 5.896 1.6 0.01 0 452.83 1712.42 0.022 0.124 St. Dev. 0.676927 0.02 0 0 8.58585 29.3523 0.00447214 0.0207364 p-v alue 0.00397675 0.0079365 0.00970079 dtlz4a 1 5.41 0.36 0.01 0.01 453.48 151.58 0.02 0.24 dtlz4a 2 5 0.35 0.01 0.01 460.1 149.9 0.01 0.24 dtlz4a 3 6.8 0.42 0.01 0.01 433.7 164.97 0.02 0.22 dtlz4a 4 5.17 0.32 0.01 0.01 492.6 163.09 0.02 0.24 dtlz4a 5 6.58 0.36 0.01 0.01 439.72 147.48 0.02 0.25 Av erage 5.792 0.36 0.01 0.01 455.92 155.404 0.018 0.238 St. Dev. 0.836224 0.04 0 0 23.0490 8.0357 0.00447214 0.0109545 p-v alue NaN 0.0079365 0.008784 dtlz7a 1 1.22 0.96 0.01 0.01 660.75 739.88 0.01 0.16 dtlz7a 2 1.16 0.99 0.01 0.01 699.96 823.73 0.01 0.12 dtlz7a 3 1.67 0.94 0.01 0.01 667.22 710.63 0.01 0.17 dtlz7a 4 1.13 0.996 0.01 0.01 734.58 785.7 0.01 0.11 dtlz7a 5 1.22 0.93 0.01 0.01 660.63 634.82 0.01 0.13 Av erage 1.28 0.96 0.01 0.01 684.628 738.952 0.01 0.138 St. Dev. 0.221472 0.03 0 0 32.3240 72.4899 0 0.0258844 p-v alue NaN 0.222222 0.00749496 Con tin ued on Next Page. .. 198 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ sd p 1 0.42 Inf 0 2.31 0 6.34e+08 0 Inf sd p 2 0.39 Inf 0 2.36 0 6.16e+08 0 Inf sd p 3 0.42 Inf 0 0.24 0 7.62e+08 0 Inf sd p 4 0.41 Inf 0 2.31 0 6.05e+08 0 Inf sd p 5 0.42 Inf 0 0.23 0 7.6e+08 0 Inf Av erage 0.412 Inf 0 1.49 0 675400000 0 Inf St. Dev. 0.0130384 NaN 0 1.14584 0 78827660 0 NaN p-v alue 0.00729036 0.00749496 Inf Table A.4: Heuristic I/O and Heuristic Single Sigma Indicators 199 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ kno1 1 0.06 18.24 0.02 0 19462.2 426036 0.04 0.15 kno1 2 0.08 17.54 0.02 0 21270.8 409695 0.06 0.87 kno1 3 0.08 17.32 0.02 0 20381.3 425499 0.04 0.06 kno1 4 0.09 17.71 0.02 0 20820.9 432499 0.05 0.02 kno1 5 0.08 16.92 0.02 0 21254.9 417180 0.05 0.21 Av erage 0.078 17.55 0.02 0 20638 422182 0.048 0.262 St. Dev. 0.0109545 0.49 0 0 752.14 8849.31 0.0083666 0.347951 p-v alue 0.00397675 0.0079365 0.170587 ok a1 1 0.06 0.51 0.09 0.14 766.37 172.71 0.31 0.72 ok a1 2 0.03 3.07 0.09 0 1546.46 1952.5 0.18 2.41 ok a1 3 0.06 10.29 0.06 0 2542.74 18879.1 2.26 0.65 ok a1 4 0.05 9.04 0.03 0.04 4220.01 18526.7 2.2 0.6 ok a1 5 0.06 10.27 0 0.02 92.14 11343.3 2.27 0.59 Av erage 0.052 6.64 0.054 0.04 1833.54 10174.9 1.444 0.994 St. Dev. 0.0130384 4.54 0.0391152 0.0583095 1616.47 8867.53 1.09582 0.79324 p-v alue 0.524518 0.222222 1 ok a2 1 0 6.6 0.48 0.01 41.61 112.75 0.02 3.05 ok a2 2 0 12.47 0.68 0 26.65 130.18 -0.01 3.35 ok a2 3 0 30.62 0.28 0 50.76 589.98 -0.05 3.37 ok a2 4 0 0.63 0.59 0.53 14.36 7.14 0.28 3.14 ok a2 5 0 224.31 0.79 0 76.16 5127.17 -5.96e-10 1.68 Av erage 0 54.93 0.564 0.108 41.908 1193.44 0.048 2.918 St. Dev. 0 95.35 0.195525 0.235945 23.682 2210.44 0.132174 0.705386 p-v alue 0.0344536 0.150794 0.0079365 Con tin ued on Next Page. .. 200 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ vlmop2 1 0.94 3.93 0 0 172.39 560.98 0 0 vlmop2 2 1.05 3.85 0 0 176.19 558.56 0 0 vlmop2 3 0.98 4.07 0 0 171.92 571.77 0 0 vlmop2 4 0.95 4.03 0 0 172.87 564.65 0 0 vlmop2 5 0.91 4.03 0 0 170.78 563.64 0 0 Av erage 0.966 3.98 0 0 172.83 563.92 0 0 St. Dev. 0.0531977 0.09 0 0 2.03196 4.99017 0 0 p-v alue NaN 0.0079365 NaN vlmop3 1 1.27 4.44 0.01 0 2664.43 7629.78 0.01 0.02 vlmop3 2 1.25 4.3 0.01 0 2680.45 6935.73 0 0.01 vlmop3 3 1.22 4.15 0.01 0 2692.73 5029.64 0.01 0.05 vlmop3 4 1.27 4.41 0.01 0 2565.49 6802.92 0.01 0.01 vlmop3 5 1.27 4.08 0.01 0 2518.74 4756.18 0.02 0.05 Av erage 1.256 4.28 0.01 0 2624.37 6230.85 0.01 0.028 St. Dev. 0.0219089 0.16 0 0 77.5361 1264.78 0.00707107 0.0204939 p-v alue 0.00397675 0.0079365 0.144698 dtlz1a 1 0.11 4.22 0 4.06e-07 115.64 1132.27 0 0 dtlz1a 2 0.17 4.18 0 1.81e-06 117 1145.03 0 0 dtlz1a 3 0.11 4.77 0 6e-07 103.31 1131.78 0 0 dtlz1a 4 0.11 4.37 0 1.19e-05 121.84 1241.28 0 0 dtlz1a 5 0.13 4.26 0 4.71e-05 163.28 1621.78 0 0 Av erage 0.126 4.36 0 1.23632e-05 124.214 1254.43 0 0 St. Dev. 0.0260768 0.24 0 1.99974e-05 22.8819 210.389 0 0 p-v alue 0.00749496 0.0079365 NaN Con tin ued on Next Page. .. 201 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ dtlz2a 1 5.17 1.59 0.01 0 572.24 2179.57 0.02 0.19 dtlz2a 2 5.78 1.58 0.01 0 448.95 1745.74 0.02 0.15 dtlz2a 3 6.75 1.59 0.01 0 467.11 1817.43 0.02 0.17 dtlz2a 4 5.52 1.63 0.01 0 720.62 2723.83 0.03 0.17 dtlz2a 5 5.19 1.59 0.01 0 456.11 1753.36 0.02 0.17 Av erage 5.682 1.6 0.01 0 533.006 2043.99 0.022 0.17 St. Dev. 0.648205 0.02 0 0 116.253 419.914 0.00447214 0.0141421 p-v alue 0.00397675 0.0079365 0.008784 dtlz4a 1 5.16 0.99 0.01 0.01 485.41 475.8 0.02 0.23 dtlz4a 2 4.83 0.998 0.01 0.01 439.89 375.24 0.02 0.12 dtlz4a 3 6.58 1.56 0.01 0.01 432.7 495.03 0.02 0.09 dtlz4a 4 5.05 0.94 0.01 0.01 448.72 343.72 0.02 0.36 dtlz4a 5 6.38 0.89 0.01 0.01 432.39 220.29 0.02 0.1 Av erage 5.6 1.08 0.01 0.01 447.822 382.016 0.02 0.18 St. Dev. 0.815138 0.27 0 0 22.042 110.945 0 0.115109 p-v alue NaN 0.547619 0.00749496 dtlz7a 1 1.16 1.0 0.01 0.01 781.96 817.73 0.01 0.41 dtlz7a 2 1.11 1.33 0.01 0.01 928.14 1234.17 0.01 0.33 dtlz7a 3 1.33 0.98 0.01 0.01 844.32 838.98 0.01 0.15 dtlz7a 4 1.25 1.98 0.01 0.01 969.05 1465.9 0.01 0.14 dtlz7a 5 1.28 1.55 0.01 0.01 898.46 1049.05 0.01 0.16 Av erage 1.226 1.37 0.01 0.01 884.386 1081.17 0.01 0.238 St. Dev. 0.0896103 0.42 0 0 73.1041 274.096 0 0.123976 p-v alue NaN 0.420635 0.00749496 Con tin ued on Next Page. .. 202 Problem File Training Yield NSGA NEMO NSGA NEMO NSGA-NEMO NEMO-NSGA Time (s) Ratio Spacing Spacing Hyp erv olume Hyp erv olume Binary ?+ Binary ?+ sd p 1 0.42 Inf 0 2.14 0 8.02e+08 0 Inf sd p 2 0.42 Inf 0 2.14 0 7.7e+08 0 Inf sd p 3 0.44 Inf 0 2.11 0 8.18e+08 0 Inf sd p 4 0.42 Inf 0 0.17 0 7.65e+08 0 Inf sd p 5 0.45 Inf 0 2.01 0 9.63e+08 0 Inf Av erage 0.43 Inf 0 1.714 0 823600000 0 Inf St. Dev. 0.0141421 NaN 0 0.864772 0 80989505 0 NaN p-v alue 0.00729036 0.00749496 Inf Table A.5: Heuristic I/O and Heuristic Multiple Sigmas Indicators 203