Fast Solution of Large-Body Problems Using Domain Decomposition and Null-Field Generation in the Method of Moments Except where reference is made to the work of others, the work described in this dissertation is my own or was done in collaboration with my advisory committee. This dissertation does not include proprietary or classified information. Tyler N. Killian Certificate of Approval: Michael E. Baginski Associate Professor Electrical Engineering Sadasiva M. Rao, Chair Professor Electrical Engineering Lloyd S. Riggs Professor Electrical Engineering Tin-Yau Tam Professor Mathematics George T. Flowers Dean Graduate School Fast Solution of Large-Body Problems Using Domain Decomposition and Null-Field Generation in the Method of Moments Tyler N. Killian A Dissertation Submitted to the Graduate Faculty of Auburn University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Auburn, Alabama December 18, 2009 Fast Solution of Large-Body Problems Using Domain Decomposition and Null-Field Generation in the Method of Moments Tyler N. Killian Permission is granted to Auburn University to make copies of this dissertation at its discretion, upon the request of individuals or institutions and at their expense. The author reserves all publication rights. Signature of Author Date of Graduation iii Vita Tyler N. Killian, son of John and Robin Killian, was born on March 9, 1983, in Gadsden, Alabama. He attended high school at Sand Rock High School in Sand Rock, Alabama and graduated in 2001. After high school he attended Auburn Uni- versity where he graduated Summa Cum Laude with a Bachelor?s degree in electrical engineering in 2005. In the fall semester 2005, he entered graduate school at Auburn University and obtained the Master?s degree in electrical engineering in the summer of 2008. The following semester he entered the Ph.D. program in electrical engineering at Auburn University. iv Dissertation Abstract Fast Solution of Large-Body Problems Using Domain Decomposition and Null-Field Generation in the Method of Moments Tyler N. Killian Doctor of Philosophy, December 18, 2009 (M.S., Auburn University, 2008) (B.S., Auburn University, 2005) 104 Typed Pages Directed by Sadasiva M. Rao In this work, a new Method of Moments (MoM) solution procedure for calculat- ing electromagnetic scattering and radiation by electrically large conducting bodies is presented. By using domain-decomposition, conducting structures are divided into several disjoint pieces. By replacing basis functions on each piece of the structure with specially designed functions, null fields may be produced on surrounding areas thereby decoupling sections of the geometry. Also, the geometrical divisions induce a partitioning on the overall system matrix. By creating these null fields, the blocks in the system matrix with the largest element values are eliminated. The result is a block-diagonally-dominant moment matrix that can be used in an iterative proce- dure for rapid convergence. Furthermore, due to the nature of the algorithm, the solution procedure can be divided cleanly among multiple processors for extra sav- ings in CPU resources. Finally, since an iterative procedure is employed, the large memory requirements typical in MoM problems can be effectively sidestepped. v Acknowledgments To the author?s parents, he expresses his appreciation for their love and encour- agement throughout life. To his advisor, Dr. Sadasiva M. Rao, he thanks for his patience and top-notch guidance. His expertise has been invaluable for this work. To Dr. Michael Baginski, he thanks for his encouragement and interesting tech- nical discussions. To the Naval Research Laboratory, he expresses his appreciation for the financial support that made this work possible as well as for their helpful technical suggestions. To the NASA Langley Research Center, he thanks for the GSRP fellowship and travel funding that allowed him to present his research at technical conferences. vi Style manual or journal used Journal of Approximation Theory (together with the style known as ?aums?). Bibliography follows van Leunen?s A Handbook for Scholars. Computer software used The document preparation package TEX (specifically LATEX) together with the departmental style-file aums.sty. vii Table of Contents List of Figures x 1 Introduction and Overview 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Current Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 General Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 Simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.6 Parallel Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.7 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2 Two-dimensional conducting cylinders 21 2.1 Electric-Field Integral Equation - EFIE . . . . . . . . . . . . . . . . . 21 2.2 Magnetic-Field Integral Equation - MFIE . . . . . . . . . . . . . . . . 25 2.3 Combined-Field Integral Equation - CFIE . . . . . . . . . . . . . . . 27 2.4 Element Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.5 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3 Arbitrarily shaped three-dimensional conductors 43 3.1 Electric Field Integral Equation - EFIE . . . . . . . . . . . . . . . . . 43 3.2 Magnetic Field Integral Equation - MFIE . . . . . . . . . . . . . . . . 47 3.3 Combined Field Integral Equation - CFIE . . . . . . . . . . . . . . . 48 3.4 Element Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.5 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4 Three-dimensional conductor periodic arrays 58 4.1 Integral Equation Formulation . . . . . . . . . . . . . . . . . . . . . . 58 4.2 Element Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.3 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 viii 5 Conclusions 64 5.1 Scaling and Parallel Efficiency . . . . . . . . . . . . . . . . . . . . . . 64 5.2 Element Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.3 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Bibliography 70 Appendices 72 A Block Gauss-Seidel Iterative Solver 73 B Radar Cross Section (RCS) 75 C Matrix Partitioning and Parallel Processing 78 D GPU Acceleration 90 ix List of Figures 1.1 Pulse functions on two 2D strips. . . . . . . . . . . . . . . . . . . . . 13 2.1 Pulse function for two-dimensional TM polarization. . . . . . . . . . . 22 2.2 Pulse function for two-dimensional TE polarization. . . . . . . . . . . 24 2.3 Two-dimensional TE polarization testing locations. . . . . . . . . . . 25 2.4 Element grouping for two-dimensional case. . . . . . . . . . . . . . . 29 2.5 Elimination of groups within near-field. . . . . . . . . . . . . . . . . . 29 2.6 Cross section of the 50? square cylinder. . . . . . . . . . . . . . . . . 32 2.7 Real currents in the shadow region (Face#1) for square cylinder with 50? sides with TM incident wave Ho = 1 at 350. . . . . . . . . . . . . 33 2.8 Imaginary currents in the shadow region (Face#1) for square cylinder with 50? sides with TM incident wave Ho = 1 at 350. . . . . . . . . . 33 2.9 Real currents in the shadow region (Face#2) for square cylinder with 50? sides with TM incident wave Ho = 1 at 350. . . . . . . . . . . . . 34 2.10 Imaginary currents in the shadow region (Face#2) for square cylinder with 50? sides with TM incident wave Ho = 1 at 350. . . . . . . . . . 34 2.11 Real currents in the shadow region (Face#1) for square cylinder with 50? sides with TE incident wave Ho = 1 at 350. . . . . . . . . . . . . 35 2.12 Imaginary currents in the shadow region (Face#1) for square cylinder with 50? sides with TE incident wave Ho = 1 at 350. . . . . . . . . . 35 2.13 Real currents in the shadow region (Face#2) for square cylinder with 50? sides with TE incident wave Ho = 1 at 350. . . . . . . . . . . . . 36 2.14 Imaginary currents in the shadow region (Face#2) for square cylinder with 50? sides with TE incident wave Ho = 1 at 350. . . . . . . . . . 36 x 2.15 Object with complex shape and concavity. . . . . . . . . . . . . . . . 37 2.16 Real currents for object with concavity with 100? total circumference and TM incident wave with Ho = 1 at 450. (a) Real part and (b) Imaginary part. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.17 Imaginary currents for object with concavity with 100? total circum- ference and TM incident wave with Ho = 1 at 450. (a) Real part and (b) Imaginary part. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.18 Real currents for object with concavity with 100? total circumference and TE incident wave with Ho = 1 at 450. (a) Real part and (b) Imaginary part. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.19 Imaginary currents for object with concavity with 100? total circum- ference and TE incident wave with Ho = 1 at 450. (a) Real part and (b) Imaginary part. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.20 Geometry for two-dimensional strip array. Each array element is 2? in length and they are separated by 0.1?. . . . . . . . . . . . . . . . . . 40 2.21 RCS for 2D strip array with 100 2? elements. The incident wave is normal to the array with Ho = 1 and has TM polarization. . . . . . . 41 2.22 RCS for 2D strip array with 100 2? elements. The incident wave is normal to the array with Ho = 1 and has TE polarization. . . . . . . 41 3.1 Quantities for RWG basis functions. . . . . . . . . . . . . . . . . . . . 45 3.2 Grouping for arbitrary 3d case. X?s represent null groups. . . . . . . . 50 3.3 Triangular mesh for one octant of 5? radius sphere. . . . . . . . . . . 54 3.4 Bistatic RCS for 5? radius sphere. First and second iterations and MoM BOR code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.5 Triangular mesh for section of 12??12? square plate. . . . . . . . . 55 3.6 Bistatic RCS for 12??12? square plate. The first and second iterations are shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.7 Triangular mesh for French Mirage aircraft geometry. . . . . . . . . . 56 3.8 Bistatic RCS for aircraft, first and second iterations . . . . . . . . . . 56 xi 4.1 Finite periodic conducting array periodic in two dimensions. . . . . . 59 4.2 Elimination of groups within near-field for finite periodic array. . . . . 59 4.3 Geometry for 50 x 50 0.5?2 plate finite periodic array. . . . . . . . . . 63 4.4 Bistatic RCS result after 1 and 2 iterations. . . . . . . . . . . . . . . 63 5.1 Ratio of single CPU time to 1, 2, 4, and 6 processors for machine one. 67 5.2 Ratio of single CPU time to 1, 2, 4, 6, and 8 processors for machine two. 67 D.1 Ratio of CPU to GPU execution times for matrix fill and conjugate gradient solver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 xii Chapter 1 Introduction and Overview In this chapter, we will introduce a basic overview of the technique as well as the motivation behind it. Furthermore, the strengths and weaknesses of current and previous techniques will be highlighted. The general concepts used in the technique will be given as well as the simple steps used to achieve accurate currents for large- body problems. Furthermore, this chapter acts as an outline for applying the method to the various formulations in the following chapters. 1.1 Motivation The Method of Moments [1] is an accurate method used in electromagnetics to solve for the electric currents induced on an object due to an incident field. The object of interest is discretized for representation on a computer. Usually, this is either done with a triangular mesh for three-dimensional objects or line segments for two-dimensional objects. Since the Method of Moments is a Boundary Element Method (BEM), then only the object under study needs to be discretized. Once this is done, a set of N basis functions are defined on the object to represent the desired currents. Finally, a system matrix representing the interaction between each pair of basis functions is constructed. This matrix can be inverted to solve the linear system relating the currents to the incident field. Filling the matrix typically requires O(N2) operations while the inversion procedure requires O(N3) operations. For large-body problems, these steps represent a problem both in terms of storage and solution 1 speed. In order to address these issues, we need a method that can solve the problem piecemeal to reduce storage and also a solver requiring much fewer operations than full matrix inversion or LU decomposition. 1.2 Current Methods There are currently a variety of methods used to combat the aforementioned issues with the Method of Moments. Brute force methods, such as using supercom- puters, attempt to directly solve the linear system using multiple processors [10]. Furthermore, they can solve large systems by storing the system matrix on hard disk. However, this method scales very poorly. Parallel LU decomposition cannot be fully parallelized and therefore the efficient gains become smaller with each additional pro- cessor. Furthermore, large problems can require hundreds of gigabytes of storage if the full system matrix is generated. Another method is the use of entire domain basis functions. Here, functions are defined over larger parts of the geometry. Using this technique, fewer functions can be used on the overall geometry. The downside is that these functions can only be defined for canonical shapes such as squares, circles, etc. If the geometry cannot be represented in terms of these shapes, the functions cannot be utilized. Another popular method is the Fast Multipole Method (FMM) [8]. Here, the geometry is broken up and basis functions are grouped together. By using wave translation, each set of functions is allowed to radiate ?as a group? thereby allowing us to avoid calculating all the individual elements in the system matrix when using an iterative solver such as the conjugate gradient method. By accelerating the vector products in the iterative solver, large problems have been solved [9]. Since the method approximates parts of the system matrix, it may not be accurate when solving for precise current values on a scatterer. Furthermore, the iterative procedure is typically 2 the conjugate gradient solver [12], which can display erratic or slow convergence and numerical problems due to precision round-off. An additional method is to generate characteristic basis functions, where entire domain functions are generated at run time [11]. Here, the structure is broken up into pieces. For each piece, an incident wave is applied at various angles and the resulting currents are found for each case. Each current solution then becomes a basis function. A singular value decomposition is then used to find the minimal set of basis functions from this collection that is necessary to represent the currents on that part of the structure. Once this has been done for each piece of the structure, all the functions are collected and a new system matrix is generated and solved using these new functions. If there are very few basis functions, the new linear system can be solved very efficiently. However, since each piece is considered separately, there is no guarantee that the new basis functions can properly account for the interaction with the rest of the structure. Furthermore, for complex shapes, there is no guarantee that the number of unknowns will be much less than when sub-domain functions are used. Also, research has been done in trying to create a sparse system matrix [16],[17]. In [16], a special set of basis functions is used where the functions produce a highly directional field pattern with small sidelobes. Test functions in line with the sidelobes will then produce small matrix elements. The elements are then simply dropped since they are small in magnitude. Although this is an interesting idea, it appears that the system matrix becomes ill- conditioned for the new basis functions. Furthermore, the solution is still sensitive to these matrix elements, irregardless of how small they are and therefore they cannot be dropped. In [17], a set of entire domain functions are developed with coefficients that are solved for at runtime. Although the number of unknowns remains the same, the new functions generate a highly sparse, banded matrix. This system may then be solved without matrix inversion. Unfortunately, the functions are not easily extended 3 to three-dimensional problems. Finally, attempts have been made to diagonalize the system matrix by a basis change to fully computed entire-domain basis functions [7]. In this work, we use the concepts introduced in [6] and [7] for decoupling various parts of the structure. However, we do not attempt to form a full eigenfunction solution over the entire structure as that is not computationally efficient. Instead, we decouple grouped basis functions according to a nearest neighbor criteria. By applying these concepts in a different way, we are able to form a much more efficient solution. 1.3 Concepts In this section, the basic concepts used in the method will be covered. First, it is imperative that sub-domain functions be used in the original MoM formulation. The original MoM matrix will form the basis for the method. Here, coupling between functions depends directly on their spatial separation. Typically, the coupling falls off as 1R or 1R2 where R is the distance between two functions. Thus, the largest elements in the system matrix are for those functions which are closest together spatially. Our strategy will be to eliminate those elements in the system matrix which are largest. In order to accomplish this, first functions will be grouped spatially using a nearest neighbor criteria. The system matrix can then be rearranged such that elements in the same group are in the same partitions in the matrix. Next, we take each group one at a time. Adjacent groups within a given radius (usually a few wavelengths) are considered ?near-field groups?, while those outside are called ?far-field groups.? We wish to decouple our chosen group from those groups which are in the near-field region. We generate a new set of basis functions on the group to replace the sub- domain functions. These functions consist of a linear combination of the original sub-domain functions with the added criteria that they produce a null field on the 4 groups in the near-field. This corresponds to creating zeros in the system matrix thereby eliminating the largest elements. Once this procedure has been carried out for each group, we are left with a new system matrix with partitions corresponding to the constructed groups. Furthermore, this final matrix will be block-diagonally-dominant due to the elimination of near-field groups. This matrix will converge very quickly in an iterative scheme making use of its structure, such as the Gauss-Seidel scheme. Computing the new basis essentially results in solving for the near-field interaction while the iterative procedure solves for the far-field interactions. Furthermore, the iterative procedure will take very few iterations allowing us to speed up the solution procedure. So, by making use of the physics of the problem, we can solve the near- field and far-field problems separately enabling us to maintain accuracy where it is most needed and gain speed where accuracy is not as essential. Finally, each process along the way can be subdivided cleanly so that the method is amenable to parallel processing. 1.4 General Procedure The procedure begins with a standard MoM formulation using sub-domain basis functions. Sub-domain basis functions are defined only over a small portion of the geometry and are typically on the order of a tenth of a wavelength in size. This requirement ensures that when two basis functions are separated by a large distance, the corresponding matrix elements representing the coupling between the functions will be small in magnitude. Next, the functions are placed into multiple disjoint groups. These groups may be decided by the geometry of the problem. For example, for an antenna array, the functions on each array element may be a group in the system. Otherwise, we may simply group elements which are electrically close to one 5 another. The functions within each group are then renumbered if necessary so that these groups effectively partition the system matrix so that each group will correspond to one self-block along the diagonal. Once the groups have been properly formed, we select a ?source? group and then begin creating a new set of basis functions for that group. We choose ?test? points (points where the new basis functions will create null fields) corresponding to those elements in the MoM matrix which are the largest. Since our original basis functions are sub-domain functions, the largest elements in the matrix will correspond to functions which are geometrically close to one another on the structure. Consequently, we establish a nearest neighbor criteria where groups that are within a predetermined radius (typically a few wavelengths) are included as test groups. Since these groups are close to the source, the coupling between these elements and the source elements will be significant and the corresponding matrix elements will be large. Once the test points have been chosen, we form a new set of basis functions to replace the source group functions. Each new basis function is a linear combination of a sub-domain function from the source group, which is given a coefficient of one, and all the functions from the test groups, each of which is assigned a thus far undetermined coefficient. To solve for these unknown coefficients, or weights, for a source group of size K, we form K linear systems: ? ?? ?? ?? ?? ?? ? Zt1t1 Zt1t2 ??? Zt1tn Zt2t1 Zt2t2 ... ... Ztnt1 Ztntn ? ?? ?? ?? ?? ?? ? ? ?? ?? ?? ?? ?? ? ?1,j ... ?n,j ? ?? ?? ?? ?? ?? ? = ? ? ?? ?? ?? ?? ?? ? Zt1sj ... Ztnsj ? ?? ?? ?? ?? ?? ? (1.1) where j = 1,...,K and there are n testing locations and the Zm,n elements come from the MoM system matrix. Also, the column vectors [?i,j] with i = 1,...,n represent 6 the desired coefficients for forming the new basis functions. Notice that the matrix to be inverted is just a principal submatrix of the original MoM matrix with the indices of the chosen test functions. Also, since there are n unknowns and n equations, this system can be solved exactly if the matrix is nonsingular. Since the matrix effectively represents solving a smaller method of moments problem, the matrix will be nonsingular. Furthermore, as we move from one new source basis to another (as j changes) in the same source group, only the right hand side of Eq. 1.1 is modified. Therefore, for each source group, it is only necessary to invert one matrix. Finally, in most situations, the number of test points is larger than the number of functions in the source group. In this case, solving Eq. 1.1 can be made more efficient by using LU decomposition and then using forward/backward substitution for multiple right-hand sides. Next, we let fi for i = 1,...,K be the original sub-domain functions and gi for i = 1,...,K the new set of functions. The new functions are given by: ? ?? ?? ?? ?? ?? ? gs1 gs2 ... gsK ? ?? ?? ?? ?? ?? ? = ? ?? ?? ?? ?? ?? ? fs1 fs2 ... fsK ? ?? ?? ?? ?? ?? ? + ? ?? ?? ?? ?? ?? ? ?1,1 ?2,1 ??? ?n,1 ?1,2 ?2,2 ... ... ?1,K ?n,K ? ?? ?? ?? ?? ?? ? ? ?? ?? ?? ?? ?? ? ft1 ft2 ... ftn ? ?? ?? ?? ?? ?? ? (1.2) Note that the [?] column vectors in Eq. 1.1 become rows in the matrix seen in Eq. 1.2. Finally, this process is carried out for each group in the structure. By changing our source functions, we create a new system matrix ?Z. This can be represented formally as: ?Z = ZR (1.3) 7 where Z is the original MoM system matrix for the sub-domain functions and R is a sparse matrix representing a basis change from the sub-domain functions to the new, null-field producing functions. R is constructed as follows. Note, each column represents a new basis function. Since each new function is a linear combination of the original functions we have gi = Nsummationdisplay j=1 ?jfj (1.4) where N is the number of unknowns. Column i in R will then be [?1,?2,...,?N]T. Using our procedure, the replaced source function is always assigned a weight of 1 when the new functions are created. This means R will have ones along the diagonal. Furthermore, test points are only chosen at locations near the source, not over the entire structure. For points not chosen, we simply place a zero in R. Consequently, R will be a sparse matrix, becoming more sparse as the system size increases. As an example, suppose we have a system size of N = 5 with sub-domain functions f1,f2,f3,f4,f5. In this case, Z, ?Z, and R are all 5 ? 5 matrices. Now suppose we wish to replace f2 with a function g2 = f2+?1f1+?2f3+?3f5 while leaving f1,f3,f4,f5 untouched. In this case we have R = ? ?? ?? ?? ?? ?? ?? ?? ? 1 ?1 0 0 0 0 1 0 0 0 0 ?2 1 0 0 0 0 0 1 0 0 ?3 0 0 1 ? ?? ?? ?? ?? ?? ?? ?? ? (1.5) where f1,f3,f4,f5 are transformed into themselves and f2 is transformed into g2. Note that although the subdomain functions were created with the purpose of rep- resenting the electric current, they still form a true basis in that they may represent 8 any function, including other basis functions. In this way, any basis function can be constructed from the original sub-domain functions without having to re-derive a new mathematical formulation from scratch for the new functions. Using this procedure, we may transform all of the original basis functions (using only one R matrix), and thereby create a new system matrix. If we have M groups and our original MoM matrix is partitioned as ? ?? ?? ?? ?? ?? ? ZG1,G1 ZG1,G2 ??? ZG1,GM ZG2,G1 ZG2,G2 ... ... ZGM,G1 ZGM,GM ? ?? ?? ?? ?? ?? ? ? ?? ?? ?? ?? ?? ? IG1 IG2 ... IGM ? ?? ?? ?? ?? ?? ? = ? ?? ?? ?? ?? ?? ? VG1 VG2 ... VGM ? ?? ?? ?? ?? ?? ? (1.6) where the groups Gi for i = 1,...,M may vary in size, then the new system matrix will look similar to the following: ?Z = ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? ?ZG 1,G1 [0] ?ZG 1,G3 ?ZG 1,G4 ??? [0] [0] ?ZG2,G2 ?ZG 3,G1 [0] ?ZG 3,G3 ?ZG 4,G1 ?ZG 4,G2 ?ZG 4,G4 ... ... [0] [0] ?ZGn,Gn ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? ? ?? ?? ?? ?? ?? ? ?IG 1 ?IG 2 ... ?IG M ? ?? ?? ?? ?? ?? ? = ? ?? ?? ?? ?? ?? ? VG1 VG2 ... VGM ? ?? ?? ?? ?? ?? ? (1.7) An important observation here is that certain partitions have been replaced with zeros due to our choice of basis functions. Furthermore, the change of basis for a group has the effect of creating zeros in that group?s block-column. Also, these zeros produce nulls at locations which are close geometrically, but need not be close in the system matrix. In fact, we do not have a banded matrix in general. Finally, solving 9 this system gives us the coefficients for the new basis functions, not the original sub-domain functions. Our original system is transformed as follows: ZI = V ? ZRR?1I = V ? (ZR)(R?1I) = V ? ?Z?I = V (1.8) Once the new system has been solved, we can obtain the coefficients for the original functions with I = R?I (1.9) Since all the largest terms in the system matrix have been eliminated, the newly created matrix is block-diagonally-dominant. Roughly, this means that the elements of the diagonal blocks are much larger than the off-diagonal elements. We begin solving this system by inverting the diagonal blocks and solving for currents as if the non-diagonal blocks were zero: ? ?? ?? ?? ?? ?? ? ?ZG 1 0 ??? 0 0 ?ZG2 ... ... 0 ?ZGM ? ?? ?? ?? ?? ?? ? ? ?? ?? ?? ?? ?? ? ?IG 1 ... ?IG M ? ?? ?? ?? ?? ?? ? = ? ?? ?? ?? ?? ?? ? VG1 ... VGM ? ?? ?? ?? ?? ?? ? (1.10) In effect, what we are doing is solving the system by taking into account only the near- field interaction between elements. Since this type of interaction is very complex and involves multiple reflections, we must solve this part ofthe problem exactly. Generally, this solution will be fairly accurate overall and will lead to a more quickly converging solution when used as the initial guess in an iterative scheme. An additional iterative 10 scheme may be useful because the off-diagonal blocks are not necessarily zero and therefore it may be necessary to go through an additional step to further enhance the accuracy of our solution. Next, we take the initial current guess and use it as the starting point for the Block-Gauss-Seidel iterative scheme as described in Appendix A. Since our matrix ?Z is block-diagonally-dominant and our iterative solver favors this type of matrix, it will quickly converge to a solution. Typically, this requires only a few iterations. Furthermore, this scheme has an interesting physical interpretation. To solve for a new current value ?Ik+1Gj , we use ?Ik+1G j = ?Z?1G j,Gj ? ??V Gj ? Msummationdisplay i=1 inegationslash=j ?ZG j,Gi ?IkG i ? ?? (1.11) where there are M groups and ?IkGi is the previously found solution for ?IGi. The summation in Eq. 1.11 represents the total field on group j due to the currents on the rest of the structure. This means that Eq. 1.11 represents a modification in current values at location j due to reflections on the structure. Therefore, our initial current guess represents currents due to the incident wave plus all reflections within each group. Further iteration represents additional reflections between the groups. Also, note that since each group usually radiates in all directions, subsequent reflections will be progressively smaller in magnitude as energy is radiated into thesurrounding space. Consequently, groups that are spatially separated by large distances will require very few reflections to obtain convergence. To summarize, we have five basic steps: 1. Group Formation - The geometry of the problem is divided up into disjoint groups. Each group consists of a set of sub-domain basis functions. 11 2. Create new basis functions - For each group, a new set of basis functions is constructed - one for each member of the group. For a given basis function in the group, the new function will consist of that sub-domain function plus a linear combination of the source functions at all the locations along the structure where null-fields shall be produced. The function inside the group is given a coefficient of one, while those outside (where nulls are produced) are given unknown coefficients or weights. These weights can be solved for exactly by a matrix inversion. Furthermore, only one matrix inversion is necessary for each group. 3. Create new system matrix ?Z - Once the new functions are generated, a new system matrix is generated. Since the newly generated functions are linear combinations of the original sub-domain functions, the elements of the new system matrix are combinations of those from the sub-domain system matrix. Also, the matrices can be partitioned such that the blocks correspond to the basis groups found in step 1. The new matrix will be block-diagonally-dominant since all the largest elements (due to near-field coupling) have been eliminated. These null blocks are generated by the specially constructed basis functions. 4. Construct initial guess - We construct an initial guess in order to accelerate the convergence of the iterative procedure. Here, we invert the diagonal blocks of the new matrix and solve for the currents on the structure. This amounts to finding the currents due to the incident wave excitation and the most dominant near-field interactions on the structure. Since each group interacts weakly with the ignored groups, this solution will be fairly accurate but can be made as accurate as desired by further iteration. 12 5. Iterate for accuracy - The initial guess represents an approximate solution, but one can iterate further to obtain more accuracy if necessary. Our iteration scheme is the block adaptation of the Gauss-Seidel solver. Since the matrix is block-diagonally-dominant and the solver is based on this property, it will converge very quickly. Therefore, very few iterations will be necessary to get a sufficiently accurate solution. Notice that the method makes no mention of any specific MoM formulation. In fact, as long as the basis functions are sub-domain functions, the MoM system matrix will have a diagonally strong structure. Furthermore, the functions can be properly groupedandtheabove procedure may beapplied. Throughout theremaining chapters, the procedure will be applied to a variety of formulations, including both two-dimensional and three-dimensional problems. 1.5 Simple example Here a simple example will be used to illustrate and clarify the concepts discussed in the previous sections. In the example, there are 2 two-dimensional strips, each carrying three pulse functions as shown in Figure 1.1. We will place the functions on each strip into 2 separate groups: G1 = {f1,f2,f3} and G2 = {f4,f5,f6}. Figure 1.1: Pulse functions on two 2D strips. 13 Here, the geometry provides us with a convenient group selection. For an arbi- trary case, we simply create groups with functions that are geometrically close to one another on the given structure. Our choice of groups induces a natural partitioning on the MoM matrix. The original sub-domain system matrix is given by: Z = ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? ? ?? ?? ?? ? Z1,1 Z1,2 Z1,3 Z2,1 Z2,2 Z2,3 Z3,1 Z3,2 Z3,3 ? ?? ?? ?? ? ? ?? ?? ?? ? Z1,4 Z1,5 Z1,6 Z2,4 Z2,5 Z2,6 Z3,4 Z3,5 Z3,6 ? ?? ?? ?? ? ? ?? ?? ?? ? Z4,1 Z4,2 Z4,3 Z5,1 Z5,2 Z5,3 Z6,1 Z6,2 Z6,3 ? ?? ?? ?? ? ? ?? ?? ?? ? Z4,4 Z4,5 Z4,6 Z5,4 Z5,5 Z5,6 Z6,4 Z6,5 Z6,6 ? ?? ?? ?? ? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? = ? ?? ? [ZG1G1] [ZG1G2] [ZG2G1] [ZG2G2] ? ?? ? (1.12) where ZGiGj is the partition consisting of all the sub-domain MoM matrix elements with sources from group Gj and test points on group Gi. In order for Z to be block-diagonally dominant, the elements in ZG1G1 and ZG2G2 should be much larger in magnitude than the elements in ZG1G2 and ZG2G1. To eliminate ZG2G1, we create three new entire-domain basis functions to replace the sub-domain functions on Group 1, which we refer to as the source group. These will be given by: g1 = f1 +?1f4 +?1f5 +?1f6 g2 = f2 +?2f4 +?2f5 +?2f6 (1.13) g3 = f3 +?3f4 +?3f5 +?3f6 14 The weights ?i,?i, and ?i are chosen such that their net effect is to produce null fields at every point on the second strip. We will refer to Group 2 as the test group. To give an example of how we find the coefficients for the new basis functions, we will solve for g1. Mathematically, we can represent the null-field generation using the following criteria: Z4,1 +?1Z4,4 +?1Z4,5 +?1Z4,6 = 0 Z5,1 +?1Z5,4 +?1Z5,5 +?1Z5,6 = 0 (1.14) Z6,1 +?1Z6,4 +?1Z6,5 +?1Z6,6 = 0 We can obtain the coefficients ?1,?1, and ?1 with the following relationship: ? ?? ?? ?? ? ?1 ?1 ?1 ? ?? ?? ?? ? = ? ? ?? ?? ?? ? Z4,4 Z4,5 Z4,6 Z5,4 Z5,5 Z5,6 Z6,4 Z6,5 Z6,6 ? ?? ?? ?? ? ?1? ?? ?? ?? ? Z4,1 Z5,1 Z6,1 ? ?? ?? ?? ? (1.15) From this relationship, we can make a couple of important observations. First, the matrix to be inverted is identical to the system matrix for the standard MoM problem involving only the segments 4, 5, and 6. Since this problem is well-defined, the inverse must exist and we can therefore solve for the coefficients exactly. Further- more, if we try to solve for g2 or g3, we will use the same matrix, but will multiply it by a different column vector. Therefore, only one matrix must be inverted to solve for the coefficients for each group. Once we solve Eq. 1.15, we obtain ?1, ?1, and ?1 and thereby generate the new basis function g1. In a similar fashion, we can form g2 and g3. These functions replace f1,f2 and f3. Likewise, we can replace f4,f5, and f6 with new functions g4,g5, and g6 15 to produce nulls on the first strip thereby eliminating ZG1G2. We form a new system matrix ?Z by a source basis change to the newly formed g?is. The elements in ?Z are then simply linear combinations of the original elements of Z. For example, if we wish to generate the element ?Z2,1, we compute ?Z2,1 = Z2,1 +?1Z2,4 +?1Z2,5 +?1Z2,6 (1.16) In matrix notation we create ?Z with the following basis transformation: ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? ?Z1,1 ?Z1,2 ?Z1,3 ?Z1,4 ?Z1,5 ?Z1,6 ?Z2,1 ?Z2,2 ?Z2,3 ?Z2,4 ?Z2,5 ?Z2,6 ?Z3,1 ?Z3,2 ?Z3,3 ?Z3,4 ?Z3,5 ?Z3,6 ?Z4,1 ?Z4,2 ?Z4,3 ?Z4,4 ?Z4,5 ?Z4,6 ?Z5,1 ?Z5,2 ?Z5,3 ?Z5,4 ?Z5,5 ?Z5,6 ?Z6,1 ?Z6,2 ?Z6,3 ?Z6,4 ?Z6,5 ?Z6,6 ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? = ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? Z1,1 Z1,2 Z1,3 Z1,4 Z1,5 Z1,6 Z2,1 Z2,2 Z2,3 Z2,4 Z2,5 Z2,6 Z3,1 Z3,2 Z3,3 Z3,4 Z3,5 Z3,6 Z4,1 Z4,2 Z4,3 Z4,4 Z4,5 Z4,6 Z5,1 Z5,2 Z5,3 Z5,4 Z5,5 Z5,6 Z6,1 Z6,2 Z6,3 Z6,4 Z6,5 Z6,6 ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? ? ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? 1 0 0 ?4 ?5 ?6 0 1 0 ?4 ?5 ?6 0 0 1 ?4 ?5 ?6 ?1 ?2 ?3 1 0 0 ?1 ?2 ?3 0 1 0 ?1 ?2 ?3 0 0 1 ? ?? ?? ?? ?? ?? ?? ?? ?? ?? ? (1.17) Note that we only need to store the lower left and upper right hand blocks since the remaining partitions are simply identity matrices. Also, note that we only need to store coefficients for locations where we want to produce null fields. In larger problems, we only produce nulls on near-field elements, not the entire structure. 16 Thus, if we have ?Z = ZR, then R will be very sparse and we will only need a minor amount of storage. Our final partitioned matrix is: ?Z = ? ?? ? ?ZG 1G1 0 0 ?ZG2G2 ? ?? ? (1.18) where the off-diagonal blocks are exactly zero by design. In this example, the system may be solved exactly by simply inverting the sub- blocks ?ZG1G1 and ?ZG2G2. In a problem where there are several groups, it is unnecessary to produce nulls everywhere outside of a source group. Only the test points in the near-field neighborhood must have nulls since their corresponding matrix elements are the most significant. Additional test points from the remaining groups may be picked up sparsely if one so chooses, although this appears to be unnecessary and adds complexity to the algorithm. The only requirement is that the group blocks corre- sponding to the self terms dominate the terms in their respective columns and rows. Since ?Z can be made highly block-diagonally dominant, the system will converge quickly when used in an iterative scheme. 1.6 Parallel Processing Finally, since each part of the solution procedure can be further divided into computationally separate pieces, the method is highly amenable to parallel processing. In this section, we will discuss how each part of the algorithm may be implemented in a parallel scheme. In order to construct the new system matrix ?Z, we must first generate the ele- ments of the original matrix Z. Each one of the matrix elements may be computed 17 completely independently from the other elements. Furthermore, the elements may be distributed evenly across each of the processors allowing for equal work load. There- fore, this part of the algorithm tends to scale linearly with the number of processors used. In the present work, only one block-row of Z was stored in memory at a time. The row was broken up according to the group choice and then the column blocks corresponding to the groups were divided amongst the processors. When solving for the coefficients used to construct the new basis functions, each source group can be considered separately. Each processor may be assigned a set of groups, each of which requires a single matrix inversion. If the groups are similar in size, they may simply be distributed evenly across the processors. If they are uneven, the processors can each get a different number of groups in order to balance the load. Here, dynamic scheduling may be used so that when a CPU becomes available, it simply advances to the next group and begins solving for weights. In this type of situation, if one CPU becomes occupied with a large set of coefficients, the remaining CPUs can continue to work on smaller groups. Good load balancing can be achieved in this way. For example, one CPU can solve for the coefficients on one group with N null-field points, while another can solve eight groups, each of which has N/2 null- field points. Here, each group solution carries a complexity of O(N3) where N is the number of null-field points that must be produced for that source group. Typically, one iteration of the solver takes roughly the same amount of time as does one matrix-vector multiply with the same dimensions as the system size. Therefore, the iteration procedure generally does not require parallel processing since it is a very fast procedure and requires only a few iterations. In this work, this procedure was not made parallel. However, it is possible to create a parallel version if necessary. If the Gauss-Seidel version is used, then as we move from one group to the next solving for currents, we may take each group and assign subsets of the basis 18 functions to each processor. Each processor can then form the summations given in Eq. A.3. The inverse in Eq. A.3 may then be done by a single CPU. Alternatively, one could use a block-Jacobi iterative solver, which is essentially the same as the Gauss-Seidel solver with the exception that once a set of currents are found for a group, they are not used until the next iteration. The Gauss-Seidel version uses those new current values immediately, leading to faster convergence. In a parallel scheme, the block-Jacobi solver would have the benefit that each group of currents could be found independently and thus the algorithm is slightly more parallel than the Gauss-Seidel version. 1.7 Efficiency Here we compare the efficiency of the method to a full LU decomposition solution. We assume that we have divided a structure with N basis functions into K groups, each with M unknowns. Furthermore, we assume that each source group produces a total of P null field points. There are three major sources that determine the amount of required memory. First, we must store the coefficients for each group. There are M sets of P coefficients for every source group. Therefore, we must store K?M ?P complex numbers for the coefficients. Next, we need enough scratch space to solve for the coefficients. This requires enough room to store a P ?P matrix, or P2 complex values. Also, we must store a block-row of the partitioned matrix for use in the iterative scheme. This requires storing M?N complex values. So we have a required storage of approximately KMP + P2 + MN complex values. For the full solution, we must store the entire matrix for a total of N2 complex values. Note that we have K ? N, M ? N, and P ? N so that we require much less storage than does storing the entire matrix. For example, suppose that we have N = 100000, K = 400, 19 M = 250, and P = 2000. Then we have KMP+P2+MNN2 = 0.024. So, in this case, we require less than 3 percent of the storage needed for storing the full matrix. We can also compare the necessary CPU requirements for the two methods. For a LU decomposition, the number of operations are on the order of N3. For our method, there are a few essential steps that determine the characteristic CPU scaling of the solution. First, since the number of groups scales linearly with N, then the required operations for finding the coefficients does so as well. Next, creating the new system matrix is equivalent to multiplying the sub-domain system matrix by a sparse matrix. Normally, matrix multiplication is an O(N3) process, but since one of the matices is highly sparse, this step will scale as N2. Finally, the algorithm requires generating the original system matrix and this is an N2 procedure. Therefore, our method is an order of magnitude faster than a full solution. The numerical examples given in this work demonstrate this type of behavior in terms of actual wall clock time. 20 Chapter 2 Two-dimensional conducting cylinders In this chapter, the procedure is applied to arbitrarily shaped two-dimensional conducting cylinders, where the cylinders are infinite along the z-axis. First, we develop the integral equations for the electric and magnetic field formulations for both TE and TM polarizations. Next, we show how those are used to construct the combined-field formulation to avoid problems with internal resonances for closed bodies. Furthermore, we discuss how the geometry is decomposed into groups so that the method may be properly applied. Also, we display the technique used to generate null fields so that groups located near one another on the structure may be mathematically decoupled from each other in order to generate a block-diagonally- dominant matrix. Finally, some example problems and results are given to show the effectiveness of the technique. 2.1 Electric-Field Integral Equation - EFIE In this section, we present the Electric Field Integral Equation for both Trans- verse Electric (TE) and Transverse Magnetic (TM) incident waves. Also, we represent the induced current in terms of pulse expansion functions and develop the matrix com- ponents when testing with pulse functions. Applying the boundary condition that the total tangential electric field must be zero at the surface of the conductor, the 21 following integral equation for the TM case may be derived [9]: k? 4 integraldisplay C Jz(??)H(2)o (k|????|)dl? = Eiz(??) (2.1) where ? is an observation point and ?? is a source point along the contour C. Note there is only a z-component of current. We expand this current in terms of pulse functions: Jz(?) = Nsummationdisplay n=1 ?nPn(?) (2.2) where the ?n?s are the desired unknown coefficients. If we represent the contour C as a collection of line segments (edges), then the pulses are defined as shown in Figure 2.1: Pn(?) = ? ??? ??? 1 ? ? edgen 0 otherwise (2.3) Figure 2.1: Pulse function for two-dimensional TM polarization. Expanding Eq. 2.1 we arrive at: k? 4 Nsummationdisplay n=1 ?n integraldisplay C PnH(2)o (k|????|)dl? = Eiz(?) (2.4) Using the Galerkin Method and testing each side of Eq. 2.4 using the inner product ?a,b? = integraldisplay C a?bdl (2.5) 22 we obtain the entries for the system matrix: Zm,n = k?4 integraldisplay C Pm integraldisplay C PnH(2)o (k|????|)dl? = k?lm4 integraldisplay C H(2)o (k|?mct ???|)dl? (2.6) where ?mct is the centroid and lm is the length of the mth edge. The final integral can be evaluated by Gaussian Quadrature. The excitation vector is given by: Vm = integraldisplay C PmEiz(?)dl = lmEiz(?mct) (2.7) The TE case begins with the following integral equation [9]: E(?)??al = k?4 integraldisplay C Jl(??)H(2)o (k|????|)(?a?l ??al)dl? + ?4k ddl integraldisplay C dJl(??) dl? H (2) o (k|??? ?|)dl? (2.8) where C is the contour of the object. Again, as in the TM case, we represent the current in terms of pulse functions, except in this case each function is defined from the midpoint of one segment to the midpoint of an adjacent segment. Also, rather than represent the derivative on Jl in Eq. 2.8 in terms of delta functions, we define the charge in terms of pulse doublets. If Pn, shown in Figure 2.2, is defined as: Pn = ? ??? ??? 1 ? ? (ctn,ctn+1) 0 otherwise (2.9) 23 where ctn and ctn+1 are the centroids of the nth and n+1th edges, respectively, then we can define the charge as: dPn dl = ?? ??? ??? ??? ??? ? 1 ln ? ? edgen ?1 ln+1 ? ? edgen+ 1 0 otherwise (2.10) where ln and ln+1 are the lengths of the nth and n+ 1th edges. Figure 2.2: Pulse function for two-dimensional TE polarization. Applying an expansion/testing procedure similar to the TM case we obtain the MoM matrix components: Zm,n = k?4 integraldisplay C Pm integraldisplay C PnH(2)o (k|????|)(?a?l ??al)dl?dl + ?4k integraldisplay C Pm ddl integraldisplay C dPn dl? H (2) o (k|??? ?|)dl?dl (2.11) To evaluate the integrals, we can do the following: integraldisplay C Pm integraldisplay C PnH(2)o (k|????|)(?a?l ??al)dl?dl = lm2 integraldisplay C H(2)o (k|?m?1/4 ???|)(?a?l ??alm)dl? +lm+12 integraldisplay C H(2)o (k|?m+1/4 ???|)(?a?l ??alm+1)dl? (2.12) 24 and d dl integraldisplay C dJl(??) dl? H (2) o (k|??? ?|)dl? = ? integraldisplay C dPn dl? H (2) o (k|?m+1/2 ?? ?|)dl? + integraldisplay C dPn dl? H (2) o (k|?m?1/2 ?? ?|)dl? (2.13) where the testing quantities and locations can be seen in Figure 2.3. Figure 2.3: Two-dimensional TE polarization testing locations. The incident field vector is given by: Vm = integraldisplay C PmEi(?)??aldl = lm2 Ei(?m?1/4)??alm + lm+12 Ei(?m+1/4)??alm (2.14) 2.2 Magnetic-Field Integral Equation - MFIE By applying the boundary condition on the magnetic field, we can arrive at another formulation for both the 2D TE and TM polarizations. First, the boundary condition on the magnetic field is: J(?) = ?n?Ht(?), ? ? C (2.15) 25 where J is the total current, Ht is the total magnetic field, and C is the contour of the conductor. The MFIE only applies to closed bodies and thus ?n may be defined as a vector pointing to the outside of the object of interest. The integral equation for the TM polarization case is [9]: Hi(?)??al = Jz(?)2 + jk4 integraldisplay C ? Jz(??) parenleftBigg ?n? ??? ? |????| parenrightBigg H(2)1 (k|????|)dl? (2.16) where the deleted integral is evaluated everywhere except ? = ??. Expanding and testing with pulse functions we obtain the matrix components: Zm,n = integraldisplay C PmPn 2 dl = lm 2 , for m = n (2.17) Zm,n = jk4 integraldisplay C Pm integraldisplay C Pn parenleftBigg ?n? ??? ? |????| parenrightBigg H(2)1 (k|????|)dl? = jklm4 integraldisplay C parenleftBigg ?n? ? m ct ?? ? |?mct ???| parenrightBigg H(2)1 (k|????|)dl?, for m negationslash= n (2.18) where ?mct is the centroid of edge m and the remaining integral may be evaluated using Gaussian Quadrature. The excitation vector is given by: integraldisplay C PmHi(?)??al = lmHi(?mct)??alm (2.19) We can also write an integral equation for TE polarization [9]: ?Hiz(?) = Jl(?)2 + jk4 integraldisplay C ? Jl(??) parenleftBigg ?n? ? ??? ? |????| parenrightBigg H(2)1 (k|????|)dl? (2.20) Again, the deleted integral is not evaluated at ? = ??. Furthermore, as with the EFIE TE case, we expand and test with pulse functions extending from the centroid of one 26 edge to the centroid of the adjacent edge. After applying the expansion and testing procedure, we arrive at the following MoM matrix components: Zm,n = integraldisplay C PmPn 2 dl = 1 2 bracketleftBiggl m 2 + lm+1 2 bracketrightBigg , for m = n (2.21) Zm,n = jk4 integraldisplay C Pm integraldisplay C Pn parenleftBigg ?n? ? ??? ? |????| parenrightBigg H(2)1 (k|????|)dl? = jk4 bracketleftBiggl m 2 integraldisplay C parenleftBigg ?n? ? ?m?1/4 ?? ? |?m?1/4 ???| parenrightBigg H(2)1 (k|?m?1/4 ???|)dl? (2.22) + lm+12 integraldisplay C parenleftBigg ?n? ? ?m+1/4 ?? ? |?m+1/4 ???| parenrightBigg H(2)1 (k|?m+1/4 ???|)dl? bracketrightBigg , for m negationslash= n For the excitation vector we have: Vm = ? integraldisplay C PmHiz(?)dl = ? bracketleftBiggl m 2 Hz(?m?1/4) + lm+1 2 Hz(?m+1/4) bracketrightBigg (2.23) The testing locations are the same as those given in Figure 2.3. In the next section, we will combine these formulations to eliminate any matrix conditioning problems that may occur due to internal resonances for closed bodies. 2.3 Combined-Field Integral Equation - CFIE To avoid any problems due to internal resonances for closed bodies, we may use the Combined Field Integral Equation [4] given by ????n? ?n?E+ (1??)?n?H = 0 (2.24) 27 which can be implemented as ZCFIE = ?ZEFIE +?(1??)ZMFIE VCFIE = ?VEFIE +?(1??)VMFIE (2.25) where 0 ? ? ? 1 is a constant depending on the problem and ? = radicalBig? ? is the impedance of the medium. For open body problems, we let ? = 1 and for closed bodies ? is typically 0.5. Note the same equations may be used for both TM and TE polarizations. 2.4 Element Grouping For the two-dimensional case, groups are formed along the contour of the cylinder as shown in Figure 2.4. Basis functions along the rim are placed into disjoint sets of roughly equal size. Typically, the group size should be on the order of a few wavelengths. For example, between 2 and 6 wavelengths. The larger the group, the better, as long as the group sizes are not large with respect to the overall geometry. Also, for two-dimensional geometries, a given near-field radius will not enclose many unknowns and so larger group sizes are acceptable. The separation distance between two groups is taken to be the distance from the center of one group to the center of the other. For a given source group, nulls are created on groups within the near- field region (typically a couple of wavelengths). Usually, this means that the two groups adjacent to the source group are eliminated. However, if a cross-section of the cylinder is sufficiently thin, it may be necessary to eliminate more than the two adjacent groups. This can be seen in Figure 2.5. 28 Figure 2.4: Element grouping for two-dimensional case. Figure 2.5: Elimination of groups within near-field. 29 2.5 Numerical Results The following two-dimensional examples using both TM and TE polarization show theeffectiveness of the technique. Allclosed bodysolutions utilize the combined- field formulation while open bodies use the electric field formulation for each polar- ization. Also, both codes use pulse functions for both the source basis and test basis. Finally, the block form of the Gauss-Seidel iterative solver has been used. Consider the square cylinder with 50? sides illuminated by a plane wave as shown in the Figure 2.6. The total contour length for this case is 200?. Figures 2.7 and 2.8 show the real and imaginary currents on Face #1 for an incident wave with TM polarization. The currents for Face #2 are given in Figures 2.9 and 2.10. The TE case is given in Figures 2.11 and 2.12 for Face #1 and Figures 2.13 and 2.14 for Face #2. The amplitude of the incident magnetic field Ho is 1.0 Amps/m for each polarization. The angle of incidence for both cases is 350 with respect to x-axis. The contour of the cylinder is divided into 2000 divisions using a 10 divisions per wavelength criterion. Further, the basis functions are collected into 40 groups with 50 basis functions per group. For each source group, the testing is carried out on the two adjacent groups. The initial guess, obtained by computing currents only from the individual groups, is refined with one iteration following the Gauss-Seidel procedure. The currents are shown to be in agreement with the standard MoM solution. For the next example, consider a highly complex shaped contour with concavity as shown in Figure 2.15. The total circumference is 100?. The incident wave is at 450 with respect to the x-axis with Ho = 1.0. There are 25 groups with 40 sub-domain basis functions in each group. Figures 2.16 and 2.17 show the real and imaginary currents for the TM case. Figures 2.18 and 2.19 show the currents for the TE case. 30 The numerical results, after one iteration, are compared with the standard MoM solution and show good agreement for each case. Next, we consider the large two-dimensional strip array shown in Figure 2.20. Each element is 2? in length and there are 100 collinear elements in the array spaced 0.1? apart. Each element is a group with 20 segments per group resulting in 100 groups. Testing for a given source group is done on the two adjacent elements. If the element is at either end of the array, then testing is done only on the single adjacent element. The incident wave is normal to the array and has magnitude Ho = 1.0. Figure 2.21 shows the bistatic radar cross section with respect to the azimuthal angle for the TM case. The TE case is given in Figure 2.22. The results are compared with the conventional MoM solution and good agreement is evident in each case. Next, we show the real-time results for a 700? circumference two-dimensional circular cylinder using the new procedure and compare with the conventional method. The scattering case is solved for a 600 MHz incident wave at 1800 with respect to the x-axis with Ho = 1.0 and TM polarization. The cross section of the cylinder is divided into 7000 edges using a 10 divisions per wavelength criterion. For the conventional MoM solution, the matrix fill and execution times are 12 minutes and 189 minutes, respectively. For the new procedure, 7000 unknowns are divided into 70 groups of 100 unknowns each. In the following table, we provide the computational time for each step involved. The processor for the test machine is a 2.4 GHz Pentium 4. 1 Matrix fill 12 minutes 20 seconds 2 Construction of new basis 16.1 seconds 3 Construction of new system matrix 15.7 seconds 4 Iterative solution (single iteration) 15.8 seconds 5 New method total solution time (Add 1,2,3, and 4) 13 minutes 8 seconds 31 Figure 2.6: Cross section of the 50? square cylinder. 32 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 50 100 150 200 250 300 350 400 450 500 Current (A) Element number MoM New Method Figure 2.7: Real currents in the shadow region (Face#1) for square cylinder with 50? sides with TM incident wave Ho = 1 at 350. -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 50 100 150 200 250 300 350 400 450 500 Current (A) Element number MoM New Method Figure 2.8: Imaginary currents in the shadow region (Face#1) for square cylinder with 50? sides with TM incident wave Ho = 1 at 350. 33 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 550 600 650 700 750 800 850 900 950 1000 Current (A) Element number MoM New Method Figure 2.9: Real currents in the shadow region (Face#2) for square cylinder with 50? sides with TM incident wave Ho = 1 at 350. -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 550 600 650 700 750 800 850 900 950 1000 Current (A) Element number MoM New Method Figure 2.10: Imaginary currents in the shadow region (Face#2) for square cylinder with 50? sides with TM incident wave Ho = 1 at 350. 34 -1.5 -1 -0.5 0 0.5 1 50 100 150 200 250 300 350 400 450 500 Current (A) Element number MoM New Method Figure 2.11: Real currents in the shadow region (Face#1) for square cylinder with 50? sides with TE incident wave Ho = 1 at 350. -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 50 100 150 200 250 300 350 400 450 500 Current (A) Element number MoM New Method Figure 2.12: Imaginary currents in the shadow region (Face#1) for square cylinder with 50? sides with TE incident wave Ho = 1 at 350. 35 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 550 600 650 700 750 800 850 900 950 1000 Current (A) Element number MoM New Method Figure 2.13: Real currents in the shadow region (Face#2) for square cylinder with 50? sides with TE incident wave Ho = 1 at 350. -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 550 600 650 700 750 800 850 900 950 1000 Current (A) Element number MoM New Method Figure 2.14: Imaginary currents in the shadow region (Face#2) for square cylinder with 50? sides with TE incident wave Ho = 1 at 350. 36 Figure 2.15: Object with complex shape and concavity. 37 -4 -3 -2 -1 0 1 2 3 4 5 0 100 200 300 400 500 600 700 800 900 1000 Current (A) Element number MoM New Method Figure 2.16: Real currents for object with concavity with 100? total circumference and TM incident wave with Ho = 1 at 450. (a) Real part and (b) Imaginary part. -3 -2 -1 0 1 2 3 0 100 200 300 400 500 600 700 800 900 1000 Current (A) Element number MoM New Method Figure 2.17: Imaginary currents for object with concavity with 100? total circumfer- ence and TM incident wave with Ho = 1 at 450. (a) Real part and (b) Imaginary part. 38 -8 -6 -4 -2 0 2 4 6 0 100 200 300 400 500 600 700 800 900 1000 Current (A) Element number MoM New Method Figure 2.18: Real currents for object with concavity with 100? total circumference and TE incident wave with Ho = 1 at 450. (a) Real part and (b) Imaginary part. -4 -3 -2 -1 0 1 2 3 4 0 100 200 300 400 500 600 700 800 900 1000 Current (A) Element number MoM New Method Figure 2.19: Imaginary currents for object with concavity with 100? total circum- ference and TE incident wave with Ho = 1 at 450. (a) Real part and (b) Imaginary part. 39 Figure 2.20: Geometry for two-dimensional strip array. Each array element is 2? in length and they are separated by 0.1?. 40 -20 -10 0 10 20 30 40 50 60 0 50 100 150 200 250 300 350 400 RCS (dB) ? (deg) MoM New Method Figure 2.21: RCS for 2D strip array with 100 2? elements. The incident wave is normal to the array with Ho = 1 and has TM polarization. -140 -120 -100 -80 -60 -40 -20 0 20 40 60 0 50 100 150 200 250 300 350 400 RCS (dB) ? (deg) MoM New Method Figure 2.22: RCS for 2D strip array with 100 2? elements. The incident wave is normal to the array with Ho = 1 and has TE polarization. 41 2.6 Conclusions In this chapter, the method was applied to 2D arbitrary conducting cylinders. Here, open and closed bodies were considered with both transverse electric and trans- verse magnetic incident waves. Also, we considered scattering cases from canonical shapes as well as complex shapes. Accurate and efficient results were obtained in each case. Furthermore, it was shown that for 2D cases, the method virtually elimi- nates the solution time and therefore the overall solution is limited by the matrix fill time. Also, the groups for two-dimensional cases may be formed arbitrarily or may be decided by the geometry of the object. Furthermore, the groups should ideally be a few wavelengths in length and should include only a few wavelengths in radius as a near-field designation. 42 Chapter 3 Arbitrarily shaped three-dimensional conductors In this chapter, we apply the method to arbitrarily shaped three-dimensional conductors. First the integral equations are presented for both the electric field and magnetic field boundary conditions. Furthermore, each equation is expanded with set of sub-domain basis functions in preparation for applying the method. Also, the combined-field formulation is presented to avoid any internal resonance problems associated with closed bodies. Next, we discuss how the algorithm is applied in the general case. We determine how groups should be formed and to what groups the null-field procedure should be applied. The decoupling procedure is illustrated and we discuss how to handle near-field and far-field interactions on an arbitrary three- dimensional body. Finally, we apply the algorithm to some challenging scattering problems to illustrate the accuracy and effectiveness of the method. 3.1 Electric Field Integral Equation - EFIE The Electric Field Integral Equation (EFIE) for Perfect Electrical Conductors (PECs) is developed as follows. First, by definition of a PEC, the total tangential electric field at the surface of the conductor is identically zero: Eitan(r) +Estan(r) = 0, r ? Sc (3.1) or Eitan(r) = ?Estan(r), r ? Sc (3.2) 43 where Eitan and Estan are the incident and scattered fields, respectively and Sc is the conductor surface. The scattered field at a point r is related to the currents on the conductor by the following equation: Es(r) = ?j?A(r)???(r) (3.3) where A and ? are the vector and magnetic potentials given by: A(r) = ? integraldisplay Sc J(r?)G(r,r?)dS? (3.4) ?(r) = ?1j?? integraldisplay Sc ??J(r?)G(r,r?)dS? (3.5) with the Green?s function given by G(r,r?) = e ?jkR 4?R ,R = |r?r ?| (3.6) Substituting for the scattered field we get the desired integral equation: Ei(r)tan = [j?A(r)+??(r)]tan (3.7) Next, we discretize the geometry of the object of interest using a triangular surface mesh. Then we define a set of basis functions over the surface to approximate the current on the PEC. Let fn for n = 1,...,N be a basis defined on the surface of the conductor. The total current on the conductor is then given by: J(r) = Nsummationdisplay n=1 ?nfn(r) (3.8) 44 Here we use the RWG basis functions as given in [2]. Each function is defined over a pair of triangles as: fn(r) = ?? ??? ??? ??? ??? ? ln 2A+n ? + n, r ? T + n ln 2A?n ? ? n, r ? T ? n 0, otherwise (3.9) where A+n and A?n are the areas of the positive and negative side triangles, T+n and T?n , respectively. The quantities in Eq. 3.9 are shown in Figure 3.1. Figure 3.1: Quantities for RWG basis functions. Substituting Eq. 3.9 into Eqs. 3.4 and 3.5 we get A(r) = Nsummationdisplay n=1 ?nAn(r) (3.10) and ?(r) = Nsummationdisplay n=1 ?n?n(r) (3.11) where An(r) = ? integraldisplay Sc fn(r?)G(r,r?)dS? (3.12) 45 ?n(r) = ?1j?? integraldisplay Sc ??fn(r?)G(r,r?)dS? (3.13) Since we have N unknowns, we must generate N linearly independent equations. We do this by performing a testing procedure on each side of Eq. 3.7,using the inner product ?f,g?? integraldisplay Sc f ?gdS (3.14) and the Galerkin procedure, we test each side of Eq. 3.7 using the previously defined RWG functions. We thereby generate a linear system ZI = V (3.15) which can be solved to find the coefficients ?n. The testing procedure provides us with the following quantities: Zm,n = j??An,fm?+???n,fm? (3.16) as well as Vm = angbracketleftBig Ei,fm angbracketrightBig (3.17) The potential terms can be evaluated at the centroids of the testing triangles: ?An,fm? = lm2 bracketleftBig An(rct+m )??ct+m + An(rct?m )??ct?m bracketrightBig (3.18) angbracketleftBig ??c/p/q,n,fm angbracketrightBig = ?lm bracketleftBig ?c/p/q,n(rc+m )??c/p/q,n(rc?m ) bracketrightBig (3.19) 46 and the excitation vector is given by: angbracketleftBig Ei,fm angbracketrightBig = lm2 bracketleftBig Ei(rct+m )??ct+m + Ei(rct?m )??ct?m bracketrightBig (3.20) Finally, we solve for the currents Im = ?m . 3.2 Magnetic Field Integral Equation - MFIE The Magnetic Field Integral Equation (MFIE) for PECs is developed as follows. This method is only applicable to closed bodies since we must assume the fields inside are zero. First, we apply the boundary conditions on the magnetic field at the surface of the conductor. The magnetic field is related to the current on the structure with J(r) = ?n?(Hi(r)+Hs(r)), r ? Sc (3.21) where ?n is a unit vector normal to the surface of the conductor and pointing outward, Hi is the incident wave, and Hs is the scattered field due to the currents on the conductor. It is shown in [5] that for r not on an edge, we may write the scattered portion as ?n?Hs(r) = limr?S c ?n???A(r) = J2(r) + ?n? integraldisplay Sc ? J(r?)???G(r,r?)dS?, r ? Sc (3.22) where G(r,r?) = e?jkR4?R ,R = |r?r?|, and r approaches the surface from outside the object. The deleted integral is evaluated everywhere except r = r?. Substituting 47 Eq. 3.22 into Eq. 3.21 we arrive at the integral equation ?n?Hi(r) = J2(r)? ?n? integraldisplay Sc ? J(r?)???G(r,r?)dS?, r ? Sc (3.23) As with the EFIE in Eq. 3.8, we may expand the current using RWG functions. Also, applying the Galerkin procedure using the inner product in Eq. 3.14, we arrive at a linear system ZI = V with the following elements: Zm,n = 12 < fm,J > ?lm bracketleftBigg 1 2A+m integraldisplay Sc ?+ ? ?n+m ? integraldisplay Sc ? fn ?(??G)+mdS?dS + 12A? m integraldisplay Sc ?? ? ?n?m ? integraldisplay Sc ? fn ?(??G)?mdS?dS bracketrightBigg (3.24) and Vm = lm bracketleftBigg 1 2A+m integraldisplay Sc ?+ ? ?n+m ?Hi(r+m)dS (3.25) + 12A? m integraldisplay Sc ?? ? ?n?m ?Hi(r?m)dS bracketrightBigg (3.26) The integrals may be evaluated as in [3]. Again, the deleted integral is evaluated everywhere except r = r?. Solving this system provides us with the coefficients for the basis defined on the structure. 3.3 Combined Field Integral Equation - CFIE To avoid any problems due to internal resonances for closed bodies, we may use the Combined Field Integral Equation [4] given by ????n? ?n?E+ (1??)?n?H = 0 (3.27) 48 which can be implemented as ZCFIE = ?ZEFIE +?(1??)ZMFIE (3.28) VCFIE = ?VEFIE +?(1??)VMFIE (3.29) where 0 ? ? ? 1 is a constant depending on the problem and ? = radicalBig? ? is the impedance of the medium. For open body problems, we let ? = 1 and for closed bodies ? is typically 0.5. 3.4 Element Grouping In this section, we discuss the grouping procedure for the arbitrary three dimen- sional conducting case. In order to account for the near-field interactions, we use the null-producing procedure on those groups that are within a given perimeter. As shown in Figure 3.2, we take each group as a source. All those groups within a given near-field radius will be selected for the decoupling procedure. Each basis function in a given group will be replaced with a linear combination of that basis function, with a coefficient of 1, and all the basis functions in the null groups, which are assigned unknown coefficients. By requiring that the new basis functions produce null fields everywhere on the selected groups, we generate zeros within the new system matrix. Once this has been carried out for all the groups, we may solve for the currents on the structure by first inverting the self-block-matrices to account for near-field inter- actions, and then solve for the remaining current contributions via the Gauss-Seidel iterative procedure. Note also that the coefficients for all the groups can be solved for independently from one another. In other words, parallel processing may be used to solve for weights throughout the structure simultaneously. Furthermore, since each set of group coefficients may be solved for separately, a parallel procedure shows a high 49 Figure 3.2: Grouping for arbitrary 3d case. X?s represent null groups. degree of parallel efficiency irregardless of how many processors are used, provided that the number of processors does not exceed the number of groups. 3.5 Numerical Results In this section, we present some numerical results to verify the accuracy of the algorithm. First, using the CFIE formulation, we show a bistatic RCS calculation for a 5? radius sphere, where ? is the wavelength for the given frequency, as seen in Figure 3.3. In this case, we have 92,550 unknowns. The structure is divided into 314 groups, each roughly 1?2 in area. For each source group, all groups within a 2? distance were included as near-field groups. Figure 3.4 shows the bistatic RCS for 1 and 2 iterations as well as the RCS when computed with a MoM BOR (Body of Revolution) code. The incident field E = ??a? is incident at an angle of (?,?) = (45o,0o) with a frequency of 300 MHz. The RCS is calculated for the x-z plane, and is shown in Figure 3.4. Since there is very little difference between the plots, a single iteration suffices in this case. Also, note that, for this code, the initial guess given in Sec. 1.4 is implemented as the first iteration of the solver. Thus, the first iteration only includes near-field interactions. A good way to determine the quality of our solution is to calculate the 50 following: r = V ?ZI (3.30) where V is the excitation vector, Z is the standard moment matrix (sub-domain functions), and I is the current vector for the sub-domain functions. The total error is then given by: Total error = Nsummationdisplay i |ri| (3.31) where N is the number of unknowns for the system. The average error per term is then given by: Avg. error per term = Total errorN (3.32) For the case of the sphere, the average error per term is .096 for 1 iteration, while it is only .0142 for two iterations. So, the spherical case compares very well with a standard Method of Moments solution. The RCS was also compared to a MoM BOR (Body of Revolution) code and shows excellent agreement. Also, when computed on 8 3.0 GHz AMD Opteron CPUs, the solution took approximately 25 wall clock hours and less than 4.5 GB of storage. This includes the time taken for a serialized RCS calculation on a single CPU (around 2 hours). Storing the full matrix in memory would require approximately 64 GB of storage. Furthermore, an LU decomposition would require approximately 19 days and 11 hours to solve on 8 CPUs assuming a perfectly parallel solver. Also, the weight generation step shows proper scaling when using multiple CPUs. For a single CPU, the weight generation procedure takes 49 hours. For 4 CPUs, this step took 9 hours and 14 mins. It required only 4 hours, 28 mins for 8 CPUs. For the next example, we compute the bistatic RCS of a 12m?12m square plate with 42883 unknowns as seen in Figure 3.5. Here, we have an open structure and 51 therefore utilize only the EFIE. The plate is excited with an incident wave E = ??a? at an angle of (?,?) = (45o,0o) with a frequency of 300 MHz. The RCS is computed for the x-z plane. The groups were formed by dividing the plate into a square grid of 144 groups, each roughly 1?2 in size and having on average 298 unknowns. Using a 2? near-field criteria, the solver went through two iterations and the RCS for each is shown in Figure 3.6. The average error for the first iteration is 0.612 while the average error for the second is 0.232. Also, when computed on 8 3.0 GHz AMD Opteron CPUs, the solution took approximately 2 hours 45 minutes wall clock time and less than 2gb GB of storage. Storing the full matrix in memory would require 13.7 GB of memory. Furthermore, computing a LU decomposition on 8 CPUs would take approximately 1 day and 22 hours assuming the solver is fully parallel. As a final example, we also calculate the bistatic RCS of a French Mirage fighter jet. The geometry may be seen in Figure 3.7. For this case, we have an open structure and therefore use the EFIE only. From end to end, the jet is approximately 33? long. Across the wings, it is approximately 23? in width. Also, from the bottom to the highest point on the plane (vertical fin), it is about 10? high. The total surface area is 550?2. There are 159293 unknowns and so a full LU decomposition in this case would be computationally expensive both in terms of memory and CPU time. In fact, the required storage for a full LU decomposition would be 189 GB. Furthermore, the LU decomposition itself would take approximately 99 days 5 hours assuming the solver was fully parallel. For this case, we formed the groups by placing the aircraft inside a cubical grid with each cell having dimensions of 1??1??1?. All functions within a given cell were grouped together. There are 702 groups with an average group size is 226 functions and all groups within 2? of a given source group were decoupled with null field points. Using an incident wave E = ??a? at an incident angle of (?,?) = (45o,0o) with a frequency of 750 MHz, we calculate the bistatic RCS for 52 the x-z plane as shown in Figure 3.8. After running 2 iterations, the average error per term for the first iteration is 0.224 while the error for the second iteration is 0.218. The error demonstrates that our solution agrees very well with the standard MoM solution and more iterations may be performed for further accuracy if necessary. Also, note that the grouping scheme was generated automatically and thus it is possible to automate the entire process for an arbitrary three-dimensional conductor. 53 Figure 3.3: Triangular mesh for one octant of 5? radius sphere. -10 0 10 20 30 40 50 0 20 40 60 80 100 120 140 160 180 RCS (dB) Angle (degrees) 1 Iteration 2 Iterations MoM BOR Figure 3.4: Bistatic RCS for 5? radius sphere. First and second iterations and MoM BOR code. 54 Figure 3.5: Triangular mesh for section of 12??12? square plate. -30 -20 -10 0 10 20 30 40 50 60 0 50 100 150 200 250 300 350 400 RCS (dB) Angle (degrees) 1 Iteration 2 Iterations Figure 3.6: Bistatic RCS for 12??12? square plate. The first and second iterations are shown. 55 Figure 3.7: Triangular mesh for French Mirage aircraft geometry. -20 -10 0 10 20 30 40 50 60 0 50 100 150 200 250 300 350 400 RCS (dB) Angle (degrees) 1 Iteration 2 Iterations Figure 3.8: Bistatic RCS for aircraft, first and second iterations 56 3.6 Conclusions In this chapter, the method was applied to arbitrary three-dimensional conduc- tors. First, we developed the integral equations for the electric field, magnetic field, and combined field formulations. The electric field formulation may be used for open bodies while the combined field may be used for closed bodies in order to avoid problems with internal resonances. Next, we discussed how basis functions should be grouped together using a domain decomposition technique. Furthermore, a near- field region was defined where all the functions within a certain radius of a source group were eliminated by developing a new set of basis functions. These functions were defined in such a way that the net effect of the parts of each function was to produce null fields everywhere on the structure within the near-field radius. This pro- cedure decouples all strongly interacting groups and allows us to produce a strongly block-diagonally-dominant matrix that will converge very quickly within an iterative scheme such as the block Gauss Seidel method. Furthermore, some challenging ex- ample problems were simulated to show the effectiveness of the method. The results agree very well with the standard Method of Moments solution using sub-domain ba- sis functions. Furthermore, we can conclude that a good rule of thumb for the size of each group is roughly 1?2. Also, the near-field criteria should generally be set at ap- proximately a 2? radius. A larger radius leads to a more quickly converging iterative solution, but could result in difficulty in solving for the group coefficients. Finally, the numerical results demonstrated that the technique can be made highly parallel and the solution time decreases in direct proportion to the number of processors used. 57 Chapter 4 Three-dimensional conductor periodic arrays In this chapter, we use our method to solve a special case of arbitrary 3D conduc- tors. Here, we solve for the case of finite conducting periodic arrays. The motivation behind this special case is to demonstrate that the geometry behind certain problems may be utilized to further enhance the solution efficiency. Although each of the array elements may be any arbitrary three-dimensional surface, the arrays themselves are periodic only in two dimensions as shown in Figure 4.1. However, the concepts applied here may easily be extended to cases where there is periodicity in three dimensions. 4.1 Integral Equation Formulation For the finite periodic array, we use the same integral formulation as was used in the arbitrary 3D case. In the case where each array element is an open body, we use the EFIE only. If each element is a closed body, we use the combined field formulation as seen in Section 3.3 with ? = 0.5. This allows us to avoid any issues arising due to internal resonances within each element. 4.2 Element Grouping Although nothing prevents each array element frombeing subdivided into groups, we are assuming here that each element forms one and only one group. Null fields are produced on those groups that are horizontally and vertically adjacent to a given source group. This can be seen in Figure 4.2. 58 Figure 4.1: Finite periodic conducting array periodic in two dimensions. Figure 4.2: Elimination of groups within near-field for finite periodic array. 59 For groups on the edges and corners of the array, there will only be three and two adjacent groups, respectively. The finite periodic array is a special case in the general method because the weights used to produce the null fields are largely redundant. In fact, for two-dimensional arrays, one must solve for at most 9 sets of weights. These are each of the four corners, one element along each of the four sides, and one element internal to the structure. Once these weights have been computed, they can easily be copied to the other groups in the system. It greatly simplifies the programming and is computationally inexpensive to com- pute all the corners separately. Note that while the physical structure may show sym- metry for edge groups and corner groups, this symmetry may not exist numerically due to the meshing of each element and the basis functions defined there. So, in general, one cannot solve for the coefficients at one corner of the structure and then directly copy those weights to the other three corners. The same reasoning applies to elements along the edges of the array. 4.3 Numerical Results Here we present a bistatic RCS calculation for a finite periodic array. Each element is a 0.5? x 0.5? square plate, each expanded with 64 RWG basis functions. There are 2500 plates arranged on a 50 x 50 grid for a total of 160000 unknowns. Each plate is spaced 0.75? from each of the adjacent plates. The array is excited with a 300MHz incident wave with E? = 120? and E? = 0 at an angle of ? = 45o and ? = 0o. Finally, the bistatic RCS is calculated for the x-z plane. Figure 4.3 displays the geometry for the problem. Figure 4.4 shows a comparison between the first and second iterations of the Gauss-Seidel scheme. Note that there is a negligible difference between the RCS values. When compared to the standard Method of 60 Moments solution, the average error per current term (either real or imaginary part) for 1 iteration was 0.288 while it was 0.122 for 2 iterations. Here the error per term is calculated by first obtaining the error vector: r = V ?ZI (4.1) where V is the excitation vector, Z is the standard moment matrix (sub-domain functions), and I is the current vector for the sub-domain functions. The total error is then given by: Total error = Nsummationdisplay i |ri| (4.2) where N is the number of unknowns for the system. The average error per term is then given by: Avg. error per term = Total errorN (4.3) When computed on 8 3.0 GHz AMD Opteron CPUs, the solution took approx- imately 19.5 wall clock hours and less than 500 MB of storage. This includes the time taken for a serialized RCS calculation on a single CPU (around 3 hours). A full LU decomposition of this problem would require 190 GB of storage and would have a solution time of approximately 100 days and 13 hours assuming the solver is fully parallel. Also, note that, for this code, the initial guess given in Sec. 1.4 is implemented as the first iteration of the solver. Thus, for this case, the simulation shows that the currents are almost entirely determined by the near-field interactions on the structure. 61 4.4 Conclusions In this section, a special case of the 3D conductor formulation was considered. Here, a finite array of elements periodic in 2 dimensions was constructed and the bistatic RCS was calculated. There are two essential reasons for simulating this special case. First, the problem lends itself very well to the iterative algorithm. Since there is spacing between the elements, there is already significant decoupling between different parts of the structure. So, this type of structure is well-suited for the algorithm. Furthermore, due to the geometry, the decoupling weights of many groups are redundant. This significantly reduces the amount of necessary computation. After calculating at most 9 different sets of weights, the remaining groups may be decoupled by simply copying the weights throughout the structure. Finally, a typical problem of a square plate periodic array was simulated and the results were shown to compare very well with the standard Method of Moments solution. Furthermore, only 1 or 2 iterations were necessary and therefore the solution time was shown to be much more efficient than when using a standard LU decomposition or conjugate gradient solution. 62 Figure 4.3: Geometry for 50 x 50 0.5?2 plate finite periodic array. 10 20 30 40 50 60 70 0 50 100 150 200 250 300 350 400 RCS (dB) Angle (degrees) 1 Iteration 2 Iterations Figure 4.4: Bistatic RCS result after 1 and 2 iterations. 63 Chapter 5 Conclusions In this work, we applied a domain decomposition technique where any arbitrary conducting structure may be divided into an arbitrary number of groups. Next, a basis transformation allowed us to decouple groups adjacent to one another on the structure. This was accomplished using the criteria that each group should radiate null fields at locations on the structure chosen by a nearest neighbor criteria. A new block-diagonally-dominant system matrix was constructed from the matrix cor- responding to a set of sub-domain basis functions. Using this new matrix, we were able to apply a linear iterative solver capable of converging within a few iterations. Since a full LU decomposition solver was not used and iterative solvers require very little memory, we were able to avoid both CPU time and memory size constraints usually found in the standard Method of Moments. Finally, this method has been outlined in a very general format and may therefore be applied to any existing Method of Moments code formulated with sub-domain basis functions. Various geometries in two and three dimensions were simulated to illustrate the successful application of the method to arbitrary conducting structures. 5.1 Scaling and Parallel Efficiency In this section, we discuss the scaling as well as the parallel efficiency of each part of the algorithm. Referring to Sec. 1.4, we see that there are five major steps to the algorithm. The scaling and efficiency of each step is as follows: 64 1. Group Formation - This is part of the preprocessing stage. All groups are formed prior to running the code and so this is not considered in the scaling behavior of the algorithm. 2. Create new basis functions - Although scaling here is dependent upon the ge- ometry and nearest neighbor criteria, we may still construct a rule of thumb. If we assume that all the groups are roughly the same size for all geometries, the scaling will be O(N). Furthermore, the coefficients/weights for each source group may be solved for independently. Therefore, the groups may be evenly distributed across multiple processors and the efficiency will be near 100%. 3. Create new system matrix ?Z - This step is represented by the operation ?Z = ZR. For dense matrices, matrix multiplication is typically an O(N3) operation. However, in this case, R is highly sparse, so forming the new system matrix is an O(N2) process. For parallel operation, we can view the matrix multiplication as forming columns of ?Z. Since the columns may be evenly distributed to each of the processors, this is step also will have a parallel efficiency of approximately 100%. 4. Construct initial guess - This step consists of inverting the diagonal blocks of ?Z. Assuming group sizes are roughly the same, the number of blocks requiring inversion grows linearly with the number of unknowns. Therefore, the scaling for this step is O(N). Furthermore, the blocks may be inverted in any order and may be evenly distributed across multiple processors. Therefore, the parallel efficiency will be near 100%. Note, however, that because the groups are gener- ally small with respect to the overall structure, this step, in terms of absolute time, will most likely be negligible in execution time. 65 5. Iterate for accuracy - Here, each iteration is roughly equivalent to a matrix- vector product in terms of scaling. Furthermore, very few iterations are re- quired. Therefore, this scales as O(N2). Furthermore, when viewing this as equivalent to a matrix-vector product, we can consider it as a collection of dot products - one for each row of the matrix dotted with the vector. These dot products may be evenly distributed over multiple processors and therefore the parallel efficiency will be near 100%. The graphs in Figures 5.1 and 5.2 demonstrate the parallel efficiency of the arbitrary three-dimensional code. The test problem was a linear array of cubes, each one has dimensions of 0.2m ? 0.2m ? 0.2m and they are each spaced 0.8m apart. Each cube has 72 unknowns and there are 40 cubes for a total of 2880 unknowns. Nulls were placed on adjacent boxes so that the two cubes on the ends of the array generated 72 null points and those within the array generated 144. Using different numbers of CPUs, the entire program was timed to display the parallel efficiency of the overall algorithm. This involves generating the new basis coefficients, creating the new system matrix, a single matrix fill, and one iteration of the block Gauss- Seidel solver. Two test machines were used. The first has 6 CPUs, each of which is a 64-bit 1.4 GHz Intel Itanium 2 processor. Here, the times were taken for 1, 2, 4, and 6 CPUs. For each case, we plot the ratio of the time for a single processor to that of the time for the given case. The results are shown in Figure 5.1. The second machine has 8 2.3 GHz AMD Opterons. For this machine, 1, 2, 4, 6, and 8 processors were used. The results for this test are given in Figure 5.2. The ideal ratio for each case is also given. This test demonstrates that the overall algorithm is almost ideally parallel. This is difficult to achieve in standard methods due to typical solvers such as LU decomposition. 66 1 2 3 4 5 6 1 2 3 4 5 6 Time for 1 CPU / Time for x CPUs Number of Processors Used Ideal Actual Figure 5.1: Ratio of single CPU time to 1, 2, 4, and 6 processors for machine one. 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Time for 1 CPU / Time for x CPUs Number of Processors Used Ideal Actual Figure 5.2: Ratio of single CPU time to 1, 2, 4, 6, and 8 processors for machine two. 67 5.2 Element Grouping Element grouping should begin with considering the geometry of the object to be simulated. If there are parts of the object which are likely to be highly coupled, either due to their proximity or other reasons, then the newly created basis functions for each part should produce null-fields on the other part. Furthermore, any type of symmetry or periodicity may be utilized in order to make the solution procedure more efficient. For example, the coefficients for each element in a periodic array are largely redundant and thus several matrix inversions may be avoided as the weights may simply be copied throughout the structure. For arbitrarily shaped objects, the process can be fully automated. The groups may be formed by simply creating a cubical grid enclosing the entire object and then merely grouping together all those basis functions that happen to fall into a common cell. A cell size of approximately 1??1??1? appears to work well. Also, a good rule of thumb for the near-field criteria in this case is 2?. Finally, using these criteria generally leads to a rapid convergence of only a few iterations with the iterative solver. 5.3 Further Research Further research includes applying the method to various moment formulations. Typical moment codes include support for wire and dielectric structures. This method should apply to these codes very easily. Furthermore, it is often desirable to compute multiple right hand sides as well as various frequencies. Multiple right hand sides can probably be solved simultaneously in the iterative solver while still maintaining a reasonable efficiency. Multiple frequencies require resolving the entire system multiple times. More research needs to be done in this area, such as possibly reusing the weights across multiple frequencies, in order to solve high bandwidth problems. Also, 68 there is potential for optimizing the various parts of the algorithm. For example, finding redundant structures within the geometry allows for faster weight generation. Also, since the algorithm is generally limited by the matrix fill time, it would be beneficial to find ways of speeding up this step. For example, using less rigorous integrations when functions are separated by large distances. Furthermore, Graphics Processing Units (GPUs) may be applied here since the matrix fill may easily be made parallel [10] as well as the other parts of the algorithm. For shared memory, multiple CPU architectures, a parallel LU decomposition would lower the memory requirements. In the current implementation, each processor needs its own scratch space for computing the new basis coefficients. A parallel LU decomposition would require scratch space for only a single matrix. The group coefficients could then be found one group at a time while maintaining the speed of a parallel code. Also, the algorithm shows tremendous potential for distributed computing architectures. It would be extremely effective in very large problems if hundreds of processors could be utilized to solve a single problem. Finally, the concepts here may offer a more efficient solution in the time domain as well. 69 Bibliography [1] R.F. Harrington, Field Computation by Moment Methods Classic Reissue. New York: IEEE Press, 1993. [2] S.M. Rao, D.R. Wilton, and A.W. Glisson, ?Electromagnetic scattering by sur- faces or arbitrary shape,? IEEE Trans. Antennas Propag., Vol. 30, pg. 409-418, May 1982. [3] S.M. Rao, ?Electromagnetic Scattering and Radiation of Arbitrarily-Shaped Sur- faces by Triangular Patch Modeling,? Ph.D. Dissertation University of Missis- sippi, Aug. 1980. [4] J.R. Mautz, R.F. Harrington, ?H-field, E-field, and combined-field solutions for conducting bodies of revolution,? Archiv fuer Elektronik und Uebertragungstech- nik. Vol. 32, pg. 157-164. Apr. 1978. [5] J. Van Bladel, Electromagnetic Fields. New York: McGraw-Hill, 1964. [6] M. L. Waller, ?Development and Application of Adaptive Basis Functions to Generate a Diagonal Moment Method Matrix for Electromagnetic Field Prob- lems,? Ph.D. Dissertation Auburn University, Aug. 2001. [7] M. L. Waller, Rao S.M., ?Application of adaptive basis functions for a diagonal moment matrix solution of arbitrarily shaped three-dimensional conducting body problems,? IEEE Trans. Antennas Propag., Vol. 50, Iss. 10, pg. 1445- 1452, Oct. 2002 [8] V. Rokhlin, ?Rapid solution of integral equations of scattering theory in two dimensions,? Journal of Computational Physics, Vol. 86, Iss. 2, pg. 414-439, 1990. [9] W. Gibson, The Method of Moments in Electromagnetics. Chapman and Hal- l/CRC: Taylor and Francis Group, 2008 [10] Y. Zhang, M. Taylor, T. Sarkar, A. De, M. Yuan, H. Moon, C. Liang, ?Parallel in-core and out-of-core solution of electrically large problems using the RWG basis functions,? IEEE Ant. and Prop. Magazine, Vol. 50, No. 5, Oct. 2008. 70 [11] V. V. S. Prakash, R. Mittra ?Characteristic basis function method: A new tech- nique for efficient solution of method of moments matrix equations,? Microwave and Optical Technology Letters, Vol. 36, Iss. 2, pg. 95-100. [12] T. Sarkar, S. M. Rao, ?The application of the conjugate gradient method for the solution of electromagnetic scattering from arbitrarily oriented wire antennas,? Antennas and Propagation, IEEE Transactions on , Vol. 32, No. 4, pg. 398-403, Apr 1984 [13] J. R. Westlake, A Handbook of Numerical Inversion and Solution of Linear Equa- tions. New York, Wiley, 1968. [14] ?CUDA Zone ? The resource for CUDA developers,? NVIDIA, 23 July 2009 http://www.nvidia.com/cuda. [15] ?NVIDIACUDAProgramming Guide Version 2.1,? NVIDIA Dec.2008, available at http://www.nvidia.com/cuda. [16] ?The impedance matrix localization (IML) method for moment-method calcula- tions,? IEEE Ant. and Prop. Magazine, Vol. 32, No. 5, pg. 18-30. [17] ?A New Technique to Generate a Sparse Matrix Using the Method of Moments for Electromagnetic Scattering Problems,? Microwave and Optical Technology Letters Vol. 19, No. 4, Nov. 1998. 71 Appendices 72 Appendix A Block Gauss-Seidel Iterative Solver The Gauss-Seidel iterative scheme [13] for solving linear systems may be used to construct an analogous method for partitioned systems. The Gauss-Seidel solver is a straightforward algorithm for solving a linear system: Ax = b (A.1) If there are M unknowns in the vector x, we obtain a set of solutions xkm for m = 1,2,...,M for each iteration step k. For the first iteration (k=0) we may simply put in a ?guess? for the solution. A simple choice is to let x = 0. Subsequent solutions may be obtained with the following summation: xk+1m = 1A m,m ? ??b m ? Nsummationdisplay i=1 inegationslash=m Am,i xki ? ?? (A.2) where m = 1,2,...,M but the xm values may be calculated in any order. If we have a partitioned system rather than a typical linear system, we may construct a similar algorithm: [x]k+1m = [A]?1m,m ? ??[b] m ? Nsummationdisplay i=1 inegationslash=m [A]m,i[x]ki ? ?? (A.3) where each [x]m and [b]m are subvectors of x and b, respectively, and each [A]m,n is a submatrix of A. This we will call the Block Gauss Seidel method. Also, note that, 73 according to the summation formula in Eq. A.3, the calculated values [x]k+1m are not used until the subsequent iteration. In this case, the algorithm is generally referred to as the Jacobi algorithm. However, in this work, the values of [x]k+1m are reused as soon as they are available. This leads to faster convergence and is generally referred to as the Gauss-Seidel algorithm [13]. 74 Appendix B Radar Cross Section (RCS) In this section, we present the mathematical steps involved in calculating the radar cross section (RCS) for the numerical results given in the previous chapters. Note that before any of the following steps may be applied, one must first obtain the electric currents for the given structure. For two-dimensional problems, the following definition was used for the RCS: ? = 2??|E s|2 |Ei|2 (B.1) where Ei is the incident field and Es is the scattered field calculated in the far-field in the direction of ?? = cos??x+ sin??y (B.2) The scattered electric field in the far-field region is given by Es = ?j?A (B.3) where A is the magnetic vector potential and ? = 2?f with f as the frequency for the problem. For the two-dimensional examples, this quantity is Es = ?k?4 integraldisplay C J(??)H(2)o (k|????|)dC? (B.4) 75 Using the far-field approximation for the kernel we have as x ??, H(2)o (x) ? radicalBigg 2j ?xe ?jx (B.5) Also, since we are in the far-field, we can approximate |????| as |????| = (??? ?)?(????) |????| ? ?? ???? ? (B.6) Substituting B.5 and B.6 into B.4 we arrive at Es = ?k? radicalBigg j 8k? e?jk?? ? integraldisplay C J(??)ejk(?????)dC? (B.7) Note this equation is valid for both TM and TE incidence. The polarization of J will change depending upon the incident wave polarization as will the way the integration is performed since it depends upon the current expansion functions. We may derive the three-dimensional case in a similar fashion. Here, the RCS definition is given by ? = limr??4?r2|E s|2 |Ei|2 (B.8) where Ei is the incident field and Es is the scattered field calculated in the far-field in the direction of ?r = sin?cos??x+ sin?sin??y + cos??z (B.9) The electric field in the far-field region is given by: Es = ?j?A = ?j?? integraldisplay Je ?jkR 4?R dv ? (B.10) 76 If we let r? be the integration coordinates, then as r ?? we can approximate R as R = (r?r ?)?(r?r?) |r?r?| ? r??r?r ? (B.11) giving us Es ? ?j??4?r e?jkr integraldisplay Jejk?r?r?dv? (B.12) For RWG current expansion functions, we can evaluate the integral as Es = ?j??8?r e?jkr Ncsummationdisplay n=1 ?nln bracketleftBig ?ct+n ejk?r?rct+n +?ct?n ejk?r?rct?n bracketrightBig (B.13) where the ?n?s are the coefficients for the expansion functions. From these equations, we may obtain the radar cross section of any arbitrary conductor in either two or three dimensions. 77 Appendix C Matrix Partitioning and Parallel Processing A topic closely related to this work is inverting a partitioned matrix. Typically, matrix inversion is an O(N3) process for full matrices. Furthermore, it is difficult to create a parallel matrix inversion scheme. However, using the basic concepts present in this work, it may be possible to partition a matrix and, by trading submatrix inverses for matrix multiplies, create a highly parallel solution scheme. Consider the partitioned matrix: Z = ? ?? ? [Z1,1] [Z1,2] [Z2,1] [Z2,2] ? ?? ? (C.1) Also suppose that we interpret this as representing a MoM system matrix for two bodies where Zi,j is the submatrix for expansion functions on body j and testing functions on body i. If we replace the basis functions on body 1 with functions producing nulls on body 2 and vice versa, then we can completely decouple the bodies and then solve each system independently. In terms of matrix operations, we wish to produce a matrix R with the same partitioning scheme as Z with the following property: ?Z = ? ?? ? [Z1,1] [Z1,2] [Z2,1] [Z2,2] ? ?? ? ? ?? ? [R1,1] [R1,2] [R2,1] [R2,2] ? ?? ? = ? ?? ? bracketleftBig? Z1,1 bracketrightBig [0] [0] bracketleftBig? Z2,2 bracketrightBig ? ?? ? (C.2) 78 where ?Z is a new system matrix resulting from a change of basis. Furthermore, we will force the diagonal blocks of R to be identity matrices: [R1,1] = ? ?? ?? ?? ?? ?? ? 1 0 ??? 0 0 1 0 ... ... ... 0 0 ??? 1 ? ?? ?? ?? ?? ?? ? (C.3) [R2,2] = ? ?? ?? ?? ?? ?? ? 1 0 ??? 0 0 1 0 ... ... ... 0 0 ??? 1 ? ?? ?? ?? ?? ?? ? (C.4) Using the fact that the off-diagonal blocks of ?Z must be zero we arrive at the following relationships: Z2,1 +Z2,2R2,1 = 0 ? R2,1 = ?Z?12,2Z2,1 (C.5) Z1,1R1,2 +Z1,2 = 0 ? R1,2 = ?Z?11,1Z1,2 (C.6) We can then construct the diagonal blocks of ?Z with the following equations: ?Z1,1 = Z1,1 +Z1,2R2,1 (C.7) ?Z2,2 = Z2,2 +Z2,1R1,2 (C.8) Then, we can invert ?Z by simply inverting the diagonal blocks: ?Z?1 = ? ?? ? bracketleftBig? Z1,1 bracketrightBig?1 [0] [0] bracketleftBig? Z2,2 bracketrightBig?1 ? ?? ? (C.9) 79 Also, note that ?Z?1 = R?1Z?1 (C.10) So we can get the inverse of Z by simply multiplying ?Z?1 by R: Z?1 = R?Z?1 = ? ?? ? [I] [R1,2] [R2,1] [I] ? ?? ? ? ?? ? bracketleftBig? Z?11,1 bracketrightBig [0] [0] bracketleftBig? Z?12,2 bracketrightBig ? ?? ? (C.11) = ? ?? ? bracketleftBig? Z?11,1 bracketrightBig bracketleftBig R1,2 ?Z?12,2 bracketrightBig bracketleftBig R2,1 ?Z?11,1 bracketrightBig bracketleftBig? Z?12,2 bracketrightBig ? ?? ? (C.12) Substituting the values found in Eqs. C.5 to C.8 we can get the inverse in terms of the partitions for the original system matrix Z: Z?1 = ? ?? ? bracketleftbiggparenleftBig Z1,1 ?Z1,2Z?12,2Z2,1 parenrightBig?1bracketrightbigg bracketleftbigg ?Z1,1Z1,2 parenleftBig Z2,2 ?Z2,1Z?11,1Z1,2 parenrightBig?1bracketrightbigg bracketleftbigg ?Z2,2Z2,1 parenleftBig Z1,1 ?Z1,2Z?12,2Z2,1 parenrightBig?1bracketrightbigg bracketleftbiggparenleftBig Z2,2 ?Z2,1Z?11,1Z1,2 parenrightBig?1bracketrightbigg ? ?? ? (C.13) Viewing it this way, we can see that there are now 4 partition inverses, 4 partition multiplies (if done in the proper order), and 2 matrix subtractions. Now suppose Z is of dimensions N ? N with N an even number and each of the partitions has dimensions N2 ? N2 . Then, taking all 4 partition inverses will result in about half the number of operations as taking the full matrix inverse since 4 parenleftBigN 2 parenrightBig3 = N32 . Although this part of the algorithm cannot be made fully parallel, it is possible to substantially accelerate it with multiple processors [10]. The remaining operations are only matrix multiplies and subtractions. In these two cases, one can achieve up to 100% efficiency when doing parallel processing. 80 Now consider a case where we have an N ? N matrix with N = 2m for some integer m. We can partition our matrix into 2?2 partitions: Z = ? ?? ?? ?? ?? ?? ? [2?2] [2?2] ... [2?2] [2?2] [2?2] ... ... [2?2] [2?2] ? ?? ?? ?? ?? ?? ? (C.14) We apply the partitioned inverse scheme recursively: Z = ? ?? ? [2?2] [2?(N ?2)] [(N ?2)?2] [(N ?2)?(N ?2)] ? ?? ? (C.15) We can use this step until we reach a point where we either a) already have the needed partition inverses we need or b) must invert a 2?2 partition. For case b, the inverse may be performed analytically and each element may be calculated in parallel. Furthermore, as mentioned previously, all remaining operations may be performed in parallel. So, this strategy may offer a highly parallel alternative for computing a full matrix inverse. A full inverse is important in a linear system when the number of right hand sides is larger than the dimensions of the system. In this case, it is more efficient to compute a full matrix inverse rather than performing an LU decomposition and then solving for multiple right hand sides. The following Fortran 90 code implements the above procedure and performs an in-place matrix inversion for a matrix have dimensions N ? N where N = 2m for some integer m. 81 program fm sol use fm lib implicit none integer :: m, n, nb, dm real :: myre1 , myre2 , mylg complex :: j = (0 ,1) complex, dimension (: ,:) , allocatable :: Zsi nb = 512 ! matrix dimension ! Fill matrix Zsi with random real and ! imaginary values between ?5.0 and 5.0 allocate( Zsi (nb,nb)) do m = 1,nb do n = 1,nb call random number(myre1) call random number(myre2) Zsi(m,n) = (10.0?myre1?5.0) + j ? (10.0?myre2?5.0) end do end do mylg = ( log (real(nb))/ log (2.0)) dm = nint (mylg) ! number of ( recursive ) levels call pwtin( Zsi ) ! Perform matrix inverse end program fm sol 82 module fm lib contains subroutine tby2i (Zs) implicit none complex, dimension(2 ,2) , intent(inout) :: Zs complex :: mydet , mytmp mydet = Zs(1 ,1)?Zs(2 ,2) ? Zs(1 ,2)?Zs(2 ,1) mytmp = Zs(1 ,1) Zs(1 ,1) = Zs(2 ,2) Zs(2 ,2) = mytmp Zs(1 ,2) = ?Zs(1 ,2) Zs(2 ,1) = ?Zs(2 ,1) Zs(1 ,1) = Zs(1 ,1)/mydet Zs(1 ,2) = Zs(1 ,2)/mydet Zs(2 ,1) = Zs(2 ,1)/mydet Zs(2 ,2) = Zs(2 ,2)/mydet end subroutine tby2i 83 ! In?place matrix multiply ! Replaces Za subroutine ipmml(Za,Zb) implicit none complex, dimension (: ,:) , intent(inout) :: Za complex, dimension (: ,:) , intent(in) :: Zb integer :: m,n,p,mc,nc , pc complex, dimension(size(Za,2)) :: Rt mc = size(Za,1) nc = size(Zb,2) pc = size(Za,2) do m = 1,mc Rt = 0.0 do n = 1,nc do p = 1,pc Rt(n) = Rt(n) + Za(m,p)?Zb(p,n) end do end do Za(m,1: nc) = Rt end do end subroutine ipmml 84 ! In?place matrix multiply ! Replaces Zb subroutine ipmmr(Za,Zb) implicit none complex, dimension (: ,:) , intent(in) :: Za complex, dimension (: ,:) , intent(inout) :: Zb integer :: m,n,p,mc,nc , pc complex, dimension(size(Zb,1)) :: Rt mc = size(Za,1) nc = size(Zb,2) pc = size(Za,2) do n = 1,nc Rt = 0.0 do m = 1,mc do p = 1,pc Rt(m) = Rt(m) + Za(m,p)?Zb(p,n) end do end do Zb(1:mc,n) = Rt end do end subroutine ipmmr 85 ! Compute Za ? Zb?Zc replacing Za subroutine ipsmm(Za,Zb,Zc) implicit none complex, dimension (: ,:) , intent(inout) :: Za complex, dimension (: ,:) , intent(in) :: Zb, Zc integer :: m,n,p,mc,nc , pc complex, dimension(size(Za,2)) :: Rt mc = size(Zb,1) nc = size(Zc,2) pc = size(Zb,2) do m = 1,mc Rt = 0.0 do n = 1,nc do p = 1,pc Rt(n) = Rt(n) + Zb(m,p)?Zc(p,n) end do Za(m,n) = Za(m,n) ? Rt(n) end do end do end subroutine ipsmm 86 ! Compute Za + Zb?Zc replacing Za subroutine ipamm(Za,Zb,Zc) implicit none complex, dimension (: ,:) , intent(inout) :: Za complex, dimension (: ,:) , intent(in) :: Zb, Zc integer :: m,n,p,mc,nc , pc complex, dimension(size(Za,2)) :: Rt mc = size(Zb,1) nc = size(Zc,2) pc = size(Zb,2) do m = 1,mc Rt = 0.0 do n = 1,nc do p = 1,pc Rt(n) = Rt(n) + Zb(m,p)?Zc(p,n) end do Za(m,n) = Za(m,n) + Rt(n) end do end do end subroutine ipamm 87 recursive subroutine pwtin(Za) implicit none complex, dimension (: ,:) , target , intent(inout) :: Za integer :: m,n,nb complex, dimension (: ,:) , pointer :: pa , pb, pc , pd nb = size(Za,1) pa => Za(1:nb/2 ,1:nb/2) pb => Za(1:nb/2,nb/2+1:nb) pc => Za(nb/2+1:nb ,1:nb/2) pd => Za(nb/2+1:nb,nb/2+1:nb) if (nb==2) then call tby2i (Za) return end if call pwtin(pd) call ipmml(pb,pd) call ipsmm(pa ,pb, pc) call pwtin(pa) call ipmmr(pd,pc) call ipmmr(pa ,pb) call ipamm(pd,pc ,pb) 88 call ipmml(pc ,pa) do m = 1,size(pb,1) do n = 1,size(pb,2) pb(m,n) = ?pb(m,n) end do end do do m = 1,size(pc ,1) do n = 1,size(pc ,2) pc(m,n) = ?pc(m,n) end do end do end subroutine pwtin end module fm lib 89 Appendix D GPU Acceleration GPUs (Graphics Processing Units) may contain hundreds of cores and are able to execute up to thousands of threads simultaneously [14]. Although the codes used in this work have not been accelerated with GPU hardware, this has the potential to speed up the overall procedure significantly. In fact, the major limiting factor in this work has been the matrix fill time for the MoM system matrix. Luckily, filling this matrix can be done in a highly parallel manner. Each element may be calculated independently from all the other elements in the matrix. Furthermore, they may be calculated in any order. Thus, this procedure is embarrassingly parallel and is suitable for implementation on a GPU. To test this idea, we have implemented a standard two-dimensional EFIE formu- lation with TM polarization as seen in Chapter 2. Here, each thread is assigned to a single element of the system matrix. Threads working in parallel may then quickly fill the matrix. In order to solve the linear system, we used a conjugate gradient iterative solver. Iterative solvers are more amenable to parallel processing than are direct solutions. In this algorithm, there are a number of matrix-vector multiplies and these determine how quickly the solver executes. To implement this procedure on the GPU, we assigned each thread to a row in the system matrix. In each matrix-vector multiply, each thread performs one dot product between its assigned row and the vector by which the matrix is multiplied. Next, we have tested this code on both a 2.67 GHz Core i7 CPU and a NVIDIA Tesla C1060 GPU and compared the results for unknowns numbering from 1000 up 90 to 22000 (the upper memory limit for the GPU). The ratio of the execution time for the CPU to that of the GPU is shown in Figure D.1. Note that, for each case, there is a roughly linear gain in speed as the number of unknowns increases. This implies that the growth rate for the GPU execution time is an order of magnitude less than that of the CPU. There is also a fluctuation in the graph for the matrix fill time. This could be due to a number of factors. First, when programming this routine, the threads must be assigned in multiples of 32 for maximum performance (an artifact of the NVIDIA cards) [15]. However, the matrix cannot always be perfectly divided into partitions of this size. To remedy this situation, the implementation used here will fill the largest submatrix of the system matrix that may be partitioned in this way. After that, the remaining terms are calculated separately in smaller blocks. This introduces latency that may behave in an unpredictable manner. Furthermore, there may be differences in how the compilers translate each of the codes. A separate compiler must be used for the CPU and GPU and they may optimize operations differently. Also, the speed at which different operations (multiplication, addition, etc.) are handled on each device may vary as the GPU has a complex cache/memory architecture. Finally, the variables for the GPU are in single precision (a hardware limitation) while those on the CPU are in double. This affects how the operations are performed as well as how the code is compiled and optimized. The speed of the gradient solver also improves as the number of unknowns in- creases, although much more slowly than does the matrix fill. This is due to the fact that the solver has only been parallelized along one dimension of the matrix, while the matrix has been parallelized along two. Still, the solver is almost ten times faster than the CPU for large problems. 91 Overall, it appears that this technique would greatly accelerate the matrix fill and therefore the the algorithm presented in this work. Furthermore, inexpensive programmable GPUs are currently available for desktop machines and can also be found in many supercomputers making this a practical option. 0 10 20 30 40 50 60 0 5000 10000 15000 20000 25000 GPU vs CPU ratio Unknowns Matrix Fill Conj Grad Figure D.1: Ratio of CPU to GPU execution times for matrix fill and conjugate gradient solver. 92