GENERTIA: A SYSTEM FOR VULNERABILITY ANALYSIS, DESIGN AND 
REDESIGN OF IMMUNITY-BASED ANOMALY DETECTION SYSTEM 
 
 
Except where reference is made to the work of others, the work described in this 
dissertation is my own or was done in collaboration with my advisory committee.  
This dissertation does not include proprietary or classified information. 
 
 
________________________ 
Haiyu Hou 
 
 
Certificate of Approval: 
 
 
 
 
 
 
_________________________________ 
David Umphress 
Associate Professor 
Computer Science and Software 
Engineering 
 
 
_________________________________ 
Richard Chapman  
Associate Professor 
Computer Science and Software 
Engineering 
__________________________________ 
Gerry Dozier, Chair  
Associate Professor 
Computer Science and Software 
Engineering 
 
 
__________________________________ 
Joe F. Pittman 
Interim Dean 
Graduate School 
 
 
 
 GENERTIA: A SYSTEM FOR VULNERABILITY ANALYSIS, DESIGN AND 
REDESIGN OF IMMUNITY-BASED ANOMALY DETECTION SYSTEM 
 
 
 
Haiyu Hou 
 
 
 
A Dissertation 
Submitted to 
the Graduate Faculty of 
Auburn University 
in Partial Fulfillment of the  
Requirement for the 
Degree of 
Doctor of Philosophy 
 
 
 
 
 
Auburn, Alabama 
December 15, 2006
 iii 
 
 
 
GENERTIA: A SYSTEM FOR VULNERABILITY ANALYSIS, DESIGN AND 
REDESIGN OF IMMUNITY-BASED ANOMALY DETECTION SYSTEM 
 
 
 
Haiyu Hou 
 
 
Permission is granted to Auburn University to make copies of this thesis at its discretion, 
upon request of individuals or institutions and at their expense. The author reserves all 
publication rights. 
 
 
 
 
______________________________ 
                                                                                    Signature of Author 
 
 
 
______________________________ 
                                                                                    Date of Graduation 
 
 iv 
 
 
VITA 
 
 Haiyu Hou, son of Chengya Hou and Shuquan Deng, was born June 8, 1973, in 
Panzhihua, Sichuan, China. After graduation from Panzhihua No. 3 High School in 1991, 
he attended Sichuan University and graduated with a Bachelor of Science degree in 
Biochemistry in July, 1995. Then he studied Molecular Biology for three years at 
Shanghai Institute of Biochemistry, Chinese Academy of Sciences. In June, 1998, he 
entered Auburn University and graduated with a Master of Science degree in Textile 
Science in May, 2000. In May, 2002, he obtained a second Master of Science degree, in 
Computer Science. After that, he continued his study in Computer Science for a Doctor 
of Philosophy degree at Auburn University. 
 v 
 
 
DISSERTATION ABSTRACT  
GENERTIA: A SYSTEM FOR VULNERABILITY ANALYSIS, DESIGN AND 
REDESIGN OF IMMUNITY-BASED ANOMALY DETECTION SYSTEM  
Haiyu Hou  
Doctor of Philosophy, December 15, 2006  
(M.S., Auburn University, 2002)  
(M.S., Auburn University, 2000) 
(B.S., Sichuan University, 1995)  
169 Typed Pages  
Directed by Gerry Dozier  
 The principles of immunology have been applied to the design and 
implementation of artificial systems and algorithms for solving a broad range of 
mathematical and engineering problems, which results in a new computation paradigm, 
termed an Artificial Immune System (AIS). This dissertation focuses on the performance 
improvement of an AIS in its anomaly detection functionality. 
 A typical AIS can be described as having three factors: 1) pattern and detector 
representations, 2) matching rules that decide the affinity between detectors and patterns, 
and 3) algorithms that describe the generation, death and regeneration of detectors.
vi 
 
 
Traditional representations and matching rules have been shown to make AIS suffer from 
a series of problems including poor scalability.  This dissertation proposes a constraint-
based representation and the corresponding matching rules to address these problems. 
 This dissertation proposes GENERTIA, a system that proactively improves the 
performance of an AIS by discovering and patching the vulnerabilities.  GENERTIA 
consists of two subsystems: a red team and a blue team. The red team is able to discover 
the vulnerabilities in the AIS, and the blue team design detectors to patch the discovered 
vulnerabilities. The two teams effectively strengthen an AIS in an interactive and co-
evolutionary fashion. 
 GENERTIA is applied to an AIS-based intrusion detection system (IDS).  
Experiments show that GENERTIA can effectively increase the detection rate of the IDS 
with little increase of false positive rate.  
 The GENERTIA blue team provides a novel approach to the generation of a 
compact and effective detector set.  This leads to the proposal of an anomaly detection-
based classification system.  This classification system consists of multiple subsystems 
with each subsystem being an AIS that discriminates a class of patterns from other 
classes. 
 The GENERTIA is also applied to the proposed classification system to improve 
its classification accuracy by improving the performance of the individual subsystems of 
the classification system. 
vii 
 
 
Style manual or journal used: Evolutionary Computation 
 
 
Computer software used: Microsoft Word 2003
viii 
 
 
TABLE OF CONTENTS 
LIST OF TABLES  xi
LIST OF FIGURES xiii
1 INTRODUCTION 1
 1.1 Goals 4
 1.2 Main Contributions 6
 1.3 Dissertation Outline 8
  
2 BACKGROUND 11
 2.1 Immunology for Computer Scientists 11
  2.1.1 Structure and Functions of the Immune System 11
   2.1.1.1 Innate and Adaptive Immunity 12
   2.1.1.2 Development of Immunity 13
   2.1.1.3 Clonal Deletion and Clonal Selection 14
   2.1.1.4 Elimination of Pathogens 16
  2.1.2 Immunological Principles for Design of Computer Systems 16
 2.2 Artificial immune system 18
  2.2.1 Immune Network Theory 19
  2.2.2 Clonal Selection Theory 24
  2.2.3 Negative Selection Theory 26
   2.2.3.1 NSA Defined Over Hamming Shape-Space 27
   2.2.3.2 Real-Valued NSA 31
   2.2.3.3 A Third Representation 34
  2.2.4 AIS-Based IDSs 36
   2.2.4.1 The Need of Anomaly Detection-Based IDSs 36
   2.2.4.2 Overview of Immunity-Based Intrusion Detection 37
    
ix 
 
 
3 AN AIS WITH CONSTRAINT-BASED DETECTORS 47
 3.1 Pattern Representation 47
 3.2 Detector Representation 48
 3.3 Matching Rule 49
  
4 GENERTIA ? A SYSTEM FOR AIS VULNERABILITY ANALYSIS AND REDESIGN 52
 4.1 Introduction 52
 4.2 GENERTIA Red Team 54
 4.3 GENERTIA Blue Team 58
  
5 A GENERTIA-BASED INTRUSION DETECTION SYSTEM 62
 5.1 Introduction 62
 5.2 Experiments 64
  5.2.1 Data Set 65
  5.2.2 Vulnerability Analysis 67
  5.2.3 Generating Detectors 70
  5.2.4 GENERTIA Redesign 72
  5.2.5 Performance Comparison of AISs with/without Redesign 76
 5.3 Discussion 79
  5.3.1 The Importance of the Preliminary Work on GENERTIA 79
  5.3.2 Inefficiencies and Improvements of GENERTIA 81
  5.3.3 Adaptation of GENERTIA to Dynamical Environment 86
   
6 AN IMMUNITY-BASED CLASSIFICATION SYSTEM 88
 6.1 Introduction 88
 6.2 An AIS-Based Classification System 91
  6.2.1 Anomaly Detection-Based Classification 91
  6.2.2 Training of Sub-Systems 92
  6.2.3 Classification 96
 6.3 Experiments 96
x 
 
 
  6.3.1 Wisconsin Breast Cancer Dataset 97
  6.3.2 Fisher?s Iris Dataset 104
  6.3.3 Vulnerability Analysis 112
 6.4 Discussion 117
  
7 APPLYING GENERTIA TO IMMUNITY-BASED CLASSIFICATION SYSTEM 119
 7.1 Introduction 119
 7.2 The Pima Indian Diabetes Dataset 121
 7.3 Initial Result: Classifiers without GENERTIA Redesign 124
 7.4 Applying GENERTIA to the Classification System 131
 7.5 A Virtual Expert 138
 7.6 GENERTIA Redesign with Virtual Expert 141
 7.7 Summary 142
  
8 CONCLUSION AND FUTURE WORK 144
 8.1 Research Objectives 144
 8.2 Dissertation Summary 144
 8.3 Future Work 146
    
REFERENCE 149
 
xi 
 
 
LIST OF TABLES 
Table 5.1 Network Traffic Volume of the Hosts Subjected to Attacks in the 
1998 Lincoln Lab Dataset 
67
Table 5.2 Comparing the Efficiency of the GRT and Random Generation 69
Table 5.3 Time cost in terms of NDC for generating detectors using RGNS 
and GENERTIA 
83
Table 6.1 Performances of AISs Trained with the NSA on the Wisconsin 
Breast Cancer Dataset 
98
Table 6.2 Comparison of Performance of Varied-Sized and Constraint-
Based Detectors Using Fisher?s Iris Dataset 
107
Table 6.3 Performances of AISs Trained with the NSA on Fisher?s Iris 
Dataset 
110
Table 6.4 Comparison of Classifier on the Wisconsin Breast Cancer Dataset 
and the Fisher?s Iris Dataset 
110
Table 7.1 Some Previous Results for the Pima India Diabetes Dataset 122
Table 7.2 Errors Made by AIS with Randomly Generated Detectors 126
Table 7.3 Accuracy of the AIS with Randomly Generated Detectors 126
Table 7.4 Results for AIS with GBT-Designed Detectors 127
Table 7.5 Results for 2-Subsystems Classifier with Randomly Generated 
Detectors 
129
Table 7.6 Results for 2-Subsystems Classifier with Randomly Generated 
Detectors 
129
Table 7.7 Results for 2-Subsystems Classifier with GBT-Designed 
Detectors 
130
Table 7.8 Results for 2-Subsystems Classifier with 2 Subsystems 
Redesigned by GENERTIA 
134
xii 
 
 
Table 7.9 Results for 2-Subsystems Classifier with 1 Subsystem 
Redesigned by GENERTIA 
136
Table 7.10 Results for Classifier Redesigned by GENERTIA with Virtual 
Expert 
142
xiii 
 
 
LIST OF FIGURES 
Figure 4.1 The flow chart shows the overall GRT algorithm 57
Figure 4.2 The flow chart shows the overall GBT algorithm 61
Figure 5.1 The architecture of GENERTIA for AIS-based IDS 64
Figure 5.2 The effect of release threshold on the GBT activity 71
Figure 5.3 Comparing the GRT effort needed to discover vulnerabilities in 
systems trained with and without GENERTIA 
75
Figure 5.4 ROC diagrams for AISs with/without GENERTIA redesign 78
Figure 5.5 The average number of the GBT iterations needed to design 
detectors 
85
Figure 6.1 Flow chart of the algorithm used to generate detectors for 
classification system 
95
Figure 6.2 The difference in placement and coverage between the detector 
set generated by the NSA and the genetic algorithm 
101
Figure 6.3 Vulnerability analysis of the classifier for Fisher?s Iris Dataset 115
Figure 6.4 Vulnerability analysis of the classifier for Wisconsin Breast 
Cancer Dataset 
116
   
 
1 
 
 
CHAPTER 1 INTRODUCTION 
Biology has always been a rich source of inspiration for developing computational 
models and problem solving methods [De Castro and Von Zuben 2004].  Artificial 
Neural Networks [Haykin 1999] and Evolutionary Computation [DeJong and Spears 
1993] are good examples.  During the past two decades [Dasgupta 1999], the immune 
system has drawn significant attention, and the theories and metaphors of the immune 
system have been successfully applied to the development of systems and algorithms for 
solving scientific and engineering problems.  As a result, a new intelligent computational 
paradigm, Artificial Immune System (AIS), has come into being [De Castro and Timmis 
2002a, De Castro and Timmis 2003].   
 The natural immune system (NIS) is an extremely complex system, and some key 
immune principles, such as self-nonself discrimination, are still subjects actively debated 
by immunologists with contradictory theories [Langman and Cohn 2000].  The main 
purpose of the immune system is to protect the body from various harmful entities, which, 
in most cases, are from outside of the body, such as bacteria, viruses, fungi and protozoan 
parasites, which are the major causes for many diseases and ailments, so that they are 
collectively termed pathogens [Janeway and Travers 2001]. 
2 
 
 
 The NIS has complicated self-nonself discrimination mechanisms to effectively 
detect and eliminate these pathogens.  Another prominent feature of the NIS is its ability 
to remember the pathogens that have been successfully recognized so that the system can 
launch a rapid and more specific response when the same or similar pathogens reappear.  
These two immunological mechanisms are the most interesting metaphors for the 
computer scientists, and accordingly, most AIS research can be classified into two 
categories [Dasgupta 1999]: 1) Techniques inspired by the self-nonself recognition 
mechanism and, 2) Techniques inspired by the immune memory mechanism. 
 The AIS community is interested in the development of novel models and 
algorithms with inspiration from immunology as well as the application of models and 
algorithms to new industrial or engineering application areas [Timmis et al. 2004, Hart 
and Timmis 2005].  A pioneering study by Farmer et al. [Farmer et al. 1986] developed a 
computational model that is based on the Idiotypic Network Theory, which was proposed 
by theoretical immunologist, Jerne, [Jerne 1974] to explain the immune memory 
mechanism.  In this work, Farmer et al. [Farmer et al. 1986] revealed the similarity 
between their model and the classifier system introduced by Holland et al. [Holland and 
Reitman 1978], which suggested the possibility of developing immunity-based machine 
learning techniques.  Following this novel idea, Cooke and Hunt [Cooke and Hunt 1995, 
Hunt and Cooke 1995, Hunt and Cooke 1996] proposed a supervised machine learning 
3 
 
 
technique using an AIS for DNA sequence classification.  Based on these studies, 
Timmis et al. [Timmis et al. 2000] proposed an AIS that can be applied to data analysis 
and clustering.  Following this line of research, Watkins and his colleagues [Watkins and 
Boggess 2002, Watkins and Timmis 2002, Watkins and Timmis 2003] proposed an 
efficient supervised learning algorithm for pattern classification. 
 Research based on self-nonself recognition can be traced back to the pioneering 
work done by Forrest and her group when they proposed the negative selection algorithm 
(NSA) [Forrest et al. 1994] that was inspired by the clonal deletion process in the 
vertebrate thymus to remove self-destructive T-cells to prevent autoimmunity [Forrest et 
al. 1997, pp. 15-16 in Janeway and Travers 2001].  The NSA is primarily used to 
generate detectors for anomaly detection.  The anomaly detection problem can be defined 
as follows [Tanase 2002]: given a set of samples with assigned values, establish a 
function to classify future samples as normal or abnormal. Usually, only normal samples 
are available (or only normal samples are used) for building the function.  From the 
perspective of the NSA, the universe of patterns is divided into two parts, the self space, 
which consists of all normal patterns, and the nonself space, which consists of all 
abnormal patterns.  The self and nonself spaces are complementary to each other, and 
there is no overlap between them.  The NSA generates detectors that tend to cover only 
the nonself space. 
4 
 
 
 Since the introduction of the NSA, much research has been devoted towards 
improving the efficiency and effectiveness of the NSA [D?haeseleer et al. 1996, 
Wierzchon 2000, Ayara et al. 2002, Balthrop et al. 2002b, Gonzalez et al. 2002, Singh 
2002, Gonzalez et al. 2003, Esponda et al. 2004] and the application of it to a broad range 
of application areas [Dasgupta 1999]. These include computer security [Forrest et al. 
1994, Forrest et al. 1996, Forrest et al. 1997], network intrusion detection [Hofmeyr et al. 
1998, Hofmeyr 1999, Hofmeyr and Forrest 2000, Kim and Bentley 2001b, Hou et al. 
2002, Harmer et al. 2002, Dasgupta and Gonzalez 2002], industrial mill fault detection 
[Dasgupta and Forrest 1995, Dasgupta and Forrest 1996, Taylor and Corne 2003, 
Dasgupta et al. 2004], pattern classification [Lee and Sim 2004], to name a few.   
 Although previous studies have shown the AIS to be a promising problem solving 
technique, there is still much work that needs to be done to make the AIS truly useful for 
real-world applications, with the following being some major research issues: 
enhancement of the representations, improvement of the efficiency and effectiveness of 
AISs, exploration of immune mechanisms that are not introduced to current AISs, and 
integration of existing AIS algorithms for improved performance.   
1.1 Goals 
This dissertation deals with AISs that are based on a self/nonself recognition mechanism.  
The main purpose of this dissertation is to contribute some ideas that, hopefully, improve 
5 
 
 
the performance of the anomaly detection ability of AISs.  Specifically, the goal is to 
make an AIS more effective by improving its ability to correctly distinguish self from 
nonself.  This goal is met through the realization of the following main objectives: 
1. To define different encoding schemes for the self/nonself representation, for which 
binary-string representation is very common.  A binary representation makes the 
analysis of the problem space easier; however, this representation usually does not 
provide information that can be easily understood by a human.  Since the AIS deals 
with data with rich semantic content, a higher level representation will facilitate the 
abstraction of meaningful knowledge.   
2. To define a new detector representation.  Strong detectors are needed to improve 
the performance of an AIS.  With a binary representation, detectors have constant 
size of detection range.  The NSA has been shown to have a severe scalability 
problem [Kim and Bentley 2001a] so that AISs based on the NSA cannot deal with 
high-dimensional data sets.  This scalability problem may be solved or alleviated 
with a better designed detector representation.  Different detector representations also 
suggest the definition of different matching schemes.   
3. To develop a new detector generation algorithm.  The effectiveness of an AIS 
depends on the size of its detector set.  AISs that use the NSA for detector generation 
are not efficient [Kim and Bentley 2001a, Stibor et al. 2005c].  A more efficient 
6 
 
 
detector generation algorithm should be able to generate a small detector set with 
high effectiveness.  This is especially necessary for real-time applications such as 
network intrusion detection, where a large detector set may not be suitable [Kim and 
Bentley 2001a].   
4. To improve the performance of an AIS by reducing the number of ?holes? in the 
system.  In other words, to increase the detection rate of the AIS.  Nonself may go 
undetected through ?holes?, which are pieces of nonself space that are not covered by 
the existing detectors.  The existence of holes makes an AIS vulnerable to certain 
attacks, therefore, holes in an AIS are also referred to as vulnerabilities of the AIS in 
this dissertation.  It is inefficient to reduce the number of holes by increasing the size 
of detector set.  A ?self-inspection? mechanism for discovering and patching holes 
may effectively improve the performance of the AIS.   
 5. To verify the techniques proposed in this dissertation, experiments with AIS in 
two application areas are carried out, including network intrusion detection and 
pattern classification.   
1.2 Main Contributions 
The goal of this dissertation is to improve the performance of anomaly detection- based 
AISs.  This is achieved by the introducing of a new self/nonself and detector 
representations, which are referred to as a constraint-based representation [Hou et al. 
7 
 
 
2002]; and accordingly, new matching schemes, including an any-r-interval matching 
rule and a contiguous-r-interval matching rule, are designed for the new representation.  
Experiments [Hou and Dozier 2004] have shown that AISs with the new representation 
have better performance than AISs with the traditional binary representation.   
 A genetic algorithm-based system, termed GENERTIA [Dozier 2003, Dozier et al. 
2004b, Dozier et al. 2004a, Hou and Dozier 2005], is developed to improve the 
performance of an AIS by iteratively discovering and patching holes in the system.  
GENERTIA consist of two subsystems: the GENERTIA red team and the GENERTIA 
blue team.  The objective of the GENERTIA red team is to discover the vulnerabilities in 
the AIS.  The objective of the GENERTIA blue team is to design detectors for the 
vulnerabilities discovered by the GENERTIA red team.  The two teams cooperate in an 
interactive co-evolutionary fashion to strengthen the AIS. 
 Detectors are designed to cover nonself space.  Coverage overlap between 
detectors is inevitable, but coverage overlap does not provide extra detection ability for a 
system.  AISs that depend on the NSA may generate a large number of overlapping 
detectors.  With GENERTIA, detectors are only generated to cover nonself space that is 
not covered by existing detectors in the AIS.  This effectively reduces the coverage 
overlap between detectors, which results in an AIS with a compact detector set and high 
performance [Hou and Dozier 2006b].   
8 
 
 
 AISs designed by GENERTIA have very compact detector sets and high 
performance [Hou and Dozier 2005].  This provides an opportunity to develop 
classification systems that are based on anomaly detection [Hou and Dozier 2006a].  An 
anomaly detection based classification system is proposed.  The proposed classification 
system consists of multiple subsystems, with each subsystem being an AIS for one class 
of the classification problem.  Experimental results on benchmark data sets show that 
such a classification system is favorably comparable to well-known classifiers, and the 
compact detector set can be easily converted into rules that are meaningful for humans 
[Hou and Dozier 2006a].     
 The performance of the classification system can be further improved by applying 
GENERTIA to the individual subsystems.  GENERTIA can reduce the vulnerabilities 
(holes) in the subsystems and therefore effectively reduce the number of false negatives 
that make some patterns unclassifiable. 
1.3 Dissertation Outline 
Chapter 2 presents a biological background on immunology, and focuses on the basic 
concepts that are necessary for understanding AIS.  Chapter 2 then provides a survey of 
AIS research that is related to the work proposed in this dissertation.   
9 
 
 
 Chapter 3 presents a constraint-based AIS that performs anomaly detection.  
Chapter 3 begins with the introduction of the rationale of a simple AIS that uses the NSA 
to generate detectors, followed by the presentation of the representation of self/nonself in 
the form of vectors of integers, the constraint-based representation.  The any-r-interval 
and contiguous-r-interval matching rules are also introduced.   
 Chapter 4 introduces GENERTIA which consists of two subsystems: a red team 
and a blue team.  The GENERTIA red team is used to discover ?holes? or vulnerabilities 
in the system; while the GENERTIA blue team is used to design corresponding detectors 
for the holes discovered by the red team.  The GENERTIA can be applied to an anomaly 
detection AIS to recursively reduce the holes in the system.    
 Chapter 5 presents an application of GENERTIA to an AIS-based intrusion 
detection system.  Experiments are carried out to show the applicability of GENERTIA to 
immunity-based intrusion detection systems for performance improvement.   
 Chapter 6 proposes an anomaly detection based classification system.  This 
classification system consists of multiple subsystems with each subsystem being an 
anomaly detection-based AIS.  These subsystems depend on a GENERTIA blue team to 
generate negative detectors which can be easily converted into classification rules.  
Experiments are conducted with benchmark data sets to test the performance of this 
classifier.   
10 
 
 
 Chapter 7 presents the application of GENERTIA to the proposed classification 
system for better performance.  After the classification system is built with the 
GENERTIA blue team, the red team can be used to discover the vulnerabilities in the 
subsystems and these discovered vulnerabilities can be patched by the GENERTIA blue 
team.  This process may improve the accuracy of the classification system by reducing 
the number of patterns that cannot be classified due to false negatives from subsystems. 
 Chapter 8 gives a summary of the work proposed in this dissertation and provides 
several directions for future research. 
11 
 
 
CHAPTER 2 BACKGROUND 
This chapter first briefly introduces a number of concepts in immunology.  These 
concepts are necessary for the understanding of AIS. Following the introductory concepts 
will be a list of immunological principles that are interesting to computer scientists.  Next 
a survey of the research work on AISs that is related to the work presented in this 
dissertation is provided.   
2.1 Immunology for Computer Scientists 
This section provides a succinct introduction to the structure and functionalities of the 
natural immune system and summarizes the principles that have been inspiration for 
computer scientist and engineers in the development of computational models and 
algorithms. 
2.1.1 Structure and Function of the Immune System 
The nature immune system (NIS) exists in all the vertebrates and serves as a vital system 
for defending the body from various infections that lead to disease [Janeway and Travers 
2001].  The  NIS  is an extremely complex system with many immune mechanisms 
12 
 
 
not completely understood.  This section provides an abstract view of the system while 
omitting many details of the specific mechanisms.  The purpose of this section is to serve 
as a reference to subsequent discussion on its artificial counterparts.  Detailed reviews on 
the structural and functional analysis of the NIS may be found in [Janeway and Travers 
2001, Krogh 2000].   
 This introduction is organized into the following four parts:  
1. ?Innate and Adaptive Immunity? introduces the two types of immunities that are 
either specific or nonspecific to the invading pathogens;  
2. ?Development of Immunity? introduces the development of various cells and 
molecules involved in the immune response;  
3. ?Clonal Deletion and Clonal Selection? introduces the generation of immune cells 
that are specific to certain pathogens;  
4. ?Elimination of Antigens? briefly describes how the pathogens are eliminated from 
the body after they are recognized by the immune system.   
2.1.1.1 Innate and Adaptive Immunity 
The NIS protects the body from infectious diseases caused by various pathogens 
[Janeway and Travers 2001, Krogh 2000].  There are four major groups of pathogens, 
13 
 
 
including virus, bacteria, fungi and protozoan parasites.  Pathogens elicit immune 
responses after they enter the body.  Substances that can stimulate specific response of 
the immune system are referred to as antigens, and certain pathogens display specific 
antigens (usually some membrane proteins) on its surface.   
  Generic pathogens that display commonly shared antigens on the surface may be 
recognized and eliminated by a type of leukocyte (white blood cell) referred to as 
phagocyte, which is attracted to a site of infection to engulf and digest pathogens.  This is 
called innate immunity.   
 Specific pathogens invoke an antigen-specific immune response, which involves 
specific lymphocytes that synthesize antibodies binding to a specific antigen and promote 
the elimination of the antigen.  This is called adaptive immunity.  The adaptive immune 
system responds more quickly and efficiently to a repeat infection, and such immunity is 
long lasting.   
2.1.1.2 Development of Immunity 
The NIS is a collection of cells and organs that work together to provide immunity 
[Janeway 2001, Krogh 2000].  NIS cells wander around in the body to detect infections.  
NIS organs are where leukocytes mature and where they interact with antigens to become 
fully active effecter cells and memory cells. 
14 
 
 
 The primary immune organs, including the bone marrow and thymus, are where 
leukocytes form and mature.  Lymphocytes are antigen-specific leukocytes responsible 
for adaptive immunity.  They have membrane receptors called antibodies that bind to 
antigens.  Each lymphocyte recognizes one specific antigen.  Lymphocytes specific for 
many diverse antigens are produced continually even in the absence of antigen exposure.   
 Secondary lymphoid organs, including the spleen and lymph nodes, are where 
leukocytes meet with antigens.  Some fluid leaves the blood circulation at the capillaries 
and bathes the tissues, supplying nutrients and washing away waste products.  The fluid, 
called lymph, then collects in the lymphatic vessels and passes through the lymph nodes 
on its way back to the blood stream.  If the tissues are infected, antigens are carried to the 
lymph nodes where they contact with lymphocytes and initiate adaptive immune 
responses.  When a lymphocyte encounters its specific antigen, it proliferates and 
differentiates into a clone of effecter cells with the same antigen specificity.   
2.1.1.3 Clonal Deletion and Clonal Selection 
Lymphocytes, a vital component of the immune system, carry antigen receptors that bind 
with antigens that are carried by pathogens such as virus, bacteria, fungi and protozoan 
parasites.  The binding between an antigen and a receptor is highly specific, which is 
determined by the physical and chemical characteristics of the binding surfaces of the 
two molecules.  Each lymphocyte carries copies of a certain antigen receptor whose 
15 
 
 
antigen-binding specificity is determined by its encoding gene.  A unique genetic 
mechanism that operates during lymphocyte development is capable of generating 
different variants of antigen receptor genes in the magnitude of 10
11
 [pp.  144 in Janeway 
and Travers 2001], a repertoire that is large enough to bind virtually any possible antigen, 
including molecules that come from the body itself.  The clonal elimination process 
removes the lymphocytes that carry potentially self-reactive antigen receptors before they 
mature and join the immunity forces.   
 A lymphocyte binds an antigen with specific antibodies, which are encoded in 
groups of gene segments.  As lymphocytes develop in the bone marrow and thymus, they 
must recombine several of these gene segments to produce a functional antibody.  
Recombination occurs randomly, so millions of lymphocytes produced daily collectively 
have millions of different antigen specificities.  Since receptor generation is random, cells 
specific for self-antigens are also produced.  Anything binding to immature lymphocytes 
in the thymus and bone marrow results in their death, which is known as clonal deletion 
[Janeway 2001, Krogh 2000], and that is why mature lymphocytes do not bind to self-
components.   
 Mature naive lymphocytes are those that have not yet bound to any foreign 
antigen.  They leave the primary lymphoid organs and re-circulate through the secondary 
lymphoid organs.  If a lymphocyte does not bind its specific antigen in a few weeks, it 
16 
 
 
dies.  If it does bind to an antigen, it proliferates into a clone of lymphocytes that 
differentiate into antigen-eliminating effecter cells.  Memory lymphocytes specific for the 
same antigen are also produced.  This process is called clonal selection [Janeway 2001, 
Krogh 2000].  It explains the specificity of adaptive immunity.   
2.1.1.4 Elimination of Antigen 
The antibody is the primary weapon for antigen elimination in adaptive immunity.  After 
being activated by antigens (and other signals) lymphocytes become plasma cells that 
secrete antibodies, which is specific to the activating antigens.  Antibodies bind to 
pathogens to deactivate them.  They also coat the pathogens, and with the help of other 
molecules, generate signals to attract phagocytes to engulf and digest the coated 
pathogens.  Many digestive enzymes are released by phagocytes to decompose pathogens 
in this process.   
2.1.2 Immunological Principles for Design of Computer Systems 
Although our understanding of the complex immune system is still at a very preliminary 
stage, computer scientists have gained many valuable insights that have lead to the design 
of artificial immune systems (AISs) for a broad range of application areas [Dasgupta 
1999].  The following is a list of some of the most important immunological principles 
17 
 
 
that can be applied in the design of an artificial immune system [Hofmeyr 1999, 
Gonzalez 2003]. 
1. Distributed Processing: The immune system consists of millions of immune cells 
and macromolecules that circulate around the blood and lymph system to detect and 
respond to immune stimulus.  These components interact locally to provide 
protection in a completely distributed fashion.  There is no central control or 
coordination, therefore no single point of failure in the system.   
2. Adaptability and Memory: The immune system can adapt to pathogenic structures 
by learning to recognize them and keep a memory of them.  When a certain pathogen 
is detected for the first time, a primary response is induced and some lymphocytes 
that recognize this pathogen are kept as memory cells, which can respond more 
rapidly when the same type of pathogens are encountered.  These memory cells serve 
as signature-based detectors.   
3. Anomaly Detection: For the immune system, there are no explicit definitions of 
pathogens.  Any substance that is not the component of the body itself is potentially 
harmful to the body.  The immune system learns self empirically and develops the 
ability to detect pathogens including the ones that have never been encountered 
before.  The ability to detect novel pathogens is vital because any organism will 
always encounter them in its lifetime. 
18 
 
 
4. Diversity and Robustness: There are different immune components that provide 
diverse pattern recognition so that the immune system can detect a variety of 
pathogens.  These components are multitudinous and redundant so that the loss of a 
few of these components will not significantly affect the overall functionality of the 
immune system.   
2.2 Artificial Immune Systems 
With traditional computation, computer scientists have found it either difficult or 
impossible to perform a certain key range of tasks associated with design, decision 
making, logistics and scheduling, pattern recognition, problem solving and autonomous 
intelligence [De Castro and Timmis 2003].  Over the years, great progress towards 
designing software for such tasks has emerged by taking inspiration from a range of 
natural, mainly biological, systems [De Castro and Von Zuben 2004].  Artificial Neural 
Networks and Evolutionary Computation are good examples of successful applications of 
the biological metaphor to the solution of difficult problems [Haykin 1999, DeJong and 
Spears 1993].  The immune system, possessing very powerful information processing 
capabilities such as feature extraction, pattern recognition, and adaptive learning and 
memory, has provided computer scientists rich metaphors for developing computational 
models and problem solving methods [Dasgupta 1999, De Castro and Timmis 2002].  
During the past decade, the principles abstracted from the natural immune system have 
19 
 
 
been applied to build artificial systems for solving a broad range of computational tasks, 
resulting in a new research paradigm referred to as an Artificial Immune System (AIS) 
[Dasgupta 1999, De Castro and Timmis 2002].   
 The immune system is a very complex system that is still under active research 
[Langman and Cohn 2000].  There are various theories regarding the structures and 
mechanisms the immune system employs to fulfill its overall objective, and these theories 
even have contradictory details [Langman and Cohn 2000].  Accordingly, current AIS 
research has adopted a number of those theories; however a typical AIS implements only 
a few [Dasgupta 1999].  Among the various immunological theories, three are of most 
interest: Immune Network Theory, Clonal Selection Theory, and Negative Selection 
Theory.  The following sections review AIS models and algorithms based on these three 
theories.   
2.2.1 Immune Network Theory 
Idiotypic Network Theory was originally proposed by Jerne [Jerne 1974] to describe the 
immune system as a self-regulatory network of B-cells.  In this theory, B-cells may 
recognize each other through the presence of certain paratopes on the surface of the B-
cell.  Two B-cells are connected if they have matched paratopes and the strength of the 
connection is proportional to the extent to which they match each other, termed as 
affinity.  Connected B-cells both stimulate and suppress each other in the expression of 
20 
 
 
antibodies.  The stabilization of the network is obtained at an equilibrium state between 
the stimulation and suppression.   
 Varela et al. [Varela et al. 1988] proposed a cognitive model of the immune 
system with four significant functionalities: 1) the recognition of various molecules, 2) 
the memory of encountered molecules, 3) the discrimination of self and nonself 
molecules, and 4) the associative memory to recognize unseen molecules.  They also 
proposed that these functionalities are associated with three immune network properties: 
structure, dynamics, and meta-dynamics.  The recognition of antigens by antibodies and 
the connections between antibodies construct the structure of the immune network. The 
dynamics describes the changing of the network structure by updating the strength of 
connection between antibodies upon the encountering of new molecules. Meta-dynamics 
refers to the changing of the network structure by adding new connections and deleting 
outdated connections.   
 A seminal work that suggested developing immunity-based computational 
algorithms was by Farmer et al. in [Farmer et al. 1986], where the authors developed a 
computational model that is based on the Idiotypic Network Theory, and suggested a 
strong similarity between their model and the classifier system introduced by Holland et 
al. [Holland and Reitman 1978].  In their model, antibodies and antigens are represented 
as binary strings.  An antibody clones multiple copies of itself upon stimulation.  The 
21 
 
 
number of copies produced in cloning is determined by concentration level which, in turn, 
is determined by four factors: 1) stimulation by matching antigens, 2) stimulation by 
neighboring antibodies, 3) suppression by neighboring antibodies, and 4) cell death due 
to a finite lifespan.   
 Cooke and Hunt [Cooke and Hunt 1995] proposed using an AIS that is based on 
the immune network theory to distinguish promoter sequences from other non-promoter 
DNA sequences.  This AIS consists of a network of B-cells that generate antibodies to 
classify DNA sequences, and each B-cell can be stimulated by DNA sequences (antigens) 
or other B-cells and suppressed by other B-cells as well.   
 Hunt et al. [Hunt et al. 1999] developed the Artificial Immune Network (AIN) 
model, in which the B-cell population is made up of two sub-populations: the initial 
population and the cloned population.  The initial population is generated from a subset 
of training dataset to create the B-cell network, and the remainders are used as training 
antigens.  During the training process, antigens are randomly chosen and presented to the 
B-cell network.  A binding between an antigen and a B-cell results in the clone and 
mutation of the B-cell.  The mutation yields a diverse set of antibodies that can be used 
for classification.  New B-cells are either integrated into the network at the closest B-cells 
or discarded if integration failed.   For any antigen that cannot be bound by the current  
22 
 
 
B-cells, a new B-cell is generated using the antigen as a template and is incorporated into 
the network.   
 Timmis et al. [Timmis et al. 2000] proposed the Artificial Immune NEtwork 
(AINE), which has some similarity to AIN.  During the training process, B-cells have 
opportunities to be stimulated by the antigens from a training set, and the strength of the 
stimulation is determined by the Euclidean distance between the antigen and the B-cell 
involved in the stimulation.  Upon stimulation, a B-cell can clone itself and mutations are 
introduced to produce B-cells with binding varieties.  The training process results in a 
network of B-cells that represent the structure of the training set and it can be later used 
for data clustering.   
 Based on AINE, Timmis and Neal [Timmis and Neal 2000] proposed the 
Resource limited Artificial Immune Network (RAIN) to address efficiency problems that 
resulted from the explosion of B-cell population in AINE.  In RAIN, an artificial 
recognition ball (ARB) is used to represent a number of similar B-cells.  The concept of 
ARB is borrowed from the idea of recognition ball in theoretical immunology, which 
refers to the region in the shape space that an antibody can recognize.  Each ARB is 
assigned a number of B-cells depending on its level of stimulation; therefore, the whole 
network is resource limited.   
23 
 
 
 Watkins and his colleagues [Watkins and Boggess 2002, Watkins and Timmis 
2002, Watkins and Timmis 2003] proposed the Artificial Immune Recognition System 
(AIRS) as a supervised learning algorithm.  This algorithm consists of four stages.  The 
first stage involves data normalization, parameter discovery and the seeding of memory 
cells and initial ARB population using zero or more randomly selected training samples.  
In the second stage, a training sample (regarded as an antigen) is presented to the memory 
cell population and the memory cell with the highest affinity with the antigen is selected 
for generating ARB.  In the case when no matching memory cell is found, a new memory 
is generated using the antigen as a template and added to the memory cell population.  
Then, through the cloning and mutation of the selected memory cell, ARBs are produced 
and added to the ARB population.  In the third stage, ARBs compete for resources 
according to their stimulation levels (the ability to correctly classify a given antigen), 
with resources allocated from ARBs with lower stimulation levels to ARBs with higher 
levels.  The ARBs with zero resource are removed from the population.  In the final stage, 
for each antigen, the ARB with the highest stimulation level is compared with the 
memory cell selected in the second stage, and the one (either the ARB or the memory cell) 
with a higher stimulation level is added to a set of established memory cells.  At this 
point, the next antigen in the training set is selected and the training process proceeds 
with the second stage.  This process continues until all antigens have been presented to 
the system.   
24 
 
 
2.2.2 Clonal Selection Theory 
According to The American Heritage Stedman's Medical Dictionary
1
, Clonal Selection 
Theory is ?the theory that the mutation of stem cells produces all possible templates for 
antibody production and that exposure to a specific antigen selectively stimulates the 
proliferation of the cell with the appropriate template to form a clone or colony of 
specific antibody-forming cells?.  Simply put, the Clonal Selection Theory says that an 
antigen selects from among a variety of lymphocytes those with receptors capable of 
reacting with part of the antigen.  As a result of this interaction, specific lymphocytes are 
activated to proliferate (clonally expand).  They may then secrete molecules of antibody 
that can combine with the antigen.  If the antigen is part of the surface of a virus or 
bacterium, then the antibody labels that organism as foreign ("non-self").  The organism 
is then ingested by phagocytic cells and degraded.   
 De Castro and Von Zuben [De Castro and Von Zuben 2000] developed an 
artificial immune network model referred to as aiNet, for pattern recognition and 
clustering.  The network cells in aiNet are represented in a similar way as in AINE and a 
Minimum Spanning Tree algorithm is employed to identify clusters.  The input data are 
assumed to be unlabeled and a competitive learning algorithm is used to iteratively 
reduce the mean square error of the input data clustering.  The recognition of an input 
                                                 
1
 The American Heritage? Stedman's Medical Dictionary, 2nd Edition ? 2004 by Houghton Mifflin 
Company.   
25 
 
 
pattern (antigen) results in the proliferation, mutation and selection as suggested by the 
Clonal Selection Theory.   
 Based on the aiNet model, de Castro and Von Zuben [De Castro and Timmis 
2002b] proposed a multimodal function optimization algorithm, named CLONALG, for 
machine learning and pattern recognition.  This algorithm has two repertoires of strings: a 
set of antigens consisting of training data and a set of antibodies consisting of candidate 
solutions.  The antibody set is further divided into two sub-populations: memory and non-
memory repertories.  The algorithm runs for a predefined number of generations.  During 
each generation, a randomly selected antigen is presented to the antibody population, and 
n antibodies with the highest affinities are allowed to clone themselves with a cloning 
rate proportional to their affinities: the higher affinity the more clones generated for each 
of the n selected antibodies.  These clones are then subjected to mutation with a mutation 
rate inversely proportional to their affinities: the higher affinity the smaller the mutation 
rate.  Among these antibodies, the one with the highest affinity replaces the 
corresponding memory antibody in the memory antibody repertories if it has a higher 
affinity than the memory antibody; and d antibodies with the highest affinities replaces d 
non-memory antibodies with the lowest affinities in the non-memory antibody repertoires.  
CLONALG is a population-based algorithm that involves mechanisms of reproduction, 
mutation and selection, which is similar to evolutionary computation (EC).  The distinct 
features of CLONALG, however, are: 1) it can reach a diverse set of locally optimal 
26 
 
 
solutions, and 2) it takes into account the fitness values of individuals (cell affinity) in 
order to determine the cloning and mutation rates.   
2.2.3 Negative Selection Theory 
The negative selection algorithm (NSA), an approach to anomaly detection using 
negative detectors, was originally proposed by Forest et al. [Forrest et al. 1994] to model 
the clonal deletion process in the natural immune system to prevent autoimmunity 
[Forrest et al. 1997, pp. 15-16 in Janeway and Travers 2001].  T-cells, a type of 
lymphocyte, have antigen receptors covering their surface for binding to antigens.  The 
binding is specific: a certain antigen receptor only binds a specific antigen, while all the 
antigen receptors on a T-cell have the same specificity.  During the generation of T-cells, 
the gene encoding the antigen receptor undergoes a pseudo-random genetic 
rearrangement process, which means the antigen-binding specificity of the T-cell is 
random so that it has the potential to bind any antigen including molecules of the body 
itself.  Therefore, before these T-cells are mature, they are censored in a way that all the 
T-cells that bind to self-component (cells, macromolecules, etc.) are destroyed, and only 
the T-cells pass the censoring process are allowed to leave the thymus and circulate in the 
body to perform immunological functions.   
 The NSA proposed by Forest et al. [Forest et al. 1994] generates detectors using a 
similar strategy: detectors are generated at random and censored against a given set of 
27 
 
 
self patterns.  All the detectors that detect any self patterns are removed.  This results in a 
collection of detectors that potentially detect any patterns except the self patterns.  The 
effectiveness of the resulting detector set, in terms of detection of nonself patterns, 
depends on the size of the detector set.  The larger the detector set, the higher the 
possibility a given nonself pattern will be detected by the detector set.  This algorithm is 
more formally described in [Esponda et al. 2004] from an anomaly detection aspect: 
1. Define self, RS (real-self), as a set of patterns of fixed length l, of which only a 
subset, S, is available for learning.  The set of all l-length patterns is referred to as U 
(universe).  The task of anomaly detection is to distinguish the patterns in RS from 
the anomalous patterns in U ? RS.   
2. Candidate detectors are randomly generated and are censored against S; surviving 
detectors are negative detectors with each covering a subset of U ? RS.  Good 
coverage of U ? RS may be obtained if sufficient numbers of negative detectors are 
generated.   
2.2.3.1 NSA Defined Over Hamming Shape-Space 
The NSA does not specify the implementation in terms of pattern representation and 
matching rule.  The NSA proposed by Forrest [Forrest et al. 1994] employed binary 
strings to represent patterns and detectors, and an r-contiguous-bit matching rule was 
28 
 
 
used to determine the match between patterns and detectors.  Under the r-contiguous-bit 
matching rule, a pattern is matched by a detector if they have the same bits in at least r 
contiguous positions.  This matching rule has been used to model the antigen-receptor 
binding in immunology [Percus et al. 1993].  Theoretical analysis on the NSA shows that, 
with binary-coded detectors and r-contiguous-bit matching rule, for certain probability of 
detection, the number of candidate detectors required for censoring to generate certain 
number of valid detectors, is exponential in the size of self [D?haeseleer et al. 1996, 
Ayara et al. 2002].  This causes a scalability problem due to the time complexity that is 
proportional to the number of candidate detectors and the size of self.   
 The time complexity problem of the NSA motivated the proposal of other 
algorithms, including the greedy algorithm [D?haeseleer et al. 1996], linear time 
algorithm [D?haeseleer et al. 1996], binary template algorithm [Wierzchon 2000] and 
negative selection with mutation [Ayara et al. 2002], all based on binary string 
representation.  Singh [Singh 2002] proposes an extension of the greedy algorithm for a 
higher alphabet size.  These algorithms have time complexities that are linear in the size 
of the self, but still suffer from a time complexity that is exponential in the r value (the 
length of shortest substrings that define matching) and require space complexities that are 
also exponential in the r value [Ayara et al. 2002].  While high space complexity is not 
desirable for network intrusion detection applications, a significant drawback of these 
algorithms is that they are only applicable to r-contiguous-bit matching rules.   
29 
 
 
 Nevertheless, this representation has been adopted by Hofmeyr and Forrest 
[Hofmeyr and Forrest 2000] to implement LISYS, an AIS-based network intrusion 
detection system (IDS).  Inspired by an earlier anomaly detection-based network security 
system [Mukherjee et al. 1994], Hofmeyr and Forrest defined self patterns as normally 
occurring TCP/IP connections that are abstracted as data path triples, including the IP 
addressees of the two hosts, the port number on the server, and the direction of the 
connection.  A data path triple can be represented as a 49-bit sting.  With an extended 
NSA that generates and censors detectors in an online environment with dynamic self, 
LISYS obtained satisfying results with simulated network traffic.   
 Kim and Bentley [Kim and Bentley 2001a] argue that in order to detect intrusions 
in real network traffic, a more complicated representation of self patterns is required.  For 
example, the connection vector described in [Mukherjee et al. 1994] contains 15 fields.  
They used a pattern representation with higher dimension (33 fields) and higher 
cardinality alphabet (0 - 9) and reported extreme difficulty in generating detectors using 
the NSA due to the serious scaling problem.  In their experiments, for 20 minutes worth 
of network traffic data, 20 hours of training time was needed to generate 1000 valid 
detectors, and these detectors could only detect less than 16% of the nonself patterns.  
This lead to their conclusion that the NSA cannot handle real network traffic, and they 
suggested that this algorithm should only work as a filter for invalid detectors.  Balthrop 
and his colleagues [Balthrop et al. 2002a] argue that the difficulty encountered by Kim 
30 
 
 
and Bentley in [Kim and Bentley 2001a] was caused by a bad choice of pattern 
representation (large alphabet) and improper settings of parameters (r value).  They also 
point out that the NSA and matching rule are separable pieces and that a bad choice of 
representation and matching rule may result in poor performance of the NSA.   
 In another paper, Balthrop and his colleagues [Balthrop et al. 2002b] introduce a 
variant of the r-contiguous-bit matching rule termed as r-chunk, where detectors are 
defined by r contiguous bits while other bits are simply considered ?do not care?.  A 
pattern matches an r-chunk detector if they coincide at the defined r contiguous positions.  
r-chunk matching subsumes r-contiguous-bit matching in the sense that any r-
contiguous-bit matching detectors can be represented as a set of r-chunk matching 
detectors.  They have shown that r-chunk matching rule may improve the performance of 
NSA.  Formal analysis of the r-chunk matching rule is given by Esponda et al. in 
[Esponda et al. 2004], where they provide formula calculating the total number of 
generable detectors and the number of holes for which no valid detector can be generated, 
given r, l, the pattern length, and |S|, the size of the self set used for training.  Stibor and 
his colleagues [Stibor et al. 2004] extend r-chunking to higher alphabets and propose an 
algorithm for efficiently generating r-chunk detectors over arbitrary alphabet sizes.  Their 
analysis of this algorithm shows that the r-chunk matching rule is not suitable for NSA 
with higher alphabet size since the alphabet size has a strong negative influence on the 
total number of generable detectors.   
31 
 
 
 In a subsequent study, Stibor and his colleagues [Stibor et al. 2005a] investigate 
the appropriateness of applying NSA to network intrusion detection.  In this work, 
empirical and theoretical analysis shows that the number of holes increases exponentially 
if the r-chunk length r is not close to l, the pattern length.  However, for all the detector 
generating algorithms aforementioned, both the time and space complexities increase 
exponentially in r, which means that NSA will be inefficient for large value of r.  On the 
other hand, for a network intrusion detection application, complicated patterns are 
required to characterize the network connections and packets, which may take 30 bytes 
[Mukherjee et al. 1994].  Combining these two factors, Stibor conclude that NSA is not 
appropriate for network intrusion detection application.   
 Fortunately, the conclusion drawn in [Stibor et al. 2005a] is restricted to the NSA 
defined over Hamming Shape-Space.  Hamming Shape-Space defines a universe that 
consists of all string of fixed length over a finite alphabet.  This includes all the 
representations mentioned above.  However, other options exist, for example, real-coded 
vectors.   
2.2.3.2 Real-Valued NSA 
Dasgupta and Gonzalez [Dasgupta and Gonzalez 2002] propose a real-valued 
representation to characterize the self space and generate detectors to cover the 
complementary self space.  In their approach, the pattern space is normalized into an n-
32 
 
 
dimensional unitary space, [0, 1]
n
.  Self space is represented as hyperspheres, with each 
defined by a center and a radius.  Detectors are represented as hypercubes, defined by 
lower and upper values on each dimension.  Good detectors should have large coverage 
and small overlap with self space.  They propose to use a genetic algorithm to design 
such detectors.   
 Gonzalez and Dasgupta [Gonzalez et al. 2002] propose a Real Valued Negative 
Selection (RNS) algorithm defined over a n-dimensional space, [0, 1]
n
.  In this algorithm, 
a pattern is simply an n-dimensional vector that defines its location in the feature space, 
while detectors are represented as hyperspheres, defined by an n-dimensional vector that 
corresponds to the center and by a real value that represents its radius.  Pattern-detector 
matching is defined by the Euclidean distance between the pattern location and the center 
of the detector and by the radius of the detector.  If the distance is less than the radius, 
then the pattern is matched by the detector and classified as nonself.  Detectors are 
randomly generated and relocated to not match self, and the distance between a detector 
and self is defined by the median distance of k-nearest self points.  A later version of 
RNS [Gonzalez et al. 2003] also defines a self radius for every self point in order to allow 
close points to be considered as self.   
 Ji and Dasgupta [Ji and Dasgupta 2004a] propose a real-valued NSA with 
variable-sized detectors (termed V-Detectors).  This algorithm randomly generates a 
detector center that lies outside self hyperspheres then continuously increases the radius 
33 
 
 
of the detector until it contacts with self region.  This algorithm terminates when a 
predefined number of detectors are generated or predefined nonself coverage has been 
reached, which is estimated statistically [Ji and Dasgupta 2005].   
 The real-valued NSA is proposed to avoid the scalability problem when the 
algorithm is defined over Hamming shape-space.  It does not apply negative selection to 
censor invalid detectors.  If the termination is decided by the number of detectors, the 
complexity is linear in the size of the self set used for training [Stibor et al. 2005b]; the 
complexity is not easily computed, however, if the coverage threshold is used as a 
termination criteria.  With each detector having different volume of coverage, real-valued 
detectors may have fewer holes than using binary-coded detectors, however, it also 
makes it difficult to conduct a formal analysis of the algorithm as in [Esponda et al. 2004, 
Forrest et al. 1994].  Therefore, it is really difficult to predict the performance of the 
algorithm on a given data set.  Although some good results are obtained with real-valued 
NSA for some data sets [Gonzalez et al. 2002, Gonzalez et al. 2003, Ji and Dasgupta 
2004a], very bad results are also reported [Stibor et al. 2005b].   
One of the most important parameters for a real-valued NSA is the radius of the self 
hyperspheres, which affects the performance of the algorithm by affecting the setting of 
the radii of the detectors.  In the aforementioned work on real-valued NSA, this 
parameter is arbitrarily chosen [Gonzalez et al. 2003, Ji and Dasgupta 2004a], however, 
Stibor et al. [Stibor et al. 2005b] argue that a proper self-radius can only be determined 
34 
 
 
when anomalous samples are available.  While it is usually the case that both normal and 
abnormal samples are available for classification problems, abnormal samples are usually 
not available for anomaly detection-based network intrusion detection, and even they are 
available, intrusion detection systems that depend on abnormal samples do not truly 
perform anomaly detection but signature-based detection.  Since all the previously 
mentioned work was conducted with low dimensional data set, Stibor et al. [Stibor et al. 
2005b] argue that it is not clear how real-valued NSA performs in high dimensional space.   
 It also should be noticed that real-valued NSA treats each dimension with equal 
weight, namely, each of the n values from all fields contribute the same to the final 
decision of self/nonself classification.  Whether this is a good idea needs further 
investigation since intuition tells us that it is very possible that certain fields should have 
more weight than others in the decision process.  The idea of weighing fields differently 
is weakly and implicitly included in r-contiguous-bit matching rules in the sense that not 
all fields are considered for matching.   
2.2.3.3 A Third Representation 
Harmer and Lamont?s CDIS [Harmer et al. 2002] also employs NSA to generate 
detectors for virus detection and network intrusion detection.  Binary strings of different 
sizes up to 128 bits were tested for virus detection, and for intrusion detection, 320-bit 
strings are used to represent network packets that comprise 29 data fields.  A variety of 
35 
 
 
matching rules based on similarity measures have been investigated in searching for a 
good matching rule.  The Rogers and Tanimoto Similarity Measure [Rogers and 
Tanimoto 1960] was announced the best pattern matching rule that provides a good 
compromise between the specificity and generality of the detectors.  This similarity 
measure is calculated as follows [Harmer et al. 2002]: 
Define: X, Y ? {0, 1}
n
  
a = ?f(i), for i = 1.. n, where f(i) = 1 if X
i
 = Y
i
 = 1, otherwise f(i) = 0; 
b = ?f(i), for i = 1.. n, where f(i) = 1 if X
i
 = 1 and Y
i
 = 0, otherwise f(i) = 0; 
c = ?f(i), for i = 1.. n, where f(i) = 1 if X
i
 = 0 and Y
i
 = 1, otherwise f(i) = 0; 
d = ?f(i), for i = 1.. n, where f(i) = 1 if X
i
 = Y
i
 = 0, otherwise f(i) = 0; 
Similarity between X and Y is: ( a + d ) / ( a + d + 2 ( b + c ) ) 
 If the similarity measure is above a predefined threshold, a matching between X 
and Y occurs.  The pattern representation is exactly the same as that from Hamming 
Shape-Space, however, the matching rules have made this representation a third category.  
While the prototype successfully detects nonself strings, unfortunately, it also exhibits 
low efficiency: the extremely long negative selection time consumed for generating 
detectors makes it infeasible for real world application.   
 
36 
 
 
2.2.4 AIS-Based IDSs 
Because of the functional similarity between the immune system and network intrusion 
detection systems, significant research [Hofmeyr et al. 1998, Hofmeyr 1999, Hofmeyr 
and Forrest 2000, Kim and Bentley 2001b, Dasgupta and Gonzalez 2002, Harmer et al. 
2002] has been done in the past decade in an effort to build immunity-based IDSs.  
2.2.4.1 The Need of Anomaly Detection-Based IDSs 
With the world becoming increasingly interconnected by computer networks, IDSs have 
been playing an important role in automatic response to a wide variety of network-borne 
security problems [Anderson et al. 1995, Kabay 2001].  Currently, most commercial IDSs, 
such as Cisco Secure
2
, Dragon
3
, NFR
4
 and Snort
5
, are signature-based (rule-based) 
systems.  These systems operate by collecting and scanning for signatures of known 
viruses and attacks.  However, this ?reactive? approach makes the network system 
vulnerable to novel attacks.  Updating the number of signatures quickly becomes a 
burden for system administrators [Leon 2000].  This situation merits the development of 
?proactive? systems that are capable of anomaly detection [Hofmeyr and Forrest 2000, 
Tanase 2002].  In anomaly detection, a model (or profile) of the normal behavior of the 
system is built, and any discrepancy from the model is deemed as an abnormality.  This 
                                                 
2
 Cisco Secure, http://www.cisco.com/warp/public/cc/pd/sqsw/sqidsz/.   
3
 Enterasys Dragon, http://www.enterasys.com/products/ids/.   
4
 NFR, http://www.nfr.com/solutions/.   
5
 Snort, http://www.snort.org.   
37 
 
 
approach is based on the assumption that the normal behavior of a system exhibits stable 
patterns over time.  Any observable system behavior can be used to build a model, such 
as sequences of system calls [Forrest et al. 1996, Hofmeyr et al. 1998], network traffic 
density [Dasgupta and Gonzalez 2002], and network packet characteristics [Heberlein et 
al. 1990, Hofmeyr and Forrest 2000, Harmer et al. 2002].   
2.2.4.2 Overview of Immunity-Based Intrusion Detection 
In the context of anomaly detection, network intrusion detection can be viewed as the 
problem of classifying network traffic as normal or abnormal, which is comparable to the 
task of the natural immune system (NIS)
6
.  The IS protects the body by detecting and 
eliminating foreign (nonself) substances such as viruses, bacteria, fungi, and protozoan 
parasites [Dasgupta 1999].  The similarity between the problem of network security and 
that faced by the IS has inspired computer scientists to develop artificial immune systems 
(AISs) for computer security [Forrest et al. 1994, Forrest et al. 1996, Dasgupta 1999, 
Hofmeyr and Forrest 2000, Esponda et al. 2004].   
 The research in the area of immunity-based IDSs was pioneered by Forrest and 
her group in 1994 when thy proposed a change detection algorithm inspired by the 
self/nonself discrimination in the NIS [Forrest et al. 1994].  The NIS performs anomaly 
                                                 
6
 An alternative theory, referred to as Danger Theory, argues that the immune system responds to danger 
signals rather than non-self substances.  Application of Danger Theory to AISs is a new research branch in 
the AIS community [15].   
38 
 
 
detection using a negative detection approach [Janeway and Travers 2001, Esponda et al. 
2004]: it produces censored detectors (lymphocytes and antibodies) that potentially bind 
to everything except self substances, and everything bound by detectors is subject to 
elimination from the body [Forrest et al. 1997].  A typical AIS-based IDS tries to mimic 
the natural IS by evolving a population of negative detectors [Hofmeyr and Forrest 2000, 
Dasgupta and Gonzalez 2002, Harmer et al. 2002, Hofmeyr 1999, Hofmeyr and Forrest 
2000, Hou et al. 2002, Hou and Dozier 2004, Kim and Bentley 2001b, Dozier et al. 
2004a]: network traffic that is matched by the detectors is classified as abnormal (nonself) 
and unmatched traffic is classified as normal (self).  In [Forrest et al. 1997], Forrest 
provides an excellent review of the immunological principles that are of interest to the 
design of AISs.   
 Detectors of AISs are typically generated using the Negative Selection Algorithm 
(NSA) [Forrest et al. 1994, Hofmeyr 1999].  In the NSA, candidate detectors are 
randomly generated, and normal samples are presented to the candidate detectors.  The 
detectors matching any normal samples are replaced by another randomly generated 
candidate detector.  Detectors that do not match any normal samples are considered valid 
detectors.  This process continues until a desired number of valid detectors are generated.  
This method is efficient in that it employs a relatively small set of detectors that has a 
high probability of detecting anomalies in the protected system.  This is because the 
detectors have highly variant matching specificities.  The assumption behind the 
39 
 
 
algorithm is that the normal behavior of a system exhibits stable and observable patterns 
over time, and any perturbation or malfunction in the system will show patterns that are 
significantly different from the ones observed during the system?s normal state.   
2.2.4.3 Prototypes of AIS-Based IDSs 
In this section, we review some previous research in the field of AIS-based IDSs.  We 
then summarize the advantages and disadvantages of using AIS-based IDSs.  More 
details will be devoted to Hofmeyr?s LISYS since the AIS in this dissertation is largely 
inspired by this system.  The structure and implementation of our AIS is given in the next 
section.   
 LISYS is an AIS-based IDS [Hofmeyr and Forrest 2000].  It attempts to classify 
network traffic as either normal (self) or abnormal (nonself).  With LISYS, the 
observable behavior of a network system is modeled using data path triples (DPTs) that 
represent a TCP/IP connection: the IP addresses of the two hosts involved in the 
connection, and the service (port) number of the connection.  The presumption is that 
abnormal activities on the network will produce abnormal DPTs that can be distinguished 
from the DPTs produced from normal activities.  The normal DPTs are used to model the 
normal behavior of the network system, while detectors that detect abnormal behavior are 
generated in the complement space.  Detectors and DPTs are implemented as 49-bit 
strings and an r-contiguous-bit matching rule is used to define the match between a 
40 
 
 
detector and a DPT.  With this rule, two strings match if they have identical bits in at 
least r contiguous positions.  To generate detectors that are adaptable to the normal 
behavior of the monitored system, LISYS maintains and evolves a population of 
detectors using an extended negative selection algorithm.  The evolutionary process of 
these detectors is as follows.   
 Initially, for each host, a randomly generated set of immature detectors is created.  
These detectors are exposed to normal network traffic (self) for a user-specified amount 
of time, termed as tolerisation period.  During this period, if an immature detector 
matches a self packet then the detector dies and is regenerated.  Immature detectors that 
have survived the process become mature detectors.  Thus, mature detectors will typically 
match nonself packets.  A mature detector has a user-specified lifespan, t
mature
, to match 
m
mature
 (termed as match threshold) packets.  This time represents the learning phase of a 
detector.  Mature detectors that fail to match m
mature
 packets during their learning phase 
die and are regenerated as immature detectors.  When a mature detector matches m
mature
 
packets during its learning phase, a primary response (alarm) is invoked.  The match 
threshold is set to reduce false positives (false alarms), and it is based on the assumptions 
that nonself packets tend to occur in temporal clumps.   
41 
 
 
 Once a mature detector sounds an alarm, it awaits a response from the network 
administrator to verify that the detector has identified a valid attack on the system.  This 
response is referred to as co-stimulation [Hofmeyr and Forrest 2000].  If co-stimulation 
does not occur within a prescribed amount of time the mature detector will die and be 
regenerated as an immature detector.  Those detectors that receive co-stimulation are 
promoted to being memory detectors and are assigned a longer life time of t
memory
.  Co-
stimulation prevents mature detectors from becoming memory detectors by matching self 
packets.  This process models the activation mechanism of mature T-cells, which, besides 
binding antigen, need a second signal that indicates the occurrence of body damage 
[Hofmeyr and Forrest 2000].  Memory detectors invoke stronger (secondary) responses 
when they match at least m
memory 
nonself packets (where usually m
mature
 > m
memory
).   
 The different finite lifetimes and the dynamics of the three types of detectors 
renders a rolling coverage of the detector set, which makes LISYS an adaptive system 
that constantly adjusts its matching behavior according to the current normal behavior of 
the system.  LISYS also uses several other mechanisms to improve the system 
effectiveness.  For example, to reduce holes (nonself for which detectors cannot be 
generated [D?haeseleer et al. 1996]) in the system, LISYS uses multiple representations 
(permutation masks);  to improve  the detection rate on coordinated distributed attack,  
42 
 
 
LISYS employs distributed detectors and adjusts match threshold using sensitivity level 
and decay rate.   
 Many other researchers also have designed immunity-based IDSs using novel 
algorithms and techniques [Kim and Bentley 2001b, Dasgupta and Gonzalez 2002, 
Harmer et al. 2002].  Kim and Bentley [Kim and Bentley 2001b] introduced the dynamic 
clonal selection algorithm (DynamiCS) to deal with dynamic self in a continuously 
changing environment.  DynamiCS is based the dynamics of three types of detectors: 
immature, mature and memory detectors, which was originally introduced in Hofmeyr?s 
LISYS.  In order to reduce the false positive rate, many other mechanisms introduced in 
LISYS were also implemented in DynamiCS, such as detector tolerisation, activation 
threshold, and costimulation.  DynamiCS is designed to learn by examining a small self 
sample set at a period of time and replacing detectors that do not represent the current 
normal behavior.  To adapt to the ever-changing normal behavior of a system, DynamiCS 
introduced the concept of a gene library evolution using hyper-mutation, where immature 
detectors are generated through mutation on the deleted memory detectors, and this helps 
the AIS to be able to detect nonself at a satisfactory level without losing self-tolerance.   
 Dasgupta and Gonzalez [Dasgupta and Gonzalez 2002] present an immunity-
based technique that can be used to build a system which generates negative detectors 
using a genetic algorithm.  The detectors can differentiate varying degrees of abnormality 
43 
 
 
in network traffic.  The self and nonself space is viewed as an n-dimension space, with 
each feature of the monitored behavior being a dimension.  Normality is defined by the 
coverage of multiple hyperspheres with a center at each normal sample and a radius v that 
represents allowed variability for generalization.  Abnormality in nonself space is viewed 
as deviation from the normality in self space.  In their implementation, the detectors are 
real-valued and in the form of hyper-cubic rules: for each feature of the monitored 
behavior, a low/high boundary is used to define the nonself space in the corresponding 
dimension.  This gives a crisp nonself characterization function.  To achieve a non-crisp 
characterization function, different variability values are used to represent results in a 
collection of detectors hierarchically organized according to the v values, and unknown 
samples can be characterized as the level of deviation from the normality.  Dasgupta and 
Gonzalez also proposed the use of a genetic algorithm that evolves hyper-cubic rules that 
have large coverage of nonself space and small overlapping with other rules.   
 Harmer and Lamont?s computer defense immune system (CDIS) [Harmer et al. 
2002] is a distributed agent-based system built upon a hierarchical layered architecture, 
capable of both virus and intrusion detection.  In this system, network intrusion detectors 
are built in the form of 320-bit signature strings to monitor packets of three common 
protocols: TCP, UDP and ICMP.  For TCP packets, 29 data fields comprise a detector; 
for UDP and ICMP, 17 fields are used.  Each field takes a value with a range from 1 to 32 
bits, and 28 extra bits are used to define whether a particular field is considered valid or 
44 
 
 
not (first field that defines the protocol type is considered always valid).  They proposed 
that the Rogers and Tanimoto Similarity Measure [Rogers and Tanimoto 1960] was the 
best pattern matching rule that provides a good compromise between the specificity and 
generality of the detectors.  The detectors are randomly generated and censored using the 
negative selection algorithm before they are used to monitor the traffic packets.  The 
detectors have a life cycle that is similar to the one proposed by Hofmeyr.  When any 
detector matches a pattern, costimulation from a human expert is required before the 
detector signals other agent effectors and obtains an extended lifetime.  This makes CDIS 
an interactive system.   
 There are a number of advantages offered by AIS-based IDSs as summarized in 
[Hofmeyr 1999, Hofmeyr and Forrest 2000].  AISs designed using immunological 
principles are robust because they perform distributed and parallel protection, which also 
makes them error-tolerant.  They are self-organized and provide diverse protection in the 
form of anomaly detection as well as signature-based detection.  These systems usually 
can be tunable to balance between the required resource and effectiveness and between 
the false positive and false negative rates.  One major advantage is that they provide a 
form of passively proactive protection via negative selection.  This enables them to detect 
novel attacks.  Another major advantage is that AISs are capable of adapting to 
dynamically changing environments.  As the characteristics of self traffic change over 
time, a properly tuned AIS will be able to effectively adapt to the dynamically changing 
45 
 
 
definition of self.  Finally, the detectors evolved by AIS-based IDSs can easily be 
converted into Snort rules or TCPdump filters.  This is especially true when constraint-
based detectors are used [Hou et al. 2002, Hou and Dozier 2006a].  These constraint-
based detectors are easy to understand and can provide network administrators with 
important forensic information when new attacks are detected.   
 A significant problem with AIS-based IDSs such as LISYS is scalability [Kim 
and Bentley 2001a, Stibor et al. 2005b].  It was observed that for systems that are based 
on the negative selection algorithm (NSA), to maintain certain detection level, the 
computation time needed to generate detectors increases exponentially with the size of 
the self set [D?haeseleer et al. 1996, Kim and Bentley 2001a].  Some efficient algorithms 
have been proposed to alleviate the problem, such as the Linear Time Detector 
Generating Algorithm and the Greedy Detector Generating Algorithm [D?haeseleer et al. 
1996, Ayara et al. 2002].  These algorithms are designed for generating binary-coded 
detectors that employ an r-contiguous-bit matching rule, and they cannot be applied to 
other detector representations and matching rules.  Although these algorithms have a 
linear time complexity in the size of the self set, they require time and space exponential 
in r, the length of continuous bits that decide a match.   
 Recently, the appropriateness of applying the NSA to the network intrusion 
detection has been critically examined in the aspects of efficiency [Kim and Bentley 
46 
 
 
2001a, Stibor et al. 2004, Stibor et al. 2005a, Stibor et al. 2005b] and effectiveness [Kim 
and Bentley 2001a, Stibor et al. 2004, Stibor et al. 2005a, Stibor et al. 2005b, Stibor et al. 
2005c].  These researches suggest that, if defined over certain representations, the AISs 
based on the NSA alone could not be effectively and efficiently applied to network 
intrusion detection.  In this work, we use constraint-based detectors, which are real-coded 
and have varying-sized coverage, to build compact AISs without sacrificing good 
detection rate.   
 Another major disadvantage with the use of AIS-based IDSs is that, at present, 
there is no way to know exactly what types of attacks will pass through undetected.  A 
number of techniques have been developed to reduce the size of potential holes.  
Hofmeyr and Forrest [Hofmeyr and Forrest 2000] use a permutation mask to introduce 
diversity in the pattern representation to reduce undetectable patterns, with the 
assumption that the union of different representations will detect more patterns than any 
single representation.  Balthrop et al. [Balthrop et al. 2002b] propose the concepts of r-
chunks and crossover closure to achieve greater detector coverage.  Dasgupta and 
Gonzalez [Dasgupta and Gonzalez 2002] use multi-level detectors representing different 
deviation levels in an effort to model the non-crisp boundary between self and nonself.   
Nevertheless, none of these can be used to alert the designer or users as to the types of 
attacks that will go undetected by the AIS. 
47 
 
 
CHAPTER 3 A CONSTRAINT-BASED AIS 
This chapter presents a simple AIS that is inspired by the self-nonself recognition 
mechanism of the immune system.  It employs the Negative Selection Algorithm 
proposed by Forest et al. [Forest et al. 1994] to generate negative detectors that can detect 
anomalies that deviate from the normal behavior or patterns of the system being 
monitored.  This AIS is a simplified version of Hofmeyr?s LISYS [Hofmeyr and Forest 
2000] with a novel detector representation, which we referred to as constraint-based 
detectors [Hou et al. 2002, Hou and Dozier 2004].  Timmis and his colleagues proposed 
that the structure of an AIS can be described with three components [De Castro and 
Timmis 2002a, Stepney et al. 2004]: the representation of the samples in the self/nonself 
space, the representation of the detectors, and the matching rule that is used to decide the 
match between a sample and a detector.  We present the constraint-based AIS with these 
three aspects. 
3.1 Pattern Representation 
Our patterns are represented in the form of vector of integers, (x
0
, x
1
, x
2
 ? x
n
), where n is 
the number of fields that are included in the pattern representation, and x
i
 is the value of 
the corresponding field.   The cardinality (and range) on each field can be decided by
48 
 
 
the actual values on that field in the data set, and for continuous values, it can be decided 
after discretization as performed in [Kim and Bentley 2001a].  This representation makes 
the pattern space similar to that defined over Hamming shape-space with higher alphabet 
cardinality as describe in previous section, however, the detector representation and 
matching methods make it totally different.   
3.2 Detector Representation 
Our detectors are implemented in the form of vector of intervals, (lb
0
..  ub
0
, lb
1
..  ub
1
, 
lb
2
..ub
2
 ? lb
n
..ub
n
), where each interval corresponds to a field that is included in the 
pattern representation, and each interval covers the values that fall into that interval [Hou 
et al. 2002, Hou and Dozier 2004]. We have designed two types of interval that have 
different coverage shapes.  The first one is termed linear interval, in which the lower 
bound is less than or equal to the upper bound, and the coverage is strictly from the lower 
bound to the upper bound.  The second on is termed circular interval, where the intervals 
can wrap around so that the coverage is continuous.  In this case, the lower bound is not 
necessary less than the upper bound.  For example, an circular interval of 4..1 actually 
covers 2 sections: 4..n and 0..1, where n is the cardinality of the alphabet.  Notice that 
circular intervals contain linear intervals.   
 
49 
 
 
 These two types of intervals are designed for data with and without ?polarity?.  
We use the term ?polarity? to describe the distribution of samples in pattern space.  For 
any field, the max and min values in that field can be deemed as the two poles, and if 
there is a tendency that values of certain characteristic occur in the vicinity of one pole 
while values of another characteristic occur in the vicinity of the other pole, we say that 
this field has polarity.  For example, there are 9 fields in the Wisconsin Breast Cancer 
Data [39], and for each field, there are 10 values from 1 to 10, with higher values 
indicating a higher likelihood of presence of cancer and lower values indicating a higher 
likelihood of absence of cancer.  Obviously, this data set has polarity.  For data sets with 
polarity, it may be a good choice to employ linear intervals; however, it may be 
reasonable to employ circular intervals for data sets where a clear polarity is not obvious.   
 Notice that detectors in this form can be viewed as constraints: a detector 
population can be viewed as a population of constraints where self corresponds to a set of 
all solutions that satisfies the constraints and where non-self corresponds to the set of all 
solutions that violate at least one constraint.  We thus refer to this type of detector as a 
constraint-based detector.   
3.3 Matching Rules 
We have designed two matching rules for our representation to determine a match 
between a pattern and a constraint-based detector.  An any-r-interval matching rule is 
50 
 
 
similar to rules based on Hamming distance [Hou et al. 2002, Hou and Dozier 2004].  
That is, if any r numbers of a pattern fall within the corresponding r intervals of a 
detector then that detector is said to match the pattern.  For example, given an alphabet of 
0?9 with cardinality of 10, a pattern (4, 4, 0, 2, 2, 0) is matched by a circular detector 
(8..3, 1..1, 4..2, 1..3, 7..8, 0..9) if r is not greater than 3.  The setting of r defines the 
specificity or generality of the match between a detector and a pattern.  An r-contiguous-
intervals matching rule is similar to the rcb rule: if at least r contiguous numbers of a 
pattern fall within the corresponding r contiguous intervals of a detector then that 
detector is said to match the pattern.  In both matching rules, the value r works as a 
matching threshold, defining the specificity or generality of the detectors.  The higher the 
r value, the more specific the match between detectors and patterns, and the fewer 
patterns a detector can match.   
 Our AIS is a real-coded implementation, which has a significant advantage over 
the more commonly used binary-coded implementation as described in [Gonzalez et al. 
2002]: 
?The real valued representation is closer to the problem space.  In 
most cases, it is possible to map the detectors back to the original 
space, giving them a meaning in the problem domain context.  ?  
In binary-coded implementations (where r-contiguous bit matching rule is common), 
when a detector matches some packet, it will be difficult to analyze how the detector 
matches it and why the matched packet is abnormal.  
51 
 
 
 Notice that, similar to the real-valued detectors with varied radii, each constraint-
based detector may have a different volume of coverage in pattern space, while detector 
defined on Hamming shape-space has exactly the same volume of coverage.  The number 
of patterns a constraint-based detector can match is determined by the width of each 
interval. Using detectors with variable coverage to model nonself space has some 
advantages, for example, it is more economical to use fewer ?large? detectors to cover 
large continuous nonself space, while it is more convenient to use ?small? detectors to 
cover small nonself space surrounded by self space, where ?normal-sized? detectors do 
not fit in.  This is exactly how real-valued detectors with variable coverage can avoid 
?holes? introduced by using binary-coded detectors with a constant size of coverage.   
52 
 
 
CHAPTER 4 GENERTIA ? A SYSTEM FOR AIS VULNERABILITY ANALYSIS 
AND REDESIGN  
4.1 Introduction 
The mission of AISs for anomaly detection can be thought of as learning to distinguish 
between self and nonself patterns.  There are two types of possible errors associated with 
the anomaly detection process: self (normal) patterns erroneously detected as nonself 
(abnormal) patterns, and nonself (abnormal) patterns erroneously regarded as nonself 
(normal) patterns so they are not detected by the system.  In the context of statistics, these 
two types of errors are termed as false negatives and false positives [Marchette 2001].  
Accordingly, the performance of these AISs can be evaluated in terms of two 
measurements: the false negative rate and the false positive rate.  The false negative rate, 
which is the proportion of the nonself that has not been successfully detected by the 
system, and the false positive rate is the proportion of the self erroneously detected by the 
system.  The false negative rate can also be expressed as 1 minus the corresponding 
detection rate which is proportion of the nonself successfully detected by the system.  
False negatives are also referred to as vulnerabilities (holes) [Hofmeyr and Forrest 2000, 
53 
 
 
Marchette 2001,  Harmer et al. 2002,  Dozier 2003], in a sense that attacks will go 
undetected through these holes(in the abnormal pattern space) because the system lacks 
of detectors that cover the corresponding pattern spaces where these abnormal patterns 
exist. 
 The existence of holes may cause nonself patterns go undetected.  As mentioned 
in Chapter 2, one of the major disadvantages of using an AIS is that there is no way to 
know exactly what nonself patterns will not be detected by the system.  A number of 
techniques have been proposed to reduce the number of potential holes [Hofmeyr and 
Forrest 2000, Balthrop et al. 2002b, Dasgupta and Gonzalez 2002, Stibor et al. 2004]; 
however, the problem of possible holes still exist and there is no way to ensure that 
certain level of confidence has been achieved regarding the possibility of unknown 
nonself being detected [Hofmeyr 1999].   
 Dozier et al. proposed a technique to improve the design of AIS-based Intrusion 
Detection Systems by searching for vulnerabilities in an AIS [Dozier 2003, Dozier et al. 
2004a-b].  Based on this fundamental work, we present a system that can be used to 
improve the performance of immunity-based systems through vulnerability analysis and 
AIS redesign.  This system, which we refer to as GENERTIA (GENetic and 
EvolutionaRy Teams for Interactive design and Analysis), works by locating and 
patching the holes in an AIS. 
54 
 
 
 This system consists of two subsystems, the GENERTIA Red Team (GRT) and 
the GENERTIA Blue Team (GBT).  The GRT is for AIS vulnerability analysis, namely, 
to discover vulnerabilities (holes) in the AIS, while the GBT is for AIS redesign by 
generating detectors that patch the vulnerabilities discovered by the GRT.  The GRT and 
the GBT cooperate to recursively reduce the holes in the AIS.   
 The GENERTIA approach may be more efficient in the terms of the additional 
coverage obtained by the AIS through increasing detectors compared to adding randomly 
generated detectors through the NSA.  This approach also may be more effective in terms 
of reducing the false positive rate. 
4.2 GENERTIA Red Team 
The objective of the GRT is to perform vulnerability analysis on the AIS.  The concept of 
the GENERTIA red team was originally proposed by Dozier in [Dozier 2003, Dozier et 
al.2004a-b] where genetic and particle swarm red teams were used to analyze the 
vulnerability in an AIS.   
 In this dissertation, the GRT employs a steady-state Genetic Algorithm (GA) 
[DeJong and Spears 1993], although other EC paradigms such as Particle Swarm 
Optimization are also good candidates [Dozier et al. 2004b].  The fundamental idea of the 
red team is very simple: present the AIS with nonself patterns and let the AIS tell us the 
55 
 
 
percentage of the detector set that fails to match these patterns.  These nonself (?red?) 
patterns can be randomly generated; however, applying a GA makes the discovery of 
vulnerabilities more efficient.  Information about the discovered vulnerabilities can then 
be used by the AIS to heal (?redesign?) itself.  The GRT can be seen as a ?white-hat? 
hacker agent: it finds vulnerabilities in AISs.   
 In order to effectively evolve a population of red patterns, the GRT needs the AIS 
to provide a fitness value for a candidate vulnerability.  The fitness value of a red pattern 
can be measured as its closeness to a state in which it can avoid detection by the AIS.  In 
this dissertation, it is simply calculated as the number of detectors in the system that 
failed to detect it.  If a red pattern is assigned a fitness of zero if it is part of self.  So, a 
red pattern that avoids the detection of all the detectors (or has a fitness value that equals 
to the size of the detector set in the current AIS) potentially represents a vulnerability in 
the system.   
 The red patterns take the representation of vector of integers, as described in 
Chapter 3.  The initial red pattern population is randomly generated.  For each GRT 
iteration, an offspring is created by selecting two parents, p1 and p2, from the population 
using binary tournament selection [Goldberg and Deb 1989] and mating the parents using 
uniform crossover
7
 [Sywerda 1989]: each integer x
i
 in the offspring is randomly selected 
                                                 
7
 We chose the uniform crossover for the purpose of simplicity; other operator may improve the 
performance of the GRT.   
56 
 
 
from either x
p1i
 or x
p2i
, where x
p1i
 and x
p2i
 are the corresponding integer values of the two 
parents.  The resulting offspring is then mutated by having one of its integer replaced by 
a random number.  The worst individual in the population is replaced by the offspring if 
the offspring has a better fitness value.  However, if the newly generated offspring has a 
fitness that is equals to the size of current detector set, it will not replace the worst 
individual. Instead, it will be added to a vulnerability set.  This way, we can use 
population sizes that are smaller than the number of requested vulnerabilities.   
 The GRT population is randomly regenerated if the GRT is not able to discover 
any vulnerability in a user-specified number of iterations.  A counter records the number 
of unfruitful iterations during which no vulnerability is discovered, and this counter is 
reset when a vulnerability is discovered.  This mechanism prevents the GRT from being 
trapped in an area that is too close to the self space or the coverage of existing detectors.   
 The GRT iterations go on until a user-specified number of vulnerabilities have 
been discovered or it has exceeded a user-specified limit on the number of population 
restarts.  Meeting the second stopping criteria suggests that the current AIS is of certain 
security level so that the GRT has difficulty to find vulnerabilities in it.  The vulnerability 
set will later be used by the GBT to redesign the AIS.  Figure 4.1 shows the GRT 
algorithm in the form of a flow chart.   
 
57 
 
 
 
Figure 4.1: The flow chart shows the overall GRT algorithm. 
58 
 
 
4.3 GENERTIA Blue Team 
As mentioned in the previous section, GENERTIA is a system used for AIS vulnerability 
analysis, design and redesign.  It consists of two subsystems: the GRT and the GBT.  The 
GRT is used to discover the holes in the AIS, and the GBT generates detectors that patch 
these holes, so the GRT and the GBT cooperate to strengthen an AIS.   
 The objective of the GBT is to initially design an AIS and then redesign it to 
make it more effective.  The GBT takes a set of vulnerabilities discovered by the GRT 
and creates an additional detector set that covers these vulnerabilities.  This additional 
detector set is then combined with the current detector set (which will result in a larger 
set of detectors). 
 The additional detector set is developed as follows.  First, the GBT uses a steady-
state GA to evolve a population of candidate detectors (CDs).  These CDs are in the same 
representation as described in Chapter 3 and the initial population is randomly generated.  
Each CD is assigned a fitness based on the number of vulnerabilities it covers.  If a CD 
matches a self pattern in the training set then it is assigned a fitness of zero. 
 Each time a CD is created with a fitness that is greater than the current best fitness, 
a release threshold counter is reset to zero.  The release threshold counter is incremented 
on each iteration for which no better CD is found.  When the release threshold counter 
59 
 
 
gets above a user-specified number of iterations (where there is no improvement in the 
fitness of the best fit CD), the CD with the best fitness is then removed from the 
population and added to the additional detector set.  Once this has been done, the release 
threshold counter is reset to zero and a randomly generated CD is used to replace the 
removed CD.   
 This process of evolving, selecting the best fit CD, and adding it to the additional 
detector set is repeated until the additional detector set covers all of the vulnerabilities 
discovered by the GRT.  At this point, the additional set of detectors is added to the 
detector population.   
 To prevent the GBT from running indefinitely, the GBT population is randomly 
regenerated if the GBT cannot find any valid detector during a predefined number of 
iterations.  The GBT is only given a limited number of opportunities for population 
regeneration, and when this limit is reached, the GBT stops and all the remaining 
vulnerabilities are discarded.  The difficulty of the GBT finding valid detectors for the 
remaining vulnerabilities suggests that these vulnerabilities are too close to the self space 
so that they probably should be regarded as generalization of self. 
 To reduce the detector set size, redundant detectors are not allowed.  In order to 
accomplish this, when a CD is released, the vulnerabilities that are matched by this 
60 
 
 
detector are removed from the vulnerability set, so that the detectors released later will 
have different coverage.  Figure 4.2 shows the GBT algorithm in the form of a flow chart. 
 The steady-state GA used by the GBT uses binary tournament selection 
[Goldberg and Deb 1989] to select two parents and applies uniform crossover [Sywerda 
1989] to create one offspring
8
: each integer interval in the offspring is randomly selected 
from the two corresponding integer intervals of the two parents.  The offspring then has 
one of its intervals (randomly chosen) mutated by randomly selecting new lower and 
upper bounds for the interval. In the experiments in this dissertation, the population size 
of the GBT was arbitrarily set to 50, and for each vulnerability set, the GBT was allowed 
10 population regenerations including the initial generation.   
                                                 
8
 Again we used uniform crossover for simplicity; other operators may improve the performance of the 
GBT.   
61 
 
 
 
Figure 4.2: A flow chart shows the overall GBT algorithm.  ?RT? stands for release 
threshold
62 
 
 
CHAPTER 5 A GENERTIA-BASED INTRUSION DETECTION SYSTEM 
5.1 Introduction 
Artificial Immune Systems have been employed to build systems for a broad range of 
applications [Dasgupta 1999], and because of the functional similarity between the 
immune system and network intrusion detection systems, significant researches [Hofmeyr 
et al.1998, Hofmeyr 1999, Hofmeyr and Forrest 2000, Kim and Bentley 2001b, Dasgupta 
and Gonzalez 2002, Harmer et al.2002] have been done to build AIS-based intrusion 
detection systems (IDSs).  From an anomaly detection perspective, the functionality of an 
IDS is to classify the network traffic as either normal or abnormal. Abnormal traffic are 
potentially harmful to the network system or the hosts on the network.  All the above 
mentioned AIS-based IDSs try to build a model (a detector set) of the network traffic for 
the protected network system by learning the normal patterns displayed by the system, 
and this model is later used to classify network packets as normal or abnormal. 
 Obviously the performance of an AIS-based IDS depends on the learning process 
in which the AIS is trained and the model is built.  As mentioned in the previous chapter, 
the hypothesis developed through any learning algorithm has two types of errors: false
63 
 
 
positives and false negatives.  For intrusion detection, false negatives happen when 
abnormal packets are wrongfully classified as normal, and the consequence is the 
protected system being compromised.   
 No matter how well an IDS is designed and configured, vulnerabilities always 
exist and are subjected to exploit from hackers from all over the world.  Vulnerabilities 
are usually discovered and patched after the protected system has been found 
compromised by hackers through these vulnerabilities in the IDS [Leon 2000].  So, to 
effectively reduce the probability of being compromised, an aggressively proactive IDS 
must seek to discover and patch the vulnerabilities before an attacker can exploit them.  
But how?  In this chapter, we show that the GENERTIA system, proposed in Chapter 4, 
can be applied to strengthen an AIS-based IDS to reduce the number of vulnerabilities in 
the IDS. 
Figure 5.1 shows the architecture of GENERTIA and its interaction with an AIS-
based IDS for host-based intrusion detection.  The GBT is used to design an IDS based 
on input from the network manager.  After a preliminary host-based IDS has been 
designed, the GRT performs a strength and vulnerability analysis of the IDS, based on the 
input specifications of the network manager.  The GRT analysis results in information 
concerning the relative strength of detectors (Labeled as RSD in Figure 5.1) comprising 
the IDS as well as a list of vulnerabilities (holes in the IDS).  This information is then 
64 
 
 
given to the GBT to be used to redesign the IDS.  The GRT and the GBT work 
cooperatively in a co-evolution fashion [Potter and De Jong 1998] to make the AIS more 
effective.  The preliminary results presented in this paper are based on a host-based IDS.  
However, these results can be generalized to a network where each host has an instance 
of GENERTIA running on it. 
 
Figure 5.1  The architecture of GENERTIA for AIS-based IDS 
5.2 Experiments 
In this dissertation, the population size of the GRT was arbitrarily set to 50, and the 
vulnerability set size was arbitrarily set to 300, the GRT population regeneration 
65 
 
 
threshold was arbitrarily set to 5000, and the GRT was allowed 10 population 
regenerations including the initial generation. 
5.2.1 Data Set  
The purpose of this experiment is to show that the performance of an AIS can be 
improved by GENERTIA design/redesign.   
 The data set used in our experiments was taken from the MIT Lincoln laboratory 
1998 DARPA Intrusion Detection Evaluation Data Set [Lincoln Lab 1998].  Although 
this data set has been criticized because it was not generated by actual users and has some 
weaknesses such as the lack of noisy traffic [McHugh 2000], we adopted this data set 
because it is publicly available and has been widely used for IDS performance testing.  
This data set contains 35 days of simulated network traffic, with each packet marked as 
background (normal) or attack (abnormal) packet.  The packet volumes of the hosts that 
were subject to attack are listed in Table 5.1 (for the hosts on 172.16.115/116, the subnet 
traffic volumes are listed).  From this data set, we filtered out packets involving port 80 
(http
9
) and converted each packet into the patterns in the form as described in Chapter 3.   
 The patterns are implemented in the form (ip_address, port, src), where 
ip_address contains 4 integers that compose the IP address of the remote host, port 
                                                 
9
 We filtered out the traffic involving web server port number 80 because, according to [8], the traffic to 
this port continuously changes and does not have stable definition of normal behavior in term of data triple.   
66 
 
 
contains 1 integer and represents the port number on the receiving host, and src is 
assigned to 0 if the packet is incoming or 1 if the packet is outgoing.  Such a pattern is 
referred to as a data triple (DT).  The first 4 integers can take value between 0 and 255, 
and the fifth integer can take value between 0 and 69
10
.  For example, a DT, 
(131.204.20.120, 21, 0), represents an incoming connection from a host with IP address 
of 131.204.20.120 to the local host through a port whose number is mapped to 21 (the 
ports were mapped into 70 categories according to [Hofmeyr 1999, page 46]).  To obtain 
a host-based data set, the DTs involving host 172.16.112.50 were extracted.  We chose 
this host because majority of the network traffic involves this host, and this host was the 
target of most attacks as shown in Table 5.1.  We extracted the normal DTs and deleted 
all duplicates to obtain the self set.  The self set consisted of 112 self DTs.  This self set is 
used as a bound self space for the experiments (we assume that any other DTs are 
abnormal DTs).   
 Our AISs were trained on approximately 80% of the self set with 89 randomly 
selected self DTs.  The remaining 23 self DTs were used to test for false positives.  Our 
test set consisted of a total of 1604 distinct DTs, including all attack packets launched 
during the 35 day period.  Note that there are 22 abnormal DTs also appeared in the 
                                                 
10 
In TCP/IP, 16 bits are used for port number, resulting in 64k different numbers; however, by separating 
the well-known or undefined ports into groups, these numbers may by mapped in to 70 categories [22, page 
46].   
67 
 
 
normal set.  This is because some attacks, such like port-scan and denial-of-service, may 
generate packets that also occur in normal traffic. 
Table 5.1  Network Traffic Volume of the Hosts Subjected to Attacks in 1998 Lincoln 
Lab Dataset 
 
IP Address  
Normal 
Packets Self DTs 
Abnormal 
Packets 
Nonself 
DTs 
172.016.112.050  9954 112 1404320 1604 
172.016.112.149  12656 68 70 15 
172.016.112.194  13288 67 70 15 
172.016.112.207  11407 68 70 15 
172.016.113.050  4056 54 217447 369 
172.016.113.084  12149 66 70 15 
172.016.113.105  12038 66 70 15 
172.016.113.204  12463 67 70 15 
172.016.114.050  5937 93 27730 392 
172.016.114.148  34111 94 71 15 
172.016.114.168  12065 69 72 15 
172.016.114.169  11846 69 71 15 
172.016.114.207  12570 64 70 15 
172.016.115. 610 56 12 3 
172.016.116. 409 38 8 2 
5.2.2 Vulnerability Analysis 
In this experiment, we investigated the ability of the GRT to discover vulnerabilities.  
First, we trained our AISs to generate detector set of sizes taken from the set: {10, 20, 40, 
80, 160, 320}.  These detectors were randomly generated with negative selection.  The 
68 
 
 
performance of the trained AISs was tested using the test set, and then we used the GRT 
to find 300 vulnerabilities in the systems.  We compared the efficiency of the GRT with 
another ?red team? that created red DTs by random generation.   
 The experiment was carried out as follows: for a certain size of the detector set, an 
AIS was trained using a training set, the vulnerability analysis was then repeated 10 times 
(due to the initial random GRT population, the result varies for each analysis on the same 
AIS), and the average of the 10 runs was calculated for this AIS; this process was 
repeated for 10 AISs, and the average of the 10 systems was calculated.  We then repeat 
the above process and calculation for the other 5 training sets.  Table 5.2 shows the 
average result of this experiment.  Column 1 is the size of the detector set.  The detection 
rate (DR) and the false positive rate (FPR) are shown in Columns 2 and 3.  The DR was 
calculated as the number of abnormal DTs detected divided by 1611(the number of 
abnormal DTs); the FPR was calculated as the number of normal DTs detected divided 
by 23 (the number of normal DTs that were not included in the training set).  Columns 4 
and 5 show the numbers of iterations needed to find the first vulnerability in the AIS, 
using the GRT and random generation respectively; the numbers of iterations needed to 
find 300 vulnerabilities in the AIS using the GRT and random generation are recorded in 
Columns 6 and 7 respectively.  The numbers in the parentheses are the standard 
deviations. 
69 
 
 
 The GRT used a population of 50 red DTs that initially are randomly generated, 
and we have counted the generation of each DT as one iteration, including the first 50 
random DTs.  So the ?real? GRT iteration begins with the 50th iteration, before when 
GRT has the same efficiency as the random red team.  Accordingly, in the table, the 
numbers that are less than 50 indicate that vulnerabilities are discovered by random DTs.   
Table 5.2  Comparing the efficiency of the GRT and random generation (The numbers in 
parentheses stand for the standard deviation) 
 
1 Vulnerability 300 Vulnerabilities 
Detector 
Set Size 
DR FPR 
GRT Random GRT Random 
5 7.17% 0.94% 1(0.5) 2(0.5) 407(14.4) 458(19.7) 
10 14.89% 2.46% 2(0.0) 2(0.0) 526(34.2) 658(58.7) 
20 27.80% 4.06% 4(0.5) 3(0.5) 764(62.2) 1128(115.1) 
40 43.94% 7.61% 8(1.6) 9(0.7) 1111(83.0) 2607(453.8) 
80 60.69% 15.51% 26(3.4) 24(6.4) 1633(143.8) 6960(1805.7) 
160 77.62% 22.83% 51(10.8) 80(18.3) 2162(203.6) 23953(5692.8) 
320 91.52% 32.83% 102(14.3) 339(95.0) 3199(328.1) 113225(50185.3)
 As one may expect, using the GRT is more efficient than random generation.  One 
can also see that increasing the detector set size results in an increased number of 
iterations needed to find vulnerabilities.  Obviously this is because increasing detectors 
results in a reduced number of vulnerabilities in the system. 
 
 
70 
 
 
5.2.3 Generating Detectors 
 In this experiment, the effectiveness of the GBT was investigated.  One important 
parameter for the GBT is the release threshold.  The release threshold defines the number 
of iterations the GBT needs to go through, during which a better CD cannot be found, 
before the best CD is added to the detector set.  To investigate the effect of the release 
threshold on the efficiency of the GBT, different release threshold settings of 100, 200, 
500 and 1000 were tested.  
 The experiment was carried out as follows: an AIS with 320
11
 detectors were 
trained with a training set, then a GRT was used to discover 300 vulnerabilities in the 
system and a GBT was allowed to design detectors for the vulnerability set; this 
GRT/GBT iteration was repeated 10 times for this AIS and the average of the 10 runs is 
calculated; this process was repeated for 10 AISs and the average was calculated for the 
10 AISs.  Then the whole process and calculation was repeated with other 5 training sets; 
finally the average results from the 6 training sets was calculated. 
 Figure 5.2 shows the average GBT performance in terms of the number of 
detectors generated and the number of iterations taken to generate these detectors, with 
the x and y-error bars showing the standard deviation values.  In this figure, each point 
                                                 
11
 In this and following experiments, we generated 320 detectors to keep the detector set size consistent 
with the settings we used in later experiments, in which we used 10*x
2
 (10, 20, 40, 80, 160, 320).  It was 
also learnt from later experiments that for GENERTIA-redesigned AISs, with a release threshold of 1000, 
320 is a rough limit on the size of the detector set, beyond which it will become difficult to generate more 
detectors.   
71 
 
 
represents a different setting of release threshold, the x-axis represents the size of the 
additional detector set generated to cover the discovered vulnerabilities, and the y-axis 
represents the GBT iterations needed to generate the additional detector set.  One can see 
in Figure 5.2 that increasing the release threshold (RT) can lead to smaller additional 
detector sets.  This is because the higher the release threshold, the better the chance of 
generating CDs that match more vulnerabilities, however, at the price of more GBT 
iterations as shown in the figure. 
 
Figure 5.2  The effect of release threshold on the GBT activity. The Pareto front shows 
the trade-off between the size of the additional detector set and the GBT iterations needed 
to generate these detectors, with different GBT release threshold.  The points on the curve 
represent the average values and the error bars show the standard deviation values. 
 The setting of the release threshold is not trivial. If the threshold is set too high, 
then the GBT tends to waste many iterations trying to search for a better CD even when 
the chance for finding such a detector may be small.  This is especially true when the 
72 
 
 
vulnerability set contains DTs that are far away from each other.  They can be separated 
by self DTs so that a single detector cannot cover red DTs without covering part of self.  
On the other hand, if the release threshold is set too low, the GBT tends to generate more 
CDs than are needed.  Therefore, there is a trade-off. 
 In the following experiments, we set the GBT release threshold to 1000, which is 
relatively high, in order to generate fewer detectors, at the cost of more GBT iterations.  
Using a lower release threshold, GENERTIA should be able to more quickly 
discover/patch vulnerabilities, at the cost of a larger detector set. 
5.2.4 GENERTIA Redesign 
The objective of GENERTIA redesign is to make an AIS-based IDS more effective and 
more efficient as well.  Effectiveness of an IDS is how accurately the system can detect 
intrusion, while the efficiency of an IDS is how economical the system can be in terms of 
computing resources (memory, CPU time etc.) consumed to perform intrusion detection.  
In this experiment, we evaluate the effectiveness of an AIS using the GRT effort for 
finding vulnerabilities in the system, and the associated efficiency using the size of the 
detector set in the system.  The number of GRT iterations needed to discover the 
vulnerabilities in an IDS can reflect the effectiveness of the system because it shows how 
hard the GRT must work in order to discover vulnerabilities.  The size of the detector set 
73 
 
 
can reflect the efficiency of the system because a smaller number of detectors require less 
memory and computation.   
 To show that a GENERTIA-redesigned AIS-based IDS can outperform ones 
without GENERTIA, experiments were carried out to compare the effectiveness of a 
system that used GENERTIA and one that did not.  For the system that used GENERTIA, 
the AIS was trained to generate detector set of sizes taken values from the set: {10, 20, 40, 
80, 160, 320}.  First, the system generated only 1 detector.  Then the GRT was used to 
discover 300 vulnerabilities in the system, and the GBT was allowed to generate 
detectors for these vulnerabilities.  Next the generated detectors were added to the 
system?s detector set.  The GRT/GBT cycle was repeated continuously until the detector 
size reached the specified size.  This experiment shows how GENERTIA can redesign a 
weak system (which has only 1 detector) to make it stronger.  The GRT/GBT cycle is 
actually the redesign process although the whole process can also be viewed as the 
process of designing a system from scratch.  We will refer to a GENERTIA-redesigned 
AIS as a redesigned system for simplicity. 
 For the AIS that did not use GENERTIA, the system was simply trained to 
generate RGNS detector set of sizes also taken values from: {10, 20, 40, 80, 160, 320}.  
We refer to this AIS as an RGNS system for simplicity. 
 After the two systems were built the GRT was used to perform vulnerability 
analysis on the systems.  With each of the 6 training sets, 10 AISs were built; and the 
performance of each system was determined by taking the average of 10 runs.  So, the 
result reported here was the average of 600 runs for each parameter setting. 
74 
 
 
 Figures 5.3a and 5.3b show the results of the experiments.  The x-axis is the size 
of the detector set, and the y-axis is the number of the GRT iterations needed to discover 
the first (Figure 5.3a) and the 300th (Figure 5.3b) vulnerabilities in the system.  From 
these figures, it is clear that for a system with larger detector sets, more effort was 
required to discover vulnerabilities.  The reason for this is, as we mentioned in 
Experiment I, that larger detector sets have larger coverage of nonself space, which 
results in fewer vulnerabilities in the system.  
 However, by comparing the two curves in each figure, we can see that there was a 
big difference between the redesigned system and the RGNS systems.  Although 
increasing the size of the detector set constantly improved the performance of both 
systems, in terms of the number of GRT iterations needed to discover vulnerabilities in 
the systems,   the improvement process is much more dramatic in the redesigned system 
as GENERTIA-designed detectors were added to the system.  For the redesigned system, 
the system effectiveness grew exponentially with the increase of detector set size.  For the 
RGNS systems, the growth of system effectiveness was linear to the increase of the size 
of detector set.  Comparing the redesigned system with the RGNS system, we found that 
when the size of detector set was 320, for the RGNS system, the average number of GRT 
iterations needed to discover the first and the 300th vulnerabilities were 102 (with 
standard deviation of 14) and 3198 (328), respectively; for the redesigned system, it 
required 4038 (819) GRT iterations to discover the 1st vulnerability and 283177 (48151) 
iterations to discover the 300th vulnerability. 
 
75 
 
 
 
(a) 
 
(b) 
Figure 5.3 Comparing the GRT effort needed to discover 1 (a) and 300 (b) vulnerabilities 
in systems trained with and without GENERTIA. 
 Detectors generated through GENERTIA design have different coverage.  So for 
a GENERTIA-redesigned system, a larger detector set will have larger coverage.  For the 
RGNS system, a larger detector set does not necessarily have larger coverage, because 
the randomly generated detectors may have large overlapping coverage with detectors 
76 
 
 
already existing in the system.  We found that for a system with 10240 RGNS detectors, 
it usually took less than 2500 GRT iterations to discover a vulnerability in the system.  
Obviously, simply increasing the detector set size is not an efficient way to improve 
system performance.   GENERTIA  offers a more efficient approach to improving the 
performance of AIS-based IDSs by adding a minimal number of detectors for a maximal 
amount of coverage. 
5.2.5 Performance Comparison of AISs with/without Redesign 
To better understand the effect of GENERTIA redesign on the performance on the AIS, 
experiments were carried out to compare a GENERTIA-redesigned AIS and an AIS 
without redesign.   
 Each experiment consisted of 2 stages: training and test.  In the training stage, for 
the redesigned AIS, 1 RGNS detector was generated, then the GRT/GBT cycle was 
applied to design the rest of the detectors; for the AIS without redesign, it simply 
generated a defined number of RGNS detectors.  In the test stage, the systems were 
subjected to the test (attack) set.  We performed cross-validation by using the 6 training 
sets described in Section IV.  For each of the 6 training sets, the experiment was repeated 
for 10 times, and the average performance of two systems during their 60 runs was used 
for comparison. 
77 
 
 
 The performance of the systems was evaluated with respect to both detection and 
false positive rates.  As mentioned in the first experiment, the detection rate was 
calculated as the number of abnormal DTs detected divided by 1611(the total number of 
abnormal DTs); the false positive rate was calculated as the number of normal DTs 
detected divided by 23 (the number of normal DTs that were not included in the training 
set).  Figure 5.4 shows the average performance of the IDSs in the form of the receiver 
operating characteristics (ROC) diagrams, in which the performance of the systems is 
plotted as detection rate versus false positive rate.  The y-axis represents detection rate, 
and x-axis represents false positive rate.  The optimal performance (100% detection rate 
and 0% false positive rate) is located in the upper left corner. 
 We speculate the reason why the GENERTIA-redesigned system outperformed 
the one without redesign as follows.  GENERTIA is an interactive system.  During the 
GENERTIA redesign process, vulnerabilities discovered by the GRT are only candidate 
?holes?, and they need to be confirmed to be real ?holes? in the system by the network 
administrator before detectors are designed to patch these holes.  The reason why 
GENERTIA may increase detection rate and reduce false positive rate can be explained 
as follows.  The DT space is divided into 3 parts: ?self? represents the known self DTs (at 
the very beginning, it consists of all the DTs in the training set), ?nonself? represents the 
DTs that are currently covered by the detector set and these DTs are deemed as ?known? 
nonself, ?unknown? represents the DTs that are neither ?known? self nor ?known? nonself.  
78 
 
 
The vulnerabilities discovered the GRT are unknown DTs, and it is the responsibility of 
the network administrator to classify these DTs as self or nonself.  If it is classified as self, 
it is added to the training set (becomes a known self); no detectors will be allowed to 
cover this DT, and the AIS will not generate false positive from this DT, so this tends to 
lower the false positive rate.  If it is classified as nonself, then new detector is designed to 
cover this DT and this tends to raise the true positive rate (or detection rate). 
Performance Comparison
0.00
0.20
0.40
0.60
0.80
1.00
0.00 0.20 0.40 0.60 0.80 1.00
False Positive Rate
Detection Rate
Without GENERTIA
With Genertia
 
(a) 
Figure 5.4  ROC diagrams for AISs with/without GENERTIA redesign. (Part 1 of 2) (a) 
Full scale. In both curves, the points at the lower left corner represent a detector set of 
size 5; size of the detector set doubles at each point along the curve towards the upper 
right corner. 
79 
 
 
Performance Comparison
77.62/22.83
(160)
94.90/39.78
(640)
91.52/32.83
(320)
95.39/34.42
(80)
92.16/24.28
(40)
78.95/16.74
(20)
0.75
0.80
0.85
0.90
0.95
1.00
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
False Positive Rate
Detection Rate
Without GENERTIA
With GENERTIA
 
(b) 
Figure 5.4  ROC diagrams for AISs with/without GENERTIA redesign. (Part 2 of 2) (b) 
Detail of the upper left corner, with the labels beside the curves showing the numbers in 
the form ?Detection Rate/False Positive Rate (Detector Set Size)?. 
5.3 Discussion 
5.3.1 The Importance of the Preliminary Work on GENERTIA 
Implementing and maintaining secure computer systems is difficult, because there is no 
way of ensuring that a certain level of security has been achieved [Anderson et al. 1995, 
Hofmeyr 1999].  However, by iteratively applying vulnerability analysis and system 
redesign, we can attempt to minimize the chance of the system being compromised.  The 
preliminary experimental results presented in this paper show that one can improve the 
80 
 
 
effectiveness of an immunity-based IDS by applying cycles of vulnerability analysis and 
the system redesign. 
 Lightweight IDSs (in terms of the size of detector/signature sets) are always 
desirable, regardless if it is signature-based or an anomaly detection system.  An IDS 
with too many detectors, not only unnecessarily consumes more resources of the host on 
which it runs, but also slows down packet processing.  When the IDS is overloaded by 
the network traffic, it creates a bottleneck, and this makes the IDS a vulnerable target of 
denial-of-service attacks.  Reducing the size of the detector set is vital for an AIS to be 
useful.  To become a memory detector, a mature detector usually requires co-stimulation 
from a network administrator [Hofmeyr and Forrest 2000, Harmer et al. 2002, Kim and 
Bentley 2004], so reducing the total number of detectors may save the administrator from 
a flood of co-stimulation requests.  So a GENERTIA-redesigned AIS can alleviate the 
burden of the administrator by reducing the false positive rate. 
 GENERTIA provides a novel approach to the redesign/design of AIS-based IDSs.  
Although this work has focused on the network intrusion detection, however, the concept 
of GENERTIA may be applied to other anomaly detection systems that build and use 
models of normal behavior.  In fact GENERTIA can also be applied to signature-based 
IDS to produce a hybrid system that perform both signature-based and anomaly detection.  
For example, GENERTIA can be applied to Snort for vulnerability analysis by changing 
81 
 
 
the GRT fitness function to the number of rules a red DT can avoid.  The role of GRT 
could also be replaced by any attack/hacker software so that the system administrator 
could find out whether the system would be vulnerable to such attack.  The concept of 
GENERTIA may also be used in the development of general machine learning 
applications. 
5.3.2 Inefficiencies and Improvements of GENERTIA 
At the price of a higher false positive rate, one can increase the detection rate of an AIS 
with a larger detector set as seen in Figure 5.3.  However, with GENERTIA, the cost to 
generate additional detectors grows with the increasing in the number of existing 
detectors.  This is because the existing detectors determine the number of and distribution 
of the vulnerabilities in nonself space.  The vulnerabilities that are relatively far away 
from self are easily discovered by the GRT and patched by the detectors designed by the 
GBT in early stages.  With the increase in the detector set size, the majority of the 
vulnerabilities in the system tend to be distributed around self space, which are difficult 
for the GRT to discover, and difficult for the GBT to design patching detectors.   
 To solve this problem, an easy approach is to ?fine tune? the match between the 
detector and the DT.  Dasgupta and Gonzalez [Dasgupta and Gonzalez 2002] proposed a 
multi-level approximation approach to model nonself space that is in the vicinity of self 
space.  Similarly, the GBT may design detectors with various matching threshold values, 
82 
 
 
with higher matching threshold to match the nonself DTs that are close to self space.  In 
our prototype system, the GRT is implemented using a steady-state GA; however, it is 
possible to implement a GRT that alternatively applies different algorithms, such as a 
?genetic swarm? as suggested by Dozier [Dozier et al. 2004b].  Since different algorithms 
have different search behavior, combining a variety of algorithms may lead to a more 
thorough search [Dozier et al. 2004b].  The two GAs used by the GRT and the GBT used 
the uniform crossover operators.  We chose these operators for simplicity rather for 
efficiency, and it is likely that some other crossover operators may improve the efficiency 
of the two teams.  The parameters were set arbitrarily and optimization of these 
parameters may also improve this GENERTIA. 
 We have shown that GENERTIA can design/redesign lightweight IDSs, however, 
this is at the expense of computation time.  We have not yet established the complexity 
expression for GENERTIA, however, we can give a rough analysis by comparing the 
time complexity of generating detectors by RGNS and GENERTIA.  We may compare 
the two in term of the number of detector trials; however, the cost in generating a RGNS 
detector is very different from that of generating a detector through GENERTIA.  So we 
compare the two in term of number of detector checks (NDC): the number of checks for 
whether a detector matches a DT.  We have recorded NDC when performing the 
experiments described in the previous section, Table 5.3 shows the average NDC needed 
83 
 
 
to generate detectors with RGNS and GENERTIA.  From this table, one can see that 
GENERTIA is significantly expensive than RGNS. 
Table 5.3 Time cost in terms of NDC for generating detectors using RGNS and 
GENERTIA. (The numbers in the parentheses are standard deviation values) 
 
Detector 
Set Size 
RGNS GENERTIA 
5 1270(176) 936544(132023) 
10 2602(195) 1571719(153587) 
20 5221(208) 3436167(340054) 
40 10451(533) 7163712(618000) 
80 20937(909) 16113746(2732696) 
160 42522(3082) 42315748(5016890) 
320 84421(4914) 300251538(52273681) 
 
 With RGNS, the cost in computation time is linear to the number of detectors 
generated, because each newly generated detector has nothing to do with the ones already 
existing in the system; it goes through the same negative selection process independently, 
and statistically, costs the same in terms of computing power.  With the experimental 
setup and parameter setting in our experiments, the cost for generating a RGNS detector 
is roughly 260 NDC.  In the ideal case, the number of detector checks needed to generate 
a valid RGNS detector is the same as the size of the training set.  For other failed trials, 
assuming the average NDC performed is the half of the size of the training set, we can 
easily calculate that the average number of detectors tried for successfully generating a 
RGNS detector is (260-89)/(89/2)+1 = 4.8. 
84 
 
 
 With GENERTIA, the cost for generating detectors is from the cost for the GRT 
to discovering vulnerabilities and the cost for the GBT to design detectors for these 
vulnerabilities.  The GRT uses a GA to evolve red DTs.  During each GRT iteration, an 
offspring is generated and its fitness evaluated.  NDC performed for fitness evaluation of 
each red DT equals to the size of the detector set.  Since the number of GRT iterations 
increases exponentially with the increase in the size of the detector set, the NDC 
performed by the GRT increases in the same trend.  The GBT also uses a GA to evolve 
CDs.  During each GBT iteration, an offspring is generated and its fitness evaluated.  To 
evaluate the fitness of a CD, detector checks are performed for the DTs in the training set 
and the red DT set from the GRT.  The data for GENERTIA in Table 5.3 is plotted in 
Figure 5.5.  From this figure we can see that the computation cost in terms of NDC 
increased exponentially with the increase in the size of the detector set.  The trend of the 
curve is very similar to the ones in Figure 5.3, the GRT iterations needed to find 
vulnerabilities in an AIS.  If we consider the GRT effort for finding vulnerabilities in an 
AIS the measurement of the strength of the system, then the resemblance in these curves 
suggests that the cost of building an AIS is proportional to the strength of the built system.   
 As a real time application, a successful IDS must have good scalability.  Heavy 
computation needed by GENERTIA may seriously limit the feasibility of its application 
as an IDS redesign system.   Improving the GENERTIA by design more efficient GRT 
85 
 
 
Time for Generating Detectors with GENERTIA
0
50000000
100000000
150000000
200000000
250000000
300000000
350000000
5 10204080160320
Size of Detector Set
# Detector Checks
 
Figure 5.5  The average number of the GBT iterations needed to design detectors 
and GBT will alleviate the problem.  Besides, there is a way to reduce the computation 
time in running GENERTIA.  We can simply set a limit on the computation time for the 
GRT.  Rather than being given unlimited time to discover 300 vulnerabilities, the GRT 
can be given an iteration limit, and it terminates and hands off the discovered 
vulnerabilities when the limit is exceeded.  This may reduce the number of discovered 
vulnerabilities and it is possible that no vulnerability is discovered at all using the given 
limit.  On the other hand, however, no vulnerability being discovered in given limit may 
suggest that certain level of security is obtained.  This provides a rough measurement to 
evaluate the strength of an AIS.  As we have seen in Figure 5.5, the cost for building an 
AIS is proportional to the quality of the built system.  If an optimum system is too 
86 
 
 
expensive to obtain, we have a choice for ordering a satisfactory system: a system that is 
resistant to the hacking by a GRT with a certain number of iterations.  
5.3.3 Adaptation of GENERTIA to Dynamical Environment 
The experiments conducted in this chapter are based on a static self set.  However, the 
environment in which an IDS operates is continuously changing due to many factors such 
as security policy modification, computer configuration alternation and adding or 
deleting user accounts etc.  An IDS must constantly and automatically adapt to these 
changes without severe performance degradation.   
 To design/redesign detectors for an AIS, GENERTIA need a set of sample normal 
patterns (training set).  This training set may be initialized by collecting normal samples 
during an attack-free period or by collecting samples and filtering out abnormal samples.  
Each of the samples in the training set has a timestamp and when it is time-out, it is 
removed from the training set.  Each time a same sample occurs on the network traffic 
the timestamp of the corresponding sample in the training set is updated (samples that 
occur on the network but do not appear in the training set may be deemed as 
generalization of the sample self set, although they may also originate from undetected 
attack packets).  Each time a false positive occurs, the corresponding sample is added to 
the training set.  This dynamics keeps the training set representative of the normal 
behavior of the network traffic during the past recent period. 
87 
 
 
 GENERTIA, running as a separate process from the AIS-based IDS, does not 
need to be running constantly as the AIS itself.  It may be scheduled to run at certain time 
interval, or it may be triggered when the size of the detector set (which contains only 
mature detectors) is dropped below certain threshold due to killing of detectors that cause 
false positives.  This is comparable to the online adaptive training used by LYSIS 
[Hofmeyr and Forrest 2000], in which the immature detectors are exposed to the network 
traffic for a certain period for toleration, and the collection of the self samples occurred 
during this period correspond to the self sample set used for training in our AIS.  The 
difference is that LYSIS uses online adaptive training, while our AIS uses offline 
adaptive training.  One advantage of offline adaptive training is that the toleration of 
immature detectors is extremely shortened.  Another advantage is that immature detectors 
are not tolerant to potential attack packets that avoid detection.  With online adaptation, 
immature detectors have to be tolerant to all the packets that occur during the toleration 
period, including potential undetected attacks.  So if an attack goes undetected, it may 
always goes undetected as long as it keeps appearance on the network traffic, because 
new detectors that detect it will be regenerated. 
88 
 
 
CHAPTER 6 AN IMMUNITY-BASED CLASSIFICATION SYSTEM 
In this chapter, we propose the construction of a classification system based on anomaly 
detection that employs constraint-based detectors, which can be generated using a 
GENERTIA blue team. 
6.1 Introduction 
Classification is the mapping of data to a set of predefined classes.  Usually, this problem 
is solved by developing a model that accurately classifies the given training data and then 
using the model to classify the target data.  There are many approaches for developing a 
model [Dunham 2003]: statistic analysis [Hastie et al. 2001], neural networks [Orr 1996], 
decision trees [Quinlan 1986], rule-based algorithms [Witten and Frank 2000], distance-
based algorithms [Kennedy et al. 1998] etc.  All of them have some advantages and 
disadvantages. 
 The anomaly detection performed by AISs is essentially to solve a two-class 
classification problem: distinguishing nonself from self. The success of applying AISs to 
a broad range of anomaly detection application suggests a promising approach to design
89 
 
 
immunity-based classification systems.  The remarkable resemblance between an AIS 
and a classification system has been noted in [Hofmeyr and Forrest 2000].  However, 
there are several major concerns that have hampered the research in this direction: 
 1. Anomaly detection systems are practically two-class classification systems, 
while conventional classification systems usually solve multi-class classification 
problem. 
 2. Anomaly detection systems are usually trained using only negative (normal) 
samples in the training phase, while conventional classification systems are trained 
with samples from all classes. 
 3. Anomaly detection systems usually have many (hundreds of) anomaly 
detectors.  The large number of detectors facilitates the effective distinguishing 
abnormal patterns from normal ones; however, from a classification perspective, the 
large number of detectors does not facilitate the abstracting of logical rules. 
 4. All learning systems have two types of errors: false positives and false 
negatives.  Anomaly detection systems might have a bias of more tolerance for one 
type of error; for example, an anomaly-based intrusion detection system might be 
more concerned with false negatives (intrusions that are not detected) than with false 
positives (normal activities that are detected).  Usually the two types of errors are 
equally undesirable in conventional classification systems. 
90 
 
 
 It should be noticed that there are classification systems that are based on the 
Artificial Immune Network Model, for example, see [Watkins and Boggess 2002, 
Watkins et al. 2003, Hamaker and Boggess 2004].  However, these systems are 
practically based on K-Nearest-Neighbor approach.   
 This chapter presents the AIS-Based Classifier (ABC), a novel classification 
system based on an AIS that performs anomaly detection.  Our experimental results on 
the benchmark Wisconsin Breast Cancer Dataset and Fisher?s Iris Dataset show that the 
performance of our classification system is favorably comparable to some well-known 
classification systems on these datasets.  Further experiments reveal that the GRT can be 
employed to quickly discover novel patterns that cannot be classified by the system.  In 
other words, all the subsystems in the classifier cannot detect these patterns.  This 
indicates that, although this classifier has high classification accuracy on these two 
datasets, there exist a large number of patterns that this classifier may not be able to 
correctly classify.  These novel patterns may be used by the GBT to redesign the 
classifier to improve its performance.  However, this is only possible if domain expert 
knowledge is available, because even though the GBT can be employed to design suitable 
detectors for the patterns, the newly designed detectors needs to be added to the right 
subsystems to detect these patterns.  
 
91 
 
 
6.2 An AIS-Based Classification System 
6.2.1 Anomaly Detection-Based Classification 
As mentioned above, anomaly detection can be viewed as a two-class classification 
problem: classifying a given sample as normal or abnormal.  An anomaly detection 
system builds a model that reflects the characteristics of the normal samples.  A multi-
class classification system can be developed by combining multiple anomaly detection 
systems.  For example, a three-class classification problem can be solved by building 
anomaly detection systems for the first two classes.  Samples that belong to these two 
classes should be classified as normal by the respective systems, anything that is 
classified as abnormal by both systems should be considered belonging to the third class.  
Generally, an n-class classification system can be constructed with n-1 subsystems.  It is 
also possible to build a subsystem for each class.  This is especially useful if the test set 
may contain samples that do not belong to all the pre-defined classes and these samples 
will usually be misclassified as some class by traditional classification systems. 
 This approach seems circuitous to first map to an anomaly detection problem (to 
distinguish one class from all the rest) and then iterate on the multi-classes that remain, 
however, the whole process is straightforward: each subsystem is built the same way 
except that different training sets are used for training.  The system is also expected to 
have high accuracy because each classification has to be agreed upon by all subsystems, 
92 
 
 
which provides a crosscheck mechanism.  There are also some other advantages for 
adopting this approach.  For example, the Fisher?s Iris data contains sample 
measurements of three types of iris flowers, classification systems trained on these 
samples may correctly classify unseen samples from these three types of flowers, 
however, a sample from a fourth type of iris flowers would be wrongfully classified as 
one of the three types, while an anomaly detection-based classification system has chance 
to single out these samples as novel flowers. 
6.2.2 Training of Subsystems 
A typical anomaly detection AIS generates detectors using the Negative Selection 
Algorithm.  The performance of systems trained this way depends on the size of the 
detector set: larger detector set may cover larger area of nonself space thus detects more 
anomalies.  There are several problems with this approach: 1. this algorithm only uses 
normal samples while both normal and abnormal samples are provided for classification 
tasks; 2. the proper size of the detector set is not easily estimated; 3. the larger the 
detector set the more false positives may be generated, which results in more 
classification errors; 4. large number of detectors may be undesired compared to a 
compact set if logical rules are to be extracted from the detector set. 
 
93 
 
 
 The proposed novel classification system, Anomaly Detection-Based Classifier 
(ADBC), employs a genetic algorithm for detector generation addresses the problems 
described above.  With this approach, each subsystem is built by evolving a population of 
candidate detectors using a GENERTIA blue team as described in Chapter 4, and it is 
reviewed in the following paragraphs. 
 First, the subsystem randomly generates a population of candidate detectors.  
Then the population is evolved to search for valid detectors using a training set.  The 
training set contains both normal and abnormal samples.  A valid detector matches some 
abnormal samples in the training set, but does not match any normal samples in the 
training set.  The fitness value of a candidate detector is the number of abnormal samples 
it matches, or zero in the case it matches any normal sample. 
 During each iteration, using binary-tournament selection, two candidate detectors 
are allowed to produce offspring using uniform crossover, then the resulted offspring has 
a randomly selected interval replaced with a randomly generated interval. The offspring 
replaces the worst individual in the population if the offspring had a better fitness value.  
For a predefined number of iterations (termed release threshold), if one candidate 
detector remains the best individual in the population and it had a fitness value greater 
than zero, then this candidate detector was released from the population as a valid 
detector.  During a release, the valid detector is replaced with a randomly generated 
94 
 
 
detector, and the abnormal samples that were matched by the valid detector are removed 
from the training set.  The whole process continues until there are no abnormal samples 
left in the training set.  All the released detectors compose the detector set of the 
subsystem.  This detector generation approach is depicted in Finger 6.1. 
95 
 
 
 
Figure 6.1  Flow chart of the algorithm used to generate detectors for classification 
system 
 
 
96 
 
 
6.2.3 Classification 
After all the subsystems are trained, they are combined to form a multi-class 
classification system.  In the classification process, each sample is subjected to these 
subsystems and the subsystems classified it as normal or abnormal independently.  The 
classification is decided by comparing the results of the subsystems.  For each sample, if 
one subsystem classifies it as normal, and the other subsystems classify it as abnormal, 
then this sample is classified to the corresponding class by the subsystem that claims it 
normal. In any other cases, the sample is not classified.  These cases happen in following 
situations: 1. More than one subsystem classify a sample as normal. This situation usually 
suggests an overlap of the normal sample spaces of the corresponding subsystems.  2. All 
the subsystems classify a sample as abnormal. This happens when one subsystem 
misclassifies a normal sample as abnormal. In rare situation, this might suggest the 
sample belong to a novel class rather than the pre-defined classes. 
6.3 Experiments 
In this section, we test the proposed classification system using two well-known 
benchmark classification problems: Wisconsin Breast Cancer Dataset and Fisher?s Iris 
Dataset.  Both sets are taken from the Machine Learning Repository of the University of 
97 
 
 
California at Irvine
12
.  In the following experiments, the release threshold of the genetic 
algorithm is arbitrarily set to 5000. 
6.3.1 Wisconsin Breast Cancer Dataset 
The Wisconsin Breast Cancer Dataset contains sample data collected from University of 
Wisconsin Hospitals [Wolberg and Mangasarian 1990].  For each sample, 9 features are 
assessed and a value between 1 and 10 is assigned to each feature, with value 1 
corresponding to a normal state and 10 to a most abnormal state.  Therefore, each sample 
is recorded as a 9?dimensional vector of integers.    There are 699 samples in this data set 
including 458 benign and 241 malignant samples, and these two types of samples are not 
linearly separable.  Among these samples, 16 samples are incomplete with missing 
feature values.  Although some classification systems simply discard the samples with 
missing values, we have arbitrarily replaced all the missing values with random values. 
 This is a two-class classification problem, which can be easily converted into an 
anomaly detection problem: given a set of benign samples, generate a system to detect 
samples having characteristics different from the characteristics of the given benign 
samples.  Such an anomaly detection system can be built using the Negative Selection 
Algorithm (NSA).  Since both benign and malignant samples are available for training, 
we can also build a system by generating detectors using the genetic algorithm described 
in Section 3.  In this experiment, we test the performance of the constraint-based system 
                                                 
12
 http://www.ics.uci.edu/~mlearn/MLRepository. html 
98 
 
 
using both the NSA and the genetic algorithm.  Since the existence of polarity (see 
Chapter 4), detectors with linear intervals are used for this data set. 
 For system training using the NSA, there are two parameters to set: the size of the 
detector set, and the matching threshold, r.  We test the system with varied sizes of 
detector set.  The r value is set to 5 which is found to be a good choice after we try other 
possible settings.  The experiment was repeated for 100 times.  During each run, a 
randomly selected 90% of the dataset was used for training and the rest 10% was used for 
testing.  The experiment results in Table 6.1 are the averages of 100 runs for each 
parameter setting.   
Table 6.1  Performances of AISs Trained with the NSA on the Wisconsin Breast Cancer 
Dataset 
Size 
False 
Benign (Negative) 
False  
Malignant (Positive) 
Total 
Errors 
100 14.8 (4.3) 1.7 (0.8) 16.5 
200 4.4 (3.4) 2.6 (1.1) 7.0 
400 1.9 (1.6) 4.6 (1.5) 6.5 
500 1.2 (0.8) 4.7 (2.3) 5.9 
1000 0.2 (0.4) 5.8 (2.7) 6.0 
2000 0 (0) 6.9 (2.9) 6.9 
 In Table 6.1, the numbers in the column labeled ?Size? indicate the size of 
detector set of each subsystem rather than the total number of detectors in the system.  
False benign is a false classification of a benign sample as malignant, while false 
99 
 
 
malignant is a false classification of a malignant sample as benign.  Total number of 
errors is the sum of the two.  The numbers in the parentheses are the standard deviations.  
From this table we can see that, the performance of systems trained with the NSA is 
affected by the size of the detector set.  Increasing the size reduces false benign but also 
increases false malignant.  The best performance of the system is obtained with a detector 
set of size around 500 for each subsystem. 
 The system built with detectors generated using the genetic algorithm has better 
performance.  In our experiment with 100 runs, the average size of the detector set is 6.2 
(standard deviation of 0.6), the average number of false benign is 2.1 (1.2), and average 
false malignant is 1.7 (1.3).  These results indicate that the anomaly detection system 
built using the proposed algorithm is more accurate than the system simply trained with 
the NSA.  A major reason is that our system is trained with both negative and positive 
samples while the system trained with the NSA only uses negative samples in the training.  
However, we have results that are not reported here indicate that a system trained with 
the genetic algorithm using fewer samples may still outperform the system trained with 
the NSA.  So there are some other reasons for the performance difference. 
 We speculate another reason that is depicted in Figure 6.2.  A two-dimensional 
universe is used for simplicity.  The universe can be divided into two spaces: benign and 
malignant, which are not linearly separable as shown in Figure 6.2a.  Part of each 
100 
 
 
subspace is available for training an anomaly detection system, so we further divide the 
universe into four subspaces:  known benign, unknown benign, known malignant and 
unknown malignant as shown in Figure 6.2b.  Our goal is to develop a set of detectors 
with known benign and known malignant, and ideally, the detectors cover malignant 
space but do not cover benign space.  The training process forces detectors away from the 
known benign and it is expected that this trend will generalize to unknown benign.  
However, the resulting detectors will always cover part of unknown benign because the 
known benign usually are not totally representative of the benign space.  With the NSA, 
only the known benign samples are used for training, and the generation of each detector 
is independent of other detectors.  The size of each detector is randomly decided and 
usually a large number of detectors are needed for a high coverage of malignant space.  
The generated detectors can be highly overlapped and spread over the entire universe 
except the known benign as shown in Figure 6.2c, which corresponds to a high malignant 
detection rate and a high false malignant rate as well.  With the genetic algorithm, both 
known benign and malignant are used for evolving detectors.  The genetic algorithm 
evolves detectors that has large coverage of known malignant and stops when all the 
known malignant are covered by the valid detectors, which results in a compact detector 
set with large coverage of malignant space as shown in Figure 6.2d.  The known 
malignant also serves as a guide for the placement of the detectors so that detectors will 
101 
 
 
not be generated to cover certain benign space that is not represented by the known 
benign such as the left corners.  This effect may greatly reduce the false malignant rate. 
 
(a) The universe is divided into two spaces: Benign and Malignant; 
 
(b) Parts of both the benign and the malignant spaces are available for training; 
Figure 6.2  The difference in placement and coverage between the detector set generated 
by the NSA and the detector set evolved using the genetic algorithm.  (part 1 of 2) 
102 
 
 
 
(c) The detectors generated by the NSA have varied size of coverage, and a high 
coverage of the malignant space depends on the large number of detectors; 
 
 
(d) The detectors evolved by the genetic algorithm tend to have large size of coverage 
and a high coverage of the malignant space is obtained if the known malignant is 
representative of the whole malignant space.  
Figure 6.2  The difference in placement and coverage between the detector set generated 
by the NSA and the detector set evolved using the genetic algorithm. (part 2 of 2)  
 With the compact detector set, the anomaly detection system can be easily 
converted into rule-based classification systems.   In this experiment, we can obtain a 
classification  
103 
 
 
system with 6 rules, with condition part of each rule taking the form of disjunctive feature 
constraints:  
x
0
 ? lb
0
 ..ub
0
 ? ...? x
i
 ? lb
i
 ..ub
i
 ??? x
n
 ? lb
n
 ..ub
n
 
where x
i
 is the ith feature of a sample, and lb
i
 and ub
i
 are the lower and upper bounds of 
an interval for the ith feature.  A sample is classified as malignant if at least r features of 
this sample satisfy the n feature constraints; if this sample is not classified as malignant 
by any rule of the rule set, it is classified as benign.  As an example, the rule, <7..10 7..10 
2..10 10..10 8..10 2..10 4..10 1..10 2..9>, classifies 201 malignant samples in the dataset.  
These rules are M-of-N rules using the classification rule terminology.  These rules may 
facilitate data analysis for finding a logical description of the data.  The system trained 
using the NSA can also be converted to a rule-based classification system; however, since 
the performance of this system depends on a large size of detector set, the conversion will 
result in a large number of rules.  The accuracy of a classifier is calculated as the 
percentage of samples correctly classified by the classifier.  The average accuracy of our 
classifier is 99.46%.  For better understanding of the performance of this anomaly-based 
classifier, we compare it with some well-known classifiers in Table 6.4 (on page 106).  
These results are directly from [Duch 2002a-b].  We have inserted our system into the 
table according to the accuracy ranking, and we can see that the performance of our 
classification system is quite favorably comparable to the well-known classifiers. 
104 
 
 
6.3.2 Fisher?s Iris Dataset 
The Fisher?s Iris Dataset contains three classes of iris flowers: Setosa, Virginica and 
Versicolor.  There are 50 samples for each class.  Each sample in the data set is defined 
by 4 measurements: petal length, petal width, sepal length and sepal width.  Setosa is 
linearly separated from other two classes, while Virginica and Versicolor have 
considerable overlap. 
 The goal of the genetic algorithm (GA) is to generate a small set of detectors that 
has high detection rate and low false positive rate as well.  To our knowledge, currently, 
there is no literature available that describes an anomaly detection based classification 
system, but the Fisher?s Iris Dataset has been constantly used for testing performances of 
various anomaly detection algorithms.  Since each subsystem of our classification system 
is actually an anomaly detection system, we can compare the performance of the 
subsystems with the available literatures.   
 The Variable-sized detector generation algorithm (here we refer it as the VGA) [Ji 
and Dasgupta 2004b], is proposed as an augmentation of the NSA anomaly detection.  
With V-detectors, the universe is defined as a normalized n-dimensional unitary space, [0, 
1]
n
, and detectors are defined by an n-dimensional vector that corresponds to the center 
and by a real value that represents its radius.  Sample-detector matching is defined by the 
Euclidean distance between the sample location and the center of the detector and by the 
105 
 
 
radius of the detector.  The VGA randomly generates a detector center that lies outside 
self space then continuously increases the radius of the detector until it contacts with self 
region.  The algorithm terminates when a predefined number of detectors are generated or 
predefined nonself coverage has been reached by some simple statistic analysis.  The 
most interesting characteristic of the V-detectors is their variable sizes of coverage, 
which is also with the constraint-based detectors, except that V-detectors are virtually 
hyperspheres while constraint-based detectors are hypercubes.  The VGA uses only 
normal samples for training which corresponds to an anomaly detection problem 
definition where usually only normal samples are available. 
 We tested the performances of the three subsystems.  For each subsystem, the 
samples in one class of iris were used as normal samples, and the samples in the other 
two class of iris were used as abnormal samples, as described in [Ji and Dasgupta 2004b].  
The comparison between the two algorithms provides a better understanding of the 
performance of our algorithm.  In [Ji and Dasgupta 2004b], 50% and 100% of the normal 
samples were used for training in two experiments.  In our experiments, 33%, 50% and 
100% of the whole dataset were used for training.  The performances of the V-detectors 
and the constraint-based detectors generated using the GA are compared in Table 6.2.  
The results for V-detector are directly from [Ji and Dasgupta 2004b].  In this table, 
column of ?Detector Type? indicates the type of detectors (variable-sized detector or 
constraint-based detector) and the percentage of samples used for training; for V-detector, 
106 
 
 
the percentage means the portion of normal samples used for training, while for C-
detector, the percentage means the portion of all samples used for training (consisting of 
equal amount of normal and abnormal samples).  The column of ?Iris Class? indicates the 
individual subsystems.  For example, for the Setosa subsystem, the Setosa samples were 
used as normal samples, and samples in other two classes were used as abnormal samples.  
The column ?DR(%)? indicates the detection rate, which was calculated as the number of 
abnormal samples detected by the corresponding subsystem.  The column ?FPR(%)? 
indicates the false positive rate, which is calculated as the number of normal samples 
detected by the corresponding subsystem.  The column ?Detector Set Size? indicates the 
number of detectors that were generated by the VGA or the GA.   
 From this table, one can see that the genetic algorithm is capable of generating a 
very compact detector set with detection and false positive rates favorably comparable to 
the V-detector set.  One reason is that we have used more samples for training (both 
normal and abnormal samples were used while VGA only uses normal samples).  
However, in the case where we used 33% of the whole dataset, which corresponds to the 
number of samples of 100% of a normal set, the overall performance of the three 
subsystems is still better than the three V-detector systems (considering both the 
detection rate and the false positive rate), especially in terms of detector set size.  We 
believe the speculation depicted in Figure 6.2 for the NSA is also applicable to the VGA, 
except that V-detectors  
107 
 
 
are virtually hyper-spheres rather than hypercubes and that the VGA has a mechanism to 
increase the size of the coverage of V-detectors.  
 In the next experiment, we test our immunity-based classification system, namely, 
we will combine the anomaly detection subsystems as a three-class classification system 
for the Fisher?s Iris Dataset.  The classification system consists of three subsystems, one 
for each class of iris flower.  Again, for comparison reason, we build classification 
systems  
Table 6.2 Comparison of Performance of Varied-Sized and Constraint-Based Detectors 
Using Fisher?s Iris Dataset 
 
Detector Type Iris Class DR (%) FPR (%) Detector Set Size 
Setosa 99.97 1.32 20 (7.87) 
Versicolor 88.3 8.42 153.24 (38.8) V-Detector (50%) 
Virginica 93.58 13.18 218.36 (66.11) 
Setosa 99.98 0 16.44 (5.63) 
Versicolor 85.95 0 110.08 (22.61) V-Detector (100%) 
Virginica 81.87 0 108.12 (30.74) 
Setosa 98.8 0.4 1 (0) 
Versicolor 95.5 5.6 1.8 (0.4) C-Detector (33%) 
Virginica 95.9 6.2 1.5 (0.7) 
Setosa 98.9 0.2 1 (0) 
Versicolor 96.0 2.4 2.4 (0.5) C-Detector (50%) 
Virginica 98.4 6 1.6 (0.7) 
Setosa 100 0 1 (0) 
Versicolor 100 0 4.6 (0.5) C-Detector (100%) 
Virginica 100 0 3.6 (0.8) 
108 
 
 
using both the NSA and the genetic algorithm.  The matching thresholds for the detectors 
of the three subsystems are set to 3, which appears to be a proper choice after testing 
other possible settings.  For each subsystem, samples in one class are treated as normal 
set, and the samples in the other two classes are treated as abnormal set.  For each 
experiment, a randomly selected 90% of normal and abnormal sets are used for training, 
so for each subsystem, 90 abnormal samples and 45 normal samples are used for training.   
 The first system is built by generating detectors for each subsystem using the 
NSA.  The average performances over 100 runs of each parameter setting are shown in 
Table 6.3.  The numbers in the column labeled ?Size? indicate the size of detector set of 
each subsystem rather than the total number of detectors in the system.  The numbers in 
the parentheses are standard deviation values.  There are two types of errors: unclassified 
and misclassified, and the total of the two decides the accuracy of the system.  Samples 
are unclassified when there are conflicts between classification results of any two 
subsystems, namely, more than 2 subsystems claim the sample to be normal, or all three 
subsystems claim the sample to be abnormal. Samples are misclassified when it is 
classified to be a class other than the actual class it belongs to.  Misclassification is rare, 
because it indicates that two subsystems simultaneously misclassify this sample: one 
subsystem declares it abnormal but it actually does belong to this class, while another 
subsystem declares it normal but it actually does not belong to that class. 
109 
 
 
 From this table, we can see that the accuracy of the system is primarily decided by 
the number of unclassified samples.  The size of the detector set has tremendous effects 
on the accuracy of the classification.  Increasing the size of the detector set reduces the 
number of misclassification but it may increase the number of unclassified samples.  The 
best performance of the system is obtained when the size of detector set is around 400 for 
each subsystem.  Before the size reaching 400, increasing detectors reduces unclassified 
samples because this reduces the number of abnormal samples that are not correctly 
detected with fewer detectors.  After the size reaching 400, increasing detectors increase 
unclassified samples because more normal samples will be detected.  This leads to the 
case where all subsystems declare these samples as abnormal and the system cannot 
classify them. 
 The second classification system is built with detectors generated using the 
genetic algorithm.  In our experiment with 100 runs, the average sizes of the detector sets 
are 1 (with a standard deviation of 0), 4.2 (1.1) and 3.4 (1.0) for the subsystems of Setosa, 
Versicolor, and Virginica, respectively, the average number of unclassified samples is 1.1 
(0.9), and the average number of misclassified samples is 0.4 (0.5).  Therefore the 
average number of errors of the system is 1.5 (1.0), which corresponds to an accuracy 
rate of 99%. 
 
110 
 
 
Table 6.3  Performance of AISs Trained with the NSA on the Fisher?s Iris Dataset 
 
Size Unclassified Misclassified Total Errors 
100 13.6 (3.3) 0.4 (0.9) 14 
200 9.8 (2.9) 0.1 (0.3) 9.9 
400 8.6 (3.0) 0.2 (0.4) 8.8 
500 9.3 (2.8) 0 (0) 9.3 
1000 9.6 (2.5) 0 (0) 9.6 
2000 10.6 (1.8) 0 (0) 10.6 
 For better understanding of the performance of this immunity-based classification 
system, we compare it with some well-known classifiers in Table 6.4.  These results are 
directly from [Duch 2002a-b].  We have inserted our system (termed ADBC, standing for 
Anomaly Detection Based Classifier) into the table according to the accuracy ranking.  
From this table one can see that the performance of our classification system is quite 
favorably comparable to the well-known classifiers. 
Table 6.4  Comparison of Classifier on the Wisconsin Breast Cancer Dataset and Fisher?s 
Iris Dataset 
 
 Wisconsin Breast Cancer  Fisher?s Iris 
1 ADBC 99.46% Grobian (rough) 100% 
2 C-MLP2LN 99.0% ADBC 99.0% 
3 FSM 98.3% SSV 98.0% 
4 C4.5 96.0% C-MLP2LN 98.0% 
5 RIAC 95.0% PVM (2 rules) 98.0% 
6   PVM (1 rule) 97.3% 
7   AIRS 96.7% 
8   FuNe-I 96.7% 
9   NEFCLASS 96.7% 
10   CART 96.0% 
 
111 
 
 
 Our classification system for the Iris dataset can also be converted into a rule-
based system that consists of three set of rules.  First, detectors of each of the three 
subsystems are converted to a set of rules.  For example, detectors from the Setosa 
subsystem are converted into M-of-N rules that any sample satisfies one of these rules are 
classified as ?not-Setosa?.  The resulting rule-based system will consists of three set of 
such rules for the three classes respectively.  As an example, the following is three sets of 
rules we obtained that will classify the Iris dataset with 100% accuracy: 
Set 1: 
  Rule 1:  <52..79 20..42 24..64 6..23>  ? ?Setosa 
Set 2: 
  Rule 2: <43..79 20..44 10..20 19..25>   ? ?Versicolor 
  Rule 3: <43..51 24..36 54..69 14..22>  ? ?Versicolor 
  Rule 4: <51..66 28..30 49..52 18..22>  ? ?Versicolor 
  Rule 5: <59..64 22..23 49..52 18..23>  ? ?Versicolor 
Set 3: 
  Rule 6: <43..77 32..42 11..49 1..16>  ? ?Virginica 
  Rule 7: <52..70 24..40 33..39 16..17>  ? ?Virginica 
 The conditional part of each rule is interpreted as: ?for a give sample, if the 
number of feature values that fall into the corresponding intervals of this rule exceeds 3 
(which is the matching threshold, r), then?? These sets of rules can be converted into the 
following 3 rules: 
112 
 
 
1. ? Set 1 ? Set 2 ? Set 3  ? Setosa 
2. Set 1 ? ? Set 2 ? Set 3  ? Versicolor 
3. Set 1 ? Set 2 ? ? Set 3  ? Virginica 
where ?Set i? stands for the given sample satisfy at least one rule in Set i, and ?? Set 
i? stands for the given sample does not satisfy any rule in Set i. 
6.3.3 Vulnerability Analysis 
The proposed classifier has satisfying performance in the previous experiments.  
However, there is still room for performance improvement of the classifier.  For the Iris 
dataset classifier, among the incorrectly classified patterns were patterns that could not be 
classified by the classifier, which were due to the false negatives made by one of the 
subsystems.  For the Breast Cancer Dataset classifier, false negatives also led to 
malignant patterns classified as benign.  Since GENERTIA is designed to effectively 
reduce the false negatives in an anomaly detection system, instinctively one will think of 
using GENERTIA to improve the performance of the classifier by reducing false 
positives. 
 In this section, vulnerability analysis is performed on the classification systems.  
A vulnerability is a pattern that is classified as normal by all the subsystems.  Such a pattern 
cannot be classified by the classifier.  In the pattern space, these patterns reside outside of 
the coverage of the detectors from all subsystems.  The GRT can be employed to discover 
113 
 
 
these patterns, and the GBT can be used to design detectors for these vulnerabilities.  
This GRT/GBT cycles may effectively reduce the number of vulnerabilities in the system.   
 The detectors designed by the GBT should be added to the subsystems that should 
detect the corresponding patterns; otherwise, these detectors will cause false positives.  
For example, in the case of the Fisher?s Iris dataset, if the GRT discovers a pattern that is 
not detected by all the three subsystems and this pattern actually represents a Versicolor 
iris, then detector that matches this pattern should be added to both Setosa and Virginica 
subsystems.  However, which subsystems the detectors should be added to is not easily 
decided.  GENERTIA has no way to make such a decision without the help from a 
domain expert.  Unfortunately, domain experts were not available for this research
13
.  In 
later experiments, these detectors are added to an additional set, and vulnerabilities 
discovered later are the samples that cannot be detected by all the detector sets.  The 
purpose of these experiments is to show that the number of vulnerabilities of the 
subsystems in the Anomaly Detection-Based Classifier can be reduced by GENERTIA, 
which may improve the performance of the Classifier by reducing the number of patterns 
that cannot be classified.  This is measured by the number of GRT iterations needed to 
find vulnerabilities in the classifier, because, as discussed in Chapter 5, the more GRT 
iteration needed the fewer vulnerabilities in the classifier. 
                                                 
13
 The American Iris Society and all the horticulture departments of the SEC universities have been 
contacted for assistance in iris pattern classification without success. We contacted the UAB medical school, 
and they couldn?t help classify the breast cancer patterns. 
114 
 
 
 In this section, classifier for the Fisher?s Iris Dataset consists of three subsystems 
and the classifier for the Wisconsin Breast Cancer Dataset consists of two subsystems, 
one for the benign patterns and one for the malignant patterns.  These classifiers are 
trained with all the available patterns.  Therefore the entire test set is now known to the 
classifier, and the classifier has 100% classification accuracy.  Vulnerability analysis is 
performed on these ?perfect? classifiers
14
.  First, a GRT was used to discover the 300 
vulnerabilities in the system, and then a GBT was used to develop detectors that cover the 
holes discovered by the GRT.  When these detectors were added to the system as an 
additional detector set, the vulnerabilities in the system is reduced ? here we assume that 
with domain expert knowledge, the newly designed detectors can be deployed properly.  
The GRT/GBT cycle was carried out twice, and vulnerability analysis was carried out on 
the system. The GRT iterations needed to find the first and the 300th vulnerabilities were 
recorded to show the vulnerabilities in the classifier are reduced.  
 The experiment results shown in Figures 6.3 and 6.4 are the average of 10 runs 
for Fisher?s Iris Dataset and Wisconsin Breast Cancer Dataset, respectively. The x-axis is 
the GRT/GBT cycles consumed for building/redesigning the system, and y-axis is the 
number of the GRT iterations needed to find the 1st (a) or the 300th (b) vulnerabilities in 
the classifiers. Cycle 0 marks the number of GRT iterations needed to find vulnerabilities  
                                                 
14
 Certainly training with entire dataset makes the classifiers overfit; however, this does not matter because 
our purpose is to show that GENERTIA can be employed to perform vulnerability analysis and system 
redesign of the classifier. 
115 
 
 
First Vulnerability
3532
279
24
0
500
1000
1500
2000
2500
3000
3500
4000
012
GRT/GBT Cycles
G
R
T
 I
t
er
at
i
o
n
s
 
(a) 
300 Vulnerabilities
231176
39475
3088
0
50000
100000
150000
200000
250000
012
GRT/GBT Cycles
G
R
T I
t
e
r
a
t
i
ons
 
(b) 
Figure 6.3  Vulnerability analysis of the classifier for Fisher?s Iris Dataset. (a) The 
GRT effort needed to find 1 vulnerability (b) The GRT effort needed to find 300 
vulnerabilities 
116 
 
 
First Vulnerability
5
85
1357
0
200
400
600
800
1000
1200
1400
1600
012
GRT/GBT Cycles
G
R
T
 I
t
er
at
i
o
n
 
(a) 
300 Vulnerabilities
1475
21356
170423
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
012
GRT/GBT Cycles
G
R
T
 I
t
er
at
i
o
n
 
(b) 
Figure 6.4  Vulnerability analysis of the classifier for Wisconsin Breast Cancer 
Dataset. (a) The GRT effort needed to find the 1st vulnerability (b) the GRT effort 
needed to find the 300th vulnerability 
117 
 
 
in the initial system.  Cycle 1 marks the number of GRT iterations needed to find 
vulnerabilities in the system after the additional detector set is added to the system.  
Cycle 2 marks the number of GRT iterations needed to find vulnerabilities in the system 
after a second GRT/GBT cycle. 
 The curves in these figures have the same shape.  One can see from these figures 
that GENERTIA quickly reduces the vulnerabilities in the classifiers.  During each 
GRT/GBT cycle, the average number of detectors added to the system is less than 3 for 
classifiers of both datasets.  Assuming the GRT iterations needed to find vulnerabilities is 
proportional to the number of patterns the system can classify, the GRT/GBT cycles can 
reduce the number of patterns in an effective manner: adding a small number of detectors 
can exponentially reduce the number of patterns that cannot be classified by the classifier. 
6.4 Discussion 
In this Chapter, we propose a novel classification system based on a constraint-based 
anomaly detection system.  A genetic algorithm that is employed by the GENERTIA blue 
team is used to generate very compact detector set that has high detection rate and low 
false positive rate as well.  Experiment results showed that the classification accuracy of 
the system is favorably comparable to well-known classification systems.  We also 
suggest that constraint-based detectors can be easily converted into rules for logical 
understanding the data being classified. 
118 
 
 
 Although the classifier has satisfying performance on the Fisher?s Iris Dataset and 
Wisconsin Breast Cancer Dataset, vulnerability analysis shows that the GRT can easily 
find patterns that cannot be classified by the classifier.  The GBT can be employed to 
design detectors to cover the identified vulnerabilities.  However, in order to properly 
deploy the newly designed detectors, domain expert knowledge is needed to classify the 
vulnerabilities.   
 Since the classifier is based on negative detectors, and as long as false negative 
exists, which is inevitable because the training set is not complete, there always will be 
patterns that cannot be classified.  In the next chapter, we test the classifier using a third 
dataset and show that GENERTIA can be employed to improve the performance of a 
classifier with the help of a ?virtual expert?. 
119 
 
 
CHAPTER 7 APPLYING GENERTIA TO ANOMALY DETECTION-BASED 
CLASSIFIER 
In this chapter, we apply GENERTIA to the proposed classification system to improve its 
classification accuracy.  The accuracy of the proposed classifier depends on the 
performance of its subsystems.  Patterns that are not correctly classified, including 
misclassified and ?unclassified?, are resulted from the two types of errors made by the 
subsystems: the false positives and false negatives.  The GENERTIA cycles can be 
employed to reduce the number of ?unclassified? samples by reducing the false negatives 
of the subsystems. 
7.1 Introduction 
An anomaly detection based classification system was presented in Chapter 6.  The 
proposed classification system consists of a group of subsystems.  For each subsystem, an 
AIS is built that contains a set of negative detectors, which can be generated through the 
Negative Selection Algorithm (NSA) or the GENERTIA blue team (GBT).  As 
mentioned in Chapter 4, there are two types of errors associated with a hypothesis 
obtained with any learning algorithm, false positives and false negatives.  These errors 
120 
 
 
will cause ?misclassified? and / or  ?unclassified? errors in the proposed classification 
system.  The errors of misclassification are usually fewer than errors resulted from 
?unclassified? samples. This is because a misclassification indicates that at least two 
subsystems simultaneously misclassify this pattern: one subsystem declares it abnormal 
but it actually does belong to this class, while another subsystem declares it normal but it 
actually does not belong to that class.  From the experiment results in the previous 
chapter, we can see that the accuracy of the system is primarily decided by the number of 
unclassified patterns.  So, reducing the number of unclassified patterns is important in 
order to improve the performance of the proposed classification system. 
 Both false positives and false negatives in a subsystem may lead to patterns being 
?unclassified?.  Both false positives and false negatives are inevitable unless the training 
sets are complete but this is usually not true.  The false negatives indicate the existence of 
vulnerabilities in the AIS.  As discussed in previous chapters, simply increasing the size 
of the detector set using the NSA is not sufficient to reduce the number of vulnerabilities.  
One reason is that, because of the coverage overlaps among detectors, the gain in 
increased pattern space coverage is usually not proportional to the increase in the size of 
the detector.  The other reason is that increase in the detector set also results in the 
increase in the false positive rate.  A good learning algorithm should increase the  
121 
 
 
detection rate (because of increased pattern space coverage) with minimum increase in 
the false positive rate.  
 In this chapter, the proposed classification system is tested with a third data set, 
the Indian Pima Diabetes Dataset, and efforts are made to improve its accuracy.  
7.2 The Pima Indian Diabetes Dataset 
The Pima Indian Diabetes Dataset is a well-known benchmark dataset for machine 
learning research.  A variety of classification algorithms have been tested on this dataset 
and no algorithm performed exceptionally well.  Duch et al. [Duch 2002a] have collected 
the various past results on this dataset, as shown in Table 7.1.  From this table we can see 
that the best of these results is at 77.7% with Logdisc. 
 The Pima Indian Diabetes dataset contains two classes of samples, healthy and 
diabetes. These samples have 8 attributes, including number of pregnancies, plasma 
glucose concentration in an oral glucose tolerance test, diastolic blood pressure (mm Hg), 
triceps skin fold thickness (mm), body mass index (kg/m
2
), 2-hour serum insulin (mu 
U/ml), diabetes pedigree function, and age (years).  Each sample is assigned a yes-or-no 
sign for diabetic test according to WHO criteria.  Totally, there are 768 samples in the 
dataset, with 500 samples (65.1%) being healthy, and 268 samples (34.9%) being 
diabetes.  Therefore, the best reported classification result on this dataset is just a little 
122 
 
 
more than 10% higher than the default accuracy (65.1%, which is obtained by classifying 
every sample as healthy), which means that this dataset presents a difficult classification 
problem.  
Table 7.1 Some Previous Results for the Pima India Diabetes Dataset (Source: [Duch 
2002a]) 
 
Method Accuracy %   
Logdisc (Logistic Discrimination) 77.7  
DIPOL92 77.6  
Linear Discriminative Analysis 77.5-77.2  
SMART (Projection Pursuit Regression) 76.8  
Fisher Discriminative Analysis 76.5  
MLP+BP (Multi-Layer Perceptron + Back Propagation) 76.4  
LVQ (Learning Vector Quantization) 75.8  
RBF (radial basis function) 75.7  
SSV DT (Separability Split Value Decision Tree) 73.7? 4.7 
C4.5 DT (C4.5 Decision Tree) 73.0  
Bayesian 72.2? 6.9  
CART (Classification And Regression Trees) 72.8  
SOM (Self-Organizing Map) 72.7  
ID3 (Iterative Dichotomizer version 3) 71.7? 6.6  
IBL (Instance Based Learning) 70.4? 6.2  
kNN (k Nearest Neighbors) 67.6  
C4.5 rules 67.0?2.9 
OCN2 (Ordered CN2 induction algorithm) 65.1? 1.1  
Default 65.1  
QDA (Quadratic Discriminative Analysis)  59.5  
123 
 
 
 All of the 8 attributes are continuous values, with each attribute having different 
data range.  In order to be processed with the constraint-based anomaly detection system, 
this dataset was first discretized using following steps: 
1. For each of the 8 attributes, the values from all the samples are collected and 
sorted. 
2. The sorted values were equally divided into 40 sections.   
3. Each section, defined as a value range with lower and upper values on the section 
boundaries, is assigned an integer starting from 0 in the ascending order.   
 Now, for a given sample, each value of its attributes can be mapped into a certain 
integer depending on the value range to which the value belongs.  For example, for a 
given sample, if its value for the diabetes pedigree function is between 0.349 - 0.371, it 
will be assigned an integer of 18 for this attribute.  Processed in this way, each attribute 
should have a data range of 0 ? 39.  However, for some attributes, there are more than 40 
samples having the same value for a given attribute; therefore, the data ranges for these 
attributes were smaller than 39.  The data ranges of the 8 attributes were 12, 39, 23, 28, 
21, 39, 39, and 27 respectively.  This means the size of the sample space is around 260 
billion. 
124 
 
 
 The choice of dividing the raw data into 40 sections was arbitrarily made, but it 
should be noted that this choice may affect the performance of the classifier.  If a larger 
number is chosen, the sample space will be larger for the system to deal with, on the 
other hand, if a smaller number is chosen, the sample space will be smaller, but the 
normal samples and the abnormal samples may be closer to each other, blurring the 
difference between the two.  It is also worth noting that, after the dataset is discretized 
this way, all the samples are distinct, which means that there is no sample occurring in 
both the normal and abnormal sets. 
7.3 Initial Result: Classifiers without GENERTIA redesign 
In this section, the two implementations of classifiers were tested: classifiers with 
detectors generated from random generation and negative selection (RGNS), and 
classifiers with detectors designed by the GBT.  Since the dataset has only two classes, 
the classifiers can be implemented to have two subsystems, one for each class; the 
classifiers can also be implemented with one subsystem, and in this case, the 
classification problem is treated as an anomaly detection problem.  Therefore, there were 
totally 4 test cases. 
 In the first test case, a classifier with a single subsystem of randomly generated 
detectors was tested.  The healthy samples were used as normal and the detectors of the 
subsystem should detect diabetes samples as abnormal samples.  70% (350) of randomly 
125 
 
 
selected healthy samples were used for training and the rest 418 samples are used for 
testing.  Since there are 8 attributes associated with each sample, there are 8 possible 
values for settings the matching threshold of the detector.  All the 8 different settings of 
the matching threshold of the detectors were tested.  The classifier was also tested with 
detector sets of different sizes.  For each setting, the experiment was repeated for 10 
times and the average results are shown in Tables 7.2 and 7.3.  Table 7.2 shows the 
details of the classification errors, in terms of false negatives and false positives.  Table 
7.3 shows the corresponding accuracy, which is calculated as the number of errors 
divided by the number of samples in the testing set.  From Table 7.3 we can see that 
increasing the size of the detector set resulted in higher accuracy, and from Table 7.2 we 
can see that the increase of accuracy was because of the decrease in the number of false 
negatives.  The reason that the classifiers with higher detector matching thresholds had 
higher accuracy than the ones with lower matching thresholds was that the former ones 
were more general and tend to match more patterns, thus they had lower false negatives.  
However, increasing the size of the detectors also caused quick increase in the number of 
false positives, and that even with 6400 detectors the accuracy of the classifier was less 
than 60%, which was even lower than the default accuracy of 65%.  The performance of 
this classifier was very poor.   
126 
 
 
127 
 
 
 In the second test case, a classifier with a single subsystem consisting of detectors 
designed by the GBT was tested.  Again, the healthy samples were used as normal and 
the diabetes samples as abnormal samples.  70% (350) of randomly selected healthy 
samples together with 70% (187) of randomly selected diabetes samples were used for 
the GBT to design detectors; the remaining 231 samples were used for testing.  All the 8 
possible settings of matching thresholds were all tested.  Table 7.4 shows the average 
results of 10 runs for each experiment setup.  As expected, detectors designed by the 
GBT had much better performance than the randomly generated detectors, for the same 
reason explained in Chapter 5.  The accuracy of the classifiers, however, is only slightly 
higher than the default accuracy.   
Table 7.4 Results for AIS with GBT-Designed Detectors  
(with 70% samples used for training and 231 samples (30%) used for testing) 
Matching 
Threshold 
False 
Negatives 
False 
Positives Total Errors Accuracy  
3 29.0 55.0 84.0 63.64% 
4 31.7 47.7 79.3 65.66% 
5 39.7 37.7 77.3 66.52% 
6 35.0 41.3 76.3 66.96% 
7 38.0 36.7 74.7 67.68% 
8 36.3 35.7 72.0 68.83% 
 In the third test case, a classifier with two subsystems of randomly generated 
detectors was tested.  In this experiment, 70% of randomly selected samples from both 
classes were used for negative selection of detectors for the two subsystems respectively; 
the remaining 231 samples were used for testing.  In this case, as related in Chapter 6, the 
correct classification of a sample requires that both subsystems correctly classify it as 
128 
 
 
normal or abnormal.  There are three other possible situations that will result in an 
incorrect classification: 1) both subsystems wrongfully classify this sample, which results 
in a misclassification; 2) both subsystems classify it as normal, or 3) both classify it as 
abnormal.  The sample is ?unclassified? in the latter two situations.  When an 
?unclassification? occurs, the classifier is forced to make a guess.  Since the healthy 
samples clearly outnumber the diabetes samples (which means the healthy samples may 
have larger occupation in the sample space), an intuitive solution is to give bias to healthy 
when an unclassification takes place, in other words, to classify ?unclassified? samples as 
healthy.  Tables 7.5 and 7.6 show the average results of 10 runs for each experiment 
setup.  Table 7.5 shows the details of the numbers of misclassification and 
unclassification, and Table 7.6 shows the result from the same experiment runs after bias 
is given to healthy for assigning class to unclassified samples. 
 In the fourth test case, the detectors of the two subsystems were designed by the 
GBT, which has the same implementation of the GBT as detailed in Chapter 6.  Table 7.7 
shows the average results of 10 runs for each experiment setup.  Again, the detectors 
designed by the GBT had better performance than the randomly generated detectors.   
  
129 
 
 
130 
 
 
 
Table 7.7 Results for 2-Subsystems Classifier with GBT-Designed Detectors (with 70% 
samples used for training and 231 samples (30%) used for testing) 
 
Matching 
Threshold Error Misclassified Unclassified
Misclassified-
by-Default 
Total 
Misclassified 
 
Accuracy
3 123.5 35.5 88.0 37.9 73.4 68.23% 
4 119.2 37.4 81.8 34.7 72.1 68.79% 
5 116.5 40.0 76.5 33.3 73.3 68.27% 
6 114.0 35.1 78.9 35.1 70.2 69.61% 
7 112.7 35.4 77.3 34.6 70.0 69.71% 
8 114.9 36.2 78.7 34.6 70.8 69.35% 
 From these results, one can see that the performance of these classification 
systems were very poor, with the best performance obtained in the second test case 
having a slightly higher accuracy than the default value.  These unsatisfying results may 
seem surprising, given the fact that, in the Chapter 6, the classifier had very good 
performance on the other two datasets.  The main reason for this is the large overlapping 
between the normal (healthy) and abnormal (diabetes) sample space presented in this 
Dataset.  While the proposed classification system is based on anomaly detection, the 
normal samples in this dataset are not very different from the abnormal samples.   
 There is also another reason for the unsatisfying performance of the system.  
From those tables, one can see that the classification errors were mainly caused by 
unclassification, a careful review of the results revealed that more than half of these 
unclassifications were due to both subsystems classifying a sample as normal, which 
means both subsystems had a lot of false negatives.  As discussed in previous chapters, 
the existence of false negatives indicates the vulnerabilities in an anomaly detection 
131 
 
 
system.  Since GENERTIA was developed to detect and cover the vulnerabilities in such 
a system, therefore, hopefully, GENERTIA can be applied to improve the performance of 
the classification system.   
 In the following sections, efforts were made to improve the classifier with two 
subsystems having detectors generated using the GBT.  The reason whey we abandoned 
the other three implementations came with two concerns. First, the GBT-designed 
subsystems had better performance over the subsystems with randomly generated 
detectors, and second, with two subsystems in the classifier, we can focus on reducing the 
number of unclassification, since the two subsystems provides a crosscheck mechanism 
which results in fewer misclassifications. 
7.4 Applying GENERTIA to the Classification System 
The proposed classification system employs a GBT to design detectors using the known 
positive samples as abnormal patterns and the learning stops when all the known 
abnormal patterns are covered by the detector set.  One major drawback of this detector 
generation approach is that, when the known positive samples are not representative of 
the whole positive pattern space, as is the case for the Pima India Diabetes Dataset, the 
resulting AIS may have large amount of vulnerabilities.  The GENERTIA concept 
can be applied to the individual subsystems to improve the overall performance of the 
proposed classification system.  After the individual subsystems are built, the 
132 
 
 
GENERTIA red team (GRT) and the GBT cycles can be applied to the AISs in the 
subsystems to reduce the vulnerabilities in the subsystems, and, hopefully, reduce the 
number of unclassifications.   
 A small modification has been made to the GBT.  The original blue team tries to 
design detectors for every vulnerability, which means that the GBT returns only when all 
the vulnerabilities have been covered by the newly designed detectors.  In the modified 
algorithm, the GBT is given a limit of certain number of iterations, and it exits if no valid 
candidate detector can be released in that limit.  There are two concerns for this 
modification.  First, since the normal and abnormal patterns have large overlap in the 
pattern space, there is a possibility that some abnormal patterns are so close to the normal 
patterns that detectors designed for these abnormal patterns will inevitably cover even 
more normal patterns, which results in more false positives than the number of false 
negatives that would be generated if these detectors were not designed.  Second, since 
these abnormal patterns are so close to normal patterns, it is very difficult to design 
detectors for these abnormal patterns, and this modification prevents the GBT running for 
ever. 
 There are two strategies in redesign the classifier using GENERTIA, redesigning 
both subsystems in parallel or redesigning only one subsystem.  Both strategies were 
133 
 
 
tested in this section.  The reason for redesigning only one subsystem will be discussed 
later in this section. 
 With the first redesign strategy, GENERTIA was applied on both subsystems to 
reduce the number of false negatives in these two AISs.  The experiments followed the 
following steps: 
1. A classifier with two subsystems was built using the GBT, with 70% randomly 
selected health and diabetes samples as training sets. 
2. The GRT was given a limit of 1,000,000 iterations to discover 100 vulnerabilities 
in one subsystem. 
3. The GBT was used to design detectors for the subsystem that cover the 
vulnerabilities discovered in Step 2.  The GBT breaks out if no valid detectors can be 
found in a limit of 1,000,000 iterations, with remaining vulnerabilities discarded. 
4. Repeat Steps 2 ? 3 for the other subsystem. 
5. Repeat Steps 2 ? 4 until one of the GRT couldn?t find 100 vulnerabilities by the 
given iteration limit. 
6. Test the classifier with the whole dataset, and whenever an unclassification 
occurred, the default value, ?healthy?, was assigned.  Record the number of 
unclassifications and misclassifications. 
134 
 
 
 Table 7.8 shows the experiment results.  Compared with the results in Table 7.7, 
where the classifiers were not redesigned by GENERTIA, one can see that the 
performance of the classifiers surprisingly degraded after the GENERTIA redesign, with 
the accuracy dropped from around 69% to 65%.  Careful reading the two tables, one can 
see that the reason was that the redesign introduced a lot of unclassified samples, 
although the number of misclassification reduced significantly.  These samples were 
unclassified because both subsystems classified them as abnormal.  In other words, both 
of the AISs in the two subsystems were making many false positives and around 85% of 
the samples were unclassified.  These unclassified samples were assigned the default 
value ?healthy?, and the overall accuracy of the classifiers was very close to the default 
accuracy, 65%.   
Table 7.8 Results for 2-Subsystems Classifier with 2 Subsystems Redesigned by 
GENERTIA (with 70% samples used for training and 231 samples (30%) used for testing) 
 
Matching 
Threshold Error Misclassified Unclassified
Misclassified-
by-Default 
Total 
Misclassified 
 
Accuracy
3 216.1 5.6 210.5 75.8 81.4 64.76% 
4 212.0 4.9 207.1 77.2 82.1 64.46% 
5 212.7 10.6 202.1 70.7 81.3 64.81% 
6 209.5 7.6 201.9 71.6 79.2 65.71% 
7 209.8 9.2 200.6 71.9 81.1 64.59% 
8 211.1 5.0 206.1 76.0 81.0 64.94% 
 With the second redesign strategy, GENERTIA was employed to reduce the 
number of false positives in only on AIS.  In this case, the two subsystems are 
?unbalanced? in terms of the ability to detect anomalies, with the redesigned one being 
135 
 
 
more powerful.  The subsystem chosen for GENERTIA redesign was the subsystem that 
treated healthy samples as normal and diabetes samples as abnormal. The reason for this 
was based on the choice of the default value that will be assigned when an 
unclassification occurs: the classifier assigns ?healthy? to unclassified samples.  With the 
choice to redesign the subsystem that treated healthy samples as normal, it will be much 
safer to assign the default value to unclassified samples that are not detected by the 
redesigned subsystem.  The idea is that, given an unclassified pattern, if it is not detected 
by both AISs of the two subsystems, then it is more likely that the weaker one caused the 
false negative; on the other hand, if the pattern is detected by both subsystems, then 
chances are that the weaker one is correct because the stronger one tends to make false 
positive errors.  Therefore, when an sample can not be classified, if both subsystems 
detected it, it was classified as healthy because redesigned subsystem (the stronger one) 
probably has made a false positive; if both subsystems did not detect it, it is still 
classified as healthy because the subsystem that has not been redesigned (the weaker one) 
probably has made a false negative. 
 The following experiments followed the same steps listed in previous page except 
that only one subsystem was redesigned, the one that treat the healthy samples as 
abnormal.  The results of these experiments are shown in Table 7.9.   
136 
 
 
Table 7.9 Results for 2-Subsystems Classifier with 1 Subsystem Redesigned by 
GENERTIA (with 70% samples used for training and 231 samples (30%) used for testing 
the redesigned subsystem was the one that treat diabetes samples as normal) 
 
Matching 
Threshold Error Misclassified Unclassified
Misclassified-
by-Default 
Total 
Misclassified 
 
Accuracy
3 127.8 34.3 93.5 31.6 65.9 71.49% 
4 124.8 35.3 90.5 31.6 66.9 71.06% 
5 129.7 36.6 93.1 28.5 65.1 71.80% 
6 125.4 33.8 91.6 30.9 64.7 72.01% 
7 119.6 31.1 88.5 33.6 64.6 71.97% 
8 121.3 32.7 88.6 33.7 66.4 71.24% 
 Compared with Table 7.7, one can see that the performances of the classifiers are 
clearly improved, with classification accuracy increased from around 69% to 71.5%.  
Careful review of these two tables reveals an interesting point.  The redesigned classifiers 
had more errors (the total number of misclassified and unclassified samples) than the 
classifiers without redesign, and this is because more samples became unclassified.  
However, the redesigned classifiers made few misclassification errors, and fewer 
misclassification errors were made from the unclassified samples when they subjected to 
the default classification function.  The GENERTIA redesign process had two effects.  
First, it increased the number of false positives, which resulted in the increase in the 
number of unclassifications; second, it reduced the number of false negatives made by the 
redesigned subsystem.  The first effect is illustrated as follows: 
 
137 
 
 
 
 In this illustration, Subsystem0 is the subsystem that treats healthy samples as 
normal and Subsystem1 is the one that treats diabetes samples as normal.  Here, a healthy 
sample is detected by Subsystem1 but not detected by Subsystem0, and then it is 
correctly classified.  After the GENERTIA redesign process, Subsystem0 is able to detect 
this sample, which results in an unclassification and the sample is correctly classified by 
the default classification function.   
 The second effect is illustrated as follows: 
 
 
 In this case, an unclassified sample that would be misclassified by default will 
now be correctly classified after the GENERTIA redesign process.  
138 
 
 
 In the following case, however, the GENERTIA redesign process introduces false 
positives and negatively affects the performance of the classifier: 
 
 
 In this case, a false positive results in a misclassification.  False positives are 
introduced when the vulnerabilities discovered by the GRT are actually unknown normal 
patterns and the GBT designs detectors for these patterns.  Therefore, it is important to 
take into consideration of minimizing false positives when one uses GENERTIA to 
redesign the classifier.   
7.5 A Virtual Expert 
GENERTIA is originally developed to reduce the false negatives in anomaly detection 
systems.  If a system has been redesigned to detect more anomalies, it also tends to 
generate more false positives.  Different from the intrusion detection application domain 
as seen in the Chapter 5, where false negatives (attacks not detected) may be considered a 
more serious problem than false positives (normal traffics detected), both false positives 
139 
 
 
and false negatives are equally undesirable in a classification application.  When applied 
to classification systems, GENERTIA needs some mechanism so that the false positives 
can be minimized. 
 To minimize the false positives, the candidate vulnerabilities discovered by the 
GRT should be confirmed to be real vulnerabilities (rather than some form of 
generalization of normal patterns) before they are used by the blue team to design 
detectors.  Ideally, for a real classification application, this should be done by a domain 
expert.  For example, in the case of the developing a classifier for predicting the presence 
of diabetes based on the 8 given covariates, a doctor is needed to decide the patterns, 
which the current classifier cannot correctly classified, represent healthy or diabetes.  It 
should be noted that, adding the role of such a domain expert makes GENERTIA an 
interactive tool for design and/or redesign AISs for anomaly detection. 
 For this experiment, a virtual expert is developed to take the role of a doctor with 
expertise in predicting the presence of diabetes given a pattern that consists of the 8 
attributes.  A rule-based classifier is built based on the best reported rules discovered by 
other machine learning algorithms.  Two rules, obtained from [Duch 2002], were 
employed to develop the rule-based classifier.  These two rules are: 
1. IF f2 > 144.5 OR (f2 > 123.5 AND f6 > 32.55) then diabetes, else healthy. This 
rule is from SSV, with an overall accuracy of 76.2%. 
140 
 
 
2. IF (f2 <=151 AND f6 <= 47) then healthy, else diabetes. This rule is from C-
MLP2LN with L-units, with an overall accuracy of 75%. 
 In these two rules, f2 and f6 stands for the second and the sixth attributes of the 
samples.  Combine these two rules, we have: ?IF f2 > 144.5 OR (f2 > 123.5 AND f6 > 
32.55) OR NOT (f2 <=151 AND f6 <= 47) then diabetes, else healthy?, with an over 
accuracy of 76.4%.  After converting the numbers in the rule to their corresponding 
integers according to intervals obtained in the discretization process, the rule becomes 
?IF (f2 > 22 OR f2 > 17 AND f6 > 16 OR NOT (f2 < 23 AND f6 < 29)) then diabetes, 
else healthy? with an over accuracy of 76.4%.   
 A GA was used to optimize these parameters, and we obtained a number of rules 
with high accuracy.  From these obtained rules, one was selected and simplified as ?IF 
(f2 > 25 AND f6 > 15 OR f2 >= 35 OR f6 >= 39) then diabetes, else healthy?, which has 
an accuracy of 78%, better than the best result obtained from all known classifiers shown 
in Table 7.1.  Converted back to the original measurements, this rule becomes ?IF (f2 > 
127 AND f6 > 29.9 OR f2 >= 160 OR f6 >= 49) then diabetes, else healthy?.  The 
confusion matrix of this virtual expert rule is: 
 Healthy Diabetes
Healthy 435 104 
Diabetes 65 164 
141 
 
 
 From this confusion matrix chart, one can see that the virtual expert tends to 
misclassify diabetes as healthy and it has high accuracy in predicting healthy samples.  
This means the vulnerabilities confirmed by this virtual expert are very likely being real 
vulnerabilities.  Although this also means that some real vulnerabilities discovered by the 
GRT may be disconfirmed by the virtual expert, it is not so harmful to the redesign 
process as long as the confirmed vulnerabilities are real ones.  This confusion matrix also 
suggests that the disconfirmed vulnerabilities should not be regarded as normal patterns 
and added to the normal set, because the probability of these patterns being abnormal is 
somehow high. 
 Although the accuracy of this virtual expert is far from being perfect, it is better 
than the best known classifiers for the Pima Indian Diabetes Dataset.  Anyway, having 
this virtual expert checking the discovered vulnerabilities is better than not having one at 
all.  In the following section, this virtual expert is used to check the vulnerabilities 
discovered by the GRT, and only those confirmed ones are given to the GBT to design 
detectors.   
7.6 GENERTIA Redesign with Virtual Expert 
The experiments conducted in this section were the same as the ones in Section 7.4 where 
GENERTIA was applied to redesign only one subsystem, except that an additional step 
was inserted into the GRT/GBT cycles: the vulnerabilities discovered by the GRT were 
142 
 
 
checked by the virtual expert, and only those that were confirmed by the virtual expert to 
by real vulnerabilities were given to the GBT to design corresponding detectors, and the 
rest were simply discarded.   
 For each parameter setup, the experiment was repeated 10 times and Table 7.10 
shows the average results of these experiments.  Comparing Tables 7.10 and 7.9, one can 
see that the virtual expert had improved the performance of the classifiers.  Both 
misclassified and unclassified samples were fewer than the corresponding errors made by 
classifiers redesigned without the help of the virtual expert.   
Table 7.10 Results for Classifier Redesigned by GENERTIA with Virtual Expert (with 
70% samples used for training and 231 samples (30%) used for testing the redesigned 
subsystem was the one that treat diabetes samples as normal) 
 
Matching 
Threshold Error Misclassified Unclassified
Misclassified-
by-Default 
Total 
Misclassified 
 
Accuracy
3 113.8 31.4 82.4 29.6 61.0 73.59% 
4 109.1 32.1 77.0 27.8 59.9 74.09% 
5 110.2 30.6 79.6 31.5 62.1 73.13% 
6 106.4 27.8 78.6 32.0 59.8 74.11% 
7 104.6 28.2 76.4 31.3 59.5 74.23% 
8 103.3 30.5 72.8 29.9 60.4 73.86% 
7.7. Summary 
In Chapter 5, GENERTIA was applied to improve the performance of an AIS-based 
intrusion detection system.  In this chapter, the GENERTIA was applied to another 
application, the proposed anomaly detection-based classification system.  The GRT can 
be applied to discover the vulnerabilities in the classification system, specifically, the 
143 
 
 
GRT can be used to identify the patterns that are classified to be ?self? by more than two 
subsystems.  If we can assume that each pattern can only belongs to one class, then these 
patterns represent vulnerabilities in one or more subsystems. Domain expert knowledge is 
needed to confirm the actual classes that these patterns belong to, and then the GBT can 
be used to design detectors for these patterns.  The proposed classifier had poor 
performance on a third dataset, the Pima Indian Diabetes Dataset.  Experiments showed 
that the performance of the classifier can be improved with the GENERTIA redesign 
process. 
144 
 
 
CHAPTER 8 CONCLUSION AND FUTURE WORK 
8.1 Research Objectives 
The goal of this dissertation is to improve the performance of an Artificial Immune 
System for anomaly detection.  Typically, an AIS uses the Negative Selection Algorithm 
to generate negative detectors, which are capable of detect patterns possessing 
characteristics different from the normal patterns that are used for negative selection.  An 
AIS can be described with three factors: the representation of the self/nonself patterns 
and the detectors, the matching rule that decides how patterns are matched by detectors, 
and an algorithm that describes how the detectors are generated, killed and regenerated.  
Accordingly, the work in this dissertation was focused around these three factors, and the 
objectives of this dissertation include: to propose new representations for self/nonself 
patterns and detectors, to propose new matching rules, and to propose new algorithms for 
generating detectors. 
8.2 Dissertation Summary 
New representations for self/nonself patterns and detectors have been proposed in this 
research.  AISs based on constraint-based detectors and corresponding matching rules,
145 
 
 
including Any-r-interval matching and  Contiguous-r-interval matching rules have being 
implemented.  With the new representations, self/nonself patterns are in the form of 
vectors of integers, and detectors are in the form of vectors of intervals.  This high-level 
representation facilitates the abstraction of meaningful knowledge. 
 GENERTIA, a system used for AIS vulnerability analysis and redesign has been 
implemented.  GENERTIA contains two parts, a red team that is used to discover 
vulnerabilities in an AIS, and a blue team that is used to design valid detectors for the 
discovered vulnerabilities.  These two teams cooperate in a co-evolutionary fashion to 
redesign the AIS.   
 To investigate the effectiveness of GENERTIA, it has been applied to two AIS-
based systems, including a network intrusion detection system (IDS), and an AIS-based 
classification system.  Experiments show that GENERTIA can be used to improve the 
performance of AIS-based IDSs in terms of both efficiency and effectiveness. 
 The GENERTIA blue team provides a novel approach to the generation of 
detectors.  This leads to the construction of an AIS-based classification system.  This 
classifier consists of multiple subsystems in the form of AISs, one for each class of 
samples in the classification problem.  The classification of a sample is decided by the 
combined results from all the AISs.  The performance of this classification system has 
146 
 
 
been tested using two benchmark testing datasets, including the Wisconsin Breast Cancer 
Dataset and the Fisher?s Iris Dataset.  Results show that the proposed classification 
system is favorably comparable to some well-known classification systems.  
 The proposed classifier performed poorly on a third dataset, the Pima India 
Diabetes Dataset.  One main reason was that the subsystems made many false negatives, 
which caused the classifier unable to classify the corresponding samples.  GENERTIA 
can be used to reduce the vulnerabilities (which may result in false negatives) in 
individual subsystems of the classification system.  However, the GENERTIA redesign 
process also introduces more false positives, which also causes inaccurate classification.  
This is because some vulnerabilities actually may be unknown normal samples rather 
than abnormal samples.  To minimize the false positives, the vulnerabilities discovered 
by the GENERTIA red team needs to be confirmed by domain experts before the blue 
team design detectors for these vulnerabilities.  This makes GENERTIA an interactive 
tool for redesign AIS-based anomaly detection system.  Experiment shows the 
application of GENERTIA to the classifier improved its performance on the Diabetes 
dataset. 
8.3 Future Work 
There are several directions for continuing research efforts.  First, it is important to be 
able to mathematically calculate the efficiency and effectiveness of constraint-based AISs, 
147 
 
 
in terms of the theoretical effort needed to generated a certain number of detectors, the 
theoretical possibility of a pattern being or not being detected by a detector set of given 
size.  This work has been done for the binary-coded version. 
 The AISs implemented in this dissertation make a crisp boundary between the self 
and nonself space, which is based on the assumption that the normal is distinct from 
abnormal. However, for some application domain, it is possible that there is no such a 
clear cut between the two spaces.  It may be interesting to build ?fuzzy-version? 
constraint-based AISs.  One possible approach involves the building of AISs with 
different levels of coverage, which detectors at each level having a certain matching 
threshold.  Matching threshold defines the generality of specificity of the matching 
between the detectors and the patterns, and detectors with higher matching threshold are 
more specific than detectors with lower matching threshold.  Therefore, with a set of 
detector with different matching thresholds, once an abnormal pattern is matched, it will 
be possible to tell at which level the pattern is matched by the detector set, and matching 
by detectors with higher matching thresholds indicates a higher possibility of the matched 
pattern being abnormal. 
 The GENERTIA blue team uses a GA to design detectors for the discovered 
vulnerabilities.  The candidate detectors are assigned a fitness value of zero if they match 
any sample in the training set.  While this works fine for datasets in which the normal and 
148 
 
 
abnormal samples are distinct, for dataset such like the Pima India Diabetes Dataset, 
where the two classes of data have large overlap in pattern space, it may be more 
effective to adopt a different candidate detector fitness function.  For example, rather than 
assigning zero to candidates that cover training samples, the fitness of the candidates are 
given a penalty for each training samples they cover.  This may generate detectors that 
have large coverage of abnormal space and only a little normal space. 
 Another interesting research direction is to abstract rules from the proposed 
classification system.  Since it is based on constraint-based detectors, the detectors are 
already in the form of M-of-N rules.  However, it will be more useful if these rules can be 
simplified in a form that can be directly understood by man.  For complex dataset, such 
like the Pima India Dataset, the detector set becomes much larger, this simplification is 
more necessary.  This may require the dissection of detectors in to individual intervals 
and ranking them according to the number of samples they matches for different classes. 
 These ideas, and others, extends beyond the research reported in this dissertation, 
and might be direction for future research. 
149 
 
 
REFERENCES: 
[Anderson et al. 1995] Anderson, D., Frivold, T., and Valdes, A. ?Next-generation 
intrusion detection expert system (NIDES): a summary?, Technical Report SRI-CSL-95-
07, Computer Science Laboratory, SRI International, 1995, available at: 
http://www.sdl.sri.com/papers/4/s/4sri/ 4sri.pdf. 
 
[Aicklin and Cayzer 2002] Aicklin, U. and Cayzer, S. ?The danger theory and its 
application to artificial immune systems?, Proceedings of the First International 
Conference on Artificial Immune System, 141-148. 
 
[Ayara et al. 2002] Ayara, M., Timmis, J, de Lemos, R., and Duncan, R. ?Negative 
Selection: How to Generate Detectors?, Proceedings of the First International 
Conference on Artificial Immune System, 89-98. 
 
[Balthrop et al. 2002a] Balthrop, J., Forrest, S. and Glickman, M. ?Revisiting LISYS: 
Parameters and Normal Behavior?, Proceedings of the 2002 Congress on Evolutionary 
Computation, 1045-1050. 
 
[Balthrop et al. 2002b] Balthrop, J., Esponda, F., Forrest, S., Glickman, M. 
?Coverage and generalization in an Artificial Immune System?, Proceedings of the 2002 
Genetic and Evolutionary Computation Conference, 3-10. 
 
[Cooke and Hunt 1995] Cooke, D. and Hunt, J. ?Recognizing Promoter Sequences Using 
an Artificial Immune System?, in Proceedings of the Intelligent Systems in Molecular 
Biology Conference, 89-97, AAAI Press, 1995. 
 
[Dasgupta and Forrest 1995] Dasgupta, D. and Forrest, S. ?Tool breakage detection in 
milling operations using a negative selection algorithm?, Technical Report CS95-5, 
University of New Mexico. 
 
[Dasgupta and Forrest 1996] Dasgupta, D. and Forrest, S. ?Novelty detection in time 
series data using ideas from immunology?, Proceedings of the ISCA 5
th
 International 
Conference on Intelligent Systems. 
150 
 
 
[Dasgupta 1999] Dasgupta, D. ?An overview of artificial immune system and their 
applications?, Artificial Immune Systems and Their Applications, D. Dasgupta, ed., 
Springer, 3-21, 1999.  
 
[Dasgupta and Gonzalez 2002] Dasgupta, D. and Gonzalez, F. ?An immunity- 
based technique to characterize intrusions in computer network?, IEEE Transactions on 
Evolutionary Computation. 6(3):281-291. 
 
[Dasgupta et al. 2004] Dasgupta, D., KrishnaKumar, K., Wong, D. and Berry, M. 
?Negative selection algorithm for aircraft fault detection?, Proceedings of the 2004 
International Conference on Artificial Immune Systems, 1-13. 
 
[De Castro and Timmis 2002a] de Castro, L. and Timmis, J. Artificial Immune Systems: A 
New Computational Intelligence Approach. Springer, ISBN 1852335947. 
 
[De Castro and Timmis 2002b] de Castro, L. and Timmis, J. ?An Artificial Immune 
Network for Multimodal Function Optimization?, Proceedings of the World Congress of 
Computational Intelligence. 
 
[De Castro and Timmis 2003] de Castro, L. and Timmis, J. ?Artificial immune systems as 
a novel soft computing paradigm?, Soft Computing 7:526-544. 
 
[De Castro and Von Zuben 2000] de Castro. L. and Zuben, F. ?An evolutionary immune 
network for data clustering?, Proceedings of the Brazilian Symposium on Artificial 
Neural Networks, 84-89. 
 
[De Castro and Von Zuben 2004] de Castro. L. and Zuben, F. Recent Developments in 
Biologically Inspired Computing, Idea Group Publishing, 2004. 
 
[DeJong and Spears 1993] DeJong, K. and Spears, W. ?On the state of Evolutionary 
Computation?, Proceedings of the fifth International Conference of Genetic Algorithm, 
618-623. 
 
[D?haeseleer et al. 1996] D?haeseleer, P., Forrest, S. and Helman, P., ?An immunological 
approach to change detection: algorithms, analysis and implications?, Proceedings of the 
1996 IEEE Symposium on Computer Security and Privacy, 110-119. 
 
[Dozier 2003] Dozier, G. ?IDS vulnerability analysis using GENERTIA red teams?, 
Proceedings of the International Conference on Security and Management, 2003, pp. 
171-176. 
[Dozier et al. 2004b] Dozier, G., Brown, D., Hurley, J., and Cain, K. ?Vulnerability 
analysis of AIS-based intrusion detection systems via genetic and particle swarm red 
teams?, Proceedings of the 2004 Congress of Evolutionary Computation, pp. 111 ? 116. 
[Dozier et al. 2004a] Dozier, G., Brown, D., Hurley, J., and Cain, K. ?Vulnerability 
analysis of immunity-based intrusion detection systems using evolutionary hackers?, 
151 
 
 
Proceedings of the 2004 Genetic and Evolutionary Computation Conference, pp. 263 ? 
274. 
 
[Duch 2002a] Duch, W. Datasets Used for Classification: Comparison of Results, 
available at: http://www.phys.uni.torun.pl/kmk/projects/datasets.html 
 
[Duch 2002b] Duch, W. Logical Rules Extracted from Data, available at: 
http://www.phys.uni.torun.pl/kmk/projects/rules.html 
 
[Duch et al. 2001] Duch W., Adamczak R. and Grabczewski K. ?A new methodology of 
extraction, optimization and application of crisp and fuzzy logical rules?, IEEE 
Transactions on Neural Networks, 12:277-306. 
 
[Dunham 2003] Dunham, M. Classification. Data Mining, Introductory and Advanced 
Topics. Prentice Hall, Upper Saddle River, NJ, 2003. 
 
[Esponda et al. 2004] Esponda, F., Forrest, S., and Helman, P. ?A formal framework for 
positive and negative detection schemes?, IEEE Transactions on Systems, Man, and 
Cybernetics ? Part B: Cybernetics, 34(1):357-373. 
 
[Farmer et al. 1986] Farmer, J., Packard, N. and Perelson, A. ?The Immune System, 
Adaptation, and Machine Learning?, Physica D, 22:187-204. 
 
[Forrest et al. 1993] Forrest, S., Javornik, B., Smith, R. and Perelson, S. ?Using genetic 
algorithms to explore pattern recognition in the immune system?, Evolutionary 
Computation, 1:191-211. 
 
[Forrest et al. 1994] Forrest, S., Perelson, S., Allen, L. and Cherukuri, R. ?Self-nonself 
discrimination in a computer?, Proceeding of the 1994 IEEE Symposium on Research in 
Security and Privacy, 202-212. 
 
[Forrest et al. 1996] Forrest, S., Hofmeyr, S., Somayaji, A., and Longstaff, T. ?A sense of 
self for unix processes?, Proceeding s of the IEEE symposium on Security and Privacy, 
1996, pp. 120-128. 
[Forrest et al. 1997] Forrest, S., Hofmeyr, S. and Somayaji. A. ?Computer immunology?, 
Communications of the ACM, 40(10):88-96. 
 
[Goldberg and Deb 1989] Goldberg, D. and Deb, K. ?A comparative analysis of selection 
schemes used in genetic algorithms?, Foundations of Genetic Algorithms, Rawlins, G. ed., 
pp. 69-93, Morgan Kaufmann, 1989. 
 
[Gonzalez et al. 2002] Gonzalez, F., Dasgupta, D., and Kozma, R. ?Combining negative 
selection and classification techniques for anomaly detection?, Proceeding of the 2002 
Congress on Evolutionary Computation, 2002, pp. 705-710. 
 
152 
 
 
[Gonzalez 2003] Gonzalez, F. ?A study of artificial immune systems applied to anomaly 
detection?, PhD dissertation, Department of Computer Science, University of Memphis. 
 
[Gonzalez et al. 2003a] Gonzalez, F., Dasgupta, D., and Gomez, J. ?The effect of binary 
matching rules in negative selection?, Proceedings of the 2003Genetic and Evolutionary 
Computation Conference. 
 
[Gonzalez et al. 2003b] Gonzalez, F, Dasgupta, D. and Nino, D. ?A randomized real-
valued negative selection algorithm?, Proceedings of the 2003 International Conference 
on Artificial Immune Systems, 261-272 
 
[Hamaker and Boggess 2004] Hamaker, J. and Boggess, L. Non-Euclidean Distance 
measures in AIRS, and Artificial Immune Classification System, Proceedings of the  
2004 IEEE conference on Evolutionary Computation. 1067-1073. 
 
[Harmer et al. 2002] Harmer, P., Williams, P., Gunsch, G., and Lamont, G. ?An artificial 
immune system architecture for computer security applications?, IEEE Transactions on 
Evolutionary Computation, 6(3):252-280. 
 
[Hart and Timmis 2005] Hart, E. and Timmis, J. ?Application areas of AIS: the past, the 
present and the future?, Proceedings of the 2005 International Conference of Artificial 
Immune System, 483-497. 
 
[Hastie et al. 2001] Hastie, T., Tibshirani, R., and Friedman J. eds. The Element of 
Statistical Learning: Data Mining, Inference, and Prediction (Springer Series in 
Statistics). New York: Springer-Verlag, 2001. 
 
[Haykin 1999] Haykin, S. Neural Networks: A Comprehensive Foundation, Prentice Hall, 
ISBN 0-13-273350-1 
[Heberlein et al. 1990] Heberlein, T., Dias, V., Levitt, N., Mukherjee, B., Wood, J. and 
Wolber, D. ?A network security monitor?, Proceedings of the 1990 IEEE Symposium on 
Security and Privacy, pp. 296-303. 
 
[Hofmeyr et al. 1998] Hofmeyr, S., Forrest, S. and Somayaji, A. ?Intrusion Detection 
Using Sequences of System Calls?, Journal of Computer Security, 6:151-180. 
 
[Hofmeyr 1999] Hofmeyr (1999) ?An immunological model of distributed detection and 
its application to computer security?, PhD dissertation, Department of Computer Science, 
University of New Mexico. 
 
[Hofmeyr and Forrest 2000] Hofmeyr, S., and Forrest, S. ?Architecture for an artificial 
immune system?, Evolutionary Computation, 8(4):443-473. 
 
[Holland and Reitman 1978] Holland, J. and Reitman, J. ?Cognitive Systems Based on 
Adaptive Algorithms?, in Watermand, D. and Hayes-Roth, F. editors, Pattern-Directed 
Inference Systems, Academic Press, New York, 1978. 
153 
 
 
[Hou et al. 2002] Hou, H., Zhu, J., and Dozier, G. ?Artificial immunity using constraint-
based detectors?, Proceedings of the 2002 World Automation Congress, pp. 239-244. 
 
[Hou and Dozier 2004] Hou, H., and Dozier, G. ?Comparing the performance of binary-
coded detector and constraint-based detector?, Proceedings of the 2004 Congress of 
Evolutionary Computation, pp. 773-777. 
 
[Hou and Dozier 2005] Hou, H. and Dozier, G. ?Immunity-Based Intrusion Detection 
System Design, Vulnerability Analysis, and GENERTIA?s Genetic Arms Race?, 
Proceedings of the ACM Symposium on Applied Computing, 961-965. 
 
[Hou and Dozier 2006a] Hou, H. and Dozier, G. ?An Anomaly-Detection Based 
Classification System?, Proceedings of the Congress of Evolutionary Computation, 
CEC2006. 
 
[Hou and Dozier 2006b] Hou, H. and Dozier, G. ?An Evaluation of Negative Selection 
Algorithm with Constraint-Based Detectors?, Proceedings of 44th ACM Southeast 
Conference, 134-139 
 
[Janeway and Travers 2001] Janeway, C. and Travers, P. Immunobiology: The Immune 
System in Health and Disease, Current Biology Ltd., London, 2001 
[Jerne 1974] Jerne, N. ?Towards a Network Theory of the Immune System?, Annul 
Review of Immunology, 125C:373-389, 1974. 
 
[Ji and Dasgupta 2004a] Ji, Z. and Dasgupta, D. ?Real-valued negative selection 
algorithm with variable-sized detectors?, GECCO 2004, Lecture Note in Computer 
Science, 3102:287-298. 
 
[Ji and Dasgupta 2004b] Ji, Z. and Dasgupta, D. Augmented Negative Selection 
Algorithm with Variable-Coverage Detectors, Proceedings of the 2004 Congress on 
Evolutionary Computation, 1080-1088. 
[Ji and Dasgupta 2005] Ji, Z. and Dasgupta, D. ?Estimating the detector coverage in a 
negative selection algorithm?, Proceedings of the 2005 Genetic and Evolutionary 
Computation Conference, ???-???. 
 
[Kabay 2001] Kabay, M. ?Studies and surveys of computer crime?, at: 
http://www.securitystats.com/reports/Studies_and_Surveys_of_Computer_ Crime.pdf. 
 
[Kennedy et al. 1998] Kennedy, R., Lee, Y., Roy, B., Reed, C., and Lippman, R. Solving 
Data Mining Problems through Pattern Recognition, Englewood Cliffs, N.J.: Prentice 
Hall, 1998. 
 
[Kim and Bentley 2001a] Kim, J. and Bentley, P.J., ?An evaluation of negative selection 
in an artificial immune system for network intrusion detection?, Proceedings of the 2001 
Genetic and Evolutionary Computation Conference, 1330-1337. 
 
154 
 
 
[Kim and Bentley 2001b] Kim, J. and Bentley, P. ?Towards an artificial immune system 
for network intrusion detection: an investigation of clonal selection with a negative 
selection operator?, Proceedings of the 2001 Congress on Evolutionary Computation, 
1244-1252. 
 
[Kim and Bentley 2004] Kim, J. and Bentley, P. J. ?Immune memory and gene 
library evolution in the dynamic clonal selection algorithm?, Genetic Programming and 
Evolvable Machine, 5(4):361-391. 
 
[Krogh  2000] Krogh, D. ?The immune system: defending the body from invaders?, 
Biology, ISBN: 0023668911, 521-532. 
 
[Langman and Cohn 2000] Langman, R. and Cohn, M. ?Editorial summary?, Seminars in 
Immunology, 12:343-344. 
 
[Lee and Sim 2004] Lee, D. and Sim, K. ?Negative Selection Algorithm for DNA 
Sequence Classification? Proceedings of 2nd International Conference on Soft 
Computing and Intelligent Systems. 
 
[Leon 2000] Leon, M. ?Internet virus boom?, Infoworld, 22(3): 36-37, 2000. 
 
[Lincoln Lab 1998] Lincoln Laboratory, 1998 DARPA Intrusion Detection Evaluation 
Data Set, at:http://www.ll.mit.edu/IST /ideval/data/1998/1998_data_index.html. 
 
[Marchette 2001] Marchette, D.J. Computer Intrusion Detection and Network Monitoring: 
A Statistical Viewpoint, Springer, 2001. 
 
[McHugh 2000] McHugh, J. ?Testing intrusion detection systems: A critique of the 1998 
and 1999 DARPA intrusion detection system evaluation as performed by Lincoln 
Laboratory?, ACM Transaction, 3(4):262-294, 2000. 
 
[Mitchell 1997] Mitchell, T. Machine Learning, McGraw-Hill, 1997. 
 
[Mukherjee et al. 1994] Mukherjee, B., Heberlein, T., and Levitt, K. ?Network Intrusion 
Detection?, IEEE Network, 8(3):26-41. 
 
[Orr 1996] Orr, M. Introduction to radial basis function networks, Tech. Rep., Centre for 
Cognitive Science, University of Edinburgh, 1996 
 
[Percus et al. 1993] Percus, J., Percus, O. and Perelson, A. ?Predicting the size of the T-
cell receptor and antibody combining region from consideration of efficient self-nonself 
discrimination?, Proceedings of the National Academe of Sciences, U.S.A., 90:1691-1695. 
 
[Potter and De Jong 1998] Potter, M. and De Jong, K. ?The coevolution of antibodies for 
concept learning?. Lecture Notes in Computer Science, 1498: 530?535. 
 
155 
 
 
[Rogers and Tanimoto 1960] Rogers, S. and Tanimoto, T. "A computer program for 
classifying plants." Science, 132:1115-1118. 
 
[Quinlan 1986] Quinlan, J. Induction of decision trees, Machine Learning, 11(1):81-106. 
 
[Singh 2002] Singh, S. ?Anomaly detection using negative selection based on the r-
contiguous matching rule?, Proceedings of the 2002 International Conference on 
Artificial Immune Systems, 99-106. 
[Stepney et al. 2004] Stepney, S., Smith, R., Timmis, J. and Tyrell, A. ?Towards a 
Conceptual Framework for Artificial Immune Systems?, Proceedings of the 3rd 
International Conference on Artificial Immune Systems, 53-64. 
 
[Stibor et al. 2004] Stibor, T., Bayarou, K. and Eckert, C. ?An investigation of r-chunk 
detector generation on higher alphabets?, GECCO 2004, Lecture Note in Computer 
Science, 3102:299-307. 
 
[Stibor et al. 2005a] Stibor, T., Timmis, J. and Eckert, C. ?On the appropriateness of 
negative selection defined over hamming shape-space as a network intrusion detection 
system?, Proceedings of the 2005 Congress of Evolutionary Computation, 995-1002. 
[Stibor et al. 2005b] Stibor, T., Timmis, J. and Eckert, C. ?A comparative study of real-
valued negative selection to statistical anomaly detection techniques?, ICARIS 2005, 
Lecture Note in Computer Science, 3627:262-275. 
 
[Stibor et al. 2005c] Stibor, T., Mohr, P. and Timmis, J. ?Is negative selection appropriate 
for anomaly detection??, Proceedings of the 2005 Genetic and Evolutionary Computation 
Conference, pp. 321-328. 
 
[Sywerda 1989] Sywerda, G. ?Uniform crossover in genetic algorithm?, Proceedings of 
the Third International Conference on Genetic Algorithms, 1989, pp. 2-9. 
 
[Tanase 2002] Tanase, M. ?One of these things is not like the others: the state of anomaly 
detection?, 2002, available at: http://online.securityfocus.com/infocus/1600. 
 
[Timmis et al. 2000] Timmis, J., Neal, M. and Hunt, J. ?An Artificial Immune System for 
Data Analysis?, Biosystems, 55(1): 143-150, 2000. 
 
[Timmis and Neal 2000] Timmis, J. and Neal, M. ?A Resource Limited Artificial 
Immune System for Data Analysis?, in Research and Development in Intelligent Systems 
XVII, Proceedings of ES2000, (Canbridge, UK), 19-32, 2000. 
 
[Timmis et al. 2004] Timmis, J., Knight, T., de Castro, L. and Hart, E. ?An overview of 
artificial immune systems?, Paton, R., Bolouri, H., Holcombe, M. Parish, J. and Tateson, 
R. editors, Computation in Cells and Tissues: Perspectives and Tools for Thought, 
Natural Computation Series, 51-86. Springer, November 2004. 
 
156 
 
 
[Taylor and Corne 2003] Taylor, D. and Corne, D. ?An investigation of the negative 
selection algorithm for fault detection in refrigeration systems?, Proceedings of ICARIS 
2003. 
[Watkins and Boggess 2002] Watkins, A. and Boggess, L. ?A resource limited artificial 
immune classifier?, Proceedings of the 2002 Congress on Evolutionary Computation, 
1546-1551. 
 
[Watkins and Timmis 2002] Watkins, A  and Timmis, J. ?Artificial Immune Recognition 
System(AIRS): Revisions and Refinements?, Proceedings of the 1st International 
Conference on Artificial Immune Systems, 173-191, 2002. 
 
[Watkins et al. 2003] Watkins, A., Timmins, J. and Boggess, L. ?Artificial Immune 
Recognition System (AIRS): An Immune-Inspired Supervised Learning Algorithm?, 
Genetic Programming and Evolvable Machines, 5(3): 291-317, 2004 
 
[Wierzchon 2000] Wierzchon, S. ?Generating optimal repertoire of antibody strings in an 
artificial immune system?, Intelligent Information Systems, Advances in Soft Computing 
Series of Physica-Verlag, Springer Verlag, 119-133. 
[Witten and Frank 2000] Witten, I. and Frank, E. Data Mining Practical Machine 
Learning Tools and Techniques, San Francisco: Morgan Kaufmann, 2000. 
 
[Wolberg and Mangasarian 1990] Wolberg, W. and Mangasarian, O. ?Multisurface 
method of pattern separation for medical diagnosis applied to breast cytology?, 
Proceedings of the National Academe of Sciences, U.S.A., 87:9193-9196.