AGENT-BASED SIMULATION OF BEHAVIORAL ANTICIPATION IN 
COMPUTER NETWORKS: A COMPARATIVE STUDY OF 
 ANTICIPATORY FAULT MANAGEMENT 
  
 
Except where reference is made to the work of others, the work described in this thesis is 
my own or was done in collaboration with my advisory committee. This thesis does not 
include proprietary or classified information. 
 
 
 
 
________________________________________ 
Avdhoot Kishore Saple 
 
 
 
 
Certificate of Approval: 
 
 
____________________________       ____________________________ 
Drew Hamilton           Levent Yilmaz, Chair 
Associate Professor         Assistant Professor 
Computer Science and Software                                    Computer Science and Software 
Engineering                                 Engineering 
 
 
____________________________      ____________________________ 
Gerry Dozier          Stephen L. McFarland 
Associate Professor          Acting Dean 
Computer Science and Software      Graduate School 
Engineering             
 
 
 
 
 
 
 
AGENT-BASED SIMULATION OF BEHAVIORAL ANTICIPATION IN 
COMPUTER NETWORKS: A COMPARATIVE STUDY OF 
 ANTICIPATORY FAULT MANAGEMENT 
Avdhoot Kishore Saple 
 
 
 
 
A Thesis 
Submitted to 
the Graduate Faculty of 
Auburn University 
in Partial Fulfillment of the 
Requirements for the 
Degree of 
Master of Science 
 
 
 
 
 
Auburn, Alabama 
May 11, 2006 
 iii 
 
AGENT-BASED SIMULATION OF BEHAVIORAL ANTICIPATION IN 
COMPUTER NETWORKS: A COMPARATIVE STUDY OF 
 ANTICIPATORY FAULT MANAGEMENT 
 
 
 
 
Avdhoot K. Saple  
 
 
 
 
 
 
 
Permission is granted to Auburn University to make copies of this thesis at its discretion, 
upon request of individuals or institutions and at their expense. The author reserves all 
publication rights. 
 
 
 
__________________________ 
Signature of Author  
 
 
__________________________ 
Date of Graduation  
  
 iv 
 
THESIS ABSTRACT 
AGENT-BASED SIMULATION OF BEHAVIORAL ANTICIPATION IN 
COMPUTER NETWORKS: A COMPARATIVE STUDY OF  
ANTICIPATORY FAULT MANAGEMENT 
 
Avdhoot K. Saple 
Master of Science, May 11, 2006 
(B.E., Mumbai University, 2004) 
 
98 Typed Pages 
Directed by Dr. Levent Yilmaz 
 
Network fault management is concerned with the detection, isolation and 
correction of anomalous conditions that occur in a computer network. Present state of art 
in fault management classifies existing methodologies into two main categories: reactive 
rule based approaches and intelligent monitoring systems. We explore the concept of 
anticipatory behavior to develop an intelligent agent-based network management model, 
which uses an anticipatory agent to proactively detect occurrence of faults using a 
predictive Bayesian model pertaining to network performance. To analyze the 
effectiveness of the anticipatory technique, we compare it with alarm correlation and 
rule-based reactive fault management strategies. Results of the comparative analysis are 
 v 
 
presented to demonstrate the potential of the anticipatory technique in detecting network 
anomalies. Our findings indicate that the anticipatory technique improves network 
performance significantly better than the reactive techniques. We furthermore describe a 
methodology for adaptive restructuring of the network based on the simulated annealing 
process. We observe that adaptive restructuring gives significantly better performance 
under the reactive rule-based fault-management technique as compared to the 
anticipatory strategy.  
 
 
 
 vi 
 
ACKNOWLEDGEMENTS 
 
 
I wish to thank my advisor, Dr. Levent Yilmaz for his support and guidance along 
the way, above all for being so patient and understanding with me. Thanks to my 
committee members for reviewing my thesis. I would also like to thank my colleagues 
and friends for the wonderful times together. Above all, thanks to my parents for being 
always there to listen. They have been my constant source of inspiration. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 vii 
 
 
 
 
 
 
 
 
 
 
Style manual or journal used APA Style 
 
 
 
Computer software used Microsoft Word. Images drawn using Microsoft PowerPoint, 
UML Diagrams drawn using ArgoUML. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
 viii 
 
TABLE OF CONTENTS 
 
List Of Figures........................................................................................................... 
List Of Tables??????????????????????????? 
1 Introduction??????????????????????????.... 
    1.1 The Need for Fault Management in Computer Networks............................... 
    1.2 Anticipatory fault Management....................................................................... 
    1.3 Research Objective..........................................................................................  
2 Overview of Fault Management in Computer Networks....................................... 
    2.1 Rule Based Approaches................................................................................... 
        2.1.1 Cased-based Reasoning............................................................................ 
    2.2 Alarm Correlation............................................................................................ 
       2.2.1 Alarm Correlation using Finite State Machines......................................... 
   2.3 Pattern Matching............................................................................................... 
   2.4 Statistical Analysis............................................................................................ 
   2.5 Intelligent probing for fault management......................................................... 
   2.6 Proactive Fault Detection using Intelligent Agents.......................................... 
3 Anticipatory Systems.............................................................................................. 
   3.1 Anticipatory Agents.......................................................................................... 
   3.2 Preventive State Anticipation........................................................................... 
4 Agent-based Modeling of Reactive and Anticipatory Control in  
   Computer Networks................................................................................................ 
  4.1 The DEVS Network Model............................................................................... 
      4.1.1 The DEVS Formalism................................................................................. 
   4.2 Reactive Agents................................................................................................ 
   4.3 Anticipatory Agents ? A Bayesian Approach................................................... 
   4.4 Adaptive restructuring of the DEVS network model........................................ 
5 Detailed Design of the DEVS Network Model....................................................... 
   5.1 The DEVS Basic and Coupled Model.............................................................. 
   5.2 The Distributed Network Monitoring System.................................................. 
   5.3 Component Overview....................................................................................... 
   5.4 Monitoring Agents............................................................................................ 
   5.5 Reactive/Anticipatory Agents........................................................................... 
   5.6 Control Agent.................................................................................................... 
   5.7 Annealer............................................................................................................ 
6 Experiment Design and Simulation Results......................................................... 
   6.1 Experiment Design......................................................................................... 
   6.2 Simulation Results............................................................................................ 
      6.2.1 Sensitivity Analyses.................................................................................... 
      6.2.2 Results with adaptive control using annealer.............................................. 
 
     
x 
xiii 
1 
1 
2 
3 
5 
6 
7 
7 
8 
9 
10 
11 
12 
14 
15 
16 
 
19 
21 
22 
24 
26 
29 
34 
34 
36 
37 
42 
44 
47 
48 
52 
52 
53 
56 
61 
 ix 
 
 
7 Conclusions............................................................................................................. 
    7.1 The Limitations of Anticipatory fault management......................................... 
    7.2 Future Work..................................................................................................... 
References.................................................................................................................. 
Appendices????????????????????????????.
Appendix A................................................................................................................ 
Appendix B................................................................................................................ 
Appendix C................................................................................................................ 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64 
64 
65 
67 
73 
74 
75 
77 
 
 x 
 
LIST OF FIGURES 
3.1 Basic Components for Anticipatory Agents???............................................ 
3.2 The Basic Architecture of an Anticipatory Agent in Our Model?..................... 
4.1 Reactive and Anticipatory Control?????.................................................. 
4.2 Conceptual Framework of the DEVS Formalism??........................................ 
4.3 Control Flow Diagram Depicting the Operation of the Annealer?.................... 
5.1 Designed Network Model?????????................................................ 
5.2 Experimental Frame???????????????................................. 
5.3 Activity Diagram of the Experimental Frame????..................................... 
5.4 Sequence Diagram for Interactions of Various Components within 
      the Experimental Frame???????????????........................... 
5.5 Activity Diagram of Network Component due to Fault(s)???...................... 
5.6 Activity Diagram of a Monitoring Agent ???????............................... 
5.7 Sequence Diagram showing Interaction between Monitoring Agents?............ 
5.8 Activity Diagram of a Reactive Agent????????................................ 
5.9 Activity Diagram of an Anticipatory Agent ?????................................... 
5.10 Sequence Diagram for Interaction between Monitoring and  
        Management Agents. ........................................................................................ 
5.11 Activity Diagram of a Control Agent??????........................................ 
5.12 Sequence Diagram for Management and Control Agents????................. 
 
16 
17 
19 
23 
33 
37 
38 
39 
 
40 
41 
43 
44 
45 
46 
 
47 
48 
49 
 xi 
 
5.13  Activity diagram of the annealer????????....................................... 
6.1 Simulated Model in DEVS Environment???................................................ 
6.2 Response Surfaces???????????................................................... 
6.3 Sensitivity Analyses for Variation of Complexity for 
       Reactive Technique?........................................................................................ 
6.4 Sensitivity Analyses for Variation of Complexity for  
       Alarm Correlation Technique............................................................................ 
6.5 Sensitivity Analyses for Variation of Complexity for 
      Anticipatory Technique??.............................................................................. 
6.6 Performance Metrics with and without Annealer for  
      Reactive Technique????............................................................................. 
6.7 Performance Metrics - Alarm Correlation and Anticipatory Technique?......... 
A.1 Design Class Diagram of the DEVS Network Model?????.................... 
B.1 Calculation of C.I for Reactive vs Alarm Correlation Technique??.............. 
B.2 Calculation of C.I for Reactive vs Anticipatory Technique???.................... 
B.3 Calculation of C.I for Alarm Correlation vs Anticipatory Technique??........ 
  
 
 
 
 
 
 
 
  
 
 
50 
53 
55 
 
57 
 
58 
 
59 
 
61 
62 
74 
75 
76 
76 
 
 
 xii 
 
C.1 Performance Metrics for Low Link Delay and Level 1 Complexity?..............  
C.2 Performance Metrics for Moderate Link Delay and Level 1 Complexity.......... 
 
C.3 Performance Metrics for High Link Delay and Level 1 Complexity?............. 
 
C.4 Performance Metrics for Low Link Delay and Level 2 Complexity?.............. 
 
C.5 Performance Metrics for Moderate Link Delay and Level 2 Complexity?...... 
 
C.6 Performance Metrics for High Link Delay and Level 2 Complexity................. 
 
C.7 Performance Metrics for Low Link Delay and Level 3 Complexity?.............. 
 
C.8 Performance Metrics for Moderate Link Delay and Level 3 Complexity?...... 
 
C.9 Performance Metrics for High Link Delay and Level 3 Complexity?.............. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77 
78 
79 
80 
81 
82 
83 
84 
85 
 
 
 
 
 
 
 
 
 
 
 
 
 xiii 
 
LIST OF TABLES 
Table1.1 An Overview of Fault Management Techniques?.................................... 
Table 4.1 Sample Set of Evidence to be Processed by the Na?ve Bayesian   
                Classifier....................................................................................................  
Table 6.1 Confidence Intervals for Performance Metrics???.............................. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13 
 
27 
54 
 
 1 
 
Chapter 1 
Introduction 
Network fault management (Oates 1995) entails the detection, isolation and 
correction of anomalous conditions that occur in a network. It can be decomposed into 
three subtasks: fault identification (Schwarz  and Katzela 1995), fault diagnosis (Agre 
1986), and fault remediation. Fault identification involves detecting deviation from 
normal behavior followed by identification of its nature, whereas fault diagnosis involves 
determining the root cause of the identified problem. Fault remediation is the formulation 
of a course of action that addresses the problem. All three stages of fault management 
involve reasoning and decision making based on information about current and past states 
of the network. 
1.1 The Need for Fault Management in Computer Networks 
As business and individuals have become increasingly reliant on computer 
networks, the complexity of those networks has grown along a number of dimensions. 
The phenomenal growth of the Internet in recent years provides a clear example of the 
extent to which the use of computer networks is becoming ubiquitous (Oates 1995). As 
computer networks increase in size, heterogeneity, complexity and pervasiveness, 
effective management of such networks simultaneously becomes more important and 
more difficult. 
 2 
 
The use of computer networks for business is expanding enormously. The average 
number of electronic point-of-sale transactions in the United States went from 38 per day  
in 1985 to 1.2 million per day in 1993. An average $800 billion is transferred among 
partners in international currency markets every day; about $1 trillion is transferred daily 
among US banks; and an average $2 trillion worth of securities are traded daily in New 
York markets. Nearly all of the financial transactions pass over information networks. 
Consequently, the losses incurred due to faults in these networks are enormously high. 
Related dollar losses are estimated to be between $100,000 and $10 million (Herdman 
1994). Hence it is important to have an effective and efficient network fault management 
technique to restore the proper functioning of computer and information networks.   
1.2 Anticipatory Fault Management  
Traditionally, network management activities, such as fault management, have 
been performed with direct human involvement. However, these activities are becoming 
more demanding and data intensive, due to the heterogeneous nature and increasing size 
of networks today. For these reasons, it is becoming necessary to automate network 
management activities. Artificial intelligence technologies can play an important role in 
the problem solving and reasoning techniques that are employed in fault management.  
Anticipatory fault management involves a novel approach of designing 
autonomous agents that is based on the idea of anticipatory systems (Ekdhal et al. 1994). 
An anticipatory system has a model of itself and of the relevant part of its environment 
 3 
 
and will use the model to predict the future. The predictions are then utilized to determine 
the agent?s behavior. An anticipatory system is thus a system which uses the knowledge 
of future states to decide what action has to be taken in the present. Anticipatory fault 
management can be carried out by having intelligent processing anticipatory agent reside 
on the network under observation. It makes use of adaptive learning methods (Butz et al. 
2003) to detect abnormal behavior before a fault actually occurs. The agent first acquires 
a picture of the network?s health by means of observation processing to process 
performance variables and obtain probability of each measured variable at a given time. It 
then combines all the information to build up a predictive model, which provides a 
method for estimating probabilities and allows the agent to combine observed 
information with prior knowledge. (Hood and Ji 1998) The agent therefore gets a 
complete picture of the network?s health to carry out adaptive behavior for fault 
identification and diagnosis. 
1.3 Research Objective  
Given the importance of network fault management, the goal of this research 
comprises of coming up with an anticipatory technique for network fault management, 
followed by comparison of the same with two widely popular techniques: reactive rule 
based strategy and alarm correlation approach. We compare the three techniques based 
on the following network performance metrics: throughput, turnaround time, and the drop 
rate of packets. 
To facilitate the experiment, a simulation model of a computer network is 
developed using the DEVS (Zeigler and Sarjaughian 2003) modeling and simulation 
framework. Reactive and anticipatory agents are embedded into the network for network 
 4 
 
fault management. The reactive agent operates on a simple rule based engine that detects 
faults based on predefined fuzzy rule-base. We use a Na?ve Bayesian classifier (Luger 
and  Stubblefield 1989) as part of the anticipatory agent. The Bayesian classifier acts as a 
predictive model for the anticipatory agent to facilitate prediction of faults based on past 
data. Our findings indicate that anticipatory fault management performs significantly 
better than the reactive and alarm correlation techniques under the experimental 
conditions the model is tested. Furthermore, we make a study of which technique 
performs better with network reconfiguration. Network reconfiguration comprises of 
restructuring the network model as a simulated annealing process (Carley and Svoboda 
1996). The restructuring is based on varying network operational parameters such as the 
operation of switch, the routing strategy of the router, and the link delay of nodes. The 
results of network adaptation shows that simulated annealing can be applied to the 
reactive technique since it gives a better performance as compared to the alarm 
correlation and the anticipatory technique. 
The thesis is organized as follows. Chapter 2 reviews the present state of art in 
fault management of computer networks. Chapter 3 comprises of the design of 
anticipatory systems. In chapter 4, we discuss the agent based modeling of reactive and 
anticipatory control in computer networks. Chapter 5 discusses the detailed design of the 
DEVS model. Chapter 6 comprises of experimentation design, simulation, and results. 
Finally, in chapter 7 we conclude by discussing the open issues as well as planned future 
work. 
  
  
Chapter 2 
 
Overview of Fault Management in Computer Networks 
 
 
Researches have approached the problem of fault management using various 
techniques such as artificial intelligence (Corn et al. 1988; Joseph et al. 1989; Wright et 
al. 1988; Yamahira et al. 1989), machine learning (Langley and Simon 1995) and state 
space modeling (Rouvellou and Hart 1995). A fault is simply a malfunction in some 
component of the network, either hardware of software. At an abstract level, fault 
identification can be thought of as a function, I, with inputs and outputs. The input to the 
function is a description of network state, S, and the output is a set of hypothesis, H, 
concerning the existence of n different faults. Each hypothesis may specify the 
indications in S of the corresponding fault and may contain some amount of diagnosis 
information. That is identification and diagnoses are rarely totally decoupled. 
Fault identification, therefore is a process of function that maps from network 
states to fault hypothesis: 
I: S    H ?
Different approaches to fault identification define S in a distinct manner (Tim Oates 
1995).
 5 
 
 6 
 
2.1 Rule Based Approaches 
Early work in the area of fault or anomaly detection was based on expert systems. 
In expert systems, an exhaustive database containing the rules of behavior of the faulty 
system is used to determine if a fault occurred by matching of predefined rules of 
network anomalies (Lewis 1993). Rule based systems are too slow for real time 
applications and are dependent on prior knowledge about the fault conditions of the 
network. The identification of faults in this approach depends on symptoms that are 
specific to a particular manifestation of a fault. Examples of these symptoms are 
excessive utilization of bandwidth, number of open TCP connections, total throughput 
exceeded etc. (Thottan and Ji 2003). An expert system model using fuzzy cognitive maps 
(FCMs) (Ndouusse and Okuda 1996) can be used to obtain an intelligent modeling of the 
propagation and interaction of network faults. Fuzzy expert systems (Kandel 1991) are 
especially attractive in a dynamic environment because they favor silicon 
implementation, learning and they avoid the lengthy symbolic graph search in favor of 
computational inference. Traditional expert systems with symbolic knowledge 
representation implemented with ?IF/ THEN? conditional statements require complicated 
and lengthy matching schemes, to slow for real-time systems such as networks. 
Furthermore, traditional expert systems lack support for on-line mathematical analysis, an 
essential feature common in engineering systems. Fuzzy Expert Systems (FES) provide 
an alternative to symbolic intelligence. In FES, vague causal reasoning is represented 
numerically, and hence is amenable to computational processing. In particular, FES 
which uses a graph-based knowledge representation can easily be converted into causal 
matrices, thus offering an appealing computational feedback memory recall capability.  
 7 
 
2.1.1 Cased-Based Reasoning 
Case based reasoning is an extension of rule-based systems (Lewis 1993). It 
differs from FCM in that, in addition to just rules, a picture of previous of fault scenarios 
is used to make the decisions. It differs from FCM in that, in addition to just rules, a 
picture of previous fault scenarios is used to make decisions. A picture here refers to the 
circumstances or events that led to the fault. In order to adapt the case-based reasoning 
scheme to the changing network environment, adaptive techniques are used to obtain the 
functional dependence of relevant criteria such as network load, collision rate, etc., to 
previous trouble tickets (Lewis and Dreo 1993). The trouble ticketing system is used to 
perform two functions: Prepare for problem diagnostics through filtering, and infer the 
root cause of the problem. Using case-based reasoning for describing fault scenarios also 
suffers from heavy dependence on past information. Furthermore, the identification of 
relevant criteria for the different faults will, in turn, require a set of rules to be developed. 
In addition, using any function approximation, such as back propagation, causes an 
increase in computation time and complexity. The number of functions to be learned also 
increases with the number of faults studied.  
2.2 Alarm Correlation  
A fault is a disorder occurring in the managed network. Faults happen within the 
managed networks while alarms are external manifestations of faults (Rouvellou and Hart 
1995). Alarms are defined by vendors and generated by network equipment are 
observable by network operators. Similar alarm messages with different time stamps are 
interpreted as separate alarms. Modern telecommunication networks may produce 
thousands of alarms per day, making the task of real-time network surveillance and fault 
 8 
 
management difficult. Due to the large volumes of alarms, network operators frequently 
overlook or misinterpret them. To reduce the number of alarms displayed on operators? 
terminals, current network management systems apply alarm filtering procedures or, in 
the case of bursts of alarms, send them directly to a printer or database (Jakobson and 
Weissman 1993). Furthermore, a single fault in a large communication network may 
result in a large number of fault alarms making the isolation of the primary source of 
failure a difficult task (Katzela 1995). 
2.2.1 Alarm Correlation Using Finite State Machines 
External observations of alarms may instill an impression that one alarm causes 
another. However the causality is not between alarms, but rather between faults. Finite 
state machines model alarm sequences that occur during and prior to fault events. For 
instance, a probabilistic finite state machine model is built for a known network fault 
using historical data (Lazar et al. 1992). State machines are designed with the intention of 
not just detecting an anomaly but also possibly identifying and diagnosing the problem. 
The sequences of alarms obtained from the different points in the network are modeled as 
states of a finite state machine. The alarms are assumed to contain information such as 
the device name as well as the symptom and the time of occurrence. The transitions 
between the states are measured using prior events (Katzela and Schwarz 1995; Rouvelle 
and Hart 1995; Bouloutas et. al. 1990). A given cluster of alarms may have a number of 
explanations and the objective is to find the best explanation among them. The best 
explanation is obtained by identifying a near optimal set of nodes with minimum 
cardinality such that all the entities in the set explain all the alarms and at least one of the 
nodes in the set is the most likely one to be in fault (Lazer et al. 1992; Jackobson and 
 9 
 
Weissman 1993). From an observer?s point of view, fault detection and identification 
requires checking whether a network device behaves as the FSM specified and if not, 
how it deviates from the expected behavior (Lazer et al. 1992). Alarm correlation may be 
used for network fault isolation and diagnosis, selective corrective actions, proactive 
maintenance and trend analysis (Jackobson and Weissman 1993). 
2.3 Pattern Matching 
This approach describes anomalies as deviations from normal behavior and 
attempts to deal with the variability in the network environment (Feather and Maxion 
1993; Papavassiliou et al. 2000). In this approach online learning is used to build traffic 
profile for a given network. Traffic profiles are built using symptom specific feature 
vectors such as link utilization, packet loss and number of collisions. These profiles are 
then categorized by time of day, day of week and special days, such as weekends and 
holidays. When newly acquired data fails to fit within some confidence interval of the 
developed profiles then an anomaly is declared. One method includes capturing of 
normal behavior of time series as templates and setting of tolerance limits based on 
different levels of standard deviation. These limits are tested using extensive data 
analysis (Feather and Maxion 1993). The authors also propose a pattern matching scheme 
to detect address usage anomalies by tracking each address at 5-min intervals. A template 
of the mean and standard deviation on the usage of each address is then used to detect 
anomalous behavior. The anomaly vectors from any new data are checked using template 
feature vector for a given anomaly and id a match occurs it is declared indicating a fault. 
For simple, unvaried data, a mechanism called the Performance and Anomaly Monitoring 
System, or PAMS is used (Feather 1992; Maxion 1989; Maxion 1990; Maxion and 
 10 
 
Feather 1990). PAMS will highlight anomalous points in time series data by developing a 
prediction of normal behavior, called a template, and tolerance limits called envelopes, 
based on a model of data variance. Current data that falls outside of the tolerance 
envelopes is considered anomalous. The efficiency of the pattern matching approach 
depends on the accuracy of the traffic profile generated. Given a new network, it may be 
necessary to spend a considerable amount of time building traffic profiles. In the face of 
evolving network topologies and traffic conditions, this method may not scale gracefully 
(Thottan and Ji 1993).  
2.4 Statistical Analysis 
As the network evolves, each of the methods described above require significant 
recalibration or retraining. However using statistical approached (Thottan and Ji 2003), it 
is possible to continuously track the behavior of the network. Statistical analysis has been 
used to detect both anomalies corresponding to network failures (Thottan 2000) as well 
as network intrusions (Wang et al. 2002). Interestingly, both of these cases make use of 
standard sequential change point detection approach. The Flooding Detection System, 
(Wang et al. 2002), uses measured network data that describes TCP operations to detect 
SYN flooding attacks. SYN flooding attacks capitalize on the limitation that TCP servers 
maintain all half open connections. Once the queue limit is reached, future TCP 
connection request are denied. The sequential change point detection employed here 
makes use of the nonparametric cumulative sum (CUSUM) method. Using this approach 
on trace-driven simulations, it has been shown that SYN flooding attacks can be detected 
with high accuracy and reasonably short detection times. When detecting anomalies due 
to failures, we are confronted with the problem of detecting a host of potential scenarios. 
 11 
 
Each of these failure scenarios differ in their manifestations as well as their 
characteristics. Thus, it is necessary to obtain a rich set of network information that could 
cover a wide variety of network operations. The primary source for such in depth 
information is in the SNMP MIB data. Designing a failure detection system using MIB 
data necessitates the use of a general method since MIB variables exhibit varying 
statistical characteristics (Thottan 2000). 
2.5 Intelligent Probing for Fault Management 
Intelligent probing makes use of probing technology (CAIDA 2005) for cost 
effective fault diagnosis in computer networks. Probes are test transactions that can be 
actively selected and sent through the network. A distributed system can be represented  
as a ?dependency graph? where nodes can be either hardware elements (e.g., 
workstations, servers, routers) or software components or services, and links can 
represent both physical and logical connections between the elements. Probes offer the 
opportunity to develop an approach to diagnosis that is more active than traditional 
?passive? event correlation and similar techniques. A probe is a command or transaction 
sent from a particular machine called a probing station to a server or a network element in 
order to test a particular service. This work addresses the probing problem using methods 
from artificial intelligence. We call the resulting approach intelligent probing. The probes 
are selected by reasoning about the interactions between the probe paths. For diagnosis 
we use a local inference approximation scheme, for instance a Bayesian network (Huard 
and Lazar 1996) or other probabilistic dependency models (Katzela and Schwartz 1995) 
that avoids the intractability of exact inference for large networks (Brodie et al. 2002). 
 12 
 
 2.6 Proactive Fault Detection using Intelligent Agents 
Current fault management implementations generally rely on the expertise of a 
human network manager, which is translated to a set of rules and then to threshold levels 
on the measurement variables being collected. As networks become more complex and as 
changes more frequently, the human network manager will find hard to maintain 
sufficient level of expertise on a particular network?s behavior (Hood and Ji 1998). Fault 
management research has covered approaches such as expert systems, finite state 
machines, advanced database techniques, and probabilistic methods (Lazer et al. 1992). 
The drawback to all these approaches is that they require a specification of the faults to 
be detected, and it is not feasible to specify all possible faults. Also, changes in network 
configuration, applications, and traffic can alter the type and nature of possible faults, 
which makes modeling them impractical in many cases.  Intelligent agents that reside at 
network nodes use adaptive learning methods to detect abnormal behavior before a fault 
actually occurs (Ekaette and Far 2003). In this approach, the intelligent agent processes 
information collected by Simple Network Management Protocol agents, and uses it to 
detect the network anomalies that typically precede a fault (Yemini 1994). The SNMP 
agents collect information about the network node through their management information 
base, or MIB, which holds a set of variables pertinent to that particular node. The 
intelligent agents learn the normal behavior of each measurement variable and combine 
the information in the probabilistic framework of a Bayesian network (Huard and Lazar 
1996; Pearl 1998). This yields a picture of the network health form the perspective of the 
network node, which can be used to trigger local corrective action or a message to a 
centralized network manager (Hood and Ji. 1998).  
Table1.1 Fault Management Techniques 
Proposed 
System 
Methodology Complexi
ty 
Scalab
le 
Detect new 
fault 
patterns 
Rule based 
approach 
Anomaly detection by 
conventional rule based 
systems. 
  
    Low 
 
   No 
 
      No 
Alarm 
Correlation 
Incorporation of finite 
state machines to model 
alarm sequences that 
occur during and prior to 
fault events. 
 
  Moderate 
 
   No 
 
 
       No 
 
Pattern 
matching 
An anomaly is 
considered as variability 
in network environment. 
 
  Moderate 
   
   No 
  Yes, but     
introduces 
overheads 
Statistical  
analysis 
Employment of statistical 
approaches to 
continuously track the 
behavior of the network. 
 
  Moderate 
 
  Yes 
 
  
       Yes 
Intelligent  
probing 
Use of probing 
technology for fault 
diagnosis. 
    High   Yes        Yes 
Proactive 
fault 
detection 
using agents 
Deployment of software 
agents that detect, 
correlate and selectively 
seek to derive a clear 
explanation of faults. 
 
   High 
   
  Yes 
 
       Yes 
 
  
 
 13 
 
 14 
 
 
Chapter 3  
Anticipatory Systems 
The idea that anticipations influence and guide behavior has been increasingly 
appreciated over the last decades. Anticipations appear to play a major role in the 
coordination and realization of adaptive behavior. Various disciplines have explicitly 
recognized anticipations. For example, philosophy had been addressing the sense of 
reasoning, generalization, and association for a long time. More recently, experimental 
psychology confirmed that the existence of anticipatory behavior processes in animals 
and humans over the last decades (Butz et al. 2003). 
Anticipation is an important characteristic of intelligence. Proactive behavior 
requires anticipatory abilities. A seminal work on anticipatory systems is the one written 
by Rosen (1985). A brief introduction to and serious concern about anticipation follows: 
?Strictly speaking, an anticipatory system is one in which present change of state depends 
upon future circumstances, rather than merely on the present or past. As such, 
anticipation has routinely been excluded from any kind of systematic study, on the 
grounds that it violates the causal foundation on which all of theoretical science must 
rest, and on the grounds that it introduces a telic element which is scientifically 
unacceptable. Nevertheless, biology is replete with situations in which organisms can 
generate and maintain internal predictive models of themselves and their environments, 
and utilize the predictions of these models about the future for purpose of control in the 
present. Many of the unique properties of organisms can really be understood only if 
 15 
 
these internal models are taken into account. Thus, the concept of a system with an 
internal predictive model seemed to offer a way to study anticipatory systems in a 
scientifically rigorous way? (Rosen 1985). 
3.1 Anticipatory Agents 
Perception ability is a required characteristic of agents. Hence, they can be 
designed to perceive current state of self and others. They can also be designed to create 
current image(s) of future state(s). Perception requires mechanisms that enable 
interpretive capabilities. Perception invariably involves sensory qualities, and 
introspection entails accessing sensations and perceptions that agent would introspect. 
Perceptions are derived as a result of interpretation of sensory inputs within the context of 
the current world and agent?s self model. The prototype inference, orientation accounting, 
and situational classification mechanisms could be used to realize the interpretation 
capabilities of an agent. The interpretation process results in perceptions. An anticipatory 
agent needs to deliberate upon perceptions through introspection and reflection to 
anticipate. Introspection is deliberate and attentive because higher-order intentional states 
are themselves attentive and deliberate. An introspective agent should have access 
mechanisms to its internal representation, operations, behavioral potentials, and beliefs 
about its context. Reflection used the introspective mechanisms to deliberate its situation 
in relation to the embedding environmental context. These features collectively result in 
anticipation capabilities that orient and situate an agent for accurate future projections. 
Figure 3.1 presents interpretation and introspection as critical components within the 
micro-architecture of an anticipatory agent. 
 
Figure 3.1 Basic Components for Anticipatory Agents 
 
3.2 Preventive State Anticipation 
A special kind of anticipation is when an anticipated undesired situation makes an 
agent adapt its behavior in order to prevent that this situation will occur. For example, 
assume that we are going out for a walk and that the sky is full of dark clouds. Using our 
internal weather model and our knowledge about the current weather situation, we 
anticipate that it will probably begin to rain during the walk. This makes us foresee that 
our clothes will get wet which, in turn, might cause us to catch a cold, something we 
consider a highly undesirable state. So, in order to avoid catching a cold we will adapt 
our behavior and bring an umbrella when going for the walk. 
In the suggested framework, an anticipatory agent consists mainly of three 
entities: an object system (S), a world model (M) and a meta-level component 
(Anticipator). The object system is an ordinary (i.e., non-anticipatory) dynamic system. 
M is a description of the environment including S, but excluding the Anticipator. The 
importance of having an internal model that includes both the agents as part of the 
 16 
 
environment and (a large portion of) its abilities has been stressed by, for instance, 
(Zeigler 1990). The anticipator makes predictions using M and uses these predictions to 
change the dynamic properties of S. Although the different parts of an anticipatory agent 
certainly are causal systems, the agent taken as a whole, nevertheless behaves in an 
anticipatory fashion. 
When implementing an anticipatory agent, the component S corresponds to some 
kind of reactive system similar to the ones mentioned above. This component is referred 
as the Reactor. The Anticipator corresponds to a more deliberative meta-level component 
that is able to ?run? the world model faster then real time. When doing this, it reasons 
about the current situation compared to the predicted situations and its goals, and decides 
whether (and how) to change the Reactor. The resulting architecture is illustrated in 
Figure 3.2.  
 
 17 
 
 
Figure 3.2. The Basic Architecture of an Anticipatory Agent in Our Model.  
 
 
S 
E 
N 
S 
O 
R 
S 
 
E 
F 
World 
Antici ?  
F 
Model 
pator 
E 
C 
T 
O 
R 
S 
 
 
Reactor 
Anticipatory layer 
 18 
 
We can summarize the operation of the architecture as follows: The sensors receive input 
from the environment. This data is then used in two different ways: (1) to update the 
World Model and (2) to serve as stimuli for the Reactor. The Reactor reacts to these 
stimuli and provides a response that is forwarded to the effectors, which then carry out 
the desired actions(s) in the environment. Moreover, the Anticipator uses the World 
Model to make predictions, and on the basis of these predictions the Anticipator decides 
if, and what, changes of the dynamical properties of the Reactor are necessary. Every 
time the Reactor is modified, the Anticipator should, of course, also update the part of the 
World Model describing the agent accordingly. Thus, the working on an anticipatory 
agent can be viewed as two concurrent processes, one reactive at the object-level and one 
more deliberative at the meta-level (Davidsson 2003). 
 
 
  
 
 
 
 
  
 
 
Chapter 4  
 
Agent-based Modeling of Reactive and Anticipatory Control in Computer Networks 
 
The overall architecture of the simulation is primarily composed of the following 
components as shown in Figure 4.1 
 
Figure 4.1 Reactive and Anticipatory Control
? Network Model: The first component is a basic model of a typical computer 
network. The network model is the basis of design and experimentation of the 
 19 
 
 20 
 
fault management techniques. The network model is designed on a simulation 
framework and comprises of basic network components that would include 
switches, routers, hosts and links. 
? Monitoring Layer: The monitoring layer consists of multiple monitoring agents 
that are embedded over individual network components or on a group of 
components. (Eg: a monitoring agent is allocated for each subnet). The 
monitoring agents may have disjoint functions or potentially overlapping 
responsibilities for increased reliability. 
? Management Layer: The management layer comprises of the reactive or the 
anticipatory agents according to the technique being used. The reactive agent 
works on a rule based approach. It interprets the data acquired from the 
monitoring agents and communicates with the control layer to take corrective 
action (Thottan and Ji, 2003). Similarly, the anticipatory agent works on the 
principle of a Na?ve Bayesian classifier (Luger and Stubblefield, 1989) and 
interacts with the control layer to take corrective action. 
? Control Layer: The control layer is responsible for carrying our corrective action 
with respect to the information if gets from the management layer. The corrective 
action by the control layer is carried out by triggering local corrective action or a 
message to the individual components of the network model (Hood and Ji 1998). 
 
 
 21 
 
4.1 The DEVS Network Model 
The network model is developed in the DEVS (Discrete Event System 
Specification) formalism. A brief description of the simulated network components is as 
follows: 
? Generator: This component is responsible for generation of network packets 
(payloads to be processed by the hosts). 
? Transducer: A Transducer is responsible for calculation of the various network 
performance metrics. 
? Links: Simulation of links is carried out on crucial connections in the network. A 
link is looked upon as a processor and its overloading is simulated as the increase 
in processing time of the processor. 
? Switch: A switch forms a connection between different subnets to facilitate 
forwarding of packets among them. 
? Router: A router follows routing algorithms such as distance vector routing, link 
state routing, hierarchical routing, broadcast routing to facilitate forwarding of 
packets among hosts. We use the Distance Vector Routing strategy, by which the 
packets are forwarded to the best known distance to each destination (the distance 
is measured in terms of processing time of hosts). 
? Hosts: Hosts are entities that process jobs or payload. They can be network 
clients, servers, printers, plotters etc. 
 22 
 
? Monitoring agents: The monitoring agents record performance metrics such as 
network throughput, latency and packet drop rate. It reports these data to the 
management layer, where the reactive agents infer using their rules, while the 
anticipatory agent updates its predictive model. 
? Management agents: The management agents are the reactive and the anticipatory 
agents. They receive data from the monitoring agents and induce the control agent 
to take respective action. 
4.1.1 The DEVS Formalism 
The Discrete Event System Specification (DEVS) formalism (Zeigler and 
Sarjoughian, 2003) provides a means of specifying a mathematical object called a system. 
Basically, a system has a time base, inputs, states, and outputs, and functions for 
determining next states and outputs given current states and inputs. Discrete event 
systems represent certain constellations of such parameters just as continuous systems do. 
For example, the inputs in discrete event systems occur at arbitrarily spaced moments, 
while those in continuous systems are piecewise continuous functions of time. The 
insight provided by the DEVS formalism is the simple way that it characterizes how 
discrete event simulation languages specify discrete event systems parameters. Having 
this abstraction, it is possible to design new simulation languages with sound semantics 
that is easier to understand.  
The conceptual framework underlying the DEVS formalism provides is shown in 
Figure 4.2. The conceptual framework constitutes the following elements: 
? Model: It is a set of instructions for generating data comparable to that observable 
in the real system. The structure of the model is its set of instructions. The 
behavior of the model is the set of all possible data that can be generated by 
faithfully executing the model instructions. 
 
Figure 4.2 Conceptual Framework of the DEVS Formalism (adopted from ?Introduction 
to DEVS Modeling & Simulation with JAVA
TM
?, Zeigler and Sarjoughian 2003) 
 
? Simulator: It exercises the model?s instructions to actually generate its behavior. 
 23 
 
? Experimental Frame: It captures how the modeler?s objectives impact on model 
construction, experimentation and validation. The DEVS experimental frames are 
formulated as model objects in the same manner as the models of primary interest. 
In this way, model/experimental frame pairs form coupled model objects with the 
same properties as other objects of this kind. It will become evident later, that this 
uniform treatment yields key benefits in terms of modularity and system entity 
structure representation. 
 24 
 
The basic objects are related by two relations: 
? Modeling relation linking real system and model defines how well the model 
represents the system or entity being modeled. In general terms a model can 
be considered valid if the data generated by the model agrees with the data 
produced by the real system in an experimental frame of interest. 
? Simulation relation, linking model and simulator, represents how faithfully the 
simulator is able to carry out the instructions of the model. 
The basic items of data produced by a system or model are time segments. These 
time segments are mappings from intervals defined over a specified time base values in 
the ranges of one or more variables. The variables can either be observed or measured. 
The structure of a model may be expressed in a mathematical language called formalism. 
The discrete event formalism focuses on the changes of variable values and generates 
time segments that are piecewise constant. Thus, an event is a change in a variable value, 
which occurs instantaneously. 
In essence, the formalism defines how to generate new values for variables and 
the times the new values should take effect. An important aspect of the formalism is that 
time interval between event occurrences are variable (in contrast to discrete time where 
the time step is a fixed number).  
4.2 Reactive Agents 
Fuzzy reactive agents are used in the determination of the proneness of failure. 
Reactive agents work in a hard-wired stimulus-response manner. Each and every 
situation must be considered in advance. The reactive agent follows a fuzzy rule based 
approach to infer the occurrence of a fault. A system becomes fuzzy system when its 
 25 
 
operations are entirely or partially governed by fuzzy logic or are based on fuzzy sets. A 
crisp set is a collection of distinct (precisely defined) elements. In classical set theory, a 
crisp set can be a superset containing other crisp sets. A superset will represent the 
universe of discourse if it defines he boundaries in which all elements reside. In any 
given situation, a new element can be tested to see whether it belongs to any set. On the 
other hand a fuzzy set is a collection of distinct elements with a varying degree of 
relevance or inclusion (Berkan and Trubatch 1997). The Reactive Agent gets the network 
node information through various performance metrics that are being collected by the 
monitoring agents embedded in the network and uses predefined rules to infer failures 
based on their degradation. Fuzzy rules consist of antecedents and consequents. The 
antecedent variables (one or more variables that represent the conditions to be met before 
any conclusion can be made) comprise of the network throughput and latency. The 
consequents (set of outputs) comprise of proneness of failure for each of the network 
component. A sample set of fuzzy rules that are comprised in the reactive agent for a 
network component (for instance, a host) can be outlined as follows: 
? If Throughput is High and Latency is Low then Fault_proneness is Low  
? If Throughput is Moderate and Latency is Low then Fault_proneness is Low 
? If Throughput is Low and Latency is Low then Fault_proneness is moderate 
? If Throughput is High and Latency is Moderate then Fault_proneness is Low 
? If Throughput is Moderate and Latency is Moderate then Fault_proneness is 
Moderate 
? If Throughput is Low and Latency is Moderate then Fault_proneness is Moderate 
 
4.3 Anticipatory Agents ? A Bayesian Approach 
The architectural framework of the Anticipatory agent is described in the previous 
chapter. It primarily comprises of a predictive model and an anticipator. We make use of 
a Na?ve Bayesian classifier for constructing the predictive model of the anticipatory 
agent. The strength of the Na?ve Bayesian Classifier is that it provides a theoretical 
framework for combining statistical data with the prior knowledge about the problem 
domain for making future projections.  
Before getting to the Na?ve Bayesian, we make an overview of basic probability 
theory. is known as a conditional probability of event A happening given event 
B has occurred. We can express the conditional probability, as follows: 
)/( BAp
)/( BAp
)(/)()|( BpBApBAp ?=  or 
)/( BAp  = (# of times A & B occur) / (# of times B occur).  
The following example shows how a fault is detected by the anticipatory agent by making 
use of the Na?ve Bayesian classifier. Consider the sample of evidence specified in Table 
4.1.  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 26 
 
Table 4.1 Sample Set of Evidence to be Processed by the Na?ve Bayesian Classifier 
 
 
 
The Network takes value High if there is abnormality above a certain threshold in 
a single or both the subnets and is Normal otherwise. The subnet 1 and subnet 2 take 
value High if any of the component in the respective subnets have failed and is Normal 
otherwise. The probability that there can be a fault in host 1 provided we have evidence 
that subnet 1 is high is given by  
  High)  1Subnet  | yes  1p(Host === ==  1Subnet   & yes  1Host    timesof (#  
== High)  1Subnet   timesof (# / High)  6/8  
Similarly, the probability that there can be a fault in host 1 provided we have evidence 
that subnet 1 is high and network is high is given by  
=====  1Host   timesof (#  High) Network High,  1Subnet  |yes  1p(Host  
== Network  &High   1Subnet  & yes &High   1Subnet   timesof (# / High) =  
4/5High) Network ==  
But the number of conditional probabilities in a data set can be very high. Here comes the 
role of Bayes rule. This is derived as follows: 
 27 
 
Given , we know that )(/)()|( BpBApBAp ?=
),(/)()|( ApABpABp ?=  and ).()/()( ApABpABp =?  
Now, since ),()( BApABp ?=?  
)(
)()|(
)|(
Bp
ApABp
BAp = ,  
This is known as Bayesian rule. 
Consider the following problem with application of Baye?s rule: 
Given that (Subnet1 = High, Network = High), is there a fault in host 1? 
We can express this as: 
),1|1( HighNetworkHighSubnetyesHostp ===  
),1(
)1()1|,1(
NormalNetworkHighSubnetp
YesHostpYesHostHighNetworkHighSubnetp
==
====
=  
A general equation for this is: 
?
=
)()|...(
)()|...(
)...|(
21
21
21
kknk
iin
ni
CpCAAAp
CpCAAAp
AAACp  
However, the conditional probability may be difficult to compute. If 
conditional independence among the attributes of the query is assumed, we have the 
following: 
),|...(
21 in
CAAAp
?
=
)()|()...|()|(
)()|()...|()|(
)...|(
21
21
21
kknkkk
iinii
ni
CpCApCApCAp
CpCApCApCAp
AAACp  
The result of Na?ve Bayesian Classification is as follows: 
)],|()([maxarg Result 
kikk
CApCpC ?=  where 
)C of (# / )C  A of (#  )C|p(A
kkiki
?=  
 28 
 
It can be illustrated by the following example. Suppose we are given that (subnet 1 = 
High, and Network = High) and we need to know if there is a fault on host 1? 
From the above Na?ve Bayesian Classifier equation: 
Result
 (host1 =yes) 
= p (Host 1 = Yes) * p (Subnet 1 = High ? Host 1 = Yes) * p (Network =    
                          High ? Host 1 = Yes)   
                          = (6/14)*(1)*(3/6) 
                          = 0.21428 
Result
 (host1 =no) 
= p (Host 1 = No) * p (Subnet 1 = High ? Host 1 = No) *   p (Network = 
                          High ? Host 1 = No)  
                          = (8/14)*(3/8)*(3/8) 
                        = 0.08035 
Hence we see that Result
 (host1 =yes) 
> Result
 (host1 =no)
, and hence the predictive 
model predicts the potential of fault in host 1. The Anticipator thereby notifies the control 
layer to take respective corrective action for host 1. Note that, the set of evidence to the 
Bayesian classifier is continuously updated according to the events taking place in the 
network. After a fixed interval of time (say 5 time units), the classifier computes the 
result (Result = )])|()([maxarg
kikk
CApCpC ?  based on the state of the components at that 
time instant. 
4.4 Adaptive Restructuring of the DEVS Network Model 
Adaptive restructuring can be described as modifying the operating regimes of the 
DEVS network model in an effort to improve its performance based on the network 
conditions at a particular instant of time. We intend to find which fault management 
technique performs better under adaptation. In this section we describe the methodology, 
 29 
 
 30 
 
by which operating regimes of the DEVS network model are modified based on certain 
parameters to preserve the proper functioning of the network. 
In case of the DEVS network model, we need to decide the set of operating 
regimes that we intend to modify in the DEVS environment with the intention that the 
normal operation of the network model is preserved. Based on the above requirement we 
come up with the following modes of operation of certain components of the network 
model. We then club three of those modes to form a particular operating regime.  
Following are the modes of operation 
Sw_Mode_0: Original operation of the switch.  
Sw_Mode_1: Operation of switch with random forwarding of packets to each of 
the subnet 
Ro_Mode_0: Original configuration of the router 
Ro_Mode_1: Modification of packet forwarding strategy with forwarding packets 
to the host with the highest instantaneous value of throughput. 
Li_Mode_0: The original value of link delay as defined. 
Li_Mode_1: Increase value of link delay by 50% 
Li_Mode_2: Decrease value of link delay by 50% 
Following are the operating regimes of the DEVS environment 
1) Sw_Mode_0 AND Ro_Mode_0 AND Li_Mode_1 
2) Sw_Mode_0 AND Ro_Mode_0 AND Li_Mode_2 
3) Sw_Mode_1 AND Ro_Mode_0 AND Li_Mode_0 
4) Sw_Mode_0 AND Ro_Mode_1 AND Li_Mode_1 
5) Sw_Mode_1 AND Ro_Mode_1 AND Li_Mode_0 
6) Sw_Mode_1 AND Ro_Mode_1 AND Li_Mode_1 
7) Sw_Mode_1 AND Ro_Mode_1 AND Li_Mode_2 
Adaptivity can be modeled as a simulated annealing process. Simulated annealing 
consists of capturing a new state of the DEVS model. The new state is obtained by 
applying any of the operating regimes. This is followed by recording the performance 
metric (network throughput) of the network with the new state for a finite amount of 
time. This recorded metric is then compared with the metric obtained in the previous state 
(state of the DEVS model before the operating regimes are modified) and the change in 
performance metric is recorded. The metropolis criterion (Carley and Svoboda 1996) is 
then used to determine whether or not to adopt the new state. The metropolis criteria 
states that, a change is always accepted if the forecast performance for a hypothetical 
organization is better than the known performance of the current organization. A 
?hypothetical organization? can be interpreted as a new organization that can be obtained 
by applying design changes to the current organization. Furthermore, when the forecast is 
poorer that change may still be accepted with a probability which is calculated using the 
Boltzman equation  
Ttt
ePP
/)(cos
0
?
=  
such that cost(t) = 0 ? performance (t), and is the probability of accepting a ?bad? 
design for the previous iteration. The above process is then repeated until the temperature 
reaches a freezing point or until the simulation time ends, whichever is earlier. 
Temperature is defined as the model?s current level of risk aversion. In other words, the 
degree to which the DEVS network model is open to accept change of state. The 
Temperature always drops after every new state has been adopted for the DEVS model. 
0
P
 31 
 
Freezing point is the point at which a state is in its final form and no more adaptation or 
change is allowed. (Carley and Svoboda 1996).  
To implement the above notion in the DEVS environment, we include an 
additional component in the network design called as the ?annealer?. The total simulation 
time for the network operation is fixed to 1000 time units and the annealer is made to 
operate after every 100 time units. After application of a new state, the annealer records 
the performance metric for a finite amount of time as mentioned above, this finite time is 
fixed to 50 time units. Each operation of the annealer can be termed as iteration. 
The annealer operates based on the following algorithm. 
1. Set the initial value of temperature T=0.433 and ? = 0.975 where ?  is the rate at 
which the DEVS model learns to be risk averse. The initial value of temperature 
(0.433) corresponds to a probability of 0.9 for changes to be accepted. 
2. Derive the new state of the DEVS model by applying an operating regime at 
random as described above and record the performance metric (network 
throughput) of the DEVS model in the new state. 
3. If the recorded performance metric is better than the one obtained in the old state, 
continue with step 2, else proceed with step 4 
If the new recorded metric is poorer than the older ones, use the metropolis 
criteria to determine whether the new state can be adopted in the network.  
4. Set the new values of temperature and probability 
0
P = P 
)()1( tTtT ?=+ ?  
 32 
 
5. Continue steps 2 thru 5 until a freezing point is reached or the simulation time 
ends, whichever is earlier. (P = 0.55 and T = 0.345).  
The above algorithm, followed by the annealer can be depicted by the following control 
flow model:  
 
Figure 4.3 Control Flow Diagram Depicting the Operation of the Annealer 
 
We then compare the final results of performance metrics when simulated 
annealing is implemented, with our original results (without adaptation) for each of the 
fault management technique to analyze the network performance under adaptive 
reconfiguration.  
 33 
 
 34 
 
Chapter 5  
Detailed Design of the DEVS Network Model 
Chapter 4 describes the detailed architecture for agent based modeling of reactive 
and anticipatory control in computer networks. This chapter provides details pertaining to 
design of the DEVS network simulation model that implements a distributed network 
monitoring system (DNM) (Prietula et al. 1998). 
5.1 The DEVS Basic and Coupled Models 
In the DEVS formalism, one must specify 1) basic models from which larger ones 
are built, and 2) how these models are connected together on hierarchical fashion. A basic 
model contains the following information 
? the set of input ports through which external events are received 
? the set of output ports through which external events are sent 
? the set of state variables and parameters: two state variables are usually present, 
?phase? and ?sigma? (in the absence of external events the system stays in the 
current ?phase? for the time given by ?sigma?) 
? the time advance function which controls the timing of internal transitions -  when 
the ?sigma? state variable is present, this function just returns the value of 
?sigma?.
? the internal transition function which specifies to which next state the system will 
transit after the time given by the time advance function has elapsed. 
 35 
 
? the external transition function which specifies how the system changes state 
which an input is received ? the effect is to place the system in a new ?phase? and 
?sigma? thus scheduling it for a next internal transition; the next state is computed 
on the basis of the present state, the input port and the value of the external event, 
and the time that has elapsed in the current state. 
? the confluent transition function which is applied when an input is received at the 
same time that an internal transition is to occur -  the default definition simply 
applies the internal transition function before applying the external transition 
function to the resulting state 
? the output function which generates an external output just before an internal 
transition takes place. 
Basic models may be coupled in the DEVS formalism to form a Coupled model. A 
coupled model tells how to couple (connect) several component models together to form 
a new model. This latter model can itself be employed as a component in a larger coupled 
model, thus giving rise to a hierarchical construction. A coupled model contains the 
following information 
? the set of components 
? the set of input ports through which external events are received 
? the set of output ports through which external events are received 
? the external input coupling which connects the input ports of the coupled model to 
one or more of the input ports of the components 
? the external output coupling which connects output ports of components to output 
ports of the coupled model, thus when an output is generated by a component it 
 36 
 
may be sent to a designated output port of the coupled model and thus be 
transmitted externally 
? the internal coupling which connects output ports of components to input ports of 
other components , hence when an input is generated by a component, it may be 
sent to the input ports of designated components (in addition to being sent to an 
output port of the coupled model) (Zeigler and Sarjoughian, 2003). 
5.2 The Distributed Network Monitoring System  
A distributed network monitoring (DNM) system consists of a hierarchical 
structure with a set of network components and is endowed with monitoring agents that 
cooperate in monitoring the network. The network can be divided into several regions or 
sub networks (in our case we consider two distinct subnets). Within each sub network, a 
set of monitoring agents are jointly responsible for maintaining up-to-date models of host 
and router performance and availability. These monitoring agents belong to the 
monitoring layer as described in the previous chapter. Monitoring agents are responsible 
for notifying the management layer regarding the status of network components as well 
as sub networks. The management layer which consist of the reactive and the anticipatory 
agents, utilize the data acquired from the monitoring layer to make control decisions for 
management of network faults. Figure 5.1 shows a hypothetical network structure based 
on the notion of distributed network monitoring system (DNM). 
 
Figure 5.1 Designed Network Model 
5.3 Component Overview 
We give a brief description and functions of the crucial components in our 
network model  
? Experimental frame 
The experimental frame primarily consists of 3 sub components, the generator, 
the transducer and the fault injection mechanism. The generator generates packets 
to be processed by the network components on the basis of a specific inter-arrival 
time. The transducer is responsible for computation of network performance 
metrics (throughput, latency and drop rate of packets). The throughput is defined 
as the average rate of job departures from the architecture, estimated by the 
 37 
 
number of jobs processed during the observation interval, divided by the length of 
the interval. A job?s turnaround time is the length of time between its arrival to 
the processor and its departure as a completed job. The drop rate of packets is 
defined as the percentage of packets dropped due to network faults. The fault 
injection mechanism, which is embedded in the experimental frame, generates 
?fault packets? at a random rate. A ?fault packet? when encountered by a network 
component, induces a certain level of degradation in the throughput and latency of 
the component. 
 
Figure 5.2 Experimental Frame 
The activity diagram for the experimental frame is shown in Figure 5.3. The 
generator and the fault injection mechanism start as soon as the simulation begins. 
The transducer computes the performance metrics based on the number of packets 
and the number of faults incurred. The packet generator and the fault injection 
mechanism cease to operate when the simulation time ends. 
 
 
 38 
 
 
 
Figure 5.3 Activity Diagram of the Experimental Frame  
The interaction among the different components of the experimental frame is 
shown by a sequence diagram in Figure 5.4. As shown in the sequence diagram, 
the generator initially starts generating network packets to be processed by the 
hosts in the network. As soon a packet is generated, the transducer is 
simultaneously triggered. The transducer records the simulation time the packet is 
generated. On completion of packet processing by any of the hosts in the network, 
the transducer records the completion time. Based on the arrival and completion 
time parameters, it computes the value of throughput and turn around time. If the 
transducer fails to record the completion time due to packet loss, it records the 
packet as being lost and appends it to the list of dropped packets which is used to 
calculate the drop rate of packets. On completion of the simulation time, when no 
more metrics are to be recorded, the transducer triggers the generator and the fault 
injection mechanism to cease generation of network packets and fault packets 
respectively.  
 39 
 
 
Figure 5.4 Sequence Diagram for Interactions of Various Components within the 
Experimental Frame  
Switches, routers, and hosts:  
The operational specifications of the switch, router, and hosts are 
discussed in the previous chapter. We now give a brief description about the 
effect on each of these components due to a fault. The switch, router, and the 
hosts degrade in a similar way when a fault packet in encountered. The 
degradation can be seen as a 3 step process. On encountering the first fault packet, 
the component?s normal working is disrupted and it?s said to change to a ?low 
degradation? state or in other words when a fault is encountered, the processing 
time of these components is doubled. This can be interpreted as the fact that 
 40 
 
degradation causes the components to delay the operation they are carrying out. 
This in turn affects the throughput, turn around time and drop rate of packets 
pertaining to that component and hence the subnet to which it belongs and 
consequently the complete network. On encountering a control packet, the 
operation is again returned to normal. Similarly, if another fault packet is 
encountered before a control packet, the components degrade further to a state of 
?moderate degradation? and furthermore ?high degradation? after which it 
completely ceases to operate. The activity diagram describing the behavior of a 
network component on encountering fault packet(s) is as shown in Figure 5.5.  
 
Figure 5.5 Activity Diagram of Network Component due to Fault(s). 
 
 41 
 
 42 
 
? Links 
Links are the interface between different components of the network. They 
regulate the flow of packets. Since we need to vary the delay of the links for 
experimental purpose, the links should be implemented in such a way that the 
variation of delay is practical. Hence we implement a link as a form of processor 
which can be considered as an entity which takes some finite amount of time (link 
delay) to process a job and forward it. The DEVS processor component, which 
models a link, has no buffering capability. Therefore, when a job arrives while the 
processor is busy, it simply ignores it. This affects the drop rate of packets. 
5.4 Monitoring Agents 
Monitoring agents are deployed within each of the subnets to record the 
individual performance metrics of components. These metrics include the throughput, the 
turn around time, and the drop rate of packets. The fault proneness of network 
components is being reported by the monitoring agents to the reactive or anticipatory 
agents, which in turn take the required action according to their functionality. The 
activity diagram depicting the behavior of an individual monitoring agent is shown in 
Figure 5.6. 
 
 
Figure 5.6 Activity Diagram of a Monitoring Agent 
The monitoring agents are being deployed at various levels in the DEVS network 
model. Those include, (1) the component level monitoring agents that monitor the 
individual performance of routers and hosts, (2) the subnet level monitoring agents that 
monitor a subnet as whole, and (3) the network monitoring agent that monitors the 
performance of the complete network. The sequence diagram shown in Figure 5.7 depicts 
the interaction between the monitoring agents at different levels in the DEVS network 
model. 
 43 
 
 
Figure5.7 Sequence Diagram showing Interaction between Monitoring Agents 
 
5.5 Reactive / Anticipatory agents 
The data received from the monitoring agents form as inputs to the reactive and 
anticipatory agents. The reactive agent functions on simple fuzzy rules while the 
anticipatory agent functions on the basis of a Na?ve Bayesian classifier as described in the 
previous chapter. Figure 5.8 shows the activity diagram of the reactive agent. 
 
 44 
 
 
Figure 5.8 Activity Diagram of a Reactive Agent 
As shown in the Figure 5.8, the reactive agent gets the data from the monitoring 
agent, which has the details of proneness of faults in the various components among the 
network. The reactive agent then matches those with the predefined fuzzy rules and 
triggers the control agent to take respective action. The activity diagram of the 
anticipatory agent is as shown in Figure 5.9. In contrast to the reactive agent, the 
anticipatory agent has an additional ?learner? component which builds a predictive model 
of fault proneness in the network based on the Na?ve Bayesian Classifier. It then 
computes probabilities of failure of the components as described in the Chapter 4. and 
thereby triggers the control agent to take corrective action. 
 45 
 
 
Figure 5.9 Activity Diagram of an Anticipatory Agent 
The interactions between the monitoring and the management agents (reactive and 
anticipatory) are shown with a sequence diagram in Figure 5.10. From the sequence 
diagram, it can be seen that the reactive agent is supplied data only by the component 
level monitoring agents. Since the reactive agent works on simple fuzzy rules, it 
interprets the data obtained by the component level monitoring agents to determine the 
components that require corrective action. The anticipatory agent, on the other hand, 
predicts the proneness of fault among the various network components, and hence it 
needs a complete picture of the network. The monitoring agent at the subnet and the 
network level help the anticipatory agent to dynamically learn the network behavior for 
the proneness of faults and thereby constantly update the evidence of the Na?ve Bayesian 
Classifier.  
 
 
 46 
 
 
Figure 5.10 Sequence Diagram for Interaction between Monitoring and 
Management Agents 
5.6 Control Agent 
The control agent operates on the basis of the output from the reactive and 
anticipatory agents. The data it obtains from the management agents consists of details 
pertaining to the network component which is under degradation. The control layer then 
triggers corrective action to those components in the form of corrective messages (Hood 
and Ji 1998). When these corrective messages are encountered by network components, 
 47 
 
they regain their normal operation. The activity diagram of the control agent operation is 
shown in Figure 5.11. 
 
Figure 5.11 Activity Diagram of a Control Agent 
The interactions among the control agent and the management agents are as shown below 
(Figure 5.12). As described above, the control agent is only responsible to trigger 
corrective action to the respective components that are under degradation or may be 
prone to degradation based on the output from the reactive or anticipatory agent 
respectively.  
5.7 Annealer 
The function of the annealer is to facilitate dynamic updating of the parameters of 
the DEVS model based on different operating regimes. The annealer operates 
independently of the management and control agents as described in the previous chapter. 
The activity diagram depicting the various states traversed is as shown in Figure 5.13. 
The annealer starts by modifying the operating regimes of the DEVS network based on 
certain parameters to reconfigure the network. It then records performance metrics of the 
 48 
 
reconfigured network for a finite amount of time. If the recorded metrics of the 
reconfigured network are better than the previous configuration, the new configuration 
with the new operating regimes is adapted. 
 
 
Figure 5.12 Sequence Diagram for Management and Control Agents. 
 
If the metrics recorded are poorer, the metropolis criterion is used to decide 
whether or not to adopt the new state. According to the metropolis criteria, the new 
configuration (with poor performance) is accepted with a certain probability that depends 
on the temperature of the network (the probability decreases as temperature decreases). 
Temperature is defined as the degree to which the DEVS network model is open to 
 49 
 
accept change of state. The temperature drops after every new state has been adopted for 
the DEVS model. 
 
 
Figure 5.13 Activity diagram of the annealer 
A complete class diagram of all the components described above is shown in 
Figure 1 of Appendix. The class Entity, Devs, Coupled, Viewabledigraph and 
Viewableatmic form the basic components of the DEVS framework. Entity is the base 
class of objects to be put into containers. The class Devs contains two main model 
classes, atomic and coupled. The class atomic realizes the atomic level of the underlying 
DEVS formalism. It has elements corresponding to each of the parts of this formalism. 
Coupled is the major class which embodies the hierarchical model composition 
constructs of the DEVS formalism. A coupled model is defined by specifying its 
component models. Components are instances of the Devs class thus enabling 
hierarchical composition. Class Viewable digraph is a derived class of coupled which 
 50 
 
 51 
 
enables to define a coupled model in an explicit manner. In addition to components, it 
enables the specification of the coupling relation, which establishes the desires 
communication links among the components (internal coupling) and between them and 
the external world (external input and external output coupling). The processor class is a 
simple processor representing storage of jobs and passage of time for its execution. The 
class switch, control, reactive, subnet1, subnet2, generator, transducer, anticipator, 
monitor have their respective functions as described in the above sections and in previous 
chapters. The Multiserver coordinator routes incoming jobs for processing and collects 
results for final output. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 52 
 
Chapter 6  
Experiment Design and Simulation Results 
The following chapter describes the detailed experimental design of the network 
model designed in DEVS followed by experimental results. We make use of Borland 
Jbuilder
TM
 as the Integrated Development Environment (IDE) for implementing the 
network model in DEVS. 
6.1 Experiment Design 
The DEVS-based network model comprises of two subnets. Each subnet includes 
a router and 3 hosts. An experimental frame generates the packets to be processed by the 
network components on the basis of a specific inter-arrival time. A fault injection 
mechanism is also embedded in the experimental frame which generates ?fault packets? 
at a random rate. A ?fault packet? when encountered by a network component, induces a 
certain level of degradation in the throughput and latency of the component. Monitoring 
agents are deployed throughout the network over each of the network components to 
record the performance metrics (throughput, latency and the drop rate) throughout the 
simulation. The throughput is defined as the average rate of job departures from the 
architecture, estimated by the number of jobs processed during the observation interval, 
divided by the length of the interval. A job?s turnaround time is the length of time 
between its arrival to the processor and its departure as a completed job. The drop rate of 
packets is defined as 
the percentage of packets dropped due to network faults. A sample screen shot of the 
DEVS environment is shown in Figure 6.1. 
 
Figure 6.1 Simulated Model in DEVS Environment 
6.2 Simulation Results 
Each of the fault management techniques (reactive, alarm correlation and 
anticipatory) are simulated by varying the levels of the link delay and the complexity of 
the network. The number of replications for each fault management technique is 270. 
This results from the sum of the replications under each combination of configuration 
levels (i.e., link delay, network complexity). Each replication is run for 1000 time units. 
The t test is performed with respect to the mean values obtained for throughput, 
 53 
 
turnaround
and the con
percen
anticipato
hence th vals 
obtained for the reactiv eters 
comprises of zero. Hence the d
significant. Table 6.1 shows th
 
 time and the drop rate of packets for each of the fault management technique 
fidence intervals are recorded. From the confidence intervals obtained at 95 
t level, we observe that the intervals obtained for the reactive vs. anticipatory and 
ry vs. alarm correlation for all the three parameters does not contain zero and 
e difference in their mean values is statistically significant. The inter
e vs. alarm correlation technique for all the three param
ifference between their means is not statistically 
e results of the t-test.  
Table 6.1 Confidence Intervals for Performance Metrics 
  Reactive 
Alarm 
Correlation Anticipatory 
 
    Reactive ------- (-0.001,0.019) 
(-0.025,-
0.006) 
Alarm 
Correlation (-0.019,0.001) ------- 
(-0.035,-
0.014) 
Anticipatory (0.006,0.025) (0.014,0.035) ------- 
 
                             6.1.1 Performance of Network Throughput 
 
54 
 
 
 
 
  Reactive Alarm Correlation Anticipatory
    Reactive ------- (-38.69,15.55) 
(25.11 , 
70.19) 
Alarm 
Correlation (-15.55, 38.69) ------- 
(33.62, 
84.81) 
Anticipatory (-70.19, -25.11) (-84.81, -33.62) ------- 
 
                          6.1.2 Performance of Network Turnaround Time 
 
 
 
  Reactive Alarm Correlation 
 
Anticipatory
     Reactive ------- (-8.54, 1.96) (5.67, 9.59) 
Alarm 
Correlation (-1.96,8.54) ------- (5.9, 15.93) 
Anticipatory (-9.59, -5.67) (-15.93, -5.9) ------- 
 
                     6.1.3 Performance with Respect to Drop Rate of Packets 
 
 
We analyze the behavior of each of the performance metrics (throughput, turnaround 
time and drop rate of packets) with respect to the variation of link delay and complexity 
of the network, for each of the fault management techniques. We plot response surfaces 
with respect to each of the dependent variables that include throughput, turn around time 
and drop rate of packets, against the independent variables (link delay and complexity). 
Figure 6.2 shows the responses obtained. 
 
Figure 6.2 Response Surfaces 
 55 
 
 56 
 
We observe that the throughput obtained in each of the techniques is significantly 
better at a lower value of link delay while throughput is less dependent on the complexity 
of the network. There is a considerable improvement in the turnaround time at higher 
levels of complexity. For the drop rate of packets, the percentage is significantly less at 
lower values of link delay; also, there is a significant reduction of drop rate of packets at 
higher values of complexity. As shown in Figure 6.2, performance parameters for the 
alarm correlation technique shows a linear dependency with respect to variation of the 
link delay. Also, reactive and anticipatory techniques are less prone to link delay until a 
certain threshold. The linearity exhibited by the alarm correlation technique can be 
explained by the fact that the fault patterns are recorded beforehand and hence the 
variation of the performance metrics is linear, whereas for the other two techniques this is 
not the case. 
6.2.1 Sensitivity Analyses 
We perform sensitivity analysis based on each level of network complexity. We 
fix the value of complexity and analyze the variation of each of the performance metrics 
with respect to variation of link delay. The graphs obtained for the reactive, alarm 
correlation and the anticipatory techniques are shown in Figure 6.3, Figure 6.4, and 
Figure 6.5 respectively. 
 
 
 
 
 
 
 
 
 
Figure 6.3 Sensitivity Analyses for Variation of Complexity for Reactive Technique  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 57 
 
 
 
 
Figure 6.4 Sensitivity Analyses for Variation of Complexity for Alarm Correlation Technique  
 
 
 
 
 
 
 
 
 
 
 
 58 
 
`
 
 
 
Figure 6.5 Sensitivity Analyses for Variation of Complexity for Anticipatory Technique  
 
From the graphs above, we interpret the effect of complexity on each of the 
performance metrics for the fault management techniques. 
a. Throughput 
It can be observed that the complexity has no observable effect on the throughput 
in case of the reactive technique. The throughput is observed to decline with 
respect to the increase in link delay of the network and is seen to be constant for 
increase in link delay from level 3 to level 4. For the alarm correlation technique, 
variation of throughput is observed to be almost linearly dependent on the link 
delay. At complexity level 2, the throughput is observed to be constant for level 2 
 59 
 
 60 
 
and 3 of link delay.  In case of the anticipatory technique, at low complexity level, 
it can be observed that throughput is not sensitive to link delay at the initial levels, 
after which it varies linearly with link delay. The throughput is again observed to 
be independent at very high levels of link delay. For moderate and high levels of 
complexity, the throughput is observed to be sensitive to link delay at low and 
moderate levels of link delay. At high levels of link delay, the throughput is less 
sensitive to increase in link delay. 
b. Turnaround Time 
For the reactive technique, the turnaround time is less sensitive to link delay at 
very low and very high levels of link delay. This trend is observed for all levels of 
complexity for the reactive technique. For the alarm correlation technique, the 
turnaround time is observed to be highly dependent on link delay for level 1 and 2 
of complexity. At level 3 of complexity, the turnaround time is seen to be less 
sensitive at higher values of link delay. For the anticipatory technique, the 
turnaround time is less sensitive to very low and very high levels of link delay for 
low complexity. The turnaround time is seen to be more sensitive to link delay 
with increase in complexity levels, except for level 3 and level 4 of link delay, 
where the turnaround time is seen to depreciate to some extent. 
c. Drop Rate of Packets 
The drop rate of packet is observed to follow a very similar trend as the variation 
in turnaround time as described in the above section. 
 
 
6.2.2 Results with Adaptive Control using Annealer 
The experimental design is kept the same with addition of the annealer 
component. We perform 30 replications for each of the fault management technique with 
network adaptation. The link delay and complexity values are fixed to ?Low? and ?Level 
1? respectively. We then compare the final values of throughput, turn around time and 
drop rate for each of the replication with enabling and disabling network adaptation. 
Figure 6.6 and 6.7 show the graphs obtained for the reactive, alarm correlation, and 
anticipatory technique.  
Comparision of Network Throughput for Reactive 
technique
0
0.1
0.2
0.3
1 3 5 7 9 11131517192123252729
Replications
Thr
oughpu
t
With Annealer
Without
Annealer
 
 
Comparision of Network Turnaround Time for 
Reactive Technique 
0
50
100
150
1 3 5 7 9 11 13 15 17 19 21 23 25 27 2 9
Replications
Tur
na
r
ound 
Ti
m
e
With Annealer
Without
Annealer
 
 
Comparision of Drop Rate of Packets for Reactive 
Technique
0
10
20
30
1 4 7 10131619222528
Re plications
D
r
op
 R
a
t
e
 of
 
P
a
cket
s
With Annealer
Without
Annealer
 
 
Figure 6.6 Performance Metrics with and without Annealer for Reactive Technique 
 61 
 
 
Figure 6.7 Performance Metrics - Alarm Correlation and Anticipatory Technique 
From Figure 6.6, we see that the reactive technique performs exceptionally better with 
network adaptation in terms of performance metrics. In contrast, the performance metrics 
obtained for the alarm correlation and anticipatory technique are not satisfactory (Figure 
6.7). This can be explained by the fact that the adaptation works independently of the 
recorded behavior of faults in case of alarm correlation technique and independently of 
the predictive model (the Na?ve Bayesian classifier) in case of anticipatory technique. 
This leads to a high number of control packet generations due to lack of communication 
between the control agent and the annealer. This is consequently responsible for the 
 62 
 
 63 
 
degradation of network performance for the alarm correlation and the anticipatory 
technique. For example, the na?ve Bayesian classifier triggers the control agent based on 
the evidence it collects. This piece of evidence may not be correct once the network is 
reconfigured by the annealer.  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 64 
 
 
Chapter 7 
Conclusions  
Network fault management is a crucial area in the field of computer networks. 
The goal of fault management is to detect, log, notify users of, and (to the extent 
possible) automatically fix network problems to keep the network running effectively. 
Because faults can cause downtime or unacceptable network degradation, fault 
management is perhaps the most widely implemented element of the ISO network 
management elements. Our approach towards anticipatory fault management provides a 
novel methodology of applying agent based behavioral anticipation towards effective 
fault management. The comparative analyses presented in the previous chapter describe 
the effectiveness of our technique with respect to reactive techniques. It can be observed 
that the anticipatory technique performs significantly better compared to the reactive and 
alarm correlation techniques with respect to network throughput, turnaround time, and 
the drop rate of packets. The results obtained from the network adaptation methodology 
shows that the annealer technique performs exceptionally well for the reactive technique 
as compared to the alarm correlation and anticipatory technique. 
7.1 The Limitations of Anticipatory Fault Management  
In our approach, the topology of the network is assumed to be fixed and static. 
The Na?ve Bayesian approach will cease to work if the topology is changed (dynamic) 
and hence cannot be applied to Ad Hoc networks. Furthermore, network adaptation 
cannot be effectively carried out for anticipatory technique due to lack of communication 
 65 
 
between the control agent and the annealer. To get around this problem, an additional 
component must be incorporated in the architecture to act as an interface between the 
annealer and the control agent. 
Another limitation to our approach is the level of network detail being considered. 
For instance, we do not consider any hardware details like faults occurring in the network 
components due to hardware failure or physical wear and tear. One more problem with 
the adaptation technique is the selection of operating regimes that are to be modified. The 
operating regimes are decided based on the DEVS toolkit; hence these regimes do not 
include possible adaptation parameters such as addition, deletion, or modification of 
component functionality.    
7.2 Future Work 
Given the limitations from the previous section, the improvement on the 
anticipatory technique for network fault management requires additional efforts and 
better tools that would capture additional details about network operation and network 
components. Furthermore, a framework needs to be devised that would enable 
anticipatory fault management in Ad Hoc networks, whereby the network topology needs 
to be modified dynamically. A methodology needs to be invented to undertake hardware 
implementation of the design. This could be done by embedding the Anticipatory control 
in an Integrated Circuit (IC) and embedding the fabricated IC in a real time network. 
Another important issue be taken care is the improvement of the network adaptation 
framework. The Na?ve Bayesian classifier should be designed with additional logic to 
consider the decisions taken by the annealer. The same can be also achieved by having an 
additional component that would act as an interface between the control agent and the 
 66 
 
annealer. The additional efforts described above would result in a high degree of 
improvement in network fault management. 
The methodology of agent based behavioral anticipation towards fault 
management in networks can be effectively deployed in the present computer industry 
and will effectively contribute towards reducing losses incurred due to network faults. 
Making improvements on our technique as mentioned above will provide a versatile 
framework for effective fault management in computer networks. 
    
 
 
 
 
 
 
 
 
 
 
 
 
 
 67 
 
REFERENCES 
A. Bouloutas, G. Hart, and M. Schwartz, ?On the design of observers for failure detection 
of discrete event systems,? in Network Management and Control. New York: Plenum, 
1990. 
A. Lazar, W. Wang, and R. Deng, ?Models and algorithms for network fault detection 
and identification: A review,? in Proc. IEEE Int. Contr. Conf., 1992. 
B. Ekdahl, E. Astor and P. Davidson, ?Towards Anticipatory Agents?, Intelligent Agents: 
ECAI-94 Workshop on Agent theories, architectures and languages. pp 191-202. 
1994.   
B. Zeigler and H. Sarjaughian, ?Introduction to DEVS modeling and Simulation with 
JAVATM: Developing component-based Simulation Models?, Arizona State 
University, August 2003. 
CAIDA. Cooperative association for internet data analysis. [Online]. Available: 
http://www/caida.org/Tools. 
Carley K. and Svoboda D.,? Modeling Organizational Adaptation as a Simulated 
Annealing Process?, Sociological methods & research, Vol. 25 No. 1, August 1996. 
Corn, P.A., Dube, R., McMichael, A.F., & Tsay, J.L. (1988), ?An autonomous distributed 
expert system for switched network maintenance?. In Proceedings of IEEE 
GLOBECOM?88(pp.1530-1537).
 68 
 
C. Hood and Chaunyi Ji, ?Intelligent agents for proactive fault detection?, IEEE, March 
1998. 
Davidsson P, ?A Framework for Preventive State Anticipation?, M.Butz et al. (Eds.): 
Anticipatory behavior in adaptive learning systems, LNAI 2684, pp. 151-166, 2003. 
Edidiong Uyai Ekaette and Behrouz Homayoun Far, ?A framework for distributed fault 
management using intelligent software agents?, CCECE 2003 ? CGGEI 2003, IEEE, 
2003. 
F. Feather and R. Maxion, ?Fault detection in an ethernet network using anomaly 
signature matching,? in Proc. ACM SIGCOMM, vol. 23, San Francisco, CA, Sept. 
1993, pp. 279?288. 
Frank E. Feather. ?Fault Detection in an Ethernet Network via Anomaly Detectors?. PhD 
thesis, Department of Electrical and Computer Engineering, Carnegie MelIon 
University, 1992. 
G. Jakobson and M. D.Weissman, ?Alarm correlation,? IEEE Network, vol. 7, pp. 52?59, 
Nov. 1993. 
H. Wang, D. Zhang, and K. G. Shin, ?Detecting syn flooding attacks,? in Proc. IEEE 
INFOCOM, 2002. 
I. Katzela and M. Schwarz, ?Schemes for fault identification in communication    
networks?, IEEE/ACM Trans. Networking, Vol. 3, pp. 753-764, Dec. 1995.   
I. Rouvellou and G. Hart, ?Automatic alarm correlation for fault identification,? in Proc. 
IEEE INFOCOM, Boston, MA, Apr. 1995, pp.553?561. 
 69 
 
J. Agre, ?A message-based fault diagnosis procedure,? Proceedings of the ACM 
SIGCOMM conference on Communications architectures & protocols, Vol 6 Issue 3, 
Aug 1986.  
Joseph, C., Kindrick, J. Muralidhar, K. So, C. & Toth-Fejel, T. (1989) ?MAP fault 
management expert system?. In Meandzija, B.& Westcott, J. (Eds.) Integrated 
Network Management, I.North-Holland: Elsevier Science Publishers B.V. 
J. F. Huard and A. A. Lazar, ?Fault Isolation Based on Decision-Theoretic 
Troubleshooting?, Technical Report 442-96-08, Center for Telecommunications 
Research, Columbia University, New York (1996). 
J. Pearl, ?Probabilistic Reasoning in Intelligent Systems: Networks of Plausible 
Inference?, Morgan Kaufmann, San Mateo, Calif., 1988. 
Hood C. and Ji C., ?Intelligent agents for proactive fault detection?, IEEE Information 
Communication Conference (Infocom 97). 1997. 
K. Carley and D. Svoboda, ?Modeling Organizational Adaptation as a Simulated 
Annealing Process?, Sociological methods and research, Vol. 25 No. 1, August 1996. 
L. Lewis, ?A case based reasoning approach to the management of faults in 
communication networks,? in Proc. IEEE INFOCOM, vol. 3, San Francisco, CA, 
Mar. 1993, pp. 1422?1429. 
L. Lewis and G. Dreo, ?Extending trouble ticket systems to fault diagnosis?, IEEE 
Network, vol. 7, pp. 44-51, Nov 1993. Kandel A, ?Fuzzy Expert Systems.? CRC 
press, 1991. 
 70 
 
Luger G. F. and  Stubblefield W. A., ?Artificial Intelligence: Structures and Strategies for 
Complex Problem Solving?, The Benjamin/Cummings Publishing Company, Inc. 
1989 
M. Brodie, I. Rish and S. Ma. ?Intelligent probing: A cost ?effective approach to fault 
diagnosis in computer networks?, IBM systems journal, Vol 41, No.3, 2002. 
M. Thottan and Chuanyi Ji, ?Anomaly detection in IP Networks?, IEEE Transactions on 
signal processing, vol.51, No.8, August 2003. 
M. Thottan, ?Fault detection in ip networks,? Ph.D. dissertation, Rensselaer Polytech. 
Inst., Troy, NY, 2000. Under patent with RPI. 
M. Butz, O. Siguad and P. Gerard, ?Internal Models and Anticipations in Adaptive 
Learning Systems?, Anticipatory behavior in adaptive learning systems, LNAI 2684, 
pp. 86-109, 2003. 
Pat Langley and Herbert A. Simon. (1995) ?Applications of Machine Learning?. 
Communications of the ACM. Vol.38. No.11 
Prietula M., Carley K., and Gasser L., ?Simulating organizations: computational models 
of institutions and groups?, Menlo Park, CA: AAAI Press/MIT Press, 1998. 
R. Herdman, ?Information security and privacy in network environments?, The Office of 
Technology Assessment (OTA), September 15, 1994.   
Roy Maxion. ?Unanticipated Behavior as a Cue for System-Level Diagnosis?, In 8th 
International Pheonix Conference on Computers and Communications, IEEE, March, 
1989. 
Roy A. Maxion. ?Anomaly Detection for Diagnosis?. In Twentieth International 
Symposium on Fault-Tolerant Compufing. IEEE, March, 1990. 
 71 
 
Rosen. R, ?Anticipatory Systems ? Philosophical, Mathematical and Methodological 
Foundations?. Pergamon Press, New York.  
R.A. Maxion and F.E. Feather. ?A Case Study of Ethernet Anomalies in a Distributed 
Computing Environment?. IEEE Transactions on Reliability 39(4),433-443, 1990. 
T Oates, ?Fault identification in Computer Networks: A review and a New    
Approach?, CS-TR 95-113. 1995. 
T. D. Ndousse and T. Okuda, ?Computational intelligence for distributed fault 
management in networks using fuzzy cognitive maps,? in Proc. IEEE ICC, Dallas, 
TX, Jun. 1996, pp. 1558?1562. 
Thottan M. and Ji C., ?Anomaly detection in IP Networks?, IEEE Transactions on signal 
processing, vol. 51, No. 8, August 2003. 
Wright, J.R., Zielinski, J.E. & Horton, E.M. (1988) ?Expert systems development: the 
ACE system?. In Liebowitz, J. (Ed.) Expert System Applications to 
Telecommunications.New York: John Wiley & Sons. 
Yamahira, T., Kiriha, Y. & Sakata, S. (1989) ?Unified fault management scheme for 
network troubleshooting expert system?. In Meandzija, B. & Westcott, J. (Eds.) 
Integrated Network Management, I.North-Holland: Elsevier Science Publishers B.V. 
Y. Yemini, ?A Critical Survey of Network Management Protocol Standards,? in 
Telecommunications Network Management into the 21st Century, S. Aidarous and T. 
Plevyak, eds, IEEE Press, Piscataway, N.J.1994. 
Zeigler B.P., ?Object ?Oriented Simulation with Hierarchical. Modular Models ? 
Intelligent Agents and Endomorphic Systems?, Academic Press, 1990.    
 72 
 
Zeigler B. and Sarjoughian H. ?Introduction to DEVS Modeling & Simulation with 
JAVA
TM
 : Developing Component-based Simulation Models?, August 2003. 
 
 73 
 
APPENDICES 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Appendix A 
Design Class Diagram 
 
Figure A.1 Design Class Diagram of the DEVS Network Model 
 74 
 
Appendix B 
 
Calculation of Confidence Intervals for the T test 
 
A two sided 100(1 -? )% C.I (Confidence Interval) for comparison of means 
21
?? ?  is 
given by: 
)...(...
21
,
2
21 YYestYY ???
?
?
 
where, 
21
21
11
)...(.
RR
SYYes
p
+=?  
 
21
, RR  are the number of replications and  
 
2
)1()1(
21
2
22
2
11
2
?+
?+?
=
RR
SRSR
S
p
 
 
where, is an unbiased estimator of the variance  and 
2
p
S
2
i
? 2
21
?+= RR? degrees of 
freedom. We perform the t test at 95% confidence interval. Tables B.1, B.2 and B.3 
shows the tabulated calculations for Confidence Intervals 
. 
B.1 Calculation of C.I for Reactive vs Alarm Correlation 
Technique 
 
1
S  
2
S  
es.  IC.  
Throughput 0.001675 0.002685 0.00538 (-0.001, 0.019) 
Turnaround Time 11504.17 17237.6 13.84 (-38.69, 15.55) 
Drop Rate 471.63 607.11 2.68 (-8.54, 1.96) 
 
 75 
 
B.2 Calculation of C.I for Reactive vs Anticipatory Technique 
 
1
S  
2
S  
es.  IC.  
Throughput 0.001675 0.001813 0.004822 (-0.025, -0.006) 
Turnaround Time 11504.17 8359.43 11.5 (25.11, 70.19) 
Drop Rate 471.63 379.17 2.38 (5.67, 9.59) 
 
 
B.3 Calculation of C.I for Alarm Correlation vs Anticipatory Technique 
 
1
S  
2
S  
es.  IC.  
Throughput 0.002685 0.001813 0.00547 (-0.035, -0.014) 
Turnaround Time 8359.43 17237.6 13.06 (33.62, 84.81) 
Drop Rate 379.17 607.91 2.56 (5.9, 15.93) 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 76 
 
 
Appendix C 
 
Sample Data Sets 
  
For each experiment, we perform 30 replications with certain levels of complexity and 
link delay. The data sets obtained for the performance metrics are as follows: 
 
Figure C.1 Performance Metrics for Low Link Delay and Level 1 Complexity 
 
 
 
 77 
 
 
 
Figure C.2 Performance Metrics  Moderate Link Delay and Level 1 Complexity 
 
 
 
 
 
 
 
 
 
 
 
 
 
 78 
 
 
 
Figure C.3 Performance Metrics for High Link Delay and Level 1 Complexity 
 
 
 
 
 
 
 79 
 
 
 
Figure C.4 Performance Metrics for Low Link Delay and Level 2 Complexity 
 
 
 
 
 
 
 
 
 
 
 
 80 
 
 
 
Figure C.5 Performance Metrics for Moderate Link Delay and Level 2 Complexity 
 
 
 
 
 
 
 
 
 
 
 
 
 81 
 
 
 
Figure C.6 Performance Metrics for High Link Delay and Level 2 Complexity 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 82 
 
 
Figure C.7 Performance Metrics for Low Link Delay and Level 3 Complexity 
 
 
 
 
 
 
 
 
 
 
 
 
 
 83 
 
 
 
Figure C.8 Performance Metrics o for Moderate Link Delay and Level 3 Complexity 
 
 
 
 
 
 
 
 
 
 
 
 
 84 
 
 
Figure C.9 Performance Metrics for High Link Delay and Level 3 Complexity 
 85