AGENT-BASED SIMULATION OF BEHAVIORAL ANTICIPATION IN COMPUTER NETWORKS: A COMPARATIVE STUDY OF ANTICIPATORY FAULT MANAGEMENT Except where reference is made to the work of others, the work described in this thesis is my own or was done in collaboration with my advisory committee. This thesis does not include proprietary or classified information. ________________________________________ Avdhoot Kishore Saple Certificate of Approval: ____________________________ ____________________________ Drew Hamilton Levent Yilmaz, Chair Associate Professor Assistant Professor Computer Science and Software Computer Science and Software Engineering Engineering ____________________________ ____________________________ Gerry Dozier Stephen L. McFarland Associate Professor Acting Dean Computer Science and Software Graduate School Engineering AGENT-BASED SIMULATION OF BEHAVIORAL ANTICIPATION IN COMPUTER NETWORKS: A COMPARATIVE STUDY OF ANTICIPATORY FAULT MANAGEMENT Avdhoot Kishore Saple A Thesis Submitted to the Graduate Faculty of Auburn University in Partial Fulfillment of the Requirements for the Degree of Master of Science Auburn, Alabama May 11, 2006 iii AGENT-BASED SIMULATION OF BEHAVIORAL ANTICIPATION IN COMPUTER NETWORKS: A COMPARATIVE STUDY OF ANTICIPATORY FAULT MANAGEMENT Avdhoot K. Saple Permission is granted to Auburn University to make copies of this thesis at its discretion, upon request of individuals or institutions and at their expense. The author reserves all publication rights. __________________________ Signature of Author __________________________ Date of Graduation iv THESIS ABSTRACT AGENT-BASED SIMULATION OF BEHAVIORAL ANTICIPATION IN COMPUTER NETWORKS: A COMPARATIVE STUDY OF ANTICIPATORY FAULT MANAGEMENT Avdhoot K. Saple Master of Science, May 11, 2006 (B.E., Mumbai University, 2004) 98 Typed Pages Directed by Dr. Levent Yilmaz Network fault management is concerned with the detection, isolation and correction of anomalous conditions that occur in a computer network. Present state of art in fault management classifies existing methodologies into two main categories: reactive rule based approaches and intelligent monitoring systems. We explore the concept of anticipatory behavior to develop an intelligent agent-based network management model, which uses an anticipatory agent to proactively detect occurrence of faults using a predictive Bayesian model pertaining to network performance. To analyze the effectiveness of the anticipatory technique, we compare it with alarm correlation and rule-based reactive fault management strategies. Results of the comparative analysis are v presented to demonstrate the potential of the anticipatory technique in detecting network anomalies. Our findings indicate that the anticipatory technique improves network performance significantly better than the reactive techniques. We furthermore describe a methodology for adaptive restructuring of the network based on the simulated annealing process. We observe that adaptive restructuring gives significantly better performance under the reactive rule-based fault-management technique as compared to the anticipatory strategy. vi ACKNOWLEDGEMENTS I wish to thank my advisor, Dr. Levent Yilmaz for his support and guidance along the way, above all for being so patient and understanding with me. Thanks to my committee members for reviewing my thesis. I would also like to thank my colleagues and friends for the wonderful times together. Above all, thanks to my parents for being always there to listen. They have been my constant source of inspiration. vii Style manual or journal used APA Style Computer software used Microsoft Word. Images drawn using Microsoft PowerPoint, UML Diagrams drawn using ArgoUML. viii TABLE OF CONTENTS List Of Figures........................................................................................................... List Of Tables??????????????????????????? 1 Introduction??????????????????????????.... 1.1 The Need for Fault Management in Computer Networks............................... 1.2 Anticipatory fault Management....................................................................... 1.3 Research Objective.......................................................................................... 2 Overview of Fault Management in Computer Networks....................................... 2.1 Rule Based Approaches................................................................................... 2.1.1 Cased-based Reasoning............................................................................ 2.2 Alarm Correlation............................................................................................ 2.2.1 Alarm Correlation using Finite State Machines......................................... 2.3 Pattern Matching............................................................................................... 2.4 Statistical Analysis............................................................................................ 2.5 Intelligent probing for fault management......................................................... 2.6 Proactive Fault Detection using Intelligent Agents.......................................... 3 Anticipatory Systems.............................................................................................. 3.1 Anticipatory Agents.......................................................................................... 3.2 Preventive State Anticipation........................................................................... 4 Agent-based Modeling of Reactive and Anticipatory Control in Computer Networks................................................................................................ 4.1 The DEVS Network Model............................................................................... 4.1.1 The DEVS Formalism................................................................................. 4.2 Reactive Agents................................................................................................ 4.3 Anticipatory Agents ? A Bayesian Approach................................................... 4.4 Adaptive restructuring of the DEVS network model........................................ 5 Detailed Design of the DEVS Network Model....................................................... 5.1 The DEVS Basic and Coupled Model.............................................................. 5.2 The Distributed Network Monitoring System.................................................. 5.3 Component Overview....................................................................................... 5.4 Monitoring Agents............................................................................................ 5.5 Reactive/Anticipatory Agents........................................................................... 5.6 Control Agent.................................................................................................... 5.7 Annealer............................................................................................................ 6 Experiment Design and Simulation Results......................................................... 6.1 Experiment Design......................................................................................... 6.2 Simulation Results............................................................................................ 6.2.1 Sensitivity Analyses.................................................................................... 6.2.2 Results with adaptive control using annealer.............................................. x xiii 1 1 2 3 5 6 7 7 8 9 10 11 12 14 15 16 19 21 22 24 26 29 34 34 36 37 42 44 47 48 52 52 53 56 61 ix 7 Conclusions............................................................................................................. 7.1 The Limitations of Anticipatory fault management......................................... 7.2 Future Work..................................................................................................... References.................................................................................................................. Appendices????????????????????????????. Appendix A................................................................................................................ Appendix B................................................................................................................ Appendix C................................................................................................................ 64 64 65 67 73 74 75 77 x LIST OF FIGURES 3.1 Basic Components for Anticipatory Agents???............................................ 3.2 The Basic Architecture of an Anticipatory Agent in Our Model?..................... 4.1 Reactive and Anticipatory Control?????.................................................. 4.2 Conceptual Framework of the DEVS Formalism??........................................ 4.3 Control Flow Diagram Depicting the Operation of the Annealer?.................... 5.1 Designed Network Model?????????................................................ 5.2 Experimental Frame???????????????................................. 5.3 Activity Diagram of the Experimental Frame????..................................... 5.4 Sequence Diagram for Interactions of Various Components within the Experimental Frame???????????????........................... 5.5 Activity Diagram of Network Component due to Fault(s)???...................... 5.6 Activity Diagram of a Monitoring Agent ???????............................... 5.7 Sequence Diagram showing Interaction between Monitoring Agents?............ 5.8 Activity Diagram of a Reactive Agent????????................................ 5.9 Activity Diagram of an Anticipatory Agent ?????................................... 5.10 Sequence Diagram for Interaction between Monitoring and Management Agents. ........................................................................................ 5.11 Activity Diagram of a Control Agent??????........................................ 5.12 Sequence Diagram for Management and Control Agents????................. 16 17 19 23 33 37 38 39 40 41 43 44 45 46 47 48 49 xi 5.13 Activity diagram of the annealer????????....................................... 6.1 Simulated Model in DEVS Environment???................................................ 6.2 Response Surfaces???????????................................................... 6.3 Sensitivity Analyses for Variation of Complexity for Reactive Technique?........................................................................................ 6.4 Sensitivity Analyses for Variation of Complexity for Alarm Correlation Technique............................................................................ 6.5 Sensitivity Analyses for Variation of Complexity for Anticipatory Technique??.............................................................................. 6.6 Performance Metrics with and without Annealer for Reactive Technique????............................................................................. 6.7 Performance Metrics - Alarm Correlation and Anticipatory Technique?......... A.1 Design Class Diagram of the DEVS Network Model?????.................... B.1 Calculation of C.I for Reactive vs Alarm Correlation Technique??.............. B.2 Calculation of C.I for Reactive vs Anticipatory Technique???.................... B.3 Calculation of C.I for Alarm Correlation vs Anticipatory Technique??........ 50 53 55 57 58 59 61 62 74 75 76 76 xii C.1 Performance Metrics for Low Link Delay and Level 1 Complexity?.............. C.2 Performance Metrics for Moderate Link Delay and Level 1 Complexity.......... C.3 Performance Metrics for High Link Delay and Level 1 Complexity?............. C.4 Performance Metrics for Low Link Delay and Level 2 Complexity?.............. C.5 Performance Metrics for Moderate Link Delay and Level 2 Complexity?...... C.6 Performance Metrics for High Link Delay and Level 2 Complexity................. C.7 Performance Metrics for Low Link Delay and Level 3 Complexity?.............. C.8 Performance Metrics for Moderate Link Delay and Level 3 Complexity?...... C.9 Performance Metrics for High Link Delay and Level 3 Complexity?.............. 77 78 79 80 81 82 83 84 85 xiii LIST OF TABLES Table1.1 An Overview of Fault Management Techniques?.................................... Table 4.1 Sample Set of Evidence to be Processed by the Na?ve Bayesian Classifier.................................................................................................... Table 6.1 Confidence Intervals for Performance Metrics???.............................. 13 27 54 1 Chapter 1 Introduction Network fault management (Oates 1995) entails the detection, isolation and correction of anomalous conditions that occur in a network. It can be decomposed into three subtasks: fault identification (Schwarz and Katzela 1995), fault diagnosis (Agre 1986), and fault remediation. Fault identification involves detecting deviation from normal behavior followed by identification of its nature, whereas fault diagnosis involves determining the root cause of the identified problem. Fault remediation is the formulation of a course of action that addresses the problem. All three stages of fault management involve reasoning and decision making based on information about current and past states of the network. 1.1 The Need for Fault Management in Computer Networks As business and individuals have become increasingly reliant on computer networks, the complexity of those networks has grown along a number of dimensions. The phenomenal growth of the Internet in recent years provides a clear example of the extent to which the use of computer networks is becoming ubiquitous (Oates 1995). As computer networks increase in size, heterogeneity, complexity and pervasiveness, effective management of such networks simultaneously becomes more important and more difficult. 2 The use of computer networks for business is expanding enormously. The average number of electronic point-of-sale transactions in the United States went from 38 per day in 1985 to 1.2 million per day in 1993. An average $800 billion is transferred among partners in international currency markets every day; about $1 trillion is transferred daily among US banks; and an average $2 trillion worth of securities are traded daily in New York markets. Nearly all of the financial transactions pass over information networks. Consequently, the losses incurred due to faults in these networks are enormously high. Related dollar losses are estimated to be between $100,000 and $10 million (Herdman 1994). Hence it is important to have an effective and efficient network fault management technique to restore the proper functioning of computer and information networks. 1.2 Anticipatory Fault Management Traditionally, network management activities, such as fault management, have been performed with direct human involvement. However, these activities are becoming more demanding and data intensive, due to the heterogeneous nature and increasing size of networks today. For these reasons, it is becoming necessary to automate network management activities. Artificial intelligence technologies can play an important role in the problem solving and reasoning techniques that are employed in fault management. Anticipatory fault management involves a novel approach of designing autonomous agents that is based on the idea of anticipatory systems (Ekdhal et al. 1994). An anticipatory system has a model of itself and of the relevant part of its environment 3 and will use the model to predict the future. The predictions are then utilized to determine the agent?s behavior. An anticipatory system is thus a system which uses the knowledge of future states to decide what action has to be taken in the present. Anticipatory fault management can be carried out by having intelligent processing anticipatory agent reside on the network under observation. It makes use of adaptive learning methods (Butz et al. 2003) to detect abnormal behavior before a fault actually occurs. The agent first acquires a picture of the network?s health by means of observation processing to process performance variables and obtain probability of each measured variable at a given time. It then combines all the information to build up a predictive model, which provides a method for estimating probabilities and allows the agent to combine observed information with prior knowledge. (Hood and Ji 1998) The agent therefore gets a complete picture of the network?s health to carry out adaptive behavior for fault identification and diagnosis. 1.3 Research Objective Given the importance of network fault management, the goal of this research comprises of coming up with an anticipatory technique for network fault management, followed by comparison of the same with two widely popular techniques: reactive rule based strategy and alarm correlation approach. We compare the three techniques based on the following network performance metrics: throughput, turnaround time, and the drop rate of packets. To facilitate the experiment, a simulation model of a computer network is developed using the DEVS (Zeigler and Sarjaughian 2003) modeling and simulation framework. Reactive and anticipatory agents are embedded into the network for network 4 fault management. The reactive agent operates on a simple rule based engine that detects faults based on predefined fuzzy rule-base. We use a Na?ve Bayesian classifier (Luger and Stubblefield 1989) as part of the anticipatory agent. The Bayesian classifier acts as a predictive model for the anticipatory agent to facilitate prediction of faults based on past data. Our findings indicate that anticipatory fault management performs significantly better than the reactive and alarm correlation techniques under the experimental conditions the model is tested. Furthermore, we make a study of which technique performs better with network reconfiguration. Network reconfiguration comprises of restructuring the network model as a simulated annealing process (Carley and Svoboda 1996). The restructuring is based on varying network operational parameters such as the operation of switch, the routing strategy of the router, and the link delay of nodes. The results of network adaptation shows that simulated annealing can be applied to the reactive technique since it gives a better performance as compared to the alarm correlation and the anticipatory technique. The thesis is organized as follows. Chapter 2 reviews the present state of art in fault management of computer networks. Chapter 3 comprises of the design of anticipatory systems. In chapter 4, we discuss the agent based modeling of reactive and anticipatory control in computer networks. Chapter 5 discusses the detailed design of the DEVS model. Chapter 6 comprises of experimentation design, simulation, and results. Finally, in chapter 7 we conclude by discussing the open issues as well as planned future work. Chapter 2 Overview of Fault Management in Computer Networks Researches have approached the problem of fault management using various techniques such as artificial intelligence (Corn et al. 1988; Joseph et al. 1989; Wright et al. 1988; Yamahira et al. 1989), machine learning (Langley and Simon 1995) and state space modeling (Rouvellou and Hart 1995). A fault is simply a malfunction in some component of the network, either hardware of software. At an abstract level, fault identification can be thought of as a function, I, with inputs and outputs. The input to the function is a description of network state, S, and the output is a set of hypothesis, H, concerning the existence of n different faults. Each hypothesis may specify the indications in S of the corresponding fault and may contain some amount of diagnosis information. That is identification and diagnoses are rarely totally decoupled. Fault identification, therefore is a process of function that maps from network states to fault hypothesis: I: S H ? Different approaches to fault identification define S in a distinct manner (Tim Oates 1995). 5 6 2.1 Rule Based Approaches Early work in the area of fault or anomaly detection was based on expert systems. In expert systems, an exhaustive database containing the rules of behavior of the faulty system is used to determine if a fault occurred by matching of predefined rules of network anomalies (Lewis 1993). Rule based systems are too slow for real time applications and are dependent on prior knowledge about the fault conditions of the network. The identification of faults in this approach depends on symptoms that are specific to a particular manifestation of a fault. Examples of these symptoms are excessive utilization of bandwidth, number of open TCP connections, total throughput exceeded etc. (Thottan and Ji 2003). An expert system model using fuzzy cognitive maps (FCMs) (Ndouusse and Okuda 1996) can be used to obtain an intelligent modeling of the propagation and interaction of network faults. Fuzzy expert systems (Kandel 1991) are especially attractive in a dynamic environment because they favor silicon implementation, learning and they avoid the lengthy symbolic graph search in favor of computational inference. Traditional expert systems with symbolic knowledge representation implemented with ?IF/ THEN? conditional statements require complicated and lengthy matching schemes, to slow for real-time systems such as networks. Furthermore, traditional expert systems lack support for on-line mathematical analysis, an essential feature common in engineering systems. Fuzzy Expert Systems (FES) provide an alternative to symbolic intelligence. In FES, vague causal reasoning is represented numerically, and hence is amenable to computational processing. In particular, FES which uses a graph-based knowledge representation can easily be converted into causal matrices, thus offering an appealing computational feedback memory recall capability. 7 2.1.1 Cased-Based Reasoning Case based reasoning is an extension of rule-based systems (Lewis 1993). It differs from FCM in that, in addition to just rules, a picture of previous of fault scenarios is used to make the decisions. It differs from FCM in that, in addition to just rules, a picture of previous fault scenarios is used to make decisions. A picture here refers to the circumstances or events that led to the fault. In order to adapt the case-based reasoning scheme to the changing network environment, adaptive techniques are used to obtain the functional dependence of relevant criteria such as network load, collision rate, etc., to previous trouble tickets (Lewis and Dreo 1993). The trouble ticketing system is used to perform two functions: Prepare for problem diagnostics through filtering, and infer the root cause of the problem. Using case-based reasoning for describing fault scenarios also suffers from heavy dependence on past information. Furthermore, the identification of relevant criteria for the different faults will, in turn, require a set of rules to be developed. In addition, using any function approximation, such as back propagation, causes an increase in computation time and complexity. The number of functions to be learned also increases with the number of faults studied. 2.2 Alarm Correlation A fault is a disorder occurring in the managed network. Faults happen within the managed networks while alarms are external manifestations of faults (Rouvellou and Hart 1995). Alarms are defined by vendors and generated by network equipment are observable by network operators. Similar alarm messages with different time stamps are interpreted as separate alarms. Modern telecommunication networks may produce thousands of alarms per day, making the task of real-time network surveillance and fault 8 management difficult. Due to the large volumes of alarms, network operators frequently overlook or misinterpret them. To reduce the number of alarms displayed on operators? terminals, current network management systems apply alarm filtering procedures or, in the case of bursts of alarms, send them directly to a printer or database (Jakobson and Weissman 1993). Furthermore, a single fault in a large communication network may result in a large number of fault alarms making the isolation of the primary source of failure a difficult task (Katzela 1995). 2.2.1 Alarm Correlation Using Finite State Machines External observations of alarms may instill an impression that one alarm causes another. However the causality is not between alarms, but rather between faults. Finite state machines model alarm sequences that occur during and prior to fault events. For instance, a probabilistic finite state machine model is built for a known network fault using historical data (Lazar et al. 1992). State machines are designed with the intention of not just detecting an anomaly but also possibly identifying and diagnosing the problem. The sequences of alarms obtained from the different points in the network are modeled as states of a finite state machine. The alarms are assumed to contain information such as the device name as well as the symptom and the time of occurrence. The transitions between the states are measured using prior events (Katzela and Schwarz 1995; Rouvelle and Hart 1995; Bouloutas et. al. 1990). A given cluster of alarms may have a number of explanations and the objective is to find the best explanation among them. The best explanation is obtained by identifying a near optimal set of nodes with minimum cardinality such that all the entities in the set explain all the alarms and at least one of the nodes in the set is the most likely one to be in fault (Lazer et al. 1992; Jackobson and 9 Weissman 1993). From an observer?s point of view, fault detection and identification requires checking whether a network device behaves as the FSM specified and if not, how it deviates from the expected behavior (Lazer et al. 1992). Alarm correlation may be used for network fault isolation and diagnosis, selective corrective actions, proactive maintenance and trend analysis (Jackobson and Weissman 1993). 2.3 Pattern Matching This approach describes anomalies as deviations from normal behavior and attempts to deal with the variability in the network environment (Feather and Maxion 1993; Papavassiliou et al. 2000). In this approach online learning is used to build traffic profile for a given network. Traffic profiles are built using symptom specific feature vectors such as link utilization, packet loss and number of collisions. These profiles are then categorized by time of day, day of week and special days, such as weekends and holidays. When newly acquired data fails to fit within some confidence interval of the developed profiles then an anomaly is declared. One method includes capturing of normal behavior of time series as templates and setting of tolerance limits based on different levels of standard deviation. These limits are tested using extensive data analysis (Feather and Maxion 1993). The authors also propose a pattern matching scheme to detect address usage anomalies by tracking each address at 5-min intervals. A template of the mean and standard deviation on the usage of each address is then used to detect anomalous behavior. The anomaly vectors from any new data are checked using template feature vector for a given anomaly and id a match occurs it is declared indicating a fault. For simple, unvaried data, a mechanism called the Performance and Anomaly Monitoring System, or PAMS is used (Feather 1992; Maxion 1989; Maxion 1990; Maxion and 10 Feather 1990). PAMS will highlight anomalous points in time series data by developing a prediction of normal behavior, called a template, and tolerance limits called envelopes, based on a model of data variance. Current data that falls outside of the tolerance envelopes is considered anomalous. The efficiency of the pattern matching approach depends on the accuracy of the traffic profile generated. Given a new network, it may be necessary to spend a considerable amount of time building traffic profiles. In the face of evolving network topologies and traffic conditions, this method may not scale gracefully (Thottan and Ji 1993). 2.4 Statistical Analysis As the network evolves, each of the methods described above require significant recalibration or retraining. However using statistical approached (Thottan and Ji 2003), it is possible to continuously track the behavior of the network. Statistical analysis has been used to detect both anomalies corresponding to network failures (Thottan 2000) as well as network intrusions (Wang et al. 2002). Interestingly, both of these cases make use of standard sequential change point detection approach. The Flooding Detection System, (Wang et al. 2002), uses measured network data that describes TCP operations to detect SYN flooding attacks. SYN flooding attacks capitalize on the limitation that TCP servers maintain all half open connections. Once the queue limit is reached, future TCP connection request are denied. The sequential change point detection employed here makes use of the nonparametric cumulative sum (CUSUM) method. Using this approach on trace-driven simulations, it has been shown that SYN flooding attacks can be detected with high accuracy and reasonably short detection times. When detecting anomalies due to failures, we are confronted with the problem of detecting a host of potential scenarios. 11 Each of these failure scenarios differ in their manifestations as well as their characteristics. Thus, it is necessary to obtain a rich set of network information that could cover a wide variety of network operations. The primary source for such in depth information is in the SNMP MIB data. Designing a failure detection system using MIB data necessitates the use of a general method since MIB variables exhibit varying statistical characteristics (Thottan 2000). 2.5 Intelligent Probing for Fault Management Intelligent probing makes use of probing technology (CAIDA 2005) for cost effective fault diagnosis in computer networks. Probes are test transactions that can be actively selected and sent through the network. A distributed system can be represented as a ?dependency graph? where nodes can be either hardware elements (e.g., workstations, servers, routers) or software components or services, and links can represent both physical and logical connections between the elements. Probes offer the opportunity to develop an approach to diagnosis that is more active than traditional ?passive? event correlation and similar techniques. A probe is a command or transaction sent from a particular machine called a probing station to a server or a network element in order to test a particular service. This work addresses the probing problem using methods from artificial intelligence. We call the resulting approach intelligent probing. The probes are selected by reasoning about the interactions between the probe paths. For diagnosis we use a local inference approximation scheme, for instance a Bayesian network (Huard and Lazar 1996) or other probabilistic dependency models (Katzela and Schwartz 1995) that avoids the intractability of exact inference for large networks (Brodie et al. 2002). 12 2.6 Proactive Fault Detection using Intelligent Agents Current fault management implementations generally rely on the expertise of a human network manager, which is translated to a set of rules and then to threshold levels on the measurement variables being collected. As networks become more complex and as changes more frequently, the human network manager will find hard to maintain sufficient level of expertise on a particular network?s behavior (Hood and Ji 1998). Fault management research has covered approaches such as expert systems, finite state machines, advanced database techniques, and probabilistic methods (Lazer et al. 1992). The drawback to all these approaches is that they require a specification of the faults to be detected, and it is not feasible to specify all possible faults. Also, changes in network configuration, applications, and traffic can alter the type and nature of possible faults, which makes modeling them impractical in many cases. Intelligent agents that reside at network nodes use adaptive learning methods to detect abnormal behavior before a fault actually occurs (Ekaette and Far 2003). In this approach, the intelligent agent processes information collected by Simple Network Management Protocol agents, and uses it to detect the network anomalies that typically precede a fault (Yemini 1994). The SNMP agents collect information about the network node through their management information base, or MIB, which holds a set of variables pertinent to that particular node. The intelligent agents learn the normal behavior of each measurement variable and combine the information in the probabilistic framework of a Bayesian network (Huard and Lazar 1996; Pearl 1998). This yields a picture of the network health form the perspective of the network node, which can be used to trigger local corrective action or a message to a centralized network manager (Hood and Ji. 1998). Table1.1 Fault Management Techniques Proposed System Methodology Complexi ty Scalab le Detect new fault patterns Rule based approach Anomaly detection by conventional rule based systems. Low No No Alarm Correlation Incorporation of finite state machines to model alarm sequences that occur during and prior to fault events. Moderate No No Pattern matching An anomaly is considered as variability in network environment. Moderate No Yes, but introduces overheads Statistical analysis Employment of statistical approaches to continuously track the behavior of the network. Moderate Yes Yes Intelligent probing Use of probing technology for fault diagnosis. High Yes Yes Proactive fault detection using agents Deployment of software agents that detect, correlate and selectively seek to derive a clear explanation of faults. High Yes Yes 13 14 Chapter 3 Anticipatory Systems The idea that anticipations influence and guide behavior has been increasingly appreciated over the last decades. Anticipations appear to play a major role in the coordination and realization of adaptive behavior. Various disciplines have explicitly recognized anticipations. For example, philosophy had been addressing the sense of reasoning, generalization, and association for a long time. More recently, experimental psychology confirmed that the existence of anticipatory behavior processes in animals and humans over the last decades (Butz et al. 2003). Anticipation is an important characteristic of intelligence. Proactive behavior requires anticipatory abilities. A seminal work on anticipatory systems is the one written by Rosen (1985). A brief introduction to and serious concern about anticipation follows: ?Strictly speaking, an anticipatory system is one in which present change of state depends upon future circumstances, rather than merely on the present or past. As such, anticipation has routinely been excluded from any kind of systematic study, on the grounds that it violates the causal foundation on which all of theoretical science must rest, and on the grounds that it introduces a telic element which is scientifically unacceptable. Nevertheless, biology is replete with situations in which organisms can generate and maintain internal predictive models of themselves and their environments, and utilize the predictions of these models about the future for purpose of control in the present. Many of the unique properties of organisms can really be understood only if 15 these internal models are taken into account. Thus, the concept of a system with an internal predictive model seemed to offer a way to study anticipatory systems in a scientifically rigorous way? (Rosen 1985). 3.1 Anticipatory Agents Perception ability is a required characteristic of agents. Hence, they can be designed to perceive current state of self and others. They can also be designed to create current image(s) of future state(s). Perception requires mechanisms that enable interpretive capabilities. Perception invariably involves sensory qualities, and introspection entails accessing sensations and perceptions that agent would introspect. Perceptions are derived as a result of interpretation of sensory inputs within the context of the current world and agent?s self model. The prototype inference, orientation accounting, and situational classification mechanisms could be used to realize the interpretation capabilities of an agent. The interpretation process results in perceptions. An anticipatory agent needs to deliberate upon perceptions through introspection and reflection to anticipate. Introspection is deliberate and attentive because higher-order intentional states are themselves attentive and deliberate. An introspective agent should have access mechanisms to its internal representation, operations, behavioral potentials, and beliefs about its context. Reflection used the introspective mechanisms to deliberate its situation in relation to the embedding environmental context. These features collectively result in anticipation capabilities that orient and situate an agent for accurate future projections. Figure 3.1 presents interpretation and introspection as critical components within the micro-architecture of an anticipatory agent. Figure 3.1 Basic Components for Anticipatory Agents 3.2 Preventive State Anticipation A special kind of anticipation is when an anticipated undesired situation makes an agent adapt its behavior in order to prevent that this situation will occur. For example, assume that we are going out for a walk and that the sky is full of dark clouds. Using our internal weather model and our knowledge about the current weather situation, we anticipate that it will probably begin to rain during the walk. This makes us foresee that our clothes will get wet which, in turn, might cause us to catch a cold, something we consider a highly undesirable state. So, in order to avoid catching a cold we will adapt our behavior and bring an umbrella when going for the walk. In the suggested framework, an anticipatory agent consists mainly of three entities: an object system (S), a world model (M) and a meta-level component (Anticipator). The object system is an ordinary (i.e., non-anticipatory) dynamic system. M is a description of the environment including S, but excluding the Anticipator. The importance of having an internal model that includes both the agents as part of the 16 environment and (a large portion of) its abilities has been stressed by, for instance, (Zeigler 1990). The anticipator makes predictions using M and uses these predictions to change the dynamic properties of S. Although the different parts of an anticipatory agent certainly are causal systems, the agent taken as a whole, nevertheless behaves in an anticipatory fashion. When implementing an anticipatory agent, the component S corresponds to some kind of reactive system similar to the ones mentioned above. This component is referred as the Reactor. The Anticipator corresponds to a more deliberative meta-level component that is able to ?run? the world model faster then real time. When doing this, it reasons about the current situation compared to the predicted situations and its goals, and decides whether (and how) to change the Reactor. The resulting architecture is illustrated in Figure 3.2. 17 Figure 3.2. The Basic Architecture of an Anticipatory Agent in Our Model. S E N S O R S E F World Antici ? F Model pator E C T O R S Reactor Anticipatory layer 18 We can summarize the operation of the architecture as follows: The sensors receive input from the environment. This data is then used in two different ways: (1) to update the World Model and (2) to serve as stimuli for the Reactor. The Reactor reacts to these stimuli and provides a response that is forwarded to the effectors, which then carry out the desired actions(s) in the environment. Moreover, the Anticipator uses the World Model to make predictions, and on the basis of these predictions the Anticipator decides if, and what, changes of the dynamical properties of the Reactor are necessary. Every time the Reactor is modified, the Anticipator should, of course, also update the part of the World Model describing the agent accordingly. Thus, the working on an anticipatory agent can be viewed as two concurrent processes, one reactive at the object-level and one more deliberative at the meta-level (Davidsson 2003). Chapter 4 Agent-based Modeling of Reactive and Anticipatory Control in Computer Networks The overall architecture of the simulation is primarily composed of the following components as shown in Figure 4.1 Figure 4.1 Reactive and Anticipatory Control ? Network Model: The first component is a basic model of a typical computer network. The network model is the basis of design and experimentation of the 19 20 fault management techniques. The network model is designed on a simulation framework and comprises of basic network components that would include switches, routers, hosts and links. ? Monitoring Layer: The monitoring layer consists of multiple monitoring agents that are embedded over individual network components or on a group of components. (Eg: a monitoring agent is allocated for each subnet). The monitoring agents may have disjoint functions or potentially overlapping responsibilities for increased reliability. ? Management Layer: The management layer comprises of the reactive or the anticipatory agents according to the technique being used. The reactive agent works on a rule based approach. It interprets the data acquired from the monitoring agents and communicates with the control layer to take corrective action (Thottan and Ji, 2003). Similarly, the anticipatory agent works on the principle of a Na?ve Bayesian classifier (Luger and Stubblefield, 1989) and interacts with the control layer to take corrective action. ? Control Layer: The control layer is responsible for carrying our corrective action with respect to the information if gets from the management layer. The corrective action by the control layer is carried out by triggering local corrective action or a message to the individual components of the network model (Hood and Ji 1998). 21 4.1 The DEVS Network Model The network model is developed in the DEVS (Discrete Event System Specification) formalism. A brief description of the simulated network components is as follows: ? Generator: This component is responsible for generation of network packets (payloads to be processed by the hosts). ? Transducer: A Transducer is responsible for calculation of the various network performance metrics. ? Links: Simulation of links is carried out on crucial connections in the network. A link is looked upon as a processor and its overloading is simulated as the increase in processing time of the processor. ? Switch: A switch forms a connection between different subnets to facilitate forwarding of packets among them. ? Router: A router follows routing algorithms such as distance vector routing, link state routing, hierarchical routing, broadcast routing to facilitate forwarding of packets among hosts. We use the Distance Vector Routing strategy, by which the packets are forwarded to the best known distance to each destination (the distance is measured in terms of processing time of hosts). ? Hosts: Hosts are entities that process jobs or payload. They can be network clients, servers, printers, plotters etc. 22 ? Monitoring agents: The monitoring agents record performance metrics such as network throughput, latency and packet drop rate. It reports these data to the management layer, where the reactive agents infer using their rules, while the anticipatory agent updates its predictive model. ? Management agents: The management agents are the reactive and the anticipatory agents. They receive data from the monitoring agents and induce the control agent to take respective action. 4.1.1 The DEVS Formalism The Discrete Event System Specification (DEVS) formalism (Zeigler and Sarjoughian, 2003) provides a means of specifying a mathematical object called a system. Basically, a system has a time base, inputs, states, and outputs, and functions for determining next states and outputs given current states and inputs. Discrete event systems represent certain constellations of such parameters just as continuous systems do. For example, the inputs in discrete event systems occur at arbitrarily spaced moments, while those in continuous systems are piecewise continuous functions of time. The insight provided by the DEVS formalism is the simple way that it characterizes how discrete event simulation languages specify discrete event systems parameters. Having this abstraction, it is possible to design new simulation languages with sound semantics that is easier to understand. The conceptual framework underlying the DEVS formalism provides is shown in Figure 4.2. The conceptual framework constitutes the following elements: ? Model: It is a set of instructions for generating data comparable to that observable in the real system. The structure of the model is its set of instructions. The behavior of the model is the set of all possible data that can be generated by faithfully executing the model instructions. Figure 4.2 Conceptual Framework of the DEVS Formalism (adopted from ?Introduction to DEVS Modeling & Simulation with JAVA TM ?, Zeigler and Sarjoughian 2003) ? Simulator: It exercises the model?s instructions to actually generate its behavior. 23 ? Experimental Frame: It captures how the modeler?s objectives impact on model construction, experimentation and validation. The DEVS experimental frames are formulated as model objects in the same manner as the models of primary interest. In this way, model/experimental frame pairs form coupled model objects with the same properties as other objects of this kind. It will become evident later, that this uniform treatment yields key benefits in terms of modularity and system entity structure representation. 24 The basic objects are related by two relations: ? Modeling relation linking real system and model defines how well the model represents the system or entity being modeled. In general terms a model can be considered valid if the data generated by the model agrees with the data produced by the real system in an experimental frame of interest. ? Simulation relation, linking model and simulator, represents how faithfully the simulator is able to carry out the instructions of the model. The basic items of data produced by a system or model are time segments. These time segments are mappings from intervals defined over a specified time base values in the ranges of one or more variables. The variables can either be observed or measured. The structure of a model may be expressed in a mathematical language called formalism. The discrete event formalism focuses on the changes of variable values and generates time segments that are piecewise constant. Thus, an event is a change in a variable value, which occurs instantaneously. In essence, the formalism defines how to generate new values for variables and the times the new values should take effect. An important aspect of the formalism is that time interval between event occurrences are variable (in contrast to discrete time where the time step is a fixed number). 4.2 Reactive Agents Fuzzy reactive agents are used in the determination of the proneness of failure. Reactive agents work in a hard-wired stimulus-response manner. Each and every situation must be considered in advance. The reactive agent follows a fuzzy rule based approach to infer the occurrence of a fault. A system becomes fuzzy system when its 25 operations are entirely or partially governed by fuzzy logic or are based on fuzzy sets. A crisp set is a collection of distinct (precisely defined) elements. In classical set theory, a crisp set can be a superset containing other crisp sets. A superset will represent the universe of discourse if it defines he boundaries in which all elements reside. In any given situation, a new element can be tested to see whether it belongs to any set. On the other hand a fuzzy set is a collection of distinct elements with a varying degree of relevance or inclusion (Berkan and Trubatch 1997). The Reactive Agent gets the network node information through various performance metrics that are being collected by the monitoring agents embedded in the network and uses predefined rules to infer failures based on their degradation. Fuzzy rules consist of antecedents and consequents. The antecedent variables (one or more variables that represent the conditions to be met before any conclusion can be made) comprise of the network throughput and latency. The consequents (set of outputs) comprise of proneness of failure for each of the network component. A sample set of fuzzy rules that are comprised in the reactive agent for a network component (for instance, a host) can be outlined as follows: ? If Throughput is High and Latency is Low then Fault_proneness is Low ? If Throughput is Moderate and Latency is Low then Fault_proneness is Low ? If Throughput is Low and Latency is Low then Fault_proneness is moderate ? If Throughput is High and Latency is Moderate then Fault_proneness is Low ? If Throughput is Moderate and Latency is Moderate then Fault_proneness is Moderate ? If Throughput is Low and Latency is Moderate then Fault_proneness is Moderate 4.3 Anticipatory Agents ? A Bayesian Approach The architectural framework of the Anticipatory agent is described in the previous chapter. It primarily comprises of a predictive model and an anticipator. We make use of a Na?ve Bayesian classifier for constructing the predictive model of the anticipatory agent. The strength of the Na?ve Bayesian Classifier is that it provides a theoretical framework for combining statistical data with the prior knowledge about the problem domain for making future projections. Before getting to the Na?ve Bayesian, we make an overview of basic probability theory. is known as a conditional probability of event A happening given event B has occurred. We can express the conditional probability, as follows: )/( BAp )/( BAp )(/)()|( BpBApBAp ?= or )/( BAp = (# of times A & B occur) / (# of times B occur). The following example shows how a fault is detected by the anticipatory agent by making use of the Na?ve Bayesian classifier. Consider the sample of evidence specified in Table 4.1. 26 Table 4.1 Sample Set of Evidence to be Processed by the Na?ve Bayesian Classifier The Network takes value High if there is abnormality above a certain threshold in a single or both the subnets and is Normal otherwise. The subnet 1 and subnet 2 take value High if any of the component in the respective subnets have failed and is Normal otherwise. The probability that there can be a fault in host 1 provided we have evidence that subnet 1 is high is given by High) 1Subnet | yes 1p(Host === == 1Subnet & yes 1Host timesof (# == High) 1Subnet timesof (# / High) 6/8 Similarly, the probability that there can be a fault in host 1 provided we have evidence that subnet 1 is high and network is high is given by ===== 1Host timesof (# High) Network High, 1Subnet |yes 1p(Host == Network &High 1Subnet & yes &High 1Subnet timesof (# / High) = 4/5High) Network == But the number of conditional probabilities in a data set can be very high. Here comes the role of Bayes rule. This is derived as follows: 27 Given , we know that )(/)()|( BpBApBAp ?= ),(/)()|( ApABpABp ?= and ).()/()( ApABpABp =? Now, since ),()( BApABp ?=? )( )()|( )|( Bp ApABp BAp = , This is known as Bayesian rule. Consider the following problem with application of Baye?s rule: Given that (Subnet1 = High, Network = High), is there a fault in host 1? We can express this as: ),1|1( HighNetworkHighSubnetyesHostp === ),1( )1()1|,1( NormalNetworkHighSubnetp YesHostpYesHostHighNetworkHighSubnetp == ==== = A general equation for this is: ? = )()|...( )()|...( )...|( 21 21 21 kknk iin ni CpCAAAp CpCAAAp AAACp However, the conditional probability may be difficult to compute. If conditional independence among the attributes of the query is assumed, we have the following: ),|...( 21 in CAAAp ? = )()|()...|()|( )()|()...|()|( )...|( 21 21 21 kknkkk iinii ni CpCApCApCAp CpCApCApCAp AAACp The result of Na?ve Bayesian Classification is as follows: )],|()([maxarg Result kikk CApCpC ?= where )C of (# / )C A of (# )C|p(A kkiki ?= 28 It can be illustrated by the following example. Suppose we are given that (subnet 1 = High, and Network = High) and we need to know if there is a fault on host 1? From the above Na?ve Bayesian Classifier equation: Result (host1 =yes) = p (Host 1 = Yes) * p (Subnet 1 = High ? Host 1 = Yes) * p (Network = High ? Host 1 = Yes) = (6/14)*(1)*(3/6) = 0.21428 Result (host1 =no) = p (Host 1 = No) * p (Subnet 1 = High ? Host 1 = No) * p (Network = High ? Host 1 = No) = (8/14)*(3/8)*(3/8) = 0.08035 Hence we see that Result (host1 =yes) > Result (host1 =no) , and hence the predictive model predicts the potential of fault in host 1. The Anticipator thereby notifies the control layer to take respective corrective action for host 1. Note that, the set of evidence to the Bayesian classifier is continuously updated according to the events taking place in the network. After a fixed interval of time (say 5 time units), the classifier computes the result (Result = )])|()([maxarg kikk CApCpC ? based on the state of the components at that time instant. 4.4 Adaptive Restructuring of the DEVS Network Model Adaptive restructuring can be described as modifying the operating regimes of the DEVS network model in an effort to improve its performance based on the network conditions at a particular instant of time. We intend to find which fault management technique performs better under adaptation. In this section we describe the methodology, 29 30 by which operating regimes of the DEVS network model are modified based on certain parameters to preserve the proper functioning of the network. In case of the DEVS network model, we need to decide the set of operating regimes that we intend to modify in the DEVS environment with the intention that the normal operation of the network model is preserved. Based on the above requirement we come up with the following modes of operation of certain components of the network model. We then club three of those modes to form a particular operating regime. Following are the modes of operation Sw_Mode_0: Original operation of the switch. Sw_Mode_1: Operation of switch with random forwarding of packets to each of the subnet Ro_Mode_0: Original configuration of the router Ro_Mode_1: Modification of packet forwarding strategy with forwarding packets to the host with the highest instantaneous value of throughput. Li_Mode_0: The original value of link delay as defined. Li_Mode_1: Increase value of link delay by 50% Li_Mode_2: Decrease value of link delay by 50% Following are the operating regimes of the DEVS environment 1) Sw_Mode_0 AND Ro_Mode_0 AND Li_Mode_1 2) Sw_Mode_0 AND Ro_Mode_0 AND Li_Mode_2 3) Sw_Mode_1 AND Ro_Mode_0 AND Li_Mode_0 4) Sw_Mode_0 AND Ro_Mode_1 AND Li_Mode_1 5) Sw_Mode_1 AND Ro_Mode_1 AND Li_Mode_0 6) Sw_Mode_1 AND Ro_Mode_1 AND Li_Mode_1 7) Sw_Mode_1 AND Ro_Mode_1 AND Li_Mode_2 Adaptivity can be modeled as a simulated annealing process. Simulated annealing consists of capturing a new state of the DEVS model. The new state is obtained by applying any of the operating regimes. This is followed by recording the performance metric (network throughput) of the network with the new state for a finite amount of time. This recorded metric is then compared with the metric obtained in the previous state (state of the DEVS model before the operating regimes are modified) and the change in performance metric is recorded. The metropolis criterion (Carley and Svoboda 1996) is then used to determine whether or not to adopt the new state. The metropolis criteria states that, a change is always accepted if the forecast performance for a hypothetical organization is better than the known performance of the current organization. A ?hypothetical organization? can be interpreted as a new organization that can be obtained by applying design changes to the current organization. Furthermore, when the forecast is poorer that change may still be accepted with a probability which is calculated using the Boltzman equation Ttt ePP /)(cos 0 ? = such that cost(t) = 0 ? performance (t), and is the probability of accepting a ?bad? design for the previous iteration. The above process is then repeated until the temperature reaches a freezing point or until the simulation time ends, whichever is earlier. Temperature is defined as the model?s current level of risk aversion. In other words, the degree to which the DEVS network model is open to accept change of state. The Temperature always drops after every new state has been adopted for the DEVS model. 0 P 31 Freezing point is the point at which a state is in its final form and no more adaptation or change is allowed. (Carley and Svoboda 1996). To implement the above notion in the DEVS environment, we include an additional component in the network design called as the ?annealer?. The total simulation time for the network operation is fixed to 1000 time units and the annealer is made to operate after every 100 time units. After application of a new state, the annealer records the performance metric for a finite amount of time as mentioned above, this finite time is fixed to 50 time units. Each operation of the annealer can be termed as iteration. The annealer operates based on the following algorithm. 1. Set the initial value of temperature T=0.433 and ? = 0.975 where ? is the rate at which the DEVS model learns to be risk averse. The initial value of temperature (0.433) corresponds to a probability of 0.9 for changes to be accepted. 2. Derive the new state of the DEVS model by applying an operating regime at random as described above and record the performance metric (network throughput) of the DEVS model in the new state. 3. If the recorded performance metric is better than the one obtained in the old state, continue with step 2, else proceed with step 4 If the new recorded metric is poorer than the older ones, use the metropolis criteria to determine whether the new state can be adopted in the network. 4. Set the new values of temperature and probability 0 P = P )()1( tTtT ?=+ ? 32 5. Continue steps 2 thru 5 until a freezing point is reached or the simulation time ends, whichever is earlier. (P = 0.55 and T = 0.345). The above algorithm, followed by the annealer can be depicted by the following control flow model: Figure 4.3 Control Flow Diagram Depicting the Operation of the Annealer We then compare the final results of performance metrics when simulated annealing is implemented, with our original results (without adaptation) for each of the fault management technique to analyze the network performance under adaptive reconfiguration. 33 34 Chapter 5 Detailed Design of the DEVS Network Model Chapter 4 describes the detailed architecture for agent based modeling of reactive and anticipatory control in computer networks. This chapter provides details pertaining to design of the DEVS network simulation model that implements a distributed network monitoring system (DNM) (Prietula et al. 1998). 5.1 The DEVS Basic and Coupled Models In the DEVS formalism, one must specify 1) basic models from which larger ones are built, and 2) how these models are connected together on hierarchical fashion. A basic model contains the following information ? the set of input ports through which external events are received ? the set of output ports through which external events are sent ? the set of state variables and parameters: two state variables are usually present, ?phase? and ?sigma? (in the absence of external events the system stays in the current ?phase? for the time given by ?sigma?) ? the time advance function which controls the timing of internal transitions - when the ?sigma? state variable is present, this function just returns the value of ?sigma?. ? the internal transition function which specifies to which next state the system will transit after the time given by the time advance function has elapsed. 35 ? the external transition function which specifies how the system changes state which an input is received ? the effect is to place the system in a new ?phase? and ?sigma? thus scheduling it for a next internal transition; the next state is computed on the basis of the present state, the input port and the value of the external event, and the time that has elapsed in the current state. ? the confluent transition function which is applied when an input is received at the same time that an internal transition is to occur - the default definition simply applies the internal transition function before applying the external transition function to the resulting state ? the output function which generates an external output just before an internal transition takes place. Basic models may be coupled in the DEVS formalism to form a Coupled model. A coupled model tells how to couple (connect) several component models together to form a new model. This latter model can itself be employed as a component in a larger coupled model, thus giving rise to a hierarchical construction. A coupled model contains the following information ? the set of components ? the set of input ports through which external events are received ? the set of output ports through which external events are received ? the external input coupling which connects the input ports of the coupled model to one or more of the input ports of the components ? the external output coupling which connects output ports of components to output ports of the coupled model, thus when an output is generated by a component it 36 may be sent to a designated output port of the coupled model and thus be transmitted externally ? the internal coupling which connects output ports of components to input ports of other components , hence when an input is generated by a component, it may be sent to the input ports of designated components (in addition to being sent to an output port of the coupled model) (Zeigler and Sarjoughian, 2003). 5.2 The Distributed Network Monitoring System A distributed network monitoring (DNM) system consists of a hierarchical structure with a set of network components and is endowed with monitoring agents that cooperate in monitoring the network. The network can be divided into several regions or sub networks (in our case we consider two distinct subnets). Within each sub network, a set of monitoring agents are jointly responsible for maintaining up-to-date models of host and router performance and availability. These monitoring agents belong to the monitoring layer as described in the previous chapter. Monitoring agents are responsible for notifying the management layer regarding the status of network components as well as sub networks. The management layer which consist of the reactive and the anticipatory agents, utilize the data acquired from the monitoring layer to make control decisions for management of network faults. Figure 5.1 shows a hypothetical network structure based on the notion of distributed network monitoring system (DNM). Figure 5.1 Designed Network Model 5.3 Component Overview We give a brief description and functions of the crucial components in our network model ? Experimental frame The experimental frame primarily consists of 3 sub components, the generator, the transducer and the fault injection mechanism. The generator generates packets to be processed by the network components on the basis of a specific inter-arrival time. The transducer is responsible for computation of network performance metrics (throughput, latency and drop rate of packets). The throughput is defined as the average rate of job departures from the architecture, estimated by the 37 number of jobs processed during the observation interval, divided by the length of the interval. A job?s turnaround time is the length of time between its arrival to the processor and its departure as a completed job. The drop rate of packets is defined as the percentage of packets dropped due to network faults. The fault injection mechanism, which is embedded in the experimental frame, generates ?fault packets? at a random rate. A ?fault packet? when encountered by a network component, induces a certain level of degradation in the throughput and latency of the component. Figure 5.2 Experimental Frame The activity diagram for the experimental frame is shown in Figure 5.3. The generator and the fault injection mechanism start as soon as the simulation begins. The transducer computes the performance metrics based on the number of packets and the number of faults incurred. The packet generator and the fault injection mechanism cease to operate when the simulation time ends. 38 Figure 5.3 Activity Diagram of the Experimental Frame The interaction among the different components of the experimental frame is shown by a sequence diagram in Figure 5.4. As shown in the sequence diagram, the generator initially starts generating network packets to be processed by the hosts in the network. As soon a packet is generated, the transducer is simultaneously triggered. The transducer records the simulation time the packet is generated. On completion of packet processing by any of the hosts in the network, the transducer records the completion time. Based on the arrival and completion time parameters, it computes the value of throughput and turn around time. If the transducer fails to record the completion time due to packet loss, it records the packet as being lost and appends it to the list of dropped packets which is used to calculate the drop rate of packets. On completion of the simulation time, when no more metrics are to be recorded, the transducer triggers the generator and the fault injection mechanism to cease generation of network packets and fault packets respectively. 39 Figure 5.4 Sequence Diagram for Interactions of Various Components within the Experimental Frame Switches, routers, and hosts: The operational specifications of the switch, router, and hosts are discussed in the previous chapter. We now give a brief description about the effect on each of these components due to a fault. The switch, router, and the hosts degrade in a similar way when a fault packet in encountered. The degradation can be seen as a 3 step process. On encountering the first fault packet, the component?s normal working is disrupted and it?s said to change to a ?low degradation? state or in other words when a fault is encountered, the processing time of these components is doubled. This can be interpreted as the fact that 40 degradation causes the components to delay the operation they are carrying out. This in turn affects the throughput, turn around time and drop rate of packets pertaining to that component and hence the subnet to which it belongs and consequently the complete network. On encountering a control packet, the operation is again returned to normal. Similarly, if another fault packet is encountered before a control packet, the components degrade further to a state of ?moderate degradation? and furthermore ?high degradation? after which it completely ceases to operate. The activity diagram describing the behavior of a network component on encountering fault packet(s) is as shown in Figure 5.5. Figure 5.5 Activity Diagram of Network Component due to Fault(s). 41 42 ? Links Links are the interface between different components of the network. They regulate the flow of packets. Since we need to vary the delay of the links for experimental purpose, the links should be implemented in such a way that the variation of delay is practical. Hence we implement a link as a form of processor which can be considered as an entity which takes some finite amount of time (link delay) to process a job and forward it. The DEVS processor component, which models a link, has no buffering capability. Therefore, when a job arrives while the processor is busy, it simply ignores it. This affects the drop rate of packets. 5.4 Monitoring Agents Monitoring agents are deployed within each of the subnets to record the individual performance metrics of components. These metrics include the throughput, the turn around time, and the drop rate of packets. The fault proneness of network components is being reported by the monitoring agents to the reactive or anticipatory agents, which in turn take the required action according to their functionality. The activity diagram depicting the behavior of an individual monitoring agent is shown in Figure 5.6. Figure 5.6 Activity Diagram of a Monitoring Agent The monitoring agents are being deployed at various levels in the DEVS network model. Those include, (1) the component level monitoring agents that monitor the individual performance of routers and hosts, (2) the subnet level monitoring agents that monitor a subnet as whole, and (3) the network monitoring agent that monitors the performance of the complete network. The sequence diagram shown in Figure 5.7 depicts the interaction between the monitoring agents at different levels in the DEVS network model. 43 Figure5.7 Sequence Diagram showing Interaction between Monitoring Agents 5.5 Reactive / Anticipatory agents The data received from the monitoring agents form as inputs to the reactive and anticipatory agents. The reactive agent functions on simple fuzzy rules while the anticipatory agent functions on the basis of a Na?ve Bayesian classifier as described in the previous chapter. Figure 5.8 shows the activity diagram of the reactive agent. 44 Figure 5.8 Activity Diagram of a Reactive Agent As shown in the Figure 5.8, the reactive agent gets the data from the monitoring agent, which has the details of proneness of faults in the various components among the network. The reactive agent then matches those with the predefined fuzzy rules and triggers the control agent to take respective action. The activity diagram of the anticipatory agent is as shown in Figure 5.9. In contrast to the reactive agent, the anticipatory agent has an additional ?learner? component which builds a predictive model of fault proneness in the network based on the Na?ve Bayesian Classifier. It then computes probabilities of failure of the components as described in the Chapter 4. and thereby triggers the control agent to take corrective action. 45 Figure 5.9 Activity Diagram of an Anticipatory Agent The interactions between the monitoring and the management agents (reactive and anticipatory) are shown with a sequence diagram in Figure 5.10. From the sequence diagram, it can be seen that the reactive agent is supplied data only by the component level monitoring agents. Since the reactive agent works on simple fuzzy rules, it interprets the data obtained by the component level monitoring agents to determine the components that require corrective action. The anticipatory agent, on the other hand, predicts the proneness of fault among the various network components, and hence it needs a complete picture of the network. The monitoring agent at the subnet and the network level help the anticipatory agent to dynamically learn the network behavior for the proneness of faults and thereby constantly update the evidence of the Na?ve Bayesian Classifier. 46 Figure 5.10 Sequence Diagram for Interaction between Monitoring and Management Agents 5.6 Control Agent The control agent operates on the basis of the output from the reactive and anticipatory agents. The data it obtains from the management agents consists of details pertaining to the network component which is under degradation. The control layer then triggers corrective action to those components in the form of corrective messages (Hood and Ji 1998). When these corrective messages are encountered by network components, 47 they regain their normal operation. The activity diagram of the control agent operation is shown in Figure 5.11. Figure 5.11 Activity Diagram of a Control Agent The interactions among the control agent and the management agents are as shown below (Figure 5.12). As described above, the control agent is only responsible to trigger corrective action to the respective components that are under degradation or may be prone to degradation based on the output from the reactive or anticipatory agent respectively. 5.7 Annealer The function of the annealer is to facilitate dynamic updating of the parameters of the DEVS model based on different operating regimes. The annealer operates independently of the management and control agents as described in the previous chapter. The activity diagram depicting the various states traversed is as shown in Figure 5.13. The annealer starts by modifying the operating regimes of the DEVS network based on certain parameters to reconfigure the network. It then records performance metrics of the 48 reconfigured network for a finite amount of time. If the recorded metrics of the reconfigured network are better than the previous configuration, the new configuration with the new operating regimes is adapted. Figure 5.12 Sequence Diagram for Management and Control Agents. If the metrics recorded are poorer, the metropolis criterion is used to decide whether or not to adopt the new state. According to the metropolis criteria, the new configuration (with poor performance) is accepted with a certain probability that depends on the temperature of the network (the probability decreases as temperature decreases). Temperature is defined as the degree to which the DEVS network model is open to 49 accept change of state. The temperature drops after every new state has been adopted for the DEVS model. Figure 5.13 Activity diagram of the annealer A complete class diagram of all the components described above is shown in Figure 1 of Appendix. The class Entity, Devs, Coupled, Viewabledigraph and Viewableatmic form the basic components of the DEVS framework. Entity is the base class of objects to be put into containers. The class Devs contains two main model classes, atomic and coupled. The class atomic realizes the atomic level of the underlying DEVS formalism. It has elements corresponding to each of the parts of this formalism. Coupled is the major class which embodies the hierarchical model composition constructs of the DEVS formalism. A coupled model is defined by specifying its component models. Components are instances of the Devs class thus enabling hierarchical composition. Class Viewable digraph is a derived class of coupled which 50 51 enables to define a coupled model in an explicit manner. In addition to components, it enables the specification of the coupling relation, which establishes the desires communication links among the components (internal coupling) and between them and the external world (external input and external output coupling). The processor class is a simple processor representing storage of jobs and passage of time for its execution. The class switch, control, reactive, subnet1, subnet2, generator, transducer, anticipator, monitor have their respective functions as described in the above sections and in previous chapters. The Multiserver coordinator routes incoming jobs for processing and collects results for final output. 52 Chapter 6 Experiment Design and Simulation Results The following chapter describes the detailed experimental design of the network model designed in DEVS followed by experimental results. We make use of Borland Jbuilder TM as the Integrated Development Environment (IDE) for implementing the network model in DEVS. 6.1 Experiment Design The DEVS-based network model comprises of two subnets. Each subnet includes a router and 3 hosts. An experimental frame generates the packets to be processed by the network components on the basis of a specific inter-arrival time. A fault injection mechanism is also embedded in the experimental frame which generates ?fault packets? at a random rate. A ?fault packet? when encountered by a network component, induces a certain level of degradation in the throughput and latency of the component. Monitoring agents are deployed throughout the network over each of the network components to record the performance metrics (throughput, latency and the drop rate) throughout the simulation. The throughput is defined as the average rate of job departures from the architecture, estimated by the number of jobs processed during the observation interval, divided by the length of the interval. A job?s turnaround time is the length of time between its arrival to the processor and its departure as a completed job. The drop rate of packets is defined as the percentage of packets dropped due to network faults. A sample screen shot of the DEVS environment is shown in Figure 6.1. Figure 6.1 Simulated Model in DEVS Environment 6.2 Simulation Results Each of the fault management techniques (reactive, alarm correlation and anticipatory) are simulated by varying the levels of the link delay and the complexity of the network. The number of replications for each fault management technique is 270. This results from the sum of the replications under each combination of configuration levels (i.e., link delay, network complexity). Each replication is run for 1000 time units. The t test is performed with respect to the mean values obtained for throughput, 53 turnaround and the con percen anticipato hence th vals obtained for the reactiv eters comprises of zero. Hence the d significant. Table 6.1 shows th time and the drop rate of packets for each of the fault management technique fidence intervals are recorded. From the confidence intervals obtained at 95 t level, we observe that the intervals obtained for the reactive vs. anticipatory and ry vs. alarm correlation for all the three parameters does not contain zero and e difference in their mean values is statistically significant. The inter e vs. alarm correlation technique for all the three param ifference between their means is not statistically e results of the t-test. Table 6.1 Confidence Intervals for Performance Metrics Reactive Alarm Correlation Anticipatory Reactive ------- (-0.001,0.019) (-0.025,- 0.006) Alarm Correlation (-0.019,0.001) ------- (-0.035,- 0.014) Anticipatory (0.006,0.025) (0.014,0.035) ------- 6.1.1 Performance of Network Throughput 54 Reactive Alarm Correlation Anticipatory Reactive ------- (-38.69,15.55) (25.11 , 70.19) Alarm Correlation (-15.55, 38.69) ------- (33.62, 84.81) Anticipatory (-70.19, -25.11) (-84.81, -33.62) ------- 6.1.2 Performance of Network Turnaround Time Reactive Alarm Correlation Anticipatory Reactive ------- (-8.54, 1.96) (5.67, 9.59) Alarm Correlation (-1.96,8.54) ------- (5.9, 15.93) Anticipatory (-9.59, -5.67) (-15.93, -5.9) ------- 6.1.3 Performance with Respect to Drop Rate of Packets We analyze the behavior of each of the performance metrics (throughput, turnaround time and drop rate of packets) with respect to the variation of link delay and complexity of the network, for each of the fault management techniques. We plot response surfaces with respect to each of the dependent variables that include throughput, turn around time and drop rate of packets, against the independent variables (link delay and complexity). Figure 6.2 shows the responses obtained. Figure 6.2 Response Surfaces 55 56 We observe that the throughput obtained in each of the techniques is significantly better at a lower value of link delay while throughput is less dependent on the complexity of the network. There is a considerable improvement in the turnaround time at higher levels of complexity. For the drop rate of packets, the percentage is significantly less at lower values of link delay; also, there is a significant reduction of drop rate of packets at higher values of complexity. As shown in Figure 6.2, performance parameters for the alarm correlation technique shows a linear dependency with respect to variation of the link delay. Also, reactive and anticipatory techniques are less prone to link delay until a certain threshold. The linearity exhibited by the alarm correlation technique can be explained by the fact that the fault patterns are recorded beforehand and hence the variation of the performance metrics is linear, whereas for the other two techniques this is not the case. 6.2.1 Sensitivity Analyses We perform sensitivity analysis based on each level of network complexity. We fix the value of complexity and analyze the variation of each of the performance metrics with respect to variation of link delay. The graphs obtained for the reactive, alarm correlation and the anticipatory techniques are shown in Figure 6.3, Figure 6.4, and Figure 6.5 respectively. Figure 6.3 Sensitivity Analyses for Variation of Complexity for Reactive Technique 57 Figure 6.4 Sensitivity Analyses for Variation of Complexity for Alarm Correlation Technique 58 ` Figure 6.5 Sensitivity Analyses for Variation of Complexity for Anticipatory Technique From the graphs above, we interpret the effect of complexity on each of the performance metrics for the fault management techniques. a. Throughput It can be observed that the complexity has no observable effect on the throughput in case of the reactive technique. The throughput is observed to decline with respect to the increase in link delay of the network and is seen to be constant for increase in link delay from level 3 to level 4. For the alarm correlation technique, variation of throughput is observed to be almost linearly dependent on the link delay. At complexity level 2, the throughput is observed to be constant for level 2 59 60 and 3 of link delay. In case of the anticipatory technique, at low complexity level, it can be observed that throughput is not sensitive to link delay at the initial levels, after which it varies linearly with link delay. The throughput is again observed to be independent at very high levels of link delay. For moderate and high levels of complexity, the throughput is observed to be sensitive to link delay at low and moderate levels of link delay. At high levels of link delay, the throughput is less sensitive to increase in link delay. b. Turnaround Time For the reactive technique, the turnaround time is less sensitive to link delay at very low and very high levels of link delay. This trend is observed for all levels of complexity for the reactive technique. For the alarm correlation technique, the turnaround time is observed to be highly dependent on link delay for level 1 and 2 of complexity. At level 3 of complexity, the turnaround time is seen to be less sensitive at higher values of link delay. For the anticipatory technique, the turnaround time is less sensitive to very low and very high levels of link delay for low complexity. The turnaround time is seen to be more sensitive to link delay with increase in complexity levels, except for level 3 and level 4 of link delay, where the turnaround time is seen to depreciate to some extent. c. Drop Rate of Packets The drop rate of packet is observed to follow a very similar trend as the variation in turnaround time as described in the above section. 6.2.2 Results with Adaptive Control using Annealer The experimental design is kept the same with addition of the annealer component. We perform 30 replications for each of the fault management technique with network adaptation. The link delay and complexity values are fixed to ?Low? and ?Level 1? respectively. We then compare the final values of throughput, turn around time and drop rate for each of the replication with enabling and disabling network adaptation. Figure 6.6 and 6.7 show the graphs obtained for the reactive, alarm correlation, and anticipatory technique. Comparision of Network Throughput for Reactive technique 0 0.1 0.2 0.3 1 3 5 7 9 11131517192123252729 Replications Thr oughpu t With Annealer Without Annealer Comparision of Network Turnaround Time for Reactive Technique 0 50 100 150 1 3 5 7 9 11 13 15 17 19 21 23 25 27 2 9 Replications Tur na r ound Ti m e With Annealer Without Annealer Comparision of Drop Rate of Packets for Reactive Technique 0 10 20 30 1 4 7 10131619222528 Re plications D r op R a t e of P a cket s With Annealer Without Annealer Figure 6.6 Performance Metrics with and without Annealer for Reactive Technique 61 Figure 6.7 Performance Metrics - Alarm Correlation and Anticipatory Technique From Figure 6.6, we see that the reactive technique performs exceptionally better with network adaptation in terms of performance metrics. In contrast, the performance metrics obtained for the alarm correlation and anticipatory technique are not satisfactory (Figure 6.7). This can be explained by the fact that the adaptation works independently of the recorded behavior of faults in case of alarm correlation technique and independently of the predictive model (the Na?ve Bayesian classifier) in case of anticipatory technique. This leads to a high number of control packet generations due to lack of communication between the control agent and the annealer. This is consequently responsible for the 62 63 degradation of network performance for the alarm correlation and the anticipatory technique. For example, the na?ve Bayesian classifier triggers the control agent based on the evidence it collects. This piece of evidence may not be correct once the network is reconfigured by the annealer. 64 Chapter 7 Conclusions Network fault management is a crucial area in the field of computer networks. The goal of fault management is to detect, log, notify users of, and (to the extent possible) automatically fix network problems to keep the network running effectively. Because faults can cause downtime or unacceptable network degradation, fault management is perhaps the most widely implemented element of the ISO network management elements. Our approach towards anticipatory fault management provides a novel methodology of applying agent based behavioral anticipation towards effective fault management. The comparative analyses presented in the previous chapter describe the effectiveness of our technique with respect to reactive techniques. It can be observed that the anticipatory technique performs significantly better compared to the reactive and alarm correlation techniques with respect to network throughput, turnaround time, and the drop rate of packets. The results obtained from the network adaptation methodology shows that the annealer technique performs exceptionally well for the reactive technique as compared to the alarm correlation and anticipatory technique. 7.1 The Limitations of Anticipatory Fault Management In our approach, the topology of the network is assumed to be fixed and static. The Na?ve Bayesian approach will cease to work if the topology is changed (dynamic) and hence cannot be applied to Ad Hoc networks. Furthermore, network adaptation cannot be effectively carried out for anticipatory technique due to lack of communication 65 between the control agent and the annealer. To get around this problem, an additional component must be incorporated in the architecture to act as an interface between the annealer and the control agent. Another limitation to our approach is the level of network detail being considered. For instance, we do not consider any hardware details like faults occurring in the network components due to hardware failure or physical wear and tear. One more problem with the adaptation technique is the selection of operating regimes that are to be modified. The operating regimes are decided based on the DEVS toolkit; hence these regimes do not include possible adaptation parameters such as addition, deletion, or modification of component functionality. 7.2 Future Work Given the limitations from the previous section, the improvement on the anticipatory technique for network fault management requires additional efforts and better tools that would capture additional details about network operation and network components. Furthermore, a framework needs to be devised that would enable anticipatory fault management in Ad Hoc networks, whereby the network topology needs to be modified dynamically. A methodology needs to be invented to undertake hardware implementation of the design. This could be done by embedding the Anticipatory control in an Integrated Circuit (IC) and embedding the fabricated IC in a real time network. Another important issue be taken care is the improvement of the network adaptation framework. The Na?ve Bayesian classifier should be designed with additional logic to consider the decisions taken by the annealer. The same can be also achieved by having an additional component that would act as an interface between the control agent and the 66 annealer. The additional efforts described above would result in a high degree of improvement in network fault management. The methodology of agent based behavioral anticipation towards fault management in networks can be effectively deployed in the present computer industry and will effectively contribute towards reducing losses incurred due to network faults. Making improvements on our technique as mentioned above will provide a versatile framework for effective fault management in computer networks. 67 REFERENCES A. Bouloutas, G. Hart, and M. Schwartz, ?On the design of observers for failure detection of discrete event systems,? in Network Management and Control. New York: Plenum, 1990. A. Lazar, W. Wang, and R. Deng, ?Models and algorithms for network fault detection and identification: A review,? in Proc. IEEE Int. Contr. Conf., 1992. B. Ekdahl, E. Astor and P. Davidson, ?Towards Anticipatory Agents?, Intelligent Agents: ECAI-94 Workshop on Agent theories, architectures and languages. pp 191-202. 1994. B. Zeigler and H. Sarjaughian, ?Introduction to DEVS modeling and Simulation with JAVATM: Developing component-based Simulation Models?, Arizona State University, August 2003. CAIDA. Cooperative association for internet data analysis. [Online]. Available: http://www/caida.org/Tools. Carley K. and Svoboda D.,? Modeling Organizational Adaptation as a Simulated Annealing Process?, Sociological methods & research, Vol. 25 No. 1, August 1996. Corn, P.A., Dube, R., McMichael, A.F., & Tsay, J.L. (1988), ?An autonomous distributed expert system for switched network maintenance?. In Proceedings of IEEE GLOBECOM?88(pp.1530-1537). 68 C. Hood and Chaunyi Ji, ?Intelligent agents for proactive fault detection?, IEEE, March 1998. Davidsson P, ?A Framework for Preventive State Anticipation?, M.Butz et al. (Eds.): Anticipatory behavior in adaptive learning systems, LNAI 2684, pp. 151-166, 2003. Edidiong Uyai Ekaette and Behrouz Homayoun Far, ?A framework for distributed fault management using intelligent software agents?, CCECE 2003 ? CGGEI 2003, IEEE, 2003. F. Feather and R. Maxion, ?Fault detection in an ethernet network using anomaly signature matching,? in Proc. ACM SIGCOMM, vol. 23, San Francisco, CA, Sept. 1993, pp. 279?288. Frank E. Feather. ?Fault Detection in an Ethernet Network via Anomaly Detectors?. PhD thesis, Department of Electrical and Computer Engineering, Carnegie MelIon University, 1992. G. Jakobson and M. D.Weissman, ?Alarm correlation,? IEEE Network, vol. 7, pp. 52?59, Nov. 1993. H. Wang, D. Zhang, and K. G. Shin, ?Detecting syn flooding attacks,? in Proc. IEEE INFOCOM, 2002. I. Katzela and M. Schwarz, ?Schemes for fault identification in communication networks?, IEEE/ACM Trans. Networking, Vol. 3, pp. 753-764, Dec. 1995. I. Rouvellou and G. Hart, ?Automatic alarm correlation for fault identification,? in Proc. IEEE INFOCOM, Boston, MA, Apr. 1995, pp.553?561. 69 J. Agre, ?A message-based fault diagnosis procedure,? Proceedings of the ACM SIGCOMM conference on Communications architectures & protocols, Vol 6 Issue 3, Aug 1986. Joseph, C., Kindrick, J. Muralidhar, K. So, C. & Toth-Fejel, T. (1989) ?MAP fault management expert system?. In Meandzija, B.& Westcott, J. (Eds.) Integrated Network Management, I.North-Holland: Elsevier Science Publishers B.V. J. F. Huard and A. A. Lazar, ?Fault Isolation Based on Decision-Theoretic Troubleshooting?, Technical Report 442-96-08, Center for Telecommunications Research, Columbia University, New York (1996). J. Pearl, ?Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference?, Morgan Kaufmann, San Mateo, Calif., 1988. Hood C. and Ji C., ?Intelligent agents for proactive fault detection?, IEEE Information Communication Conference (Infocom 97). 1997. K. Carley and D. Svoboda, ?Modeling Organizational Adaptation as a Simulated Annealing Process?, Sociological methods and research, Vol. 25 No. 1, August 1996. L. Lewis, ?A case based reasoning approach to the management of faults in communication networks,? in Proc. IEEE INFOCOM, vol. 3, San Francisco, CA, Mar. 1993, pp. 1422?1429. L. Lewis and G. Dreo, ?Extending trouble ticket systems to fault diagnosis?, IEEE Network, vol. 7, pp. 44-51, Nov 1993. Kandel A, ?Fuzzy Expert Systems.? CRC press, 1991. 70 Luger G. F. and Stubblefield W. A., ?Artificial Intelligence: Structures and Strategies for Complex Problem Solving?, The Benjamin/Cummings Publishing Company, Inc. 1989 M. Brodie, I. Rish and S. Ma. ?Intelligent probing: A cost ?effective approach to fault diagnosis in computer networks?, IBM systems journal, Vol 41, No.3, 2002. M. Thottan and Chuanyi Ji, ?Anomaly detection in IP Networks?, IEEE Transactions on signal processing, vol.51, No.8, August 2003. M. Thottan, ?Fault detection in ip networks,? Ph.D. dissertation, Rensselaer Polytech. Inst., Troy, NY, 2000. Under patent with RPI. M. Butz, O. Siguad and P. Gerard, ?Internal Models and Anticipations in Adaptive Learning Systems?, Anticipatory behavior in adaptive learning systems, LNAI 2684, pp. 86-109, 2003. Pat Langley and Herbert A. Simon. (1995) ?Applications of Machine Learning?. Communications of the ACM. Vol.38. No.11 Prietula M., Carley K., and Gasser L., ?Simulating organizations: computational models of institutions and groups?, Menlo Park, CA: AAAI Press/MIT Press, 1998. R. Herdman, ?Information security and privacy in network environments?, The Office of Technology Assessment (OTA), September 15, 1994. Roy Maxion. ?Unanticipated Behavior as a Cue for System-Level Diagnosis?, In 8th International Pheonix Conference on Computers and Communications, IEEE, March, 1989. Roy A. Maxion. ?Anomaly Detection for Diagnosis?. In Twentieth International Symposium on Fault-Tolerant Compufing. IEEE, March, 1990. 71 Rosen. R, ?Anticipatory Systems ? Philosophical, Mathematical and Methodological Foundations?. Pergamon Press, New York. R.A. Maxion and F.E. Feather. ?A Case Study of Ethernet Anomalies in a Distributed Computing Environment?. IEEE Transactions on Reliability 39(4),433-443, 1990. T Oates, ?Fault identification in Computer Networks: A review and a New Approach?, CS-TR 95-113. 1995. T. D. Ndousse and T. Okuda, ?Computational intelligence for distributed fault management in networks using fuzzy cognitive maps,? in Proc. IEEE ICC, Dallas, TX, Jun. 1996, pp. 1558?1562. Thottan M. and Ji C., ?Anomaly detection in IP Networks?, IEEE Transactions on signal processing, vol. 51, No. 8, August 2003. Wright, J.R., Zielinski, J.E. & Horton, E.M. (1988) ?Expert systems development: the ACE system?. In Liebowitz, J. (Ed.) Expert System Applications to Telecommunications.New York: John Wiley & Sons. Yamahira, T., Kiriha, Y. & Sakata, S. (1989) ?Unified fault management scheme for network troubleshooting expert system?. In Meandzija, B. & Westcott, J. (Eds.) Integrated Network Management, I.North-Holland: Elsevier Science Publishers B.V. Y. Yemini, ?A Critical Survey of Network Management Protocol Standards,? in Telecommunications Network Management into the 21st Century, S. Aidarous and T. Plevyak, eds, IEEE Press, Piscataway, N.J.1994. Zeigler B.P., ?Object ?Oriented Simulation with Hierarchical. Modular Models ? Intelligent Agents and Endomorphic Systems?, Academic Press, 1990. 72 Zeigler B. and Sarjoughian H. ?Introduction to DEVS Modeling & Simulation with JAVA TM : Developing Component-based Simulation Models?, August 2003. 73 APPENDICES Appendix A Design Class Diagram Figure A.1 Design Class Diagram of the DEVS Network Model 74 Appendix B Calculation of Confidence Intervals for the T test A two sided 100(1 -? )% C.I (Confidence Interval) for comparison of means 21 ?? ? is given by: )...(... 21 , 2 21 YYestYY ??? ? ? where, 21 21 11 )...(. RR SYYes p +=? 21 , RR are the number of replications and 2 )1()1( 21 2 22 2 11 2 ?+ ?+? = RR SRSR S p where, is an unbiased estimator of the variance and 2 p S 2 i ? 2 21 ?+= RR? degrees of freedom. We perform the t test at 95% confidence interval. Tables B.1, B.2 and B.3 shows the tabulated calculations for Confidence Intervals . B.1 Calculation of C.I for Reactive vs Alarm Correlation Technique 1 S 2 S es. IC. Throughput 0.001675 0.002685 0.00538 (-0.001, 0.019) Turnaround Time 11504.17 17237.6 13.84 (-38.69, 15.55) Drop Rate 471.63 607.11 2.68 (-8.54, 1.96) 75 B.2 Calculation of C.I for Reactive vs Anticipatory Technique 1 S 2 S es. IC. Throughput 0.001675 0.001813 0.004822 (-0.025, -0.006) Turnaround Time 11504.17 8359.43 11.5 (25.11, 70.19) Drop Rate 471.63 379.17 2.38 (5.67, 9.59) B.3 Calculation of C.I for Alarm Correlation vs Anticipatory Technique 1 S 2 S es. IC. Throughput 0.002685 0.001813 0.00547 (-0.035, -0.014) Turnaround Time 8359.43 17237.6 13.06 (33.62, 84.81) Drop Rate 379.17 607.91 2.56 (5.9, 15.93) 76 Appendix C Sample Data Sets For each experiment, we perform 30 replications with certain levels of complexity and link delay. The data sets obtained for the performance metrics are as follows: Figure C.1 Performance Metrics for Low Link Delay and Level 1 Complexity 77 Figure C.2 Performance Metrics Moderate Link Delay and Level 1 Complexity 78 Figure C.3 Performance Metrics for High Link Delay and Level 1 Complexity 79 Figure C.4 Performance Metrics for Low Link Delay and Level 2 Complexity 80 Figure C.5 Performance Metrics for Moderate Link Delay and Level 2 Complexity 81 Figure C.6 Performance Metrics for High Link Delay and Level 2 Complexity 82 Figure C.7 Performance Metrics for Low Link Delay and Level 3 Complexity 83 Figure C.8 Performance Metrics o for Moderate Link Delay and Level 3 Complexity 84 Figure C.9 Performance Metrics for High Link Delay and Level 3 Complexity 85