Scalable, Self-Healing, and Real-Time Network Services for Directed Diffusion Except where reference is made to the work of others, the work described in this thesis is my own or was done in collaboration with my advisory committee. This thesis does not include proprietary or classified information. Kenan L. Casey Certificate of Approval: Kai H. Chang Professor Computer Science and Software Engineering Alvin S. Lim, Chair Associate Professor Computer Science and Software Engineering Homer Carlisle Associate Professor Computer Science and Software Engineering Joe Pittman Interim Dean Graduate School Scalable, Self-Healing, and Real-Time Network Services for Directed Diffusion Kenan L. Casey A Thesis Submitted to the Graduate Faculty of Auburn University in Partial Fulfillment of the Requirements for the Degree of Master of Science Auburn, Alabama May 10, 2007 Scalable, Self-Healing, and Real-Time Network Services for Directed Diffusion Kenan L. Casey Permission is granted to Auburn University to make copies of this thesis at its discretion, upon the request of individuals or institutions and at their expense. The author reserves all publication rights. Signature of Author Date of Graduation iii Vita Kenan Luke Casey, son of Jerry and Paula Casey, was born January 8, 1982, in Louisville, Kentucky. He graduated from Jeffersonville High School as salutatorian in 2000. He earned his Bachelor?s degree in Computer Science and Mathematics from Freed- Hardeman University, Henderson, Tennessee in 2004. iv Thesis Abstract Scalable, Self-Healing, and Real-Time Network Services for Directed Diffusion Kenan L. Casey Master of Science, May 10, 2007 (B.S., Freed-Hardeman University, 2004) 145 Typed Pages Directed by Alvin S. Lim Directed diffusion is a data-centric publish-subscribe routing protocol for sensor net- works. We have designed, implemented, and evaluated three network services which build on the strengths of directed diffusion and lessen its weaknesses. The system architecture emphasizes efficient communication, local route repair, and real-time response. We evaluate the performance of our improved diffusion in terms of routing overhead, delivery effective- ness, and deadline achievement. Our results demonstrate the benefits of the network services in all three respects. We increase the efficiency of flooding, improve packet delivery rates in the presence of node failure, and decrease the number of packets that miss their deadlines. v Acknowledgments I would like to express my appreciation to Dr. Alvin Lim for the guidance he has provided throughout my study at Auburn. I would also like to express my gratitude to the advisory committee members, Dr. Kai Chang and Dr. Homer Carlisle. I am especially thankful to Dr. Gerry Dozier for his advice and encouragement. I would like to thank the Department of Education for its support through the GAANN Fellowship. This fellowship was instrumental in supporting my research. Several fellow students have made significant contributions to this research including Raghu Neelisetti and Philip Sitton. I am thankful for their help and their friendship. Above all, I am grateful to my wife, Ashley, whose love and support have made this work possible. I thank her for her patience and encouragement during the long research process. Her companionship is my greatest joy. vi Style manual or journal used Journal of Approximation Theory (together with the style known as ?aums?). Bibliograpy follows van Leunen?s A Handbook for Scholars. Computer software used The document preparation package TEX (specifically LATEX) together with the departmental style-file aums.sty. vii Table of Contents List of Figures xi 1 Introduction 1 2 Background 5 2.1 Sense-and-Respond Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Directed Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Directed Diffusion Architecture . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 Strengths of Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.3 Weaknesses of Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.4 Route Repair Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.5 Routing and Filter API . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Distributed Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.1 Lookup Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.2 Composition Service . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.3 Adaptation Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4 ISEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4.1 Isview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.2 Ismanage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.3 Isplugin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.4 Isproxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3 Motivations and Applications 26 3.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.1 Efficient Flooding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.2 Route Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.1.3 Real-Time Communication . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.1 Emergency Medical Response . . . . . . . . . . . . . . . . . . . . . . 33 3.2.2 Business Alerts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.3 Meteorological Command and Control . . . . . . . . . . . . . . . . . 34 4 Related Work 39 4.1 Efficient Flooding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.1.1 Heuristic-based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.1.2 Topology-based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2 Route Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 viii 4.2.1 WAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2.2 ADMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2.3 SWR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.4 SHORT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2.5 RDMAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.6 ABR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2.7 TORA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2.8 AODV-LRQT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2.9 PATCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3 Real-Time Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3.1 RAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3.2 SPEED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5 Network Services Architecture 61 5.1 Clustering Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.1.1 Passive Clustering Overview . . . . . . . . . . . . . . . . . . . . . . . 64 5.1.2 Protocol Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.1.3 Adaptation to Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2 Repair Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2.1 Break Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2.2 Break Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2.3 Localized Gradient Repair . . . . . . . . . . . . . . . . . . . . . . . . 74 5.3 Real-Time Communication Mechanism . . . . . . . . . . . . . . . . . . . . . 77 5.3.1 DD-RAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.3.2 SVM-t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.3.3 DVM-t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6 Network Services Design 82 6.1 Design Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.1.1 Energy-efficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.1.2 Scalable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.1.3 Localized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.1.4 Distributed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.1.5 Real-Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.1.6 Reactive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.2 Design Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.2.1 Clustering Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.2.2 Repair Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.2.3 Real-time Communication Mechanism . . . . . . . . . . . . . . . . . 87 ix 7 Network Services Implementation 91 7.1 Clustering Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 7.1.1 Passive Clustering Attribute . . . . . . . . . . . . . . . . . . . . . . 91 7.1.2 Passive Clustering Filters . . . . . . . . . . . . . . . . . . . . . . . . 93 7.2 Repair Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 7.2.1 DD-Repair Attribute . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 7.2.2 DD-Repair Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 7.2.3 Local Flooding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7.3 Real-Time Communication Mechanism . . . . . . . . . . . . . . . . . . . . . 99 7.3.1 DD-RAP Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7.3.2 DD-RAP Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 7.3.3 Prioritized Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 8 Performance Evaluation 105 8.1 Clustering Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 8.1.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 8.1.2 Flooding Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 8.1.3 Delivery Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . 107 8.2 Repair Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 8.2.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 8.2.2 Repair Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 8.2.3 Repair Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 8.2.4 Packet Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 8.3 Real-Time Communication Mechanism . . . . . . . . . . . . . . . . . . . . . 117 8.3.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 8.3.2 On-Time Delivery Ratio . . . . . . . . . . . . . . . . . . . . . . . . . 119 9 Conclusions and Future Work 125 Bibliography 128 x List of Figures 2.1 Distributed Service Framework . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2 ISEE Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.1 Global Route Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Ideal Local Route Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3 DART 2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.1 Regional Flooding (Region Filter) . . . . . . . . . . . . . . . . . . . . . . . 42 4.2 Relative Distance Micro-Discovery . . . . . . . . . . . . . . . . . . . . . . . 52 4.3 RAP Communication Architecture . . . . . . . . . . . . . . . . . . . . . . . 55 4.4 SPEED Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.1 Layered Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.2 Detailed Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.3 Example PC Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.4 Full Gateway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.5 Distributed Gateway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.6 PC Transition State Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.7 Local Repair for Directed Diffusion . . . . . . . . . . . . . . . . . . . . . . . 75 8.1 Number of interest messages versus number of nodes for transmission radius (r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 8.2 Number of exploratory data messages versus number of nodes for transmis- sion radius (r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 xi 8.3 Number of interest messages versus number of nodes (n) and transmission radius (r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 8.4 Number of exploratory data messages versus number of nodes (n) and trans- mission radius (r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 8.5 Average percent flooding reduction for each network size (n) . . . . . . . . 111 8.6 Delivery ratio versus number of nodes for transmission radius (r) . . . . . . 112 8.7 Delivery ratio for each repair/flooding algorithm . . . . . . . . . . . . . . . 113 8.8 Latentcy of repair for each repair/flooding algorithm . . . . . . . . . . . . . 115 8.9 Overhead (Interests+Exploratory Data) for each repair/flooding algorithm 116 8.10 Normalized Overhead (Overhead Packets/Diffusion Overhead Packets) for each repair/flooding algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 117 8.11 Overhead Packets/Data Packet for each repair/flooding algorithm . . . . . 118 8.12 DD-RAP network topology . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 8.13 On-time delivery ratios of Flow 1 (high priority) versus data rate (r) for SVM and DVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 8.14 On-time delivery ratios of Flow 2 (low priority) versus data rate (r) for SVM and DVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 8.15 Percent improvement of on-time delivery ratio for SVM . . . . . . . . . . . 123 8.16 Percent improvement of on-time delivery ratio for DVM . . . . . . . . . . . 124 xii Chapter 1 Introduction In 1991, Mark Weiser challenged computer scientists to consider a new model of com- puting, a model in which computers are interwoven into the ?fabric of everyday life until they are indistinguishable from it? [1]. Weiser?s vision, now called ubiquitous computing, emphasizes intuitive interaction with pervasive computer resources. Proactive computing is a branch of ubiquitous computing which focuses on autonomous and environmentally- based devices. Three fundamental goals for proactive computing have been proposed: get physical, get real, and get out [2]. The first goal is to connect computers to the physical world. Such computers are typically small sensors and actuators which collect data over a large physical environment and communicate it over a wireless network. The second goal emphsizes the need for real-time performance. Proactive computing systems should be faster than real-time so that feedback can be provided to automated control loops for future predictions. The final goal is to remove humans from the interactive loop in order to allow for faster-than-human response times. This change essentially shifts computation from human-centered to human-supervised. Sensor networks have emerged in response to these lofty goals. Sensor networks are large-scale, wireless networks of resource-constrained sensor devices. They most clearly support the first goal of proactive computing, connecting the physical and virtual worlds though sensors. In reference to the second and third goals, sensor networks allow for rapid and automated response to environmental stimuli. They allow faster-than-human response 1 by taking humans out of the loop. Sensor devices, called nodes, consist of a processor, en- vironmental sensors, a wireless communication device (usually a radio), and a power source (usually a battery). Current sensor devices are the size of a quarter, but the goal is to further reduce their size and cost. The SmartDust project [3], for example, envisions de- vices one cubic millimeter in size and capable of being suspended in air. A wide range of applications for sensor networks has been proposed. Examples include environmental mon- itoring, military surveillance, and medical monitoring. Sensor networks are most noticeably differentiated from other types of networks by their large size, limited energy source, and dynamic nature. Sensor deployment is almost always ad-hoc due to the sheer number of sensors. Potential deployment methods include being launched by artillery shelling, scat- tered by ground vehicles, or dropped by airplanes. The small size of sensor devices puts severe limits on the amount of available energy so all operations are energy-efficient. Sensor networks are characterized by robustness in handling the inevitable failures which occur in the field. The system must continue to function after devices fail, die, or are destroyed. While a great deal of sensor network research has been conducted, the inherent char- acteristics of sensor networks provide ample challenges for system researchers. Three of the most compelling challenges of sensor networks are their immense scale, their resource scarcity, and their dynamic topologies. Sensor networks aim to include tens of thousands to millions of nodes. To support networks of this size, communication mechanisms must be incredibly scalable. The hardware of sensor devices also provides significant challenges. Sen- sor devices have very limited processing, storage, and communication capabilities. Perhaps most critical is the limited energy budget available to sensor nodes. Since batteries may not 2 be replaced in the field, all sensor network algorithms must conserve energy whenever pos- sible. Consequently, networking protocols must maintain high computation and communi- cation efficiency. Communication is particularly expensive in terms of energy (transmitting 1 bit consumes as much power as executing 800-1000 instructions [4]) so minimizing the number of transmissions (and receptions) is a paramount goal. Another challenge is the dynamic nature of sensor network topologies. Typically, sensor networks are deployed in an ad hoc fashion (e.g. by being dropped from a plane). The network must configure itself and maintain a working configuration in the presence of adverse environmental effects as well as node failure. In the face of anticipated node failure, the network should continue to function normally by taking advantage of node density. The overall goal of our research is to mask the fundamental challenges of the sensor network from high level applications and application developers. Ideally, the application programmer should not know about the scale, energy, or dynamics of the sensor network. The lower layer protocols should transparently support any size network, adapt to changing energy levels, and perform dynamic reconfiguration in response to topology changes. Ap- plications should not have to worry about such details. To address this issue, we propose three mechanisms which shield the application from the network level challenges. First, we present an efficient flooding scheme to increase the scalability of the network. Secondly, we address the dynamic nature of sensor networks through a route repair algorithm which reconfigures the network after node failure. Lastly, we propose a protocol for real-time communication which gives application developers control over data flow priorities so that time-critical messages can be delivered satisfactorily. Together, the three network services compose a new layer in the network protocol stack which applications can easily, even 3 transparently, utilize to gain improved network performance. The networks services reside slightly above the network layer but have access to the inner workings of the routing pro- tocol. The low level implementation of these services provides significant benefits to all the layers above the network level (i.e., transport and application). By taking advantage of the network services, any software running at higher layers will be capable of increased scalability, reactive self-healing, and real-time communication. Chapter 2 gives background information about directed diffusion and the distributed services framework, technologies which underlie and precede our research. In Chapter 3, we discuss the motivations for our network services and describe relevant applications of sense and response systems. We give an overview of research related to our three protocols in Chapter 4 and describe the architecture of the system in Chapter 5. In Chapter 6 we discuss the design principles that guided the development of the system. Chapters 7 and 8 explain the implementation and performance of the network services. We conclude our research and discuss future work in Chapter 9. 4 Chapter 2 Background One of the relevant applications for our network services are sense and respond (S&R) problems. S&R problems include any problem in which data must be acquired, analyzed, and acted upon in a relatively short amount of time. This chapter gives some background information on sense and respond systems and describes the fundamental network compo- nents underlying our network services. We first introduce a general architecture for sense and respond systems and identify how our network services fit into this taxonomy. We then give an introduction to the directed diffusion routing protocol and the distributed services framework built on top of diffusion. Finally, we summarize ISEE, a runtime frame- work for sensor network execution and monitoring which was used to evaluate our network enhancements. 2.1 Sense-and-Respond Systems Chandy [5] presents a high-level summary of sense and respond (S&R) systems in the context of IT issues. On the most basic level, sense and respond systems simply sense the environment and respond appropriately. Rules in S&R systems can be defined using when- then semantics. The when-clause corresponds to the detection of an event, and the then- clause describes the execution of the response. The cost of an S&R system can be broken into three parts: the cost of false positives, the cost of false negatives, and the incremental costs of running the system. Chandy [5] outlines several important characteristics of S&R 5 applications in order to categorize the application space. He presents the following S&R application properties: ? Response Type: Is the response human-centered or automated? ? Response Time: Is the response executed in minutes or sub-seconds? ? Event Rate: Are events generated a few per second or thousands per second? ? Condition Complexity: Are when-clauses based on a single event or a history of events? Are when-clauses based on a single stream or the fusion of multiple streams? ? Data Structure: Is the data structured in well defined schema or generally unstruc- tured? ? Query Structure: Are queries structured or unstructured? Our network services support S&R system of fairly high in complexity according to this taxonomy. We support an automated response time on the order of sub-seconds. Events may be generated at high rates and may compose several distinct streams of historical data that must be fused. The only exception to the trend of higher complexity is that our system utilizes structured data and structured queries. 2.2 Directed Diffusion Directed diffusion [6] [7] is a sensor network routing protocol based on the publish- subscribe communication model. Its development was guided by the principle of data- centric routing. This is in contrast to traditional address-centric protocols which route 6 packets based on host information. Diffusion also emphasizes in-network processing and localized interactions in order to promote energy efficiency and scalability. 2.2.1 Directed Diffusion Architecture Protocol Overview The objective of directed diffusion is to perform multipoint-to-multipoint communica- tion using named data. All routing is based on named data in the form of attribute-value tuples instead of host information like address-centric protocols (e.g. IP [8], DSR [9], or AODV [10]). Diffusion is also characterized by its emphasis on localized routing. Each node stores routing information only about its immediate (i.e., 1-hop) neighbors; no global information is required. Diffusion sets up subscriptions by flooding interest messages throughout the network. Nodes with matching data publish it by flooding exploratory data messages back through the network to the interested node. The subscribing node then reinforces its fastest neighbor by sending it a reinforcement message. Subsequent nodes recursively forward the reinforcement message to their fastest neighbor until the lowest latency path back to the publishing node has been reinforced. After a reinforced path has been established, reinforced data messages flow from publisher to subscriber via unicast or efficient multicast. After the two-phase setup algorithm is complete, data is, in essence, ?pulled? from source to sink along the reinforced gradients. 7 Protocol Details Diffusion uses four types of messages to establish paths within a network. A sink node subscribes to a data flow by flooding the network with interest messages that name the type of data the sink wants to receive. Intermediate nodes store the interest and record the neighbor from which it was sent. This saved path leading to the sink is known as a gradient. Nodes with data matching the interest will publish it by transmitting exploratory data along the gradients previously created. These publishing nodes are known as source nodes. When the exploratory data arrives at the sink, the sink will reinforce its single fastest neighbor (i.e., the neighbor that delivered the first exploratory data message) by sending it a reinforcement message. Any node receiving a reinforcement message will, in turn, reinforce its fastest upstream neighbor until a reinforced path all the way back to the source is established. Slower paths may be negatively reinforced to remove them from the gradient tables. Subsequent data emanating from the source, known as reinforced data, will be unicast or multicast over the reinforced path to the sink. This two-phase process results in the creation of a multipoint-to-multipoint distribution tree. 2.2.2 Strengths of Diffusion Data-Centricity Directed diffusion has generally proven to be well-suited to sensor networks. Its data- centric model is more appropriate for many sensor network applications. It performs more efficiently and provides more useful services for many types of sensor network applications (e.g., query processing). The use of named data provides an energy efficient mechanism for routing data, which avoids the unnecessary complexity of host information. Since data is 8 the primary concern it makes sense to use it as the routing criterion instead of host address, a property largely unimportant in sensor networks. Reactive Nature The reactive nature of diffusion also supports the goal of energy efficiency. Directed diffusion belongs to a class of protocols that creates routes in response to the application?s needs instead of proactively finding routes in advance. The reactive property is particularly relevant to senor networks since they may remain inactive for long periods of time waiting for events of interest. Localized Interactions One of the most notable characteristics of diffusion is its fully localized nature. Nodes only require knowledge about their 1-hop neighbors to create gradients. No global informa- tion, whether end-to-end or multi-hop, is needed for routing. This means that the routing table scales with the number of neighbors as opposed to the total number of nodes in the network. Node density, not network size, is the determining factor in the gradient table size. This improves the scalability of the protocol with respect to space complexity. The localized nature of diffusion also simplifies the routing protocol since global information is not required to make routing decisions. Aggregation Another strength of the diffusion protocol is its support for in-network processing or aggregation. Since each node understands the attribute-value tuples that compose a data message, any intermediate node may perform data aggregation. While this essentially 9 pushes application level data down to the network layer, the resulting benefits are significant. Aggregation essentially trades communication overhead for latency by consolidating the data from multiple packets into one packet. It has been demonstrated that significant energy savings (up to 42% in [7]) can be achieved by the use of aggregation in diffusion. Multipoint-to-Multipoint Links A final strong point for diffusion is its support for various types of communication paradigms. Unlike other network protocols which only provide unicast (1?1) capabilities, diffusion inherently supports multicast (1?N) communication. Additionally, diffusion?s publish-subscribe model can create gather distribution trees (N?1) and multipoint-to- multipoint (N?M) dissemination paths. Other protocols may support these modes of communication with high-level protocols, but diffusion?s relatively simple architecture in- herently supports all four classes of links. 2.2.3 Weaknesses of Diffusion Flooding Perhaps the greatest weakness of directed diffusion is its reliance on flooding for path creation and maintenance. In the case of standard diffusion (two-phase pull), both interest and exploratory data messages are flooded throughout the network whenever gradients are established. This results in a huge amount of network traffic every time a new path is established or an old path is refreshed. 10 Limited Scalability The cost of flooding rises with the size of the network, so scalability is significantly affected. Diffusion does not effectively support networks with multiple hundreds of nodes because of the huge overhead imposed by flooding across so many nodes. Although its localized nature supports scalability, diffusion?s dependence on flooding severely limits the practical size of a network. Latency of Global Repair To deal with broken paths, diffusion periodically re-creates routes by performing the same procedure used to initially find routes: global flooding. This mechanism is described in Section 2.2.1. Since diffusion handles failure and mobility with global repair mechanisms, the costs incurred for repair are significant. To reduce the energy costs of global repair, the path maintenance mechanism is performed on a relatively infrequent schedule (e.g. every 60 seconds). In the worst case, data may be delayed an entire refresh interval before a broken path is repaired. While this may be adequate for some applications, it may be completely unacceptable for others, e.g., time-critical and real-time systems. Lack of Real-Time Communication Support Many applications have time deadlines which should be recognized by the network and handled differently. Directed diffusion routes packets in a first come, first serve fashion ? no preference is given to higher priority flows. While not an inherent weakness, diffusion?s lack of support for real-time communication is a deficiency nonetheless. 11 2.2.4 Route Repair Mechanisms Standard directed diffusion includes two mechanisms for path repair. Diffusion was designed to handle path failure primarily by the periodic re-creation of gradients using a global mechanism. The designers of diffusion also mention a local repair procedure, but fail to adequately deal with the route repair problem. This section summarizes the costly global repair mechanism and the primitive local repair algorithm proposed by the original designers of the diffusion protocol. Global Gradient Repair The standard and currently implemented method of handling broken links in diffusion is global gradient repair. In this method, the sink periodically refreshes the path to the source by flooding the network with interest messages. The source also periodically floods exploratory data back to the sink in order to reinforce the path that is currently fastest. Although this mechanism provides the most optimal paths, it comes at a high cost in terms of energy and latency. Global repair results in network-wide flooding of interest and exploratory data messages. To lessen this cost, global repair is performed at a fairly infrequent interval. As a result, links may remain broken for an entire gradient refresh interval. This means that a data flow may fail to report an event for 60 seconds (by default) in the worst case since no effort is made to repair the link until the next gradient refresh. 12 Local Gradient Repair In [11], the designers of diffusion describe a local gradient repair strategy for the pro- tocol. In this method, intermediate nodes may participate in reinforcement. When a node detects degradation in link quality, it can discover a new path to the source and negatively reinforce the degraded path. The problem with this scheme, as pointed out by the authors, is that every node downstream from the break will attempt to find a new path, resulting in a significant waste of resources. The authors suggest that the first nodes after the break ?interpolate? data events so that downstream nodes continue to ?perceive? a high quality path and do not initiate unnecessary local repair. This method is far from ideal. It can hardly be considered local repair if intermediate nodes upstream from the break forward reinforcement messages all the way back to the source. On average, the intermediate node must reinforce half the distance between source and sink. In the worst case, the break would be 1-hop away from the sink and the so-called ?local? repair would be only 1 hop different from global repair. Furthermore, without some mechanism on the part of the first node downstream from the failure, all the downstream nodes will perform local repair. The interpolation mechanism proposed to solve this problem is somewhat nebulous and largely unspecified. 2.2.5 Routing and Filter API The Information Science Institute (ISI), University of Southern California has imple- mented directed diffusion and defined an API for developing applications based on the pro- tocol [12]. Additionally, the implementation includes a filter API which allows modification and extension of the network protocol. User-level programs use the routing API to publish 13 and subscribe to data flows in the network. System-level changes can be accomplished with the filter API. ISI diffusion supports three communication models: two-phase pull, one-phase pull, and one-phase push. In two-phase pull, the standard model, subscribers flood interests throughout the network, and publishers flood exploratory data back to subscribers. The lowest latency path over which exploratory data arrives at the subscriber will be reinforced. The one-phase pull communication model eliminates the flooding of exploratory data by implicitly reinforcing the fastest path on which interest messages are received. In this case, exploratory data is never flooded. In both pull models, data is conceptually ?pulled? from publisher to subscriber, i.e., the subscriber initiates the data flow. In contrast, the one-phase push model reverses the role of subscriber and publisher ? publishers initiate the transmis- sion of data, and subscribers respond by sending reinforcement messages. In this model, publishers periodically push exploratory data throughout the network. If a subscriber is interested in the exploratory data, it will reinforce a path back to the publisher to con- tinue receiving data messages. Interest messages are never flooded in this communication paradigm. Named data is supported through vectors of attribute pairs, which have a unique name (i.e., key) and a data value. These attribute vectors are the basis of diffusion?s data-centric routing. The directed diffusion routing API allows for access to the publish/subscribe network routing model and implementation of network modifications. The routing API allows user level programs to make subscriptions for data and publications of data. The diffusion protocol is implemented in two programs: the diffusion core and the gradient filter. While the core handles basic functionality involved in the system, the bulk of the actual 14 network protocol is handled by the gradient filter. The ISI diffusion routing API calls are described below. For a complete description consult [12]. Routing API ? static NR* NR::createNR() ? This static function returns a pointer to an NR object, which encapsulates all network routing mechanisms and, thus, provides access to the directed diffusion stack. The routing and filter API is comprised of the methods exposed by the NR object. ? handle NR::subscribe( NRAttrVec* subAttrs, const NR::Callback* callback ) ? subAttrs ? A pointer to a vector containing elements describing subscription information. ? callback ? A pointer to a callback object to be invoked when the subscription attributes are fulfilled. ? handle NR::unsubscribe( handle subscription handle ) ? subscription handle ? A handle associated with a subscription, returned by the subscribe method. ? handle NR::publish( NRAttrVec* pubAttrs ) ? pubAttrs ? A pointer to a vector containing elements describing the data to be sent. ? int NR::unpublish( handle publish handle ) 15 ? publish handle ? A handle associated with a publication, returned by the publish method. ? int NR::send( handle publish handle, NRAttrVec* sendAttrs ) ? publish handle ? The handle associated with the data being published. ? sendAttrs ? A pointer to a vector of data attributes and the original elements describing the publication. Filter API A filter is a process that resides between the directed diffusion stack and the user level applications. Filters are software modules that allow network developers to influence routing and data processing. Representative uses of filters include routing, in-network aggregation, caching, debugging, and monitoring [12]. Filters have greater access to low-level message contents (e.g. last hop and next hop) than diffusion applications. The filter API facilitates the development of filters. It is composed of the following methods. Again, refer to [12] for a complete explanation. ? handle addFilter(NRAttrVec* filterAttrs, int16 r priority, FilterCallback* callback) ? filterAttrs ? A pointer to a vector containing the filter attributes. ? priority ? A unique filter priority between 2 and 253. The higher the number the higher the priority. ? callback ? A pointer to the object to be executed when an incoming packet matches the filter. 16 ? int NR::removeFilter(handle filter handle ) ? filter handle ? The handle of the filter returned by addFilter. ? void sendMessage( Message* msg, handle h, u int16 t priority ) ? msg ? The pointer to the message to send along to the next filter. ? h ? The handle to the filter returned by addFilter. ? priority ? An optional argument used to modify the filter matching scheme. 2.3 Distributed Services In order to support dynamic, self-organizing sensor networks, three fundamental ser- vices for sensor networks have been proposed: lookup service, composition service, and adaptation service [13]. These services shield higher-level services and applications from the difficulties inherent to dynamic nature of sensor networks. Essentially, the three services mask the dynamic and distributed nature of sensor networks. Representative functionality provided by the distributed services includes remote service execution, dynamic task group creation, and fault tolerance. Figure 2.1 illustrates the overall system design using the distributed service framework. 2.3.1 Lookup Service The lookup service stores information about services offered in the network and dis- tributes this information to interested nodes. Applications utilize the lookup service through the lookup service API which exposes the following methods: ? service register(char* serviceType, char* serviceName, NRAttrVec* serviceAttrs) 17 Figure 2.1: Distributed Service Framework ? serviceType ? String specifying the generic service type. ? serviceName ? String specifying the unique service name. ? serviceAttrs ? An STL vector of attributes containing service information e.g. location, address, interface definition, or mobile code. ? service deregister(char* serviceType, char* serviceName) ? serviceType ? String specifying the generic service type to deregister. 18 ? serviceName ? String specifying the unique service name to deregister. ? lookup service(char* serviceType, char* serviceName, NRAttrVec* inputAttrs, NRAttrVec* outputAttrs) ? serviceType ? String specifying the generic service type to lookup. ? serviceName ? String specifying the unique service name to lookup. ? inputAttrs ? An STL vector of attributes containing the predicate for matching service providers, e.g., the number of service providers to return. ? outputAttrs ? An STL vector of attributes containing the results of the lookup. This may include a list of service providers along with application specific infor- mation. ? service exec(char* serviceType, char* serviceName, NRAttrVec* inputAttrs, NRAttrVec* outputAttrs) ? serviceType ? String specifying the generic service type to execute. ? serviceName ? String specifying the unique service name to execute. ? inputAttrs ? An STL vector of attributes containing the input to the remote service execution. This includes the name of the remote function and the input parameters to remote service. ? outputAttrs ? An STL vector of attributes containing the results of the remote service execution. ? service call(char* serviceType, char* serviceName, NRAttrVec* inputAttrs, NRAttrVec* outputAttrs) 19 ? serviceType ? String specifying the generic service type to execute. ? serviceName ? String specifying the unique service name to execute. ? inputAttrs ? An STL vector of attributes containing the input to the remote service execution. This includes the name of the remote function and the input parameters to remote service. ? outputAttrs ? An STL vector of attributes containing the results of the remote service execution. Applications that provide services, register the service with lookup server using the service register() method. Correspondingly, services can be deregistered when they are no longer available with service deregister(). The lookup service() call is invoked by applications that want to find details about a service in the network, e.g., interface or location infor- mation. Lookup clients may also request a list of service providers from the lookup server in order to select the most appropriate provider (e.g., the nearest geographically). After finding a service, a client application invokes the service exec() method to execute a service remotely. The service call() method of the lookup service encapsulates lookup service() and service exec() into one call. The lookup service is completely described in [14]. 2.3.2 Composition Service Many sensor network applications involve groups of nodes coordinating to solve one problem. Consequently, the task of organizing nodes into appropriate clusters becomes crucial. The distributed composition service composes and maintains dynamic task groups. This simplifies the development and execution of collaborative sensor network applications. The composition service defines a connector abstraction which encapsulates the underlying 20 link characteristics of a data flow. Extensive cluster information is maintained by the composition service in order to provide clients with an accurate picture of the organization of the network. Clusters may be formed ahead of time or dynamically based on events detected in the network. The composition service is accessed using the simple API shown below. ? join group(char* serviceType, int minGroupSize, int Xspan, int Yspan, NRAttrVec* out- putAttrs) ? serviceType ? String specifying the generic service type of the task group. ? minGroupSize ? Minimum requested group size. ? Xspan ? Location bounds for the group in the horizontal direction. ? Yspan ? Location bounds for the group in the vertical direction. ? outputAttrs ? An STL vector of attributes containing the results of the group insertion request. This may include group ID, group leader ID, and IDs of other members of the group. ? leave group(char* serviceType, int groupID, NRAttrVec* outputAttrs) ? serviceType ? String specifying the generic service type of the task group. ? groupID ? Group ID of the task group that the node is leaving. ? outputAttrs ? An STL vector of attributes containing the results of the group deletion request. This may include success or failure of the operation as well as application specific information. 21 Nodes not currently belonging to a group are assigned to a group when join group() is called. Cluster assignment is based on (1) location, so that clusters will be composed of nodes geographically near each other and (2) group size, so that clusters will have a similar number of members. The leave group() method removes a node from a cluster. The composition service API allows applications to access the cluster creation and management services provided by composition service. A complete specification of the composition service can be found in [15] [16]. 2.3.3 Adaptation Service In the most general sense, an adaptation service is a system that modifies its behavior based on changes to its environment [17]. The adaptation service for distributed sensor networks allows applications to adapt to changes in network and service configuration. It is organized into three phases: change awareness, consensus, and system behavior modi- fication. During the change awareness phase, the environment is monitored for possible changes. Affected units communicate with the adaptation service in the consensus phase to determine an appropriate adaptation action. In the last phase, the adaptation service implements the changes in its behavior determined in the consensus stage. The adapta- tion service for sensor networks specifically handles changes such as failure of nodes and services. The goal of the adaptation service is to mask failures and other network changes from higher-level services in order to improve fault tolerance and dynamic reconfiguration. Further details regarding the adaptation service can be found in [17]. 22 2.4 ISEE Mark Ivester developed a runtime framework for the execution and monitoring of sen- sor network services [18]. The Interactive Sensor network Execution Environment (ISEE) allows for the creation, configuration, and monitoring of emulated and real sensor networks. We used ISEE to configure and control many of our experimental networks. ISEE is a graphical execution environment which facilitates sensor network experiment repeatability and real-time visualization. The control and access environment provides remote execution, logging, interaction, and analysis facilities independent of the implementation of the sensor network. The runtime framework allows for simple extensibility, scenario creation, and ex- periment repeatability. In combination with the distributed service framework, ISEE allows reactive and straightforward developer interaction with sensor network experiments. ISEE is composed of four modules: Isview, Ismanage, Isplugin, and Isproxy, as shown in Figure 2.2. Figure 2.2: ISEE Architecture Overview 23 2.4.1 Isview The Isview module handles user interaction and network visualization. It allows the user to create and configure sensor networks through a graphical interface. Services, in- cluding but not limited to those described in Section 2.3, may be started on any subset of the network. Additionally, Isview displays state information retrieved through the Ethernet back channel. Application state, debug information, and load statistics may be monitored. 2.4.2 Ismanage The Ismanage module manages runtime plugins and communication with the sensor network. At its core, Ismanage sends commands to and receives state information from nodes in the sensor network. This module maintains consistent state information regard- ing links, nodes, and processes. Control information is sent to the nodes and feedback is received from the network by Ismanage. Another important responsibility of Ismanage is the dynamic loading and management of plugins. 2.4.3 Isplugin To increase the utility of ISEE, custom plugins, called Isplugins, can be created to handle new sensor network services and protocols. Plugins extend the capabilities of Isview and Ismanage. In addition to supporting new communication modes, plugins can also modify the visualization properties of ISEE. 24 2.4.4 Isproxy Isproxy is the module that manages remote execution of commands on sensor nodes. Fundamentally, Isproxy interfaces sensor nodes to the ISEE framework so that they can be remotely controlled through the graphical user interface. Commands given by Ismanage are sent to Isproxy and executed on the node by Isproxy. Typical actions invoked by Isproxy include starting, stopping, and monitoring processes. 25 Chapter 3 Motivations and Applications In this chapter we discuss the motivations for our enhancements to directed diffusion. We also present several existing and potential applications of S&R sensor networks. 3.1 Motivations Although directed diffusion is generally well-suited to sensor networks, it has several apparent weaknesses. Its reliance on flooding incurs a significant penalty on communication and energy efficiency. Furthermore, diffusion also handles node failure rather poorly with its use of periodic global flooding. Finally, diffusion lacks any support for time-critical message delivery. We propose three network services to augment and extend directed diffusion to handle these issues. First, we propose an efficient flooding scheme to increase the scalability of the network. Secondly, we address the dynamic nature of sensor networks through a route repair algorithm that reconfigures the network after node failure. Lastly, we present a protocol for real-time communication which gives application developers control over data flow priorities so that time-critical messages can be delivered satisfactorily. In the subsequent sections we give specific motivations for each of these mechanisms. 3.1.1 Efficient Flooding Flooding is a packet delivery process that delivers a packet to every connected node in the network [19]. Typically, flooding requires each node to rebroadcast every flooded packet so that every node is guaranteed to receive the message at least once. This type 26 of flooding has been called blind flooding [20] and simple flooding [21]. Efficient flooding algorithms attempt to reduce the number of redundant packet transmissions through the use of heuristics or information about the network topology. Such techniques increase algorithm complexity for communication efficiency. Flooding has frequently been used in both proactive and reactive routing protocols. Proactive routing protocols often rely on flooding for route advertisement. In this case, every node in the network must know the current status of a link in order to maintain correct routing tables. Efficient flooding techniques are often utilized by link-state protocols since extensive neighbor information is gathered and propagated by the routing protocol [19]. By using this topology information, a node can forward flooded packets to the minimum set of neighbors to guarantee global reception. In reactive protocols, which are generally more appropriate for sensor networks, the principle reason for flooding is route discovery. Unlike proactive protocols, reactive protocols typically use blind flooding. Reactive protocols must flood initial route creation packets because no prior topology information is known. The path to an unknown host is found by flooding the network in search of the destination. Directed diffusion, as discussed in Section 2.2.3, relies strongly on flooding. Interests and exploratory data are flooded during the gradient setup phase. After routes are discovered, diffusion performs route maintenance using the same two-phase flooding mechanism. Due to its simplicity, blind flooding has often been implemented in reactive protocols [6] [9]. Although this naive approach to flooding simplifies the design, it results in inefficiency since nodes may rebroadcast packets that all their neighbors have already received. In blind flooding, a node may receive duplicate copies of the same flooded packet from multiple 27 neighbors. The performance of blind flooding is inversely related to the average number of neighbors (neighbor degree) per node. As the neighbor degree increases, blind flooding results in greater redundant packets (network layer), greater probability for collision (MAC layer), and greater congestion of the wireless medium (physical layer) [22]. In dense networks it is not necessary for every node to forward each flooded packet. If only a subset of the nodes is chosen as relays, the flooding efficiency can be significantly improved. The underlying problem is to select a dominant set of nodes to relay flooded packets. More formally, we wish to find the minimal subset of forwarding nodes sufficient to deliver a flooded packet to every other node in the system [19]. This is typically accom- plished with algorithms that use topology information or heuristics to identify nodes that should forward packets. We have implemented efficient flooding through a clustering tech- nique that uses ongoing traffic to create a clustered structure. This structure allows nodes that may drop forwarded packets to identify themselves and, thereby, reduce the number of unnecessary retransmissions. 3.1.2 Route Repair Because of the dynamic nature of sensor networks, node and link failures are expected occurrences. When links fail, the routing protocol may attempt to repair from the breakage in one of two ways: end-to-end error recovery or local error recovery. End-to-end repair protocols initiate the recovery process by either explicitly alerting the source node of the problem with a negative acknowledgment or by implicitly informing the source with the ab- sence of a positive acknowledgment. In the former case, the node which detects a break will take no action so that the sender will timeout while waiting for a positive acknowledgment. 28 In the latter case, a negative acknowledgment will be sent from the intermediate node to the source node, reporting the link failure. Diffusion essentially uses the implicit approach (positive acknowledgment) in that the protocol defines no explicit error message depending instead on periodic route re-discovery to repair broken links. The problem with end-to-end repair is the high cost of network-wide flooding. This has serious implications on the performance of a system in terms of scalability, energy consumption, and latency. Protocols which depend on global error recovery mechanisms do not scale well with network size. Moreover, since every node must forward the flooded packet, each node consumes energy repairing a route which is possibly very distant. Latency of end-to-end repair mechanisms also suffers since routes are completely rediscovered from the source to the sink. Figure 3.1 graphically illustrates the high cost of global route repair in two-phase pull directed diffusion. Notice the global flood of interests and exploratory data messages in both directions. Also notice that the repaired path is only a few hops different than the original data path. Figure 3.1: Global Route Repair 29 When a node relatively far away from the source fails, it makes little sense to involve the source (and other distant nodes) in the error recovery process. Ideally, only nodes around the link failure should be involved in the repair process. In cases where the repaired path is only a few hops different from the original path, such as in the previous figure, the localized approach greatly reduces the overhead associated with repair. Figure 3.2 shows an ideal case for local route repair where distant nodes are not involved in the recovery process at all. Nodes in the immediate vicinity of the break participate in the repair algorithm. Note that neither the source nor the sink are involved in the recovery process. Figure 3.2: Ideal Local Route Repair The advantages of the local repair approach include increased scalability, increased flooding efficiency, and decreased latency of repair. Since only a portion of the nodes are involved in local repair, the protocol scales more gracefully with network size and consumes less overall system energy. The localized nature of the recovery also lends itself to faster route repair since the complete (source to sink) route does not have to be traversed. In their analytical comparison of local and end-to-end recovery mechanisms, Aron and Gupta [23] show that with end-to-end recovery the probability of successfully delivering a packet 30 on the first attempt rapidly degrades with increasing network size (and path length), while the performance of local recovery increases with longer routes. Additionally, the amount of resources consumed per packet is several orders of magnitude larger for end-to-end repair than for local repair. In summary, local repair improves the scalability, efficiency, and latency of a network protocol. 3.1.3 Real-Time Communication The general objective of sensor networks is distributed micro-sensing, i.e., to sense, monitor, and control physical environments. Examples include acoustic surveillance systems for monitoring homes, biometric sensors which detect harmful bio-agents in airports, and stress sensors which monitor the structural integrity of buildings during earthquakes. In many applications, data messages have time constraints in the form of end-to-end deadlines specifying an upper bound on the communication delay of a packet. Application-specific deadlines, for example, allow packets associated with important events to take priority over periodic monitoring packets. Surveillance systems may require the position of an intruder to be reported to the command center within 15 seconds of detection while the bio- detection system at the airport may need to respond within 1 second. Data in monitoring systems often has an interval of validity after which it is no longer useful. The validity interval of a home surveillance system, for example, will be much shorter than that of a temperature monitoring system since the presence of an intruder will be of much greater interest immediately after his detection, but may be useless 30 minutes later. 31 To support this time-critical property of physical environments, sensor network proto- cols must implement real-time communication mechanisms. The goal of real-time commu- nication is to minimize the number of packets which miss their end-to-end deadlines. This is typically accomplished by prioritizing packets by their deadlines so that more urgent messages are sent first. Essentially, less urgent packets are delayed so that more urgent packets can reach their destinations on time. The challenge for real-time communication over sensor networks is the multi-hop nature of sensor networks. Although it is reason- able to give priority to packets with shorter deadlines, prioritization should also be given to packets that are farther away from their destination. In order to meet its deadline, a packet with a relatively long deadline may need to be prioritized higher than a packet with a shorter deadline, if the long-deadline packet travels twice as many hops. To address this problem, we have extended an existing real-time communication protocol which considers both distance and deadline to directed diffusion. Our system allows application develop- ers to add deadline information to a data flow and have confidence that the network will maximize the number of packets which meet their deadlines. 3.2 Applications The emergence of pervasive computing has expanded the frontier for S&R systems. Although the S&R architecture is broadly applicable to various fields, its potential utility has not been fully realized. We present a summary of the applications of previous S&R systems and propose several new areas well-suited to the S&R paradigm. 32 3.2.1 Emergency Medical Response One very compelling application of S&R systems is in the field of healthcare. The authors of [24] have developed a sensor-based emergency medical response system. At the lowest level, sensors worn by patients report vital signs and location information to local command centers (e.g. ambulances) via 802.15.4. From there, information is forwarded over a cellular or satellite link to a global command center where the data is managed as a web service. The goal of the system is to provide greater situational awareness about patient condition and arrival time so that doctors can make more informed decisions. Our real-time communication model is particularly relevant to this type of application. Since information about vital signs is being communicated, timely response of the system is critical to the care of patients. Given the large size of many hospitals, efficient flooding is also an important mechanism for satisfactory performance of S&R systems in healthcare. 3.2.2 Business Alerts Many business applications for S&R systems have also been proposed [25] [26] [27]. [25] describes a unified event stream processing system to monitor, analyze, and detect critical business events. The objective is to improve efficiency, reduce cost, and enable the enterprise to react quickly. The Event Stream Processor calculates metrics based on event messages received from a variety of sources such as complaint databases or real-time production statistics. A domain expert creates rules based on the metrics to provide alerts for important business situations. Specific applications of this system include inventory replenishment, product behavior, and retail performance. Our networking improvements 33 are also beneficial in the business domain. The real-time communication mechanism is well suited to the time-critical nature of many phenomena that lead to immediate response. 3.2.3 Meteorological Command and Control S&R systems have also been developed for hazardous weather detection. Our enhanced communication protocols are especially relevant to meteorological applications where sensor failure is expected. The route repair mechanism allows the system to perform in challenging environments such as tornadoes and tsunamis. The redundant nature of the sensor network is used to overcome unavoidable node failure. Efficient flooding reduces congestion due to network flooding. This is relevant in the context of energy efficiency and network la- tency. Since hazardous weather detection systems will be constantly in use but infrequently activated, it is important to efficiently utilize the energy resources of the sensor devices. Ef- ficient flooding saves energy by reducing the redundant packet transmissions. This section describes the research in the areas of tornado detection and tsunami detection. Tornado Detection NetRad [28] is a Distributed Adaptive Collaborative Sensing (DACS) system for early tornado detection. The goal of NetRad is to detect a tornado within 60 seconds of formation and track its centroid within a 60-second temporal region. NetRad is composed of a dense network of low-powered radars that collaborate to accurately predict tornado formation. The radars report atmospheric readings to the System Operations and Control Center (SOCC) where a merged version of the data is analyzed to detect meteorological features. Radars are re-tasked every 30 seconds based on the utility of the features identified. Thus, 34 sensing can be focused on areas nearest to recently detected meteorological features so that a better picture of the tornado can be obtained. The NetRad system is somewhat different from the S&R sensor network system we proposed. Unlike our S&R system, the NetRad system uses a high-bandwidth, highly- structured, wired network to connect the sensors. Our system is designed to communicate over relatively low speed, ad-hoc, wireless networks. We also emphasize fast response time on the order of subseconds, as opposed to the 30-second turnaround time of NetRad. Tsunami Detection Our S&R system has specifically been applied to the tsunami detection problem. In this section we describe the current tsunami warning system used by the United States and a modified system which we have proposed for better detection and response to this type of disaster. Current System The current tsunami warning system is composed of ten buoys in the Pacific and five in the Atlantic/Caribbean. The Deep-ocean Assessment and Reporting of Tsunamis (DART) project is maintained by the National Oceanic and Atmospheric Admin- istration and serves as part of a tsunami warning system for the United States [29] [30] [31]. Figure 3.3 illustrates the architecture of the current system. The DART stations consist of two parts: an anchored seafloor bottom pressure recorder called a tsunameter and a companion moored surface buoy. The tsunameter detects subtle pressure changes which indicate tsunami waves. An acoustic modem transmits data from the tsunameter to the buoy, which then relays the information via satellite to land-based 35 warning centers. The goal of the DART system is to provide accurate and early warning for tsunamis. This includes avoiding false alarms and unnecessary evacuations. Proposed System Our proposed system is similar to the current system, but incorpo- rates two major modifications. First, we suggest using a multi-hop sensor network connected by radio frequency (RF) links instead of a single-hop satellite communication scheme. Al- though this modification necessitates a greater number of devices, it also leads to significant savings in terms of latency, energy, and cost per device. Since satellite transmission delays are avoided, the latency of communication will be improved. Shorter range radio broadcasts will result in longer battery life and greater sensor lifetime. Improved latency and energy- efficiency represent significant advantages of our design. The use of a sensor network will result in greater accuracy of detection with respect to resolution. This allows for more localized tsunami detection and prediction. Secondly, we propose the addition of a tsunami mitigation system that responds to tsunamis. We assume it is feasible to build an artificial barrier reef strong enough to withstand the force of tsunamis and reduce its strength before it hits the shoreline. The barrier may be engaged (or fired) when a tsunami event is detected and be disengaged otherwise. Although such a barrier system would be expensive to construct, the cost may be justified for protecting particularly valuable areas (e.g. nuclear reactors). The proposed system is similar to the Thames Barrier, a series of movable gate structures which protect London from tidal floods [32], and the gate system currently being developed to shield the city of Venice from dangerously high tides [33] [34] [35]. Although the focus of our work is on the networking and analysis aspects of the system, we have also given some consideration to the design of the barrier system. We envision a 36 defense network inspired by the concept of air bags for automobiles. The basic idea is to create a series of artificial islands which impede the progress of the wave. We propose an underwater deployment of inflation devices which are connected by wire to ballasts on the seafloor. When given the command to fire, the barriers will inflate and float to the surface while remaining tethered to the ballast. We propose using a layered series of barriers that successively attenuate the wave. Our proposed barrier system is similar to the breakwater coastal defense systems described in [36] [37]. By using advanced prediction techniques, we hope to engage the series of barriers that most effectively obstructs the wave. 37 Figure 3.3: DART 2 System Architecture 38 Chapter 4 Related Work 4.1 Efficient Flooding Due largely to its simplicity, blind flooding is prevalent among reactive protocols; how- ever, because blind flooding often results in duplicate packet delivery, many resources can be conserved if a more intelligent approach to flooding is utilized. Efficient flooding techniques use topological information or heuristics to decrease the number of redundant packets trans- mitted during a flood. There are a variety of approaches to this problem, but we divide them into the two types proposed in [20]: heuristic-based and topology-based. Heuristic-based protocols use some sort of rule to determine whether a flooded packet should be forwarded. The topology-based algorithms make use of connectivity information to deduce which nodes need to act as packet forwarders and which nodes can drop flooded messages. We further divide the heuristic-based protocols into four subcategories and the topology-based proto- cols into three subcategories. In the following sections we examine each family of efficient flooding protocols and describe a few of their most important examples. 4.1.1 Heuristic-based One approach to efficient flooding is to use heuristics to reduce the number of rebroad- casts. Heuristics based on probabilities, counters, distance, and location have been proposed [22] [38]. The main advantage of the heuristic approach is its simplicity. The primary dis- advantage is the challenge of appropriately setting the parameters of the heuristic. 39 Probabilistic The probabilistic scheme [22] is similar to flooding, except that nodes rebroadcast flooded packets according to some pre-determined probability p. Thus, nodes will forward flooded packets with probability p and drop flooded packets with probability 1?p. In blind flooding, p is always 100%. In sufficiently dense networks, lower forwarding probabilities may be used without adversely affecting delivery effectiveness. In sparse networks, however, a greater probability is required for every node to receive the message. The Fireworks protocol [39] slightly modifies the simple probabilistic scheme just de- scribed. When nodes receive a flooded packet, they broadcast it to all their neighbors with probability p and unicast it to c of their N neighbors (c < N) with probability 1?p. This modification allows for greater delivery effectiveness and finer grained control than the naive probabilistic scheme. Counter-based The counter-based approach [22] is another simple heuristic used to limit the forwarding of flooded packets. A counter c keeps track of the number of times a redundant broadcast message is received during some time interval. A counter threshold C is chosen as the maximum number of times a redundant message will be rebroadcast. Whenever c ? C, the rebroadcast is inhibited. If the threshold has not been exceeded, the packet will be forwarded. The compelling features of this approach are its simplicity and adaptability to varying network densities. In dense areas of the network, some nodes will refrain from rebroadcasting while in sparse regions all nodes may forward the message. 40 Distance-based The distance-based scheme uses relative distance between nodes as the criteria for deciding whether or not to rebroadcast. If two nodes are relatively close to each other, then the coverage area of another rebroadcast will largely overlap with the original broadcast region, and few new nodes will receive the message. However, if a node receives a message from a distant node, a rebroadcast will, for the most part, cover a different region (and therefore should reach a different set of nodes). The obvious disadvantage of this scheme is the requirement of distance estimation capabilities between each node. Ni et al. [22] claim that this may be achieved without a Global Positioning System (GPS) by the use of signal strength. Location-based As an improvement on the distance-based scheme, a location-based approach has also been proposed [22]. In this scheme, exact (i.e. GPS) location is used to compute a precise calculation of the additional coverage area provided by a rebroadcast. When a node sends a flooded packet, it adds its own location to the packet header. Upon reception of a flooded packet, a node calculates the additional coverage obtained by a rebroadcast and rebroadcasts if this value exceeds a coverage threshold. Otherwise, the packet is dropped. Another location-based approach to efficient flooding is regional flooding [18]. In this scheme, the flooding of route discovery packets in directed diffusion is limited to a region encompassing both the source and sink. By adding the location of the source and sink to flooded messages, packets can easily be dropped when they traverse outside the region defined around the two nodes. For example, two circular regions with radii slightly greater 41 than the half the distance between the source and sink may be defined around the source and sink as shown in Figure 4.1. This scheme has been implemented in previous research [18] as a filter using the directed diffusion filter API. Figure 4.1: Regional Flooding (Region Filter) Another location-based approach is the Geographic Adaptive Fidelity algorithm (GAF) [40]. The primary objective of GAF is to identify ?equivalent? nodes so that some of them can be put in an energy-conserving sleep mode. From a routing perspective, nodes are equivalent when a constant routing fidelity can be maintained with only one representative 42 node awake. Node equivalence is determined using a virtual grid which overlays the net- work. The grid is created with dimensions such that nodes in adjacent grids are able to communicate with each other. Since only one node per grid must be awake to maintain connectivity, nodes in the same grid are equivalent. Thus, all the nodes except one may en- ter a sleep state. Although not specifically an efficient flooding technique, GAF is designed to save energy throughout every phase of the routing protocol, not just route discovery. 4.1.2 Topology-based In contrast to heuristic-based efficient flooding protocols, topology-based algorithms exploit topological information about the network to identify the best set of forwarding nodes. Most topology-based schemes use Hello message exchanges to collect topological information from neighboring nodes. Other algorithms construct a source-tree rooted at the source node and restrict leave nodes from forwarding flooded messages. A third topology- based approach involves grouping nodes into clusters in which only one member, the cluster head, is responsible for forwarding flooded packets to other cluster members. Neighbor Knowledge-based The neighbor knowledge-based schemes use 1- or 2-hop neighbor topology information to build a well-covered mesh of forwarding nodes. Perhaps the simplest neighbor knowledge method is Flooding with Self Pruning [41]. This protocol requires 1-hop neighbor informa- tion to be exchanged with periodic Hello packets. Each broadcast packet contains a list of the sending node?s 1-hop neighbors in the header. If the receiving node can not reach any additional nodes by a rebroadcast (i.e., its 1-hop neighbors are a subset of the set of nodes listed in the received packet), it will refrain from rebroadcasting the packet. More 43 advanced protocols utilize 2-hop information to achieve greater flooding efficiency. The Scalable Broadcast Algorithm (SBA) [42] uses 2-hop neighbor information and the identity of the previous hop to determine if any new nodes will be reached by rebroadcasting. This may easily be determined since 2-hop connectivity is known. Dominant Pruning [41], like SBA, uses 2-hop neighbor information to make forwarding decisions. Unlike SBA, how- ever, Dominant Pruning requires forwarding nodes to explicitly choose which of their 1-hop neighbors will be forwarding nodes. The protocol uses a Greedy Set Cover algorithm to recursively choose 1-hop neighbors which cover the most 2-hop neighbors until all the 2-hop neighbors are reached. Multipoint Relaying (MPR) [43] is similar to Dominant Pruning in that upstream nodes explicitly notify a subset of their 1-hop neighbors to rebroadcast, but it differs in the way these forwarding nodes are selected. The only nodes allowed to rebroadcast a packet are those chosen as Multipoint Relays (MPRs). In turn, MPRs must choose a subset of their 1-hop neighbors as MPRs. The algorithm for choosing MPRs is outlined below: 1. Select all 2-hop neighbors that can only be reached by one 1-hop neighbor. 2. Select the 1-hop neighbor which covers the most 2-hop neighbors. 3. Repeat 2 until all 2-hop neighbors are covered. A protocol similar to MPR is Span. Span [44] selects a set of coordinating (forwarding) nodes that covers all 2-hop neighbors but does so in a different fashion. If a node detects insufficient neighboring coordinator nodes, it will proactively declare itself to be a coordina- tor to alleviate the shortage. A node?s aggressiveness in becoming a coordinator is directly 44 related to the number of neighbors it connects and its current energy level. Like GAF (Sec- tion 4.1.1), Span attempts to construct a forwarding backbone so that other (redundant) nodes can enter an energy-saving sleep mode. Source-Tree-based The source-tree approach involves creating a source tree with the maximal number of leaf nodes [45] [46]. The Adaptive Link-State Protocol (ALP) [45] is an example of source-tree based protocols for flooding efficiency. Unlike traditional link-state protocols, ALP does not require the state of each link to be flooded to the entire network. It uses a tree structure to disseminate link state information to only those links along paths used to reach destinations. Another source-tree-based protocol is Topology Broadcast Based on Reverse-Path Forwarding (TBRPF) [46]. TBRPF is a proactive, link-state routing protocol for mobile ad-hoc networks. A node rebroadcasts a flooded packet only if it is not a leaf node in the spanning-tree formed by the minimum-hop paths from all nodes to the source node. Constructing and maintaining the tree requires significant overhead. Nodes update their tree status with each received packet and also periodically perform blind flooding. Cluster-based A third topology-based approach to efficient flooding is to divide the network into a clustered structure. A representative from each group serves as a cluster head. Nodes belonging to more than two clusters at the same time are called gateways. Other nodes are called ordinary nodes. A cluster is defined by the transmission radius of the cluster head. Efficient flooding is achieved by restricting rebroadcasting to non-ordinary nodes (cluster heads and gateways). 45 The general concept of network clustering was first introduced by Ephremides et al. as the linked cluster algorithm (LCA) [47]. Cluster heads are usually elected using the Lowest ID algorithm (LID) or the Highest Degree algorithm (HD) [48]. In general, there are two approaches to clustering: active and passive. Active In active clustering, clusters are created whether or not data transmissions are occurring. Typically, non-trivial computation and communication overhead is required to identify cluster heads and gateways. Each node must broadcast its ID (or degree) for cluster head election and compare it to the IDs (or degrees) of all of its neighbors. Periodic packet exchanges are often used to maintain the clustered structure. Communication is also necessary for gateway selection. To increase flooding efficiency, the number of potential gateways must be reduced using some gateway selection mechanism. The Flooding Gateway Selection protocol (FGS) [49], for example, selects the best gateways using a greedy set cover algorithm and then explicitly notifies the forwarding gateways of their status. Passive Another approach to clustering is to compose groups passively by the use of ongoing data traffic. Passive clustering (PC) is a cluster formation mechanism designed to increase flooding efficiency in mobile and ad-hoc networks [50], [19]. The protocol reactively constructs and adaptively maintains a clustered architecture to increase flooding efficiency. Unlike the active clustering protocols described previously, PC achieves flooding reduction on the fly, without explicit signaling and cluster setup packets. PC piggybacks cluster status information on data packets and constructs the cluster structure as a by-product of ongoing user traffic. 46 PC uses the piggybacked cluster information to deduce the role of a node as one of the three states: cluster head, gateway, or ordinary node. Cluster heads broadcast flooded packets to their neighboring nodes. Gateway nodes forward flooded packets between clus- ter heads. Ordinary nodes drop all flooded packets (their neighbors have already received the packet from a cluster head or gateway). By utilizing on-going packets to share cluster information instead of explicit control messages, PC significantly reduces communication overhead and the latency of cluster setup [19]. We describe passive clustering more thor- oughly in Section 5.1. 4.2 Route Repair In this section we summarize several protocols proposed to handle route repair in mobile ad-hoc and sensor networks. Since our emphasis is on route repair, we describe the portion of the protocol that performs path repair and omit general routing aspects of the protocol whenever possible. 4.2.1 WAR Aron and Gupta propose the Witness Aided Routing protocol (WAR) [23] which is very similar to DSR but incorporates local correction mechanisms to recover from route failures. One of the primary goals of WAR is to avoid costly end-to-end error recovery with local repair mechanisms. WAR differs from DSR in two respects: unidirectional routing and error handling. WAR uses witness hosts to perform promiscuous route maintenance. Witness hosts are essentially routers that act on behalf of other nodes when they detect possible packet loss. For example, if host X sends to host Y and is overheard by W1 and 47 W2 then W1 and W2 are witness hosts. If W1 and W2 do not hear Y forward the packet to host Z (the next hop), one of them will attempt to deliver the packet to host Z. Secondly, WAR uses a localized error recovery mechanism to repair broken paths without involving the source. When a link error occurs, the node upstream from the break broadcasts a copy of the original message with a recovery flag set. Like many other repair methods, WAR?s route recovery message is constrained to a hop limited region around the repair initiator. The hop limit, denoted as the Recovery Depth, is appended to the packet. To successfully repair the link, route recovery messages must find a path to a downstream node that was on the original path. Downstream nodes can identify themselves by searching for their own ID to the route listed in the packet header. Analytical results show that as network size and route length increase, the performance of end-to-end error recovery mechanisms, degrades rapidly [23]. Hence, the local repair mechanisms of WAR support greater network scalability. 4.2.2 ADMR An Adaptive Demand-Driven Multicast Routing (ADMR) protocol [51] is chiefly con- cerned with delivering packets to a multicast group in an on-demand fashion (instead of continuously maintaining the multicast group structure). ADMR routes messages through a tree structure from the source (root) to each group member (leaf or branch). When forward- ing nodes or receiving members become disconnected from the multicast forwarding tree, ADMR invokes a local subtree repair algorithm to detect and repair the path. Each node maintains a disconnection timer for each group based on the inter-packet arrival time. If no packet is received within this time interval, ADMR assumes disconnection has occurred. 48 Nodes that detect disconnection initiate local repair by sending a repair notification packet to nodes below them in the subtree (downstream). After sending a Repair Notification, nodes wait for repair delay period of time. If a Repair Notification is received within this time interval, the node will cancel its local repair (since it is further downstream from the break). No Repair Notification will be received by the node whose parent has failed so it will identify itself using this procedure and initiate local repair. This node will flood a hop-limited Reconnect packet in the neighborhood around itself. When nodes along the original path receive Reconnect packets, they forward them without incrementing the hop count. Thus, Reconnect packets can travel back to the source along the original path. If the original path is found by the hop-limited flood, the source will respond with a Reconnect Reply packet which will be unicast back to the repair node along the re- verse path the reconnect message took. In this way, a path from the source (root) to the destination around the broken link will be reconstructed. 4.2.3 SWR A Single path With Repair routing scheme (SWR) is proposed in [52]. This protocol is motivated by the desire to avoid source-initiated path repair. In SWR, the pivot node, located immediately upstream from the break, searches for alternate paths around the broken link. It does so by broadcasting a Help Request (HREQ) to its neighbors. Upon reception of an HREQ message, nodes use previously stored information about the topology of the network to create an alternate path around the failed node with Help Response (HREP) packets. SWR maintains topological knowledge of the network by the use of 49 cost information transmitted in each data packet. This procedure is recursively repeated upstream back to the source if the original pivot node is unable to find an alternate path. 4.2.4 SHORT A framework of Self-Healing and Optimizing Routing Techniques (SHORT) is presented in [53]. Unlike other repair mechanisms, SHORT attempts to find new routes constantly, even when connectivity is present. In some sense, it attempts to heal broken links before they break by constantly searching for better paths. SHORT can be applied on top of existing mobile ad-hoc routing protocols (e.g. DSR or AODV) to increase performance by optimizing existing routes when a better local sub-path becomes available. Two algorithms are described which optimize proactive repair based on path length (Path Aware-SHORT) or energy level (Energy Aware-SHORT). The primary goal of this approach is to discover short-cut routing paths. Short-cuts result from the mobility of nodes in the ad hoc network. Although connectivity may still be intact, the shortest path from source to destination may change after its initial discovery. SHORT continually seeks and takes advantage of such short-cuts. This is accomplished through the use of a hop count (HC) field on each packet. Nodes maintain a hop comparison array for each data flow to identify short-cuts. The addition of SHORT improved the performance of DSR and AODV in terms of both path optimality and message delivery rate [53]. 50 4.2.5 RDMAR Relative Distance Micro-discovery Ad Hoc Routing (RDMAR) [54] localizes flooding of route discovery queries by estimating the relative distance between the source and desti- nation. This approach to limiting route discovery and repair floods is the distinctive char- acteristic of RDMAR. By assuming a maximum velocity and average transmission range of all mobile nodes, RDMAR calculates the maximum number of hops separating two nodes their relative position (i.e. number of hops apart). RDMAR requires each node to maintain a routing table with information about every other host in the network. This includes the following fields: ? Default Router - Neighbor indicating the next hop for this destination host. ? Relative Distance - Estimate of the relative distance (in hops) to this destination host. ? Time of Last Update - Time last update was received from this destination host. ? Route Timeout - Time remaining before route is considered invalid. ? Route Active - Flag indicating whether the route is currently active. RDMAR uses the Time of Last Update to compute a time interval of uncertainty designated as tmotion. Assuming a velocity Micro Velocity and a transmission range Micro Range, the source and destination nodes can estimate their minimum and maxi- mum radius of movement during time period tmotion as shown in Figure 4.2. This distance is divided by the Micro Range to compute the number of hops that should be traversed in the route discovery flood. RDMAR utilizes the same Micro-Discovery mechanism for route 51 repair except that it is initiated by an intermediate node instead of the source. Upon detec- tion of a link failure, an intermediate node will flood route requests to the destination using relative distance information from its routing table to set the number of hops appropriately. In this way, route requests will be restricted to the region between the intermediate node and the destination. Figure 4.2: Relative Distance Micro-Discovery 4.2.6 ABR Associativity-Based Routing (ABR) [55] [56] defines a routing metric called degree of association stability used to make routing decisions. Association stability is related to the connection stability of one node with respect to another node over time and space. The 52 destination examines association stability of potential routes and chooses the one which is most stable and contains the fewest number of hops. ABR is divided into three phases: route discovery, route re-construction, and route deletion. The second phase, route re- construction, deals with route repair in a manner very similar to SWR (Section 4.2.3). The node immediately upstream from a break broadcasts local (hop-limited) queries that perform partial route discovery. This process is repeated recursively upstream toward the source. The backtracking process is discontinued if the node currently performing the repair is more than half the distance (in hops) from the destination to the source. In this case, global route discovery is performed instead of localized repair. 4.2.7 TORA TORA (Temporally-Ordered Routing Algorithm) [57] is a source-initiated routing pro- tocol based on the concept of link reversal. Routes are created using a height metric to establish a Directed Acyclic Graph (DAG) rooted at the destination. Links are assigned a direction (either upstream or downstream) based on their relative height. In order to create this structure, nodes need information about their 1-hop neighbors. TORA incorporates a route maintenance mechanism that supports localized route repair. The node immedi- ately upstream from the link failure generates a new reference level, which is propagated by neighboring nodes and causes nodes to adjust to the new height. When a node has no downstream links, it reverses the direction of its links. This link reversal propagates upstream until a new route to the destination can be found. 53 4.2.8 AODV-LRQT Pan et al. [58] present an extension to the local repair mechanism for the Ad-hoc Distance Vector (AODV) routing protocol called AODV-LRQT. The modified protocol limits the extent to which the repair mechanism is applied along two network dimensions: depth and breadth. Both approaches limit the scope and cost of flooding. Decreasing the depth means limiting the number of times a node can forward a repair route request. This is similar to the counter-based efficient flooding technique discussed in Section 4.1.1. To reduce the breadth of repair, route repair packets are given a hop count, or time to live (TTL), which localizes the scope of potential new paths to some n-hop neighborhood around the broken link. These mechanisms are implemented using two algorithms: repair quota and adaptive TTL. Each node has a repair quota (RQ) to control the breadth of repairs. When the RQ has been met, a node will no longer forward route request packets. Depth reduction is implemented with an adaptive TTL. The algorithm assumes knowledge of the network topology and transmission range in order to find a hop count which is half the length of the longest path in the network. Simulation showed that, in AODV, constraining the breadth (repair quota) of route request floods provided a greater performance improvement than constraining the depth (adaptive TTL) [58]. 4.2.9 PATCH Proximity Approach To Connection Healing (PATCH) [59] is another local recovery mechanism proposed for DSR. It aims to reduce the control overhead and achieve fast, localized recovery. When an intermediate node detects a broken link, it floods a local recovery request in the two-hop region around itself, looking for downstream nodes on the 54 path (i.e. those closer to the destination). It does so by including in its header a list of all the nodes between the intermediate node and the destination. If one of the nodes included in the header receives the local recovery request, it sends a local recovery reply to the source so that the new route will be used in future communication. If no route is found within some recovery interval, the source will initiate the end-to-end error recovery process. 4.3 Real-Time Communication 4.3.1 RAP RAP is a real-time communication architecture for large scale wireless sensor networks proposed in [60]. The goal of RAP is to provide a scalable and lightweight communication service that maximizes the number of packets meeting end-to-end deadlines. The network stack for the RAP protocol suite is shown in Figure 4.3. Figure 4.3: RAP Communication Architecture 55 On the top layer, a query-event service allows application developers to easily monitor events and submit queries to the sensor network. Queries are registered beforehand and triggered when an event occurs. When an event that matches the attributes of interest occurs in the geographic area of interest, a message stamped with the timing constraint is sent to the base station at the registered location. Queries are written and registered using the following API: ? query(attributes, area, timing constraints, base station location) ? attributes - Attributes of interest. ? area - Geographic area of interest. ? timing constraints - Timing constraints for event, i.e., end-to-end deadline. ? base station location - Location of base station which will receive event data. ? register event(event, area, query) ? event - Name of event. ? area - Geographic area of interest. ? query - Query associated with event. The Location-Addressed Protocol is a transport layer similar to UDP except that it uses location information instead of IP addresses for identifying hosts. Routing is handled by Geographic Forwarding (GF), a simple but robust network protocol which forwards packets based on location information. GF greedily forwards packets to the neighbor closest to the packet?s destination. GPSR is used to route packets around the perimeter of a void region. 56 The heart of RAP is Velocity Monotonic Scheduling (VMS), the layer that provides support for real-time communication. VMS is the packet scheduling policy that determines the order in which incoming packets are forwarded. Typically, ad hoc networks forward packets in FCFS order but this policy performs poorly in networks where data flows have different end-to-end deadlines. In contrast, RAP prioritizes packets based on their ?local urgency.? VMS considers both the temporal deadline and the geographic distance when scheduling packets. Thus, VMS is both deadline-aware and distance-aware. This means that packets with shorter deadlines and packets with longer distances to the destination will have higher priorities. VMS defines the velocity of a packet as the quotient of the distance to the destination and the time deadline. By assigning priorities based on the velocity, VMS is able to accurately quantify the ?urgency? of a packet and thereby meet more deadlines. Two priority assignment policies are defined in VMS: static velocity monotonic (SVM) and dynamic velocity monotonic (DVM). SVM calculates a fixed velocity once at the source before the packet is transmitted. SVM computes the velocity using Equation 4.1 where psrc = (xsrc,ysrc) and pdst = (xdst,ydst) are the locations of the source and destination respectively, bardbl?bardbl is the distance between two points, and Tdeadline is the end-to-end deadline. V = bardblpsrc ?pdstbardblT deadline (4.1) DVM re-computes the velocity at each intermediate hop using Equation 4.2. Note that Ti represents the time elapsed for the packet to reach intermediate hop i. Initially, Ti = T0 = 0 and pi = psrc at the source node. By updating the priority at each node, a packet that is progressing more slowly than its velocity may dynamically increase its priority. 57 Likewise, a packet traveling more quickly than its requested velocity may be slowed down to give way to more urgent packets. V = bardblpi ?pdstbardblT deadline ?Ti (4.2) To implement VMS, the network must use a prioritized queue. RAP prioritizes at two levels to maximize performance. Prioritization based on the velocity is performed at the network layer and at the MAC-layer. The network layer places packets in a queue ordered by velocity (higher velocity first). Additionally, RAP extends the IEEE 802.11 MAC protocol to adapt the wait time (DIFS) and backoff window (CW) based on the priority of the packet. RAP has been shown to significantly reduce the deadline miss ratio when compared to traditional FCFS mechanisms. When compared to DSR over standard IEEE 802.11, RAP reduced deadline miss ratio from 90.0% to 17.9% [60]. Despite its simplicity, RAP significantly increases the real-time performance. 4.3.2 SPEED SPEED [61] is another real-time protocol for sensor networks. SPEED employs feed- back control and stateless algorithms to support soft real-time communication. The proto- col?s design also emphasizes load balancing, localized behavior, and minimized dependence on the MAC layer. Like RAP, SPEED uses a location-based routing protocol to forward packets. The organization of the SPEED protocol suite is illustrated in Figure 4.4. At the top layer, SPEED provides an API that supports three types of communication modes: unicast, area-multicast, and area-anycast. While unicast follows the familiar 1-1 communication model, multicast and anycast are somewhat unconventional modes based 58 Figure 4.4: SPEED Architecture on geographic constraints. Area-multicast delivers a packet to each node in a circular region defined by a center position and radius. Area-anycast is designed for applications in which it is sufficient for one node in some area to respond for that region. Like other geographic routing protocols, every node in SPEED participates in a periodic beacon exchange with its neighbors to share location information. Two on-demand beacons, a delay estimate beacon and a backpressure beacon, are also employed by the protocol. By timestamping each data packet and ACK, the single hop delay to each neighbor can be calculated. Instead of using queue size to gauge congestion, SPEED uses this single hop delay as a metric to approximate load in the network. Routing in SPEED is handled by Stateless Non-deterministic Geographic Forwarding (SNGF), a modified version of simple Geographic Forwarding (GF). SGNF defines the forwarding candidate set, FSi, of node i as the set of neighbors of node i that are closer to the destination. Relay Speed, Speed, is computed by dividing the gain in distance to node j by the estimated time delay to node j. Formally, Speedji = bardbl vectorpdest ? vectorpibardbl?bardbl vectorpdest ? vectorpjbardblHopDelayj i (4.3) 59 where pdest is the location of the destination, pi is the location of node i, bardbl?bardbl is the distance between two points, and HopDelayji is the estimated delay from node i to node j. Packets are forwarded to nodes in FSi based on their relay speed. The node with the maximum relay speed is chosen as the next hop if the relay speed is greater than some Ssetpoint, a system parameter. Otherwise, the packet is probabilistically forwarded to the neighbor with the highest relay speed. If the packet is not forwarded, backpressure rerouting is initiated. This algorithm works with the neighborhood feedback loop (NFL) to adapt the network layer to congestion. When congestion is detected by NFL, backpressure beacons will be sent upstream to find routes around the congested area. SPEED uses this same mechanism to discover routes around network voids. Although SPEED is somewhat more complex than RAP, it provides several advanced real-time and congestion-avoidance features not offered by RAP. In some sense, SPEED is a heavyweight approach to real-time communication while RAP is more a lightweight protocol. 60 Chapter 5 Network Services Architecture Directed diffusion serves as the baseline network protocol for our S&R system. We have developed three improvements for diffusion to lessen its weaknesses and enhance its strengths. We first describe in detail the passive clustering algorithm for efficient flooding and its adaptation to diffusion. Next, we explain a route repair mechanism for directed diffusion which emphasizes localization of repair. Thirdly, we outline the architecture of a real-time communication protocol for diffusion which prioritizes data flows. The specific purpose of the network services is to improve efficiency, robustness, and timeliness of delivery for directed diffusion. More generally, the purpose of the enhanced communication services is to enable developers to easily utilize the power of the distributed sensor environment without its inherent complexities. The improved network communication mechanisms allow the system to function in challenging environments, which may have hindered functionality previously. Passive clustering reduces congestion resulting from network flooding. Flooding reduction is especially relevant in the context of energy efficiency and network latency. The route repair mechanism is crucial for reliable operation of the system in the face of link failures. It provides an efficient method of route healing to cope with node failure. Also critical to the performance of the system is timely response to detected events. To this end, our real-time communication mechanism performs packet prioritization and, thereby, reduces the number of packets that miss deadlines. The real-time communication protocol is a based on RAP [60], but has been adapted to the two-phase pull model of directed diffusion. It has also been extended to support location-unaware networks. The distributed 61 services described in Section 2.3, the lookup service, composition service, and adaptation service, may also be used in the S&R architecture for greater system usability and reliability. Figure 5.1 shows a layered overview of the S&R architecture. Figure 5.1: Layered Architecture Note that the distributed services reside above the network services proposed in this thesis. The distributed services can take advantage of the lower level services provided by the network services (i.e., efficient flooding, route repair, and real-time communication). A more detailed illustration of the architecture of the system is shown in Figure 5.2. This diagram shows the interactions among the distributed services and among the network services. Also illustrated is the abstraction of the underlying routing protocol. Although applications and distributed services see diffusion as the routing entity, in actuality the diffusion core, gradient, and all three network services are working to deliver packets. Thus, the network services transparently benefit applications. 62 Figure 5.2: Detailed Architecture 63 5.1 Clustering Mechanism As a result of the high cost of flooding and diffusion?s strong reliance on it, we chose to implement a clustering mechanism for efficient flooding. We selected passive clustering (PC) for this purpose because its design objectives are very compatible with diffusion. In Section 6.2.1, we discuss the specific reasons for this choice. In this section, we give an overview of passive clustering and describe the detailed workings of the protocol. Finally, we describe the minor modifications necessary to adapt passive clustering to diffusion. 5.1.1 Passive Clustering Overview The distinctive characteristic of passive clustering is its use of on-going data traffic to initiate cluster formation and communicate cluster-related information among the nodes. Using promiscuous packet reception, nodes gather cluster status information about all their 1-hop neighbors and adjust their own cluster state accordingly. Passive clustering creates clusters by assigning one of three states to a node. Nodes may be cluster head (CH), gateway (GW) or ordinary nodes (OR). CHs serve as leader nodes for their clusters and forward flooded packets to each member of the cluster. GWs connect two or more CHs together, thus serving as cluster relays. ORs receive flooded packets from CHs but do not forward the packet to their neighbors. Passive clustering results in a clustered structure similar to the topology shown in Figure 5.3. The circles represent cluster boundaries. Corresponding to the three node states, PC defines three fundamental rules for oper- ation: first cluster head declaration wins, gateway selection heuristic, and ordinary nodes drop flooded packets. 64 Figure 5.3: Example PC Topology First Declaration Wins Perhaps the most distinctive PC rule is its cluster head election rule. PC selects cluster heads by allowing the first node that declares itself CH to become CH. This is known as the first declaration wins rule. If a node has not heard from another CH, it claims itself to be CH and rules the rest of the nodes in its radio range. The first declaration wins rule provides several advantageous properties. Unlike many conventional clustering schemes, PC requires no waiting period or neighbor checking requirement to elect a CH. A node becomes CH as soon as it declares itself to be CH. The first declaration wins approach is also less likely to result in chain re-clustering [50]. In traditional clustering protocols, when two cluster heads move within transmission range of each other, one of them must defer to the other. This can trigger cluster head changes that propagate throughout the network [62]. The first declaration wins rule reduces this effect. Thirdly, PC creates a clustering that more 65 closely resembles the data flow in the network. By creating cluster heads based on traffic, PC achieves a better correlation between traffic flow and resulting clustered topology. Gateway Selection Heuristic If too many nodes become gateways, the flooding efficiency decreases since such a large number of nodes forward flooded data. If too few nodes become gateways, network connectivity may be adversely affected. The goal is to choose a sufficient number of gateways to preserve connectivity, but very few more. To make this trade-off dynamically, PC defines a gateway selection heuristic. The gateway selection heuristic limits the number of nodes that become gateways without breaking the passive nature of PC. A node becomes a gateway according to the number of cluster heads and gateways it has overheard. Whenever a non- cluster head node hears a packet from a cluster head or gateway, the node becomes a gateway if Equation 5.1 is true. Otherwise, the node will become an ordinary node. ??num(GW)+? > num(CH), (5.1) Note that in Equation 5.1, num(GW) is the number of neighbors known to be gateways, num(CH) is the number of neighbors known to be cluster heads, and ? and ? are tunable parameters (?,? ? 0). The values of ? and? should be chosen based on factors such as channel quality, noise level, and traffic patterns [19]. They may be locally adjusted to provide better adaptability and flexibility. This gateway selection procedure is fully distributed and requires only local information. It relies on overheard packets instead of active packet exchanges (e.g. cluster head-list exchanges). The disadvantage of this approach is that network connectivity may be affected for wrong values of the parameters. 66 If the parameters are too aggressive in reducing the number of gateways, the topology may be partitioned. Ordinary Nodes Drop Flooded Packets The behavior of ordinary nodes lies at the heart of flooding reduction. A node that is neither a cluster head nor a gateway becomes an ordinary node. When a node identifies itself as an ordinary node, it no longer forwards flooded packets. All flooded packets received by an ordinary node can safely be dropped because neighboring nodes will have already received the packet (either from a CH or a GW). Hence, flooding efficiency is dependent on the number of ordinary nodes that can be found. In a sufficiently dense network, a relatively large number of ordinary nodes should be found. 5.1.2 Protocol Details PC defines seven clustering states for nodes: two internal states and five external states. Internal states are entered when a packet is received and serve as a tentative role for the node while the packet is being processed. The two internal states are Gateway- Ready (GW READY) and Cluster Head-Ready (CH READY). Nodes enter external states when a packet is sent. These are externally visible states which are communicated to neighboring nodes and used in making adjustments of node state. The external states are Initial (IN), Cluster Head (CH), Full Gateway (FULL GW), Ordinary Node (OR), and Distributed Gateway (DIST GW). 67 Initial On startup, nodes enter the Initial state. A node in Initial state does not belong to any cluster. Nodes move from the Initial state to one of the two internal states when a packet is received. PC maintains soft-state clusters using an implicit timeout scheme. If a node does not receives any packets within time interval tclustertimeout, it reverts back to Initial state. Upon future reception of packets, the node will restart the clustering algorithm. Cluster Head-Ready Cluster Head-Ready is the internal state of potential cluster heads. A node in Initial state changes its state to CH READY only when it has received a packet from a non-CH node. Simultaneous cluster head can sometimes occur due to topology changes and packet delay. To resolve these conflicts, PC uses a Lowest ID algorithm to select the node with the lowest ID to serve as cluster head. If a cluster head receives a packet from another cluster head with a lower ID, it gives up its role and enters the Gateway-Ready state. The node which loses the LID completion sends a CH Give-Up message to its neighbors informing them of the change in status. Gateway-Ready The Gateway-Ready state is the internal state of potential gateways (both full and distributed). A node enters GW READY state when it receives a packet from a CH. A GW READY node will become a gateway or an ordinary node depending on the gateway selection heuristic. A node will change from gateway-ready to FULL GW or DIST GW if an insufficient number of its neighbors are already gateways. 68 Cluster Head The Cluster Head state is the external state of cluster heads. A node enters the CH state from the CH READY state if it wins the Lowest ID competition or if it has not overheard any other cluster heads. Full Gateway Nodes in the state Full Gateway directly connect two cluster heads. A full gateway is thus a member of two clusters. Full gateways announce the IDs of the two CHs that they connect. A node that is reachable from two CHs may declare its role as FULL GW only if it has not heard from another FULL GW node announcing the same pair of IDs. In this way, gateways also follow the first declaration wins rule. If two nodes concurrently declare themselves as FULL GW for the same pair of CHs, the node with the lowest ID will win and the loser will become an ordinary node. Full gateways are in the external state FULL GW. Figure 5.4 illustrates a full gateway. Figure 5.4: Full Gateway Distributed Gateway A node that is in the Distributed Gateway state connects two clusters, but is not directly connected to one of the cluster heads. A distributed gateway is composed of two nodes working together to serve as a gateway between two clusters. This allows cluster 69 heads that are two hops apart to be connected. Figure 5.5 illustrates a distributed gateway. Notice both the full and distributed gateways in the example topology in Figure 5.3. Figure 5.5: Distributed Gateway Ordinary Node Ordinary nodes are nodes that do not forward packets. A node enters the OR state from the GW READY state based on the gateway selection heuristic. A node changes from GW READY to OR if enough of its neighbors are gateways as determined by the gateway selection heuristic. When a node can hear from more than two CHs and every pair of announced CHs is connected by a FULL GW it will become an OR. Incoming Packet Processing The transition state diagram shown in Figure 5.6 summarizes the passive clustering algorithm. When a packet is received at a node, the state of the node will be changed to one of the two internal states: CH READY or GW READY. An Initial node will change its state to CH READY if the packet is from a non-CH (1). If the packet is from a gateway or an initial node, the node will enter the GW READY state (5). CHs revert to CH READY (3) and GWs (9) and ORs (8) revert to GW READY upon each incoming packet. The receiving node always updates its neighbor lists based on the state of the sending node which is contained in the message. By stepping back to the GW READY state (8, 9, and 11), nodes can adjust their state dynamically. For example, if the number of gateways 70 has changed, an ordinary node may promote itself to gateway. When the soft-state of PC expires, nodes in each state revert back to Initial (12, 13, 14, and 15). As a result, new clusters may be composed in response to subsequent packet flooding. This potential for re-structuring is beneficial for sensor networks since cluster head responsibilities will be handled by different nodes throughout the life of the system, thus distributing energy consumption. Figure 5.6: PC Transition State Diagram Outgoing Packet Processing When a packet is ready to be sent, the node changes its PC state from an internal state to one of the five external states. If it has not heard from any other CHs, a candidate 71 CH (CH READY) will change its state to CH (2) when it has packets to send. Otherwise it will change to GW READY. Nodes in the GW READY state will become FULL GW, DIST GW, or OR. If the number of known CHs is greater than one and there is no gateway connecting any two CHs, the node becomes a FULL GW (7). If a DIST GW has announced another cluster not known by the current node and no FULL GW connects them, then the node becomes a DIST GW (10). Otherwise, it becomes OR (6). If the number of known CHs equals one, then the node becomes either a DIST GW or an OR. If a node hears a DIST GW announce a CH other than its CH or if there is no DIST GW in the cluster, it will become a DIST GW (10). Otherwise, the node will become an ordinary node (6). 5.1.3 Adaptation to Diffusion PC state information must be appended to all flooded packets. To apply PC on a diffusion network, the state information is added to interests and exploratory data. Recall that diffusion creates gradients and periodically refreshes them. After this initial setup phase, flooding is no longer necessary since data flows over reinforced paths. Since PC piggybacks cluster status information only on flooded packets, PC creates the clustered structure during the initial setup phase of the gradient cycle. For the remainder of the cycle, no PC state information is transmitted, so PC is essentially inactive. PC is only active during period gradient refresh when diffusion floods the network with interests and exploratory data. Since PC affects the route setup, it is possible for suboptimal routes to be discovered. The structured topology created by PC may exclude the optimal route between source and sink. This end-to-end route from source to sink will only include nodes identified to be 72 cluster heads and gateways ? no ordinary nodes will be on this path. This potential for suboptimal routes is one of the inherent trade offs of PC as compared to global flooding. PC will improve flooding efficiency but may result in longer routes. 5.2 Repair Mechanism To deal with the shortcomings in diffusion?s ability to adapt to failure and mobility, we have developed a mechanism to efficiently handle route repair. We call our local repair protocol for directed diffusion DD-Repair. Our solution emphasizes truly localized repair in order to reduce latency and energy expenditure. Its basic structure is similar to the local repair algorithm used in ADMR [51], but its methods noticeably differ from ADMR since it is tailored to directed diffusion. We divide the local repair problem into three phases: break detection, break localization, and localized gradient repair. We will describe how DD-RAP handles each of these phases in this section. 5.2.1 Break Detection The first step in adapting to node failure or mobility is detecting the link breakage. This may be accomplished in a variety of ways. We assume that appropriate algorithms may be used to reliably detect a link breakage. Our focus is on handling the break after it is detected not adaptively detecting path breakage. For our experiments, we used a fixed event rate known a priori to detect breaks. We identified a link as broken when no data was received from a flow after an entire event interval had elapsed. 73 5.2.2 Break Localization Once a break has been detected, the break localization phase begins. The goal of this phase is to identify the node immediately downstream from the broken path. This node will initiate the localized gradient repair algorithm described in the next section. All intermediate nodes that detect a break will send a repair notification message to their 1-hop downstream neighbors along the gradients for the missing data. Every intermediate node downstream from the break should send a repair notification, and every intermediate node except the one nearest to the break should also receive a repair notification. If a node does not receive a repair notification within trepair seconds after sending one, it must be the nearest node, so it initiates the localized gradient repair. 5.2.3 Localized Gradient Repair The heart of DD-Repair is the localized gradient repair phase. The goal of this phase is to find and create new gradients in the area near the break by using the same basic mechanisms as global gradient repair. We first describe several potential mechanisms for restricting flooding to a localized region and then explain the inner workings of the repair process itself. Figure 5.7 illustrates the break localization and localized gradient repair phases of the algorithm. Local Flooding In order to restrict the flooding required to find new paths, we limit the packet for- warding in one of several ways. Possible methods include simple hop-limited flooding or limiting the reconnect packets to the 1-hop neighbors of the failed node. A more complex 74 Figure 5.7: Local Repair for Directed Diffusion approach is to make use of the clusters created by passive clustering to restrict flooding. The most common approach to localized flooding among repair protocols is a simple hop- limited flood. In this approach, a hop count (or time to live) field is incremented on each hop. When the hop count exceeds a threshold value, the flooded packets are dropped. Another straightforward approach is to limit the flooding to 1-hop neighbors of the failed node. This can be easily implemented by including the ID of the failed node on the repair packets and restricting flooding to nodes who have overheard packets from that failed node. A third and more sophisticated strategy is to limit flooding to nodes in the one or two clusters around the failed node. The trade-offs associated with each of these approaches are discussed in Section 6.2.2. Reconnect Interests The first step in localized gradient repair is the local flooding of reconnect interest mes- sages. These interest messages for the broken data flow are essentially searching for nodes 75 upstream from the break which are still receiving data. Reconnect interests are only trans- mitted in a limited neighborhood (as defined by the localized flooding algorithm) around the node nearest to the break (identified during the localization phase). This originator node acts like the sink in global gradient repair. For this reason, we label it the proxy sink in local gradient repair. Nodes outside the area determined by the local flooding al- gorithm will drop reconnect interests, enforcing the region boundaries. Reconnect interests are flooded in the region near the break in search of upstream nodes still connected to the dataflow. Reconnect Exploratory Data In response to reconnect interests, upstream nodes on the data path that are still re- ceiving data transmit reconnect exploratory data messages to neighbors that send reconnect interests. Reconnect exploratory data is exploratory data that is sent back to the proxy sink. In order to perform the most localized repair, the node directly upstream from the break should be found. This node will act like the source in global gradient repair, so we call it the proxy source in local gradient repair. To achieve this behavior, reconnect exploratory data messages are only sent from nodes that have not overheard reconnect exploratory data. Thus, the first node to send reconnect exploratory data will become the proxy source. In summary, only nodes along the original path can send reconnect exploratory data. The node nearest upstream to the break should be identified as the proxy source since it should be the first node along the existing path to receive reconnect interests. The proxy source will send reconnect exploratory data back to the interest initiator (the proxy sink). 76 Reconnect Reinforcement In global gradient repair, the sink reinforces the fastest path over which the exploratory data is received. Correspondingly, in the case of local repair, the proxy sink sends reconnect reinforcement messages to the first neighbor that delivers a reconnect exploratory data. In turn, this neighbor node will reinforce its fastest neighbor all the way back to the proxy source. While this path is probably not optimal, it serves as a temporary fix until the next global gradient refresh. Since the search for the new path was restricted to a relatively small region, it is almost identical to the original optimal path found by diffusion except for a few hops where the repair was performed. As a result, the repaired path should provide a reasonably low-latency route from source to sink. 5.3 Real-Time Communication Mechanism To support timely communication, we have developed a real-time communication ser- vice for directed diffusion based on RAP [60]. Recall from Section 4.3.1 that the RAP architecture defines a five layer network stack. Many of the layers in the RAP network stack can be handled natively by diffusion. We have extended diffusion to layers. The top layer of RAP, a query-event service which allows events to be registered and triggered is inherently provided by the data-centric, publish-subscribe nature of diffusion. Diffusion op- erates without a transport layer, so RAP?s Location-Addressed Protocol is unnecessary in diffusion. Diffusion also handles the responsibilities assigned to Geographic Forwarding in RAP since diffusion routing is based on attribute vectors. Although routes in diffusion are more costly to establish due to global flooding, no location information is required. RAP, in contrast, requires each node to know its own location requiring costly GPS hardware 77 or every node. The core component of RAP is Velocity Monotonic Scheduling (VMS), the distance-aware and deadline-aware scheduling policy. VMS prioritizes messages by comput- ing a velocity based on the packet?s distance to its destination and its temporal deadlines. Although packets are prioritized at the MAC and network levels in RAP, our implementa- tion only performs prioritization at the network layer. 5.3.1 DD-RAP We have implemented both static (SVM) and dynamic (DVM) velocity monotonic scheduling policies on top of diffusion. In addition, we have extended the protocol to compute priority without requiring each node to know location information. Instead we use time information to estimate distance. We denote our time-based scheduling policies SVM-t and DVM-t. Our real-time mechanism, DD-RAP, can use any of the four algorithms to compute priority: SVM, DVM, SVM-t, or DVM-t. If location information is known by each node, SVM or DVM may be used to prioritize packets in the same fashion as RAP. The sink adds its location and the deadline of the data flow to outgoing interest packets. For SVM, when the source receives an interest message, it extracts the latitude, longitude, and deadline from the packet and calculates the velocity according to Equation 4.1. In DVM, each node along the reinforced path updates the priority of the packet based on its own location and the elapsed time of the packet as defined in Equation 4.2. 5.3.2 SVM-t If location information is not available, a time-based approach is used to estimate the distance from source to sink. The time delay between source and sink can be estimated without introducing significant communication overhead since two-way message exchanges 78 are necessary for route creation in diffusion. Instead of appending location information to outgoing messages, the sink node adds timestamps to reinforcement packets. When reinforcement messages are received at the source, the time delay between when the sink sent the reinforcement and when the source received it is computed. We assume this time difference is proportional to the distance from source to sink and that links are symmetric (i.e., the time delay from sink to source and from source to sink is the same). We also assume time synchronization among the nodes. The primary disadvantage of using time to gauge distance is that congestion delay will erroneously increase the distance estimate. End- to-end delay will increase due to congestion, implying a greater distance between source and sink. To compensate for this, we store a running minimum time difference for use in priority calculation. The time difference used in computing priority is the smallest end-to-end delay recorded up to this point in time. As in RAP, the priority of data packets may be calculated in two ways: statically (SVM-t) or dynamically (DVM-t). In the static case, priority is computed at the source and is never modified by intermediate nodes. In DVM-t, intermediate nodes dynamically re-calculate the priority at each hop. In SVM-t, the priority is computed according to Equation 5.2. P = tsource ?tsinkT deadline (5.2) Note that tsink is the time the reinforcement packet was sent from the sink, tsource is the time the reinforcement message was received at the source, and Tdeadline is the deadline of the data flow in units of time (not a timestamp). Instead of a velocity, the time-based approach results in a priority value which is a ratio of times. 79 5.3.3 DVM-t In the dynamic case, the priority of a packet will be re-calculated at each hop. Data messages must be timestamped by the source so that intermediate nodes can calculate elapsed time for a given packet. To estimate the distance to the sink, the elapsed time is subtracted from the total end-to-end time found using the same procedure as SVM-t. The elapsed time is also subtracted from the deadline to gauge the deadline urgency of the message. DVM-t calculates priority using Equation 5.3. P = tsource ?tsink ?TelapsedT deadline ?Telapsed (5.3) As in SVM-t, tsink is the time the reinforcement packet was sent from the sink, tsource is the time the reinforcement message was received at the source, and Tdeadline is the deadline of the data flow in units of time. Telapsed is the time elapsed in sending the packet to the current hop and is computed as Telapsed = tnow ?tdata where tnow is the current time and tdata is the time the data packet was sent from the source. The numerators in Equations 5.2 and 5.3 correlate to distance awareness and the denominators encapsulate deadline awareness. The time difference tsource ? tsink gives an estimate of the time required for a packet to travel from source to sink. In DVM-t, the elapsed time Telapsed is subtracted from the end-to-end transmission time in order to estimate the time (and distance) from the current node to the sink. In each gradient refresh cycle, a new reinforcement packet will be received by the source. The source will update its values of tsource and tsink if the time difference is less than the previous difference. The difference tsource ? tsink will be the running minimum of all calculated time delays. Taking the minimum time difference 80 each cycle will reduce the effects of congestion on the time delay. Note that this approach assumes a static (non-mobile) network. Our scheme achieves the same basic goals as RAP, packet prioritization based on both distance and deadline, but does so without the strong localization requirement. Although our location-free design requires more communication overhead than RAP, the additional cost is minimal since timestamp information is piggybacked on the standard packets involved in two-phase directed diffusion path discovery. No additional packets are required, only a slightly increased packet size. This trade-off may be advantageous for applications in which location information is not available. While time synchronization is a non-trivial task, it can be accomplished without additional hardware and, therefore, may be more desirable than equipping each node with an expensive GPS receiver. RAP also assumes time synchronization in order to compute elapsed time in DVM. Thus, our time-based distance estimation scheme eliminates the strong location-knowledge requirement of RAP without adding significant overhead. 81 Chapter 6 Network Services Design 6.1 Design Principles In designing the sense and response architecture, we were guided by several principles. Our primary design goals were energy-efficiency, scalability, localization, distributability, real-time communication, and reactivity. In this section, we describe each of these design goals and explain their implications on our design decisions. 6.1.1 Energy-efficient Power consumption is of paramount importance in sensor networks since devices usually cannot easily be recharged. Consequently, our design incorporates features that promote energy-efficiency. The primary purpose of passive clustering and local flooding involved in DD-Repair is to save energy by minimizing communication. Communication is the primary energy expense in sensor networks, so reduced transmissions equate to saved energy. Passive clustering reduces the number of messages involved in route creation and local repair reduces the number of messages required for route maintenance. The design of DD-RAP also emphasizes energy-efficiency in that no new packets are required. Data transfer for setting up the real-time protocol is piggybacked on top of the standard two-way message exchange used by diffusion for route discovery/refresh. All three protocols work toward the goal of energy-efficiency. 82 6.1.2 Scalable Another general goal of our system and all sensor networks is scalability. Protocols should scale up to networks with large numbers of nodes. The reliance of standard diffusion on global flooding greatly limits the scalability of the protocol. Passive clustering and localized flooding both address the goal of scalability by reducing unnecessary transmissions. As the network size increases, these unneeded communications lead to network congestion. With sufficiently large networks, flooding-induced congestion can completely cripple the network. The flooding reduction mechanisms support the goal of scalability. 6.1.3 Localized One of the best properties of directed diffusion is its completely localized nature. We wanted to preserve this characteristic in the design of our protocols because of its inherent advantages. Localized protocols maintain information about one-hop neighbors. This sig- nificantly reduces the amount of state information that must be stored and hence increases the scalability of the algorithm. The principle of localization also tends to decrease the complexity of the algorithm, simplifying design and implementation. Since the protocols are deeply integrated into diffusion, they necessarily inherit many of the localized aspects of diffusion. Passive clustering exhibits this localized nature in that nodes choose their state based only upon the states of their one-hop neighbors. DD-Repair localizes its repair work to the area immediately surrounding the failed node. Thus, the advantages of localized interactions are preserved by our algorithms. 83 6.1.4 Distributed The distributed nature of sensor networks is one of their most challenging and powerful characteristics. The goal is to fully distribute the algorithms over all the nodes in the network. This is in contrast to typical algorithms which utilize the client-server model. Our protocols address this challenge by distributing tasks among all the nodes. This is particularly evident in passive clustering where a clustered structure is created dynamically according to the first declaration wins rule. Cluster formation is completely distributed. In the local repair protocol, any node may potentially detect and repair a breakage. In this sense, all nodes act as adaptation servers forming a completely distributed adaptation service. The design of our protocols emphasizes distributed computation. 6.1.5 Real-Time Timely communication is an important goal for many applications. Our extension of the RAP protocol to diffusion aims to achieve the goal of real-time communication. The prioritized queue gives preference to more urgent packets, helping them meet their deadlines. The real-time protocol also allows for hard deadlines by dropping packets when their deadline has been exceeded. This further reduces congestion and gives other packets a greater chance of meeting their deadlines. 6.1.6 Reactive Our final design goal is to develop reactive protocols and mechanisms. In general, reactive protocols are more suitable to sensor networks than proactive protocols. In energy- constrained systems, it makes little sense to continuously maintain routes if no data is 84 being transferred over those routes. This is especially relevant in sensor systems which infrequently detect events of interest. Passive clustering and the local repair protocol par- ticularly incorporate reactive features in their design. Passive clustering reactively creates clusters in response to and through the use of ongoing packet transmissions. The local repair algorithm responds to repair broken links by reactively repairing them. 6.2 Design Decisions 6.2.1 Clustering Mechanism The compatibility of passive clustering and diffusion was recognized by Handziski et. al [63] who first implemented passive clustering over directed diffusion. Passive clustering is especially well-suited to sensor networks due to its localized, reactive, and fully distributed nature. All state changes are made using local knowledge gained by overhearing neighboring nodes. Passive clustering?s reactive cluster formation is well-matched to diffusion?s reactive publish-subscribe model. Furthermore, the fully distributed nature of passive clustering maps nicely to any sensor network routing protocol, especially diffusion. No client/server message exchanges are needed. Passive clustering does not need periodic messages, but instead takes advantage of existing packets. Finally, the protocol is very resource-efficient regardless of the size or density of the network. 6.2.2 Repair Mechanism Break Detection While break detection was not the focus of our research, we considered several potential methods for detecting a broken link. One approach is to calculate an expected time for data 85 events to be received from each neighbor. The source may monitor its output and calculate an outgoing event rate for each gradient. It will then append this event rate to each data packet. Intermediate nodes need only to store the data rate of the packet and the time it was received. If the interval has been exceeded, a break will be detected. Another strategy for break detection is for intermediate nodes to calculate expected receive times based on incoming packet reception rates. Event intervals are calculated by monitoring the input from each neighbor at intermediate nodes. The advantage of this approach is that it is completely localized and distributed. Another possibility is to use physical layer metrics (e.g. signal strength, or SNR) to detect failing links. Since we did not have access to physical layer metrics, we used the constant event rate method. Given event rate Tr, we detected link breakage when a packet was not received in 1.5 ? Tr. More complex event interval determination algorithms should be investigated for production systems. Break Localization Break localization may be carried out differently depending on the break detection scheme used. If physical layer metrics are used to identify breakages, then our break local- ization scheme may be completely unnecessary. Our design assumes no access to physical layer information and no explicit network layer acknowledgments. Instead, our scheme uti- lizes an implicit timeout procedure to detect link failure. The advantage of this approach is that it may be broadly used regardless of physical or MAC layer differences. Since no phys- ical or MAC layer assumptions are made, our repair protocol will support a wide variety of architectures. 86 Local Gradient Repair In order to restrict the flooding required to find new paths during local gradient repair, we restrict packet forwarding in one of several ways. Flooding boundaries may be found using the clusters created through the passive clustering protocol, a simple n-hop flood, or the failed node ID. These mechanisms restrict the flooding of interest and exploratory data to a limited area. Another mechanism for local flooding is to append a hop radius to the message and drop it when the hop count reaches zero. Finally, flooding may be restricted to the one hop neighbors of the failed node, i.e., only nodes that have heard from the failed node will forward the interests and exploratory data. Our design separates the local flooding algorithm from the repair protocol. By decou- pling the local flooding mechanism from the repair protocol, we gain several benefits. First, from a software engineering perspective, the modularized software is easier to write, debug, and maintain. Secondly, our repair algorithm can be paired with any local flooding strat- egy. Thus, our design may easily be extended by more complex local flooding algorithms. Thirdly, the decoupling allows multiple local flooding algorithms to be used simultaneously. Multiple algorithms could be stacked on top of each other or used in different regions of the same network. For example, an algorithm could be adaptively chosen based on local conditions, e.g., network density or congestion. 6.2.3 Real-time Communication Mechanism The design of the real-time communication protocol is directly modeled after RAP. SVM and DVM scheduling policies from RAP have been ported to diffusion with no change 87 in design. The two time-based scheduling policies were inspired by RAP, but involved moderate redesign. Prioritized Queue The prioritized queue is the essential component of DD-RAP. We recognized three options for the implementation of a prioritized queue in the diffusion API. 1. Delay-Based: Set event delay times in proportion to priority levels. 2. Block-Based: Prioritize all events that have expired during the receiving period. 3. Timer-Based: Send the highest priority packet after the expiration of a recurring send timer. The delay-based scheme uses priority to compute the delay time associated with a particular packet. The delay time is inversely proportional to the priority. Hence, high priority packets have a lower delay while low priority packets are assigned a longer delay time. The second option is to prioritize blocks of packets that have expired timers. Diffusion switches from receiving and sending when there are no packets to receive. It then processes all expired timers, i.e., all packets in the event queue with expired send times. This block of packets is usually sent in the order it was received. Using the block-based approach, the block of expired packets will be sent in priority order. Finally, the timer-based approach uses a timer to repeatedly send one packet every ttimer milliseconds. When packets are received, they are added to a queue in priority order. In a separate thread, a recurring timer sends the highest priority packet each time the timer expires. The delay-based approach has one major drawback ? it delays low priority packets when there are no high priority packets to be sent. Hence, low priority packets are penalized 88 unnecessarily and experience greater latency as a result. The block-based scheme also presents significant difficulties. It prioritizes within very small blocks of packets because the network so frequently switches between send and receive mode. This leads to a very small granularity of prioritization which may be almost imperceptible to the application. The prioritization occurs at such a minute level that it provides no benefit whatsoever. Because of these problems, we have chosen to implement the timer-based algorithm. While this results in the addition of a timeout parameter, it is superior to the other two approaches since it effectively prioritizes packets without unnecessarily delaying low priority packets. We describe the implementation this approach in more detail in Section 7.3.3. SVM-t SVM-t, as described in Section 5.3.2, computes a priority based on the deadline as given by the application and the distance from source to sink as estimated by communication delay. Although several message exchanges occur in diffusion route creation, we choose to measure the time delay of the reinforcement message as it is sent from sink to source. There are several benefits of this choice. First, it conserves bandwidth by appending timestamps to non-flooded packets. If interests or exploratory data were used to measure the time delay, then the timestamps would have to be flooded to the entire network. In a dense network, congestion may result from flooding and thereby increase the end-to-end delay. Secondly, the use of reinforcements as timing messages simplified the implementation since only one type of packet needed to be processed differently. The use of reinforcement messages assumes symmetry of the end-to-end path. We assume the time required for reinforcement messages to travel from sink to source is the same as for exploratory data messages to travel from source to sink. This assumption is warranted since it symmetric links are already assumed by diffusion. 89 DVM-t DVM-t re-calculates the priority of a packet at each intermediate node based on the elapsed time (Section 5.3.3). As the most complicated case, DVM-t posed the most design options. The primary problem was in calculating the time to the sink (the numerator in Equation 5.3). We considered two approaches: stateful and stateless. In the stateful approach each intermediate node along the path from sink to source must maintain a time value for every data flow passing through it. The value may be computed by storing tnow ? tsink for each reinforcement message received. These values should be indexed by data flow. In the stateless approach, all the information needed to compute priority is stored within each data packet. The time to the sink is estimated by subtracting the elapsed time from the total end-to-end time, tnow ?tsink ?Telapsed. End-to-end time is computed using the same procedure as SVM-t, and elapsed time is measured with a timestamp on each outgoing data message. The stateful approach is more complex and requires greater storage requirements per node. However, it provides a better estimate of the time to the sink. The stateless algorithm is simpler to implement and more scalable but may be less accurate. We chose the stateless approach due to its simplicity and scalability. It provides sufficient accuracy, especially over longer durations. These issues may be further investigated in future work. 90 Chapter 7 Network Services Implementation 7.1 Clustering Mechanism We implemented passive clustering using the Filter API provided by ISI diffusion [12]. This required the creation of a new diffusion attribute and a program with two filters. The attribute was used to relay information about the passive clustering state of each node among neighboring nodes. The filters intercept packets and update state information based on the PC state of the previous hop. 7.1.1 Passive Clustering Attribute In order to communicate information about the passive clustering state of each node, we defined the class PCInfo t to encapsulate all the passive clustering status information including state, node ID, and the cluster information. The PCInfo t class is defined and explained below. class PCInfo_t \{ int nodeID; int state; int CH1; int CH2; timeval ts; void fromAttr(NRSimpleAttribute *attr) }; ? nodeID - The diffusion ID of the node 91 ? state - An integer representing the external passive clustering state of the node ? CH1 - The ID of the node?s primary cluster head ? CH2 - The ID of the node?s secondary cluster head; For CH and OR nodes, CH2=-1 ? ts - A timestamp corresponding to the last received message from the node ? void fromAttr(NRSimpleAttribute *attr) - Converts attribute attr to a PCInfo t object. To transport the PC information in diffusion packets, we pack the PCInfo t object into a PC attribute and push the attribute onto the attribute vector of the message. The PC attribute is defined as follows: #define PC_STATE_KEY 5000 NRSimpleAttributeFactory PCAttrFactory(PC_STATE_KEY, NRAttribute::BLOB_TYPE); Upon reception of packets with this attribute, nodes unpack the data into a PCInfo t object using its fromAttr() method. Nodes maintain a table containing PC information about each of their neighbors. The neighbor state table is updated on the reception of every flooded packet. The neighbor table is implemented in the class PCNodeList as shown below. The PCNodeList class stores information about the passive clustering state of neigh- boring nodes as an STL list of PCInfo t objects. The core functionality provided by the class includes adding/updating neighbors as well as counting the number of neighbors in a particular state. The PCNodeList class also has helper methods that analyze the neighbor list to determine appropriate state changes. 92 class PCNodeList{ list neighbors; /* Core methods */ void update(int nodeID, PCInfo_t pcInfo); int count(int state); /* Helper methods */ bool anyTwoCHsNotConnected(int &ch1, int &ch2); bool remoteDistGWexists(int ch1, int &ch2); }; ? update(int nodeID, PCInfo t pcInfo) - Adds/updates PC information of node nodeID with PC status pcInfo to the neighbor list. ? count(int state) - Returns the number of nodes in state state in the neighbor list. ? anyTwoCHsNotConnected(int &ch1, int &ch2) - Returns true if any pair of known CHs are not connected by a GW and false if all pairs are connected. If a disconnected pair exists, ch1 and ch2 are set to their IDs. ? remoteDistGWexists(int ch1, int &ch2) - Return true if a remote distributed gateway for cluster head ch1 exists. If true, ch2 is set to the cluster ID of the remote gateway. 7.1.2 Passive Clustering Filters The PCInfo t and PCNodeList classes are extensively in the implementation of the passive clustering filter program. Our implementation utilized two filters in one program to intercept messages before and after the gradient filter. In the pre-gradient filter, nodes update their internal state to CH READY or GW READY according to the state of their neighbors. If no other CHs have been overheard, a node will enter CH READY. Otherwise, 93 it will enter GW READY. Algorithm 1 summarizes the internal state update process which is executed when packets are received. Algorithm 1: Incoming Packet Processing for Passive Clustering Input: Message, Neighbor State Array Output: Internal State switch Message.State do case INITIAL if num(CH) = 0 then InternalState = CH READY else InternalState = GW READY break case CH if myID < Message.ID then InternalState = CH READY else InternalState = GW READY break case FULL GW InternalState = GW READY break case DIST GW InternalState = GW READY break case CH READY InternalState = GW READY break end In the post-gradient filter, the external state of the node is determined and added to the outgoing packet. Nodes in CH READY become CHs if they have not heard from any other CHs. Nodes in GW READY enter FULL GW, DIST GW, or OR depending on the number of CHs and GWs among their neighbors. Algorithm 2 concisely summarizes the outgoing packet processing. 94 Algorithm 2: Outgoing Packet Processing for Passive Clustering Input: Internal State, Neighbor State Array Output: External State if InternalState = CH READY then if num(CH) = 0 then ExternalState = CH else if InternalState = GW READY then InternalState = GW READY if InternalState = GW READY then if num(CH) > 1 then if Any two CHs are not connected by any known gateway then ExternalState = FULL GW else ExternalState = OR else if num(CH) = 1 then if A DIST GW has announced another CH other than my CH then ExternalState=DIST GW if No DIST GW exists for my CH then ExternalState = DIST GW else ExternalState = OR else ExternalState = CH If the external state has been determined to be OR, then the interest or exploratory data packet will be dropped. This is accomplished in the Filter API by simply not forwarding the message back to the diffusion core. Thus, only nodes in the state CH, DIST GW, and FULL GW are allowed to forward interests and exploratory data. Nodes in OR will receive and process packets, but not forward them. Note that Algorithm 2 and the gateway selection heuristic (Equation 5.1) represent different implementations of the same mechanism. Our implementation, although more complex, requires no parameter tuning for ? and ?. 95 7.2 Repair Mechanism The path repair mechanism, DD-Repair, is implemented using new attributes and the Filter API. DD-Repair is one program, but the local flooding mechanism is implemented separately. This decoupling allows any local flooding algorithm to be used in combination with the core repair protocol. In this section, we describe the additional attributes used by DD-Repair and explain the details of its implementation. 7.2.1 DD-Repair Attribute We define two new attributes that correspond to the new messages introduced by our protocol. The Repair Notification attribute marks a data packet as a repair notification. The Reconnect attribute is added to reconnect interests and reconnect exploratory data to differentiate them from standard route discovery packets created by diffusion. The new attributes are listed below. #define REPAIR_NOTIFICATION_KEY 4500 #define RECONNECT_KEY 4501 NRSimpleAttributeFactory RepairNotificationAttr( REPAIR_NOTIFICATION_KEY, NRAttribute::STRING_TYPE); NRSimpleAttributeFactory ReconnectAttr(RECONNECT_KEY, NRAttribute::INT32_TYPE); 7.2.2 DD-Repair Filters The majority of local repair is implemented in one program with two filters: a pre- gradient filter and a post-gradient filter. The pre-gradient filter handles the bulk of the work. It sets data timeouts after each data packet is received. If the data timeout expires 96 without receiving a new data packet, repair notifications are sent one-hop downstream, and a repair notification timer is set. If it expires and no repair notification is received, the node becomes the proxy sink and floods interest messages with a reconnect attribute set to the failed node ID. The post-gradient filter maintains a list of upstream neighbors and downstream neighbors in order to detect link breakages. This filter also drops reconnect exploratory data at the proxy sink and reconnect reinforcements at the proxy source. Interests and exploratory data are normally created by the diffusion routing (dr) object. To support reactive repair we added the following two new methods to the dr class: ? int reconnectPublish(NRAttrVec *attrs) ? int reconnectSubscribe(NRAttrVec *attrs) These methods allow us to send reconnect packets when a break is detected. The DD- Repair filter at the proxy sink invokes reconnectSubscribe() to send reconnect interests, and the proxy source invokes reconnectPublish() to transmit reconnect exploratory data in response to reconnect interests. The proxy sink sends a reconnect reinforcement mes- sage to the first neighbor that sent it a reconnect exploratory data message. Algorithm 3 summarizes the logic involved in the pre-gradient filter. 97 Algorithm 3: Pre-Gradient Processing for DD-Repair if Repair Notification then Update state as not proxy sink else if Data Message then Update receive data time Reset expected data timer else if Reconnect Interest then if On Original Data Path and Still Receiving Data then Become Proxy Source Flood Reconnect Exploratory Data else Forward Reconnect Interest else if Reconnect Exploratory Data then if ProxySink then if ! Received Reconnect Exploratory Data then Send Reconnect Reinforcement else if Received Reconnect Interests then Forward Reconnect Exploratory Data else Drop Reconnect Exploratory Data else Forward Message 7.2.3 Local Flooding DD-Repair makes no attempt to restrict the flooding of reconnect packets. To handle this task, we have written two additional programs. The node ID filter creates a list of neighbors from which a node has received any message. It drops flooded reconnect messages that contain a reconnect attribute with a failed node ID not in the list of known neighbors. This essentially restricts reconnect messages to the one-hop neighbors of the failed node. In a sufficiently dense network, a path back to the data flow may be found in this group of 98 nodes. In sparse networks, ID-based local flooding may restrict flooding to such a degree that repair is impossible. A second simple method to limit the flooding of reconnect messages is to limit the number of hops a packet may travel. The hop filter appends a hop attribute containing the maximum number of hops that the packet may travel. Packets from foreign hosts are dropped if the decremented hop count reaches zero. Otherwise, messages are forwarded with a decremented hop count. This approach works in sparse networks as long as a sufficiently large initial hop count is used. The main disadvantage of the hop-count algorithm is the difficulty in selecting an appropriate value for the maximum number of hops. In dense networks, a large hop count may result in local flooding which is very expensive. Perhaps in the future, the hop count could be adaptively set based on the connectedness of a node. Thus, highly connected nodes would use smaller hop counts (or the ID-based algorithm) and nodes with very few neighbors would use larger hop counts. 7.3 Real-Time Communication Mechanism Like the other network services, the real-time communication protocol, DD-RAP, was implemented in the diffusion API using attributes and filters. 7.3.1 DD-RAP Attributes We defined several new attributes for DD-RAP. These include attributes for deadline, priority, DVM, SVM-t, and DVM-t. SVM only needs a priority value so no explicit attribute is necessary. To utilize DD-RAP, applications simply have to add a deadline attribute 99 to their publication definition. The deadline is an integer representing the deadline in milliseconds. The definition of the deadline attribute is shown below. #define TIME_DEADLINE_KEY 7001 NRSimpleAttributeFactory TDeadlineAttr(TIME_DEADLINE_KEY, NRAttribute::INT32_TYPE); Several other attributes are necessary for the functioning of DD-RAP itself. We defined one attribute for each variant of DD-RAP. These attributes encapsulate the data that must be communicated by each version of the protocol. DD-RAP uses the state information in each version?s attribute to compute the priority and write it to the priority attribute. Thus, the priority attribute is used by all four versions of DD-RAP. These DD-RAP attribute definitions are shown below. #define PRIORITY_KEY 7002 #define DVM_KEY 7003 #define SVM_T_KEY 7004 #define DVM_T_KEY 7005 /* SVM, DVM, SVM-t, and DVM-t */ NRSimpleAttributeFactory PriorAttr(PRIORITY_KEY, NRAttribute::FLOAT32_TYPE); /* DVM */ NRSimpleAttributeFactory DVMAttr(DVM_KEY, N RAttribute::BLOB_TYPE); /* SVM-t */ NRSimpleAttributeFactory SVMtAttr(SVM_T_KEY, NRAttribute::BLOB_TYPE); /* DVM-t */ NRSimpleAttributeFactory DVMtAttr(DVM_T_KEY, NRAttribute::BLOB_TYPE); 100 As the simplest case, SVM requires only one floating point number to be calculated at the source and sent to the destination ? no state information is needed. The other three protocols (DVM, SVM-t, DVM-t) use the priority attribute to store the calculated priority but also require another attribute to communicate state information. The DVM attribute contains the time the data packet was sent from the source. The SVM-t attribute holds a structure with the two timestamps used to compute end-to-end delay. The DVM-t attribute contains three timestamps representing the end-to-end delay timestamps and the time the data packet was sent. The definitions of the structures for SVM-t and DVM-t are shown below. typedef struct RAP_SVM_T { EventTime ts_src_expdata; EventTime ts_snk_reinforcement; }RAPSvmT; typedef struct RAP_DVM_T { EventTime ts_src_expdata; EventTime ts_snk_reinforcement; EventTime ts_src_data; }RAPDvmT; 7.3.2 DD-RAP Filters Similar to passive clustering and the repair protocol, DD-RAP is implemented using a pre-gradient filter and a post-gradient filter. Each variant essentially performs the same three steps. We explain the differing details of each version in the following subsections. 1. Add DD-RAP information to packets at sink. 2. Extract DD-RAP information from packets at source. 101 3. Compute priority and append it to outgoing data packets. SVM In SVM, the sink adds its location information (latitude and longitude) to outgoing interest packets. The source extracts the location information and uses it along with the deadline supplied by the application to compute the priority according to Equation 4.1. This priority is stored in the priority attribute by the source and used to prioritize the data packet at each intermediate hop. DVM In DVM, the sink adds its location information to outgoing interest packets. The source extracts the location information and uses it along with the deadline supplied by the application to compute the priority according to Equation 4.2. The source also timestamps the packet so that intermediate nodes can compute the elapsed time. Upon receiving a data packet, intermediate nodes extract the location of the sink and the timestamp. They use this information along with their location to update the priority again using Equation 4.2. SVM-t In SVM-t, the source adds a timestamp to outgoing exploratory data and the sink adds a timestamp to outgoing reinforcement messages. When the source is ready to send data, it computes the difference of these two timestamps and divides it by the deadline to find the priority (Equation 5.2). The priority is stored in the priority attribute and used at each intermediate hop for queue prioritization. 102 DVM-t In DVM-t, the source adds a timestamp to outgoing exploratory data and the sink adds a timestamp to outgoing reinforcement messages. The source appends a timestamp to the data packet corresponding to the time it was sent. The time difference and deadline are used to compute the initial priority, which is written to the priority attribute. Intermediate nodes subtract the elapsed time from the end-to-end delay and deadline to update the priority at each hop (Equation 5.3). 7.3.3 Prioritized Queue We implemented the priority queue using the timer-based approach introduced in Sec- tion 6.2.3. The implementation involved changing the typical behavior of diffusion filters which forward messages from the receive thread. Instead, we added each received message to a priority queue. The queue was implemented as a linked list of PriorityQueueEvent objects as defined below. class PriorityQueueEvent{ public: Message *msg; handle h; double priority; PriorityQueueEvent *next; }; The queue itself has two member variables: a pointer to the head of the queue and a mutex for thread safety. Each data message received by the receive thread of the DD- RAP filter was added to the queue using the insert() method. The run method() of 103 the DD-RAP filter acted as the sending thread. It simply dequeued the first element in the priority queue (the highest priority message) and sent it. The mutex was necessary to lock the queue for insert and dequeue operations since they were performed by different threads. The definition of the PriorityQueue class is shown below. Besides the basic, insert and dequeue operations, we also wrote several utility methods isEmpty(), print(), and length(). class PriorityQueue { PriorityQueueEvent *head_; pthread_mutex_t mut; public: PriorityQueue(); void insert(Message *msg, int handle, double priority) ; PriorityQueueEvent * dequeue(); bool isEmpty(); void print(); int length(); }; After the highest priority packet has been sent, the send thread sleeps for ttimer mil- liseconds before sending another packet. The value of ttimer determines the granularity of prioritization and affects the maximum bandwidth of the network. Larger values result in greater amounts of prioritization at the cost of throughput. Smaller values of ttimer allow more packets to be sent, but also reduce the amount of prioritization possible. An appropriate value should be set based on the bandwidth needed for the application. 104 Chapter 8 Performance Evaluation 8.1 Clustering Mechanism We evaluated our implementation of the passive clustering protocol using two metrics: flooding efficiency and delivery effectiveness. The primary purpose of passive clustering is to increase the efficiency of flooding. Our performance results show significant improvement over standard diffusion in this respect. A secondary benefit of passive clustering is increased delivery effectiveness. The packet delivery rates of diffusion may be adversely affected by the congestion caused by flooding. Since passive clustering reduces flooding-induced congestion, delivery rates can be improved by the use of passive clustering. In this section, we explain the experimental setup used to evaluate passive clustering and then describe its performance with respect to the two metrics. 8.1.1 Experiment Setup To test the performance of passive clustering, we generated topologies with n nodes randomly dispersed over a 100 m x 100 m field. The nodes had a transmission radius r which determined the connectedness of the topology. The number of nodes n ? {25, 50, 75, 100, 150, 200}. The value of the transmission radius r ? {10, 20, 30, 40} m. We simulated directed diffusion and passive clustering for 600 seconds on the same random topology for each of the 24 combinations of n and r. A constant bit rate sender application sent one 1024 B packet each second to one receiver application for the entirety of the simulation time. 105 8.1.2 Flooding Efficiency Primarily, passive clustering provides improved flooding performance by reducing the number of redundant transmissions. To evaluate flooding efficiency, we measured the total number of interest and exploratory data packets transmitted during the simulation. The total number of interest packets transmitted during the simulation is plotted against the number of nodes (n) for varying values r in Figure 8.1. Passive clustering provided a significant improvement over standard diffusion in every case. Note that for r = 10 (Figures 8.1(a) and 8.2(a)), none of the topologies were connected except for n = 200. Likewise in Figures 8.1(b) and 8.2(b) the topology generated with n = 25 was not connected. Similarly, the exploratory data packets also show significant reduction because of pas- sive clustering. Figure 8.2 illustrates the improvement in flooding efficiency gained by passive clustering. A more complete picture can be seen when plotted in three dimensions. Figure 8.3 shows the number of interest packets transmitted versus n and r. Again, notice that passive clustering is always more efficient than standard diffusion. Figure 8.4 shows the exploratory data messages plotted against n and r. Passive clustering provides dramatically increased flooding efficiency on the order of 50% fewer flooded packets over all network sizes and transmission radii. On average, the number of interest packets was reduced by 55% and the number of exploratory data mes- sages was reduced by 59%. This flooding reduction directly correlates to energy savings since fewer transmissions mean less power is consumed. Figure 8.5 shows the average flooding 106 (a) r = 10 (b) r = 20 (c) r = 30 (d) r = 40 Figure 8.1: Number of interest messages versus number of nodes for transmission radius (r) reduction percentage of interests and exploratory data for each network size (n). Reduc- tion is computed as NDD?NPCNDD where NDD is the number of packets required for directed difffusion and NPC is the number of packets transmitted by passive clustering. 8.1.3 Delivery Effectiveness A secondary benefit of passive clustering is increased delivery effectiveness in large, highly-connected networks. To measure delivery rates, we computed the delivery ratio, i.e., 107 (a) r = 10 (b) r = 20 (c) r = 30 (d) r = 40 Figure 8.2: Number of exploratory data messages versus number of nodes for transmission radius (r) the number of packets received versus the number of packets sent. We included a sequen- tially increasing number in each packet and compared the sequence numbers received to the last number sent. This was necessary because diffusion delivers duplicate data packets, making it impossible to compare the raw number of data packets sent and received. The delivery ratio for varying numbers of nodes n at each value of r is shown in Figure 8.6. Generally, passive clustering was at least equivalent to diffusion and often performed bet- ter than diffusion. A performance advantage was present for highly-connected topologies 108 Figure 8.3: Number of interest messages versus number of nodes (n) and transmission radius (r) because of the congestion. The effects of flooding-induced congestion are more prevalent in highly dense networks. Since passive clustering reduces the number of flooded packets, it reduces congestion and allows more data packets to be delivered. For topologies with fewer than 100 nodes, the difference in performance between diffusion and passive cluster- ing was not significantly different. However, in very dense networks (n ? {150,200} and r ? {30,40}) passive clustering performed appreciably better. The performance difference was 5-10%. Passive clustering returned delivery ratios on par with standard diffusion and incremen- tally improved delivery rates in dense networks. Thus passive clustering provides significant 109 Figure 8.4: Number of exploratory data messages versus number of nodes (n) and trans- mission radius (r) advantages in terms of both flooding efficiency and delivery effectiveness. These results im- ply greater potential scalability of the network and greater robustness of performance. Our results corroborate with previous work with PC over diffusion [63] in term of improved flooding efficiency and delivery effectiveness. 8.2 Repair Mechanism We evaluated the performance of DD-Repair by measuring the delivery ratio for data packets and the overhead associated with the local flooding. Since diffusion has no reactive repair mechanism, our addition provided significant gains in terms of delivery effectiveness. The cost of repair was generally low given the restricted flooding strategies we employed. 110 Figure 8.5: Average percent flooding reduction for each network size (n) We explain the experimental scenario and discuss the performance of DD-Repair in terms of delivery effectiveness, repair time, and overhead. 8.2.1 Experiment Setup To test DD-Repair we used a constant bit rate application that sent one 1024 B packet each second to one receiver application. We used topologies with n nodes where n ? {16, 64, 100}. Each value of n was tested on topologies with an average number of neighbors d where d ? {4,6}. We measured the delivery ratio (number of data packets received/number data packets sent) over a period of 60 seconds. This time period was chosen to correspond with one gradient refresh cycle. Half way through this interval (t = 30 seconds), we killed all processes running one intermediate node in the data flow. Since data packets were 111 (a) r = 10 (b) r = 20 (c) r = 30 (d) r = 40 Figure 8.6: Delivery ratio versus number of nodes for transmission radius (r) sent at a rate of one per second, the delivery ratio directly measures the time required to repair the broken link. In the case of diffusion, no packets were delivered after the breakage since its repair mechanism does not execute until the end of the gradient refresh cycle. We also measured the network traffic involved in each run of the simulation. We were specifically interested in the flooded packets (interests and exploratory data) since they are most relevant to gradient repair. 112 8.2.2 Repair Effectiveness To gauge the effectiveness of repair, we measured the delivery effectiveness in the face of node failure. We killed one intermediate node on a one-to-one data flow and calculated the number of data packets dropped during route repair. Figure 8.7 shows the delivery ratio for each repair/flooding algorithm. Standard directed diffusion corresponds to periodic repair/global flooding (PR-GF). Reactive repair (RR) means that the protocols use the break detection mechanism. The possible strategies for flooding include global flooding (GF), hop-based (Hop), and ID-based (ID). Figure 8.7: Delivery ratio for each repair/flooding algorithm As expected, the reactive repair strategies performed significantly better than the pe- riodic approach of standard diffusion. The global flooding and hop-based reactive repair were able to successfully repair the path and, as a result, were both able to deliver about 113 95% of the packets. They have similar delivery ratios because they both found a node on the original data path still receiving data (a proxy source). The ID-based flooding algo- rithm prevented successful path repair on degree 4 topologies because it overly restricted the flooding of reconnect packets, and thus, was unable to find a proxy source. It returned delivery ratios similar to periodic repair since it reverted back into that mode of repair when reactive repair failed. In the degree 6 topologies, the ID-based algorithm was able to reactively repair the path since the failed node had more neighbors and more paths to find potential proxy sources. As far as delivery effectiveness is concerned, global flooding is the safest option since it will always find a path to the data flow. Its flooding cost, however, is most expensive. The hop-based approach trades repair probability for flooding overhead. The overhead costs of each algorithm are discussed in Section 8.2.4. 8.2.3 Repair Latency The latency of reactive repair is directly connected to the repair effectiveness. We calculated the amount of time required to successfully repair the path after intermediate node failure. The latency of repair for each repair/flooding algorithm is shown in Figure 8.8. Not surprisingly, the reactive protocols performed much better than periodic repair. They were able to repair the broken path in about 2-3 seconds after the breakage. The only exception to this was in the case of ID-based flooding in the degree 4 topologies, where the reactive repair was unsuccessful due to the low level of connectivity. In these scenarios, the network was repaired with the periodic repair mechanism. For the most part, however, 114 Figure 8.8: Latentcy of repair for each repair/flooding algorithm reactive repair was able to decrease the repair time by a factor of 10, from around 30 seconds to about 3 seconds. 8.2.4 Packet Overhead To understand the cost associated with the local flooding schemes, we measured the number of flooded packets transmitted during the simulation. The primary overhead of the repair algorithm is the additional reconnect interests and exploratory data. Figure 8.9 shows the total interests and exploratory data for each topology. To understand the overhead relative to standard diffusion, Figure 8.10 shows the over- head of each repair algorithm normalized with respected to diffusion. Figure 8.10 represents 115 Figure 8.9: Overhead (Interests+Exploratory Data) for each repair/flooding algorithm the number of flooded packets sent by each protocol for each flooded packet sent by dif- fusion (PR-GF) during one 60-second gradient refresh interval. Notice that the reconnect interests and exploratory data add less than 10% overhead compared to diffusion. With this additional overhead, significantly more data packets can be delivered. To show the relationship between overhead and delivery ratio, we computed the ratio of overhead packets to data packets received for each algorithm. This is shown in Figure 8.11. We divided the total number of interests and exploratory data by the number of data messages which were successfully delivered to the sink. Smaller bars represent lower delivery cost for data messages. Note that all three reactive repair algorithms outperform diffusion. The ID-based local flooding mechanism sometimes had the same cost as diffusion because it was unable to repair the path for low density topologies. Hop-based local flooding was 116 Figure 8.10: Normalized Overhead (Overhead Packets/Diffusion Overhead Packets) for each repair/flooding algorithm the best overall performer in that it always repaired the path with relatively few overhead packets. 8.3 Real-Time Communication Mechanism To evaluate the effectiveness of DD-RAP, we measured the on-time delivery ratio of packets received at the sinks. DD-RAP shows significant improvement over standard diffu- sion in terms of timely delivery during congestion. In this section, we describe the experi- mental setup and compare the on-time delivery rates for diffusion and DD-RAP. 117 Figure 8.11: Overhead Packets/Data Packet for each repair/flooding algorithm 8.3.1 Experiment Setup We tested DD-RAP using the topology shown in Figure 8.12. We created two data flows that shared the same five bottleneck nodes. The bottleneck topology was chosen in order to exaggerate the effects of congestion. The two sources ran a constant bit rate application which generated one 1024 byte packet every T milliseconds where T = t ? j where t ? {25,50,75,100,125} and j ? [?0.2 ? t,0.2 ? t]. We introduced the jitter term j in order to create random queue lengths. Source 1 sent packets with a deadline of 115 milliseconds to Sink 1 while Source 2 gave packets a deadline of 165 milliseconds to reach Sink 2. Since the distance between sources and sinks was the same for both flows, the deadline differentiated the high priority flow (115 msec) from the low priority flow (165 msec). 118 Figure 8.12: DD-RAP network topology In order to make an equivalent comparison, we wrote a filter to delay standard diffusion packets the same amount as the DD-RAP filter delayed its packets. The delay filter was identical to the DD-RAP filter except that a priority of 0.0 was assigned to all packets, thus enforcing a FIFO ordering of the queue. The ttimer parameter of the priority queue was set to 50 milliseconds for all the experiments. We evaluated DD-RAP at five different source data rates. The source data rate is the rate at which each source generated data packets. We denote this parameter as r where r ? {8,10,14,20,41} KB/s. These r values correspond to sending one 1024 byte packet every t milliseconds where t ? {25,50,75,100,125}. We ran each experiment 30 times to gain statistical confidence in the results. We report the average of the 30 runs along with the standard error. 8.3.2 On-Time Delivery Ratio The main performance metric relevant to a real-time communication protocol is the on-time delivery ratio. This metric represents the ratio of packets that are received before 119 their deadlines to the total number of packets received by the sink. This metric measures the effectiveness of the prioritization in helping packets meet their deadlines. SVM DD-RAP performed significantly better than standard directed diffusion for every tested source data rate. The performance difference was small at the lowest and high- est data rates but very appreciable in between these extremes. At low data generation rates there is little congestion, so neither flow has trouble meeting its deadline. At the highest data rate, congestion delays are so great that the packets miss their deadlines despite pri- oritization. In the middle region, congestion exists, but does not dominate. As Figures 8.13 and 8.14 illustrate, DD-RAP (SVM) delivers significantly more packets on time in this middle region (10-20 Kb/s) than diffusion. Figure 8.13 shows the on-time delivery ratio for Flow 1, the high priority flow, and Figure 8.14 shows the on-time delivery ratio for Flow 2 which has a lower priority. The error bars in the plots show the standard error over the 30 independent runs. Figure 8.15 shows the percent improvement of SVM over directed diffusion with respect to on-time delivery ratio. Notice that the improvement for Flow 1 decreased as the data rate increased, while that of Flow 2 remained near 60%. This occured because Flow 1 had a smaller deadline, and thus DD-RAP had little flexibility in delivering these packets. As the congestion delay increased, the possibility of delivering low deadline packets decreased despite the efforts of DD-RAP. 120 Figure 8.13: On-time delivery ratios of Flow 1 (high priority) versus data rate (r) for SVM and DVM DVM Like SVM, DVM consistently outperformed directed diffusion at every data rate (Fig- ures 8.13 and 8.14 ). The performance difference also followed a similar trend: DVM performed best relative to diffusion in the middle region (10-20 KB/s). Notice that DVM was able to deliver 80-95% of the packets for both flows over this range of data rates. This occurs for the same reasons as it did in SVM. Congestion delays exist but are not impossible to overcome for these data rates. 121 Figure 8.14: On-time delivery ratios of Flow 2 (low priority) versus data rate (r) for SVM and DVM Figure 8.16 shows the relative improvement of DVM over standard diffusion. DVM consistently performed 70-100% better than diffusion. This means that DVM delivered almost twice as many packets on time. Not surprisingly, DVM returned the best performance. Although their performance was similar on the lowest two data rates, DVM significantly outperformed SVM for the faster rates. The reason for this behavior is that DVM dynamically updates the priority at each hop, thus adaptively calculating priority. DVM gives packets that have experienced longer delays along the way a chance to change their priority and compensate for lost time. Conversely, packets ahead of schedule yield for more urgent packets. In SVM, priorities are set at the source and never modified. As a result, all Flow 1 packets receive the same 122 Figure 8.15: Percent improvement of on-time delivery ratio for SVM preferential treatment by intermediate nodes. DVM gives higher preference to packets that have been delayed more. Packets in Flow 1 will be given different priorities depending on their elapsed time. This dynamic adaptation allows DVM to deliver more packets on time. 123 Figure 8.16: Percent improvement of on-time delivery ratio for DVM 124 Chapter 9 Conclusions and Future Work We have proposed, implemented, and evaluated three network services for directed diffusion. These network services improve and augment the standard directed diffusion protocol. They improve flooding efficiency, provide localized route repair, and support real-time packet delivery. The primary contributions of our work are outlined below. ? Implementation of passive clustering for directed diffusion in the ISI filter API ? Design and implementation of a route repair mechanism for directed diffusion which supports localized and reactive route repair ? Extension of RAP to directed diffusion and the elimination of its location-knowledge requirement These three network services support the overall goals of sensor networks. Flooding efficiency and localized repair promote energy conservation by reducing unneeded trans- missions. Our results demonstrated that passive clustering can decrease the number of flooded packets by a factor of two or more. The reactive route repair mechanism improves the robustness of the network by efficiently adapting to node failure. The repair protocol delivers a significantly greater number of packets than diffusion. Both passive clustering and DD-Repair improve the scalability of the sensor network since they reduce the cost and scope of flooding. The real-time protocol, DD-RAP, gives application developers greater 125 control over time-critical communication, easing application development. All three net- work services shield high level applications from the complexities of the underlying sensor network. There are many possibilities for future research in the area of network services for diffusion. One of the most interesting directions is to consider the possible interactions among the services. For example, the repair protocol could use the clusters created by passive clustering to limit the flooding of repair packets. In this way, repair packets would be restricted to the clusters adjacent to the failed node. Another possible interaction is between DD-Repair and DD-RAP. If DD-RAP is extended to include a congestion detection algorithm, DD-Repair could be used to find routes around congested regions. Repair would be performed proactively in order to improve the throughput of slow links with the same mechanism used to repair node failure. Finally, DD-RAP and passive clustering could interact with each other to compose better clusters. If congestion data is maintained at each node by DD-RAP, then it could be used to influence the creation of passive clusters. A node prone to congestion might refrain from becoming a cluster head in order to reduce its workload. Clearly, the interaction among the services could be very beneficial for the network. We also hope to investigate other efficient flooding protocols besides passive clustering. The probabilistic and counter-based schemes seem particularly well-suited to the localized nature of diffusion since they do not assume 2-hop neighbor information. If location is known, the location-based efficient flooding approach may be useful. The flooding with self-pruning algorithm would also be easy to implement over diffusion. 126 A very interesting future research area is the application of DD-Repair to mobile net- works. Directed diffusion has traditionally been limited to static networks. The periodic repair of standard diffusion cannot adapt broken routes caused by mobility. The reactive repair provided by DD-Repair may overcome this issue by quickly repairing paths broken by node mobility. Another possible improvement is to eliminate the global time synchronization require- ment of DD-RAP. This might be accomplished by using round-trip times to estimate dis- tance instead of one-way time delays. Since diffusion utilizes a two-way route setup process this could easily be implemented. More advanced time synchronization protocols could also be incorporated into DD-RAP to provide more accuracy. We also plan to evaluate the SVM-t and DVM-t protocols in the future. Our three network services provide significant enhancements to directed diffusion in several respects. We have increased the flooding efficiency of diffusion by augmenting it with passive clustering. The local repair algorithm improves the robustness of diffusion in the face of node failure without excessive flooding. We have enhanced diffusion by adding a distance and deadline-aware real-time communication protocol. By leveraging the strengths of diffusion and minimizing its weaknesses the network services significantly improve the utility of the routing protocol. 127 Bibliography [1] Mark Weiser. The computer for the twenty-first century. Scientific American, pages 94?10, September 1991. [2] David Tennenhouse. Proactive computing. Commun. ACM, 43(5):43?50, 2000. [3] B. Warneke, M. Last, B. Liebowitz, and K.S.J. Pister. Smart dust: communicating with a cubic-millimeter computer. Computer, 34(1):44?51, January 2001. [4] Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David E. Culler, and Kristofer S. J. Pister. System architecture directions for networked sensors. In Architectural Support for Programming Languages and Operating Systems, pages 93?104, 2000. [5] K. Mani Chandy. Sense and respond systems. In 31st Annual International Conference of the Association of System Performance Professionals, December 2005. [6] Chalermek Intanagonwiwat, Ramesh Govindan, and Deborah Estrin. Directed diffu- sion: a scalable and robust communication paradigm for sensor networks. In Mobile Computing and Networking, pages 56?67, 2000. [7] John S. Heidemann, Fabio Silva, Chalermek Intanagonwiwat, Ramesh Govindan, Deb- orah Estrin, and Deepak Ganesan. Building efficient wireless sensor networks with low-level naming. In Symposium on Operating Systems Principles, pages 146?159, 2001. [8] J. Postel. Internet Protocol. RFC 791 (Standard), September 1981. Updated by RFC 1349. [9] David B Johnson and David A Maltz. Dynamic source routing in ad hoc wireless networks. In Imielinski and Korth, editors, Mobile Computing, volume 353. Kluwer Academic Publishers, 1996. [10] C. Perkins. Ad hoc on demand distance vector (aodv) routing, 1997. [11] D. Estrin J. Heidemann C. Intanagonwiwat, R. Govindan and F. Silva. Directed diffu- sion for wireless sensor networking. Networking, IEEE/ACM Transactions on, 11(1):2? 16, Feb. 2003. [12] F. Silva, J. Heidemann, and R. Govindan. Network routing application programmer?s interface, 2002. 128 [13] Alvin Lim. Distributed services for information dissemination in self-organized sensor networks. Journal of Franklin Institute, 338:707?727, 2001. [14] Xuan Yu. Implementation of lookup service protocols for self-organizing sensor net- works. Master?s thesis, Auburn University, 2001. [15] Qiao Shen. Distributed composition service for self-organizing sensor networks. Mas- ter?s thesis, Auburn University, 2002. [16] Udadyan Naik. Implementation of distributed composition service for self-organizing sensor networks. Master?s thesis, Auburn University, 2005. [17] Ye Wang. A dynamic adaptation service framework in self-organizing sensor networks. Master?s thesis, Auburn University, 2002. [18] Mark Ivester. Interactive and extensible runtime framework for execution and moni- toring of sensor network services. Master?s thesis, Auburn University, 2005. [19] Taek Jin Kwon, M. Gerla, V.K. Varma, M. Barton, and T.R. Hsing. Efficient flooding with passive clustering-an overhead-free selective forward mechanism for ad hoc/sensor networks. Proceedings of the IEEE, 91(8):1210?1220, Aug. 2003. [20] Yungjung Yi, M. Gerla, and Taek Jin Kwon. Efficient flooding in ad hoc networks: a comparative performance study. In Communications, 2003. ICC ?03. IEEE Interna- tional Conference on, volume 2, pages 1059?1063vol.2, 11-15 May 2003. [21] Brad Williams and Tracy Camp. Comparison of broadcasting techniques for mobile ad hoc networks. In MobiHoc ?02: Proceedings of the 3rd ACM international symposium on Mobile ad hoc networking & computing, pages 194?205, New York, NY, USA, 2002. ACM Press. [22] Sze-Yao Ni, Yu-Chee Tseng, Yuh-Shyan Chen, and Jang-Ping Sheu. The broadcast storm problem in a mobile ad hoc network. In MobiCom ?99: Proceedings of the 5th annual ACM/IEEE international conference on Mobile computing and networking, pages 151?162, New York, NY, USA, 1999. ACM Press. [23] D. Aron and Sandeep K. S. Gupta. Analytical comparison of local and end-to-end error recovery in reactive routing protocols for mobile ad hoc networks. In MSWIM ?00: Proceedings of the 3rd ACM international workshop on Modeling, analysis and simulation of wireless and mobile systems, pages 69?76, New York, NY, USA, 2000. ACM Press. [24] Nada Hashmi, Dan Myung, Mark Gaynor, and Steve Moulton. A sensor-based, web service-enabled, emergency medical response system. In EESR ?05: Proceedings of the 2005 workshop on End-to-end, sense-and-respond systems, applications and services, pages 25?29, Berkeley, CA, USA, 2005. USENIX Association. 129 [25] Mitchell A. Cohen, Jakka Sairamesh, and Mao Chen. Reducing business surprises through proactive, real-time sensing and alert management. In EESR ?05: Proceed- ings of the 2005 workshop on End-to-end, sense-and-respond systems, applications and services, pages 43?48, Berkeley, CA, USA, 2005. USENIX Association. [26] S. Kapoor, K. Bhattacharya, S. Buckley, P. Chowdhary, M. Ettl, K. Katircioglu, E. Mauch, and L. Phillips. A technical framework for sense-and-respond business management. IBM Syst. J., 44(1):5?24, 2005. [27] S. Haeckel. Adaptive Enterprise: Creating and Leading Sense-and-Respond Organiza- tions. Harvard Business School Press, Cambridge, MA, 1999. [28] Michael Zink, David Westbrook, Sherief Abdallah, Bryan Horling, Vijay Lakamraju, Eric Lyons, Victoria Manfredi, Jim Kurose, and Kurt Hondl. Meteorological com- mand and control: an end-to-end architecture for a hazardous weather detection sensor network. In EESR ?05: Proceedings of the 2005 workshop on End-to-end, sense-and- respond systems, applications and services, pages 37?42, Berkeley, CA, USA, 2005. USENIX Association. [29] A.I. Nakamura F. Gonzelez C. Meinig, S.E. Stalin and H.G. Milburn. Technology developments in real-time tsunami measuring, monitoring and forecasting. In Oceans 2005 MTS/IEEE, Washington, D.C., September 2005. [30] A.I. Nakamura H.B. Milburn C.Meinig, S.E. Stalin. Real-time deep-ocean tsunami measuring, monitoring, and reporting system: The noaa dart ii description and disclo- sure. Technical report, NOAA, 2005. [31] E.N. Bernard C. Meifg M. Eble H.O. Mofjeld Gonzelez, F.I. and S. Stalin. The nthmp tsunameter network. National Hazards, 35(1):25?39, 2005. [32] Donna Casey. The thames barrier: Flood defence for london. Website, http://www.environment-agency.gov.uk /regions/thames/323150/335688/341764/, 2006. [33] Nova: Sinking city of venice. Website, http://www.pbs.org/wgbh/ nova/venice/gates.htm, October 2002. [34] Josh McHugh. The lost city of venice. Wired, 11(8), August 2003. [35] Hoang H. Tran Sammarco, Paulo and Chiang C. Mei. Subharmonic resonance of venice gates in waves. Journal of Fluid Mechanics, 349, 1997. [36] Lee E. Harris. Combined recreational amenities and coastal erosion protection using submerged breakwaters for shoreline stabilization. Technical report, Florida Instituteof Technology, September 2005. 130 [37] Lee Harris. Breakwater wave attenuation. [38] Yu-Chee Tseng, Sze-Yao Ni, and En-Yu Shih. Adaptive approaches to relieving broad- cast storms in a wireless multihop mobile ad hoc network. IEEE Transactions on Computers, 52(5):545?557, 2003. [39] L. Orecchia, A. Panconesi, C. Petrioli, and A. Vitaletti. Localized techniques for broadcasting in wireless sensor networks. In DIALM-POMC ?04: Proceedings of the 2004 joint workshop on Foundations of mobile computing, pages 41?51, New York, NY, USA, 2004. ACM Press. [40] Ya Xu, John S. Heidemann, and Deborah Estrin. Geography-informed energy conser- vation for ad hoc routing. In Mobile Computing and Networking, pages 70?84, 2001. [41] Hyojun Lim and Chongkwon Kim. Multicast tree construction and flooding in wireless ad hoc networks. In MSWIM ?00: Proceedings of the 3rd ACM international workshop on Modeling, analysis and simulation of wireless and mobile systems, pages 61?68, New York, NY, USA, 2000. ACM Press. [42] Wei Peng and Xi-Cheng Lu. On the reduction of broadcast redundancy in mobile ad hoc networks. In MobiHoc ?00: Proceedings of the 1st ACM international symposium on Mobile ad hoc networking & computing, pages 129?130, Piscataway, NJ, USA, 2000. IEEE Press. [43] Amir Qayyum, Laurent Viennot, and Anis Laouiti. Multipoint relaying: An efficient technique for flooding in mobile wireless networks. Technical Report Research Report RR-3898, INRIA, February 2000. [44] Benjie Chen, Kyle Jamieson, Hari Balakrishnan, and Robert Morris. Span: An energy- efficient coordination algorithm for topology maintenance in ad hoc wireless networks. In Mobile Computing and Networking, pages 85?96, 2001. [45] Scalable link-state internet routing. In ICNP ?98: Proceedings of the Sixth Interna- tional Conference on Network Protocols, page 52, Washington, DC, USA, 1998. IEEE Computer Society. [46] R. G. Ogier et al. Topology dissemination based on reverse-path forwarding (tbrpf). http://www.sri.com/esd/projects/tbrpf/docs/draft-ietf-manet-tbrpf-11.txt. Online. [47] A. Ephremides, J.E. Wieselthier, and D.J. Baker. A design concept for reliable mobile radio networks with frequency hopping signaling. Proceedings of the IEEE, 75(1):56?73, 1987. [48] Chunhung Richard Lin and Mario Gerla. Adaptive clustering for mobile wireless net- works. IEEE Journal of Selected Areas in Communications, 15(7):1265?1275, 1997. 131 [49] K. Mase, Y. Wada, N. Mori, K. Nakano, M. Sengoku, and S. Shinoda. Flooding schemes for a universal ad hoc network. In Industrial Electronics Society, 2000. IECON 2000. 26th Annual Confjerence of the IEEE, volume 2, pages 1129?1134, Nagoya, Japan, 2000. [50] M. Gerla, T. Kwon, and G. Pei. On demand routing in large ad hoc wireless networks with passive clustering, 2000. [51] David B. Johnson Jorjeta Jetcheva. Adaptive demand-driven multicast routing in multi-hop wireless ad hoc networks. In Proceedings of the Second Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc 2001), pages 33 ? 44. ACM, ACM, Oct 2001. [52] D. Tian and N.D. Georganas. Energy efficient routing with guaranteed delivery in wireless sensor networks. IEEE Wireless Communications and Networking, 3:1923? 1929, 2003. [53] C. Gui and P. Mohapatra. A self-healing and optimizing routing technique for ad hoc networks, 2003. [54] George Aggelou and Rahim Tafazolli. RDMAR: A bandwidth-efficient routing protocol for mobile ad hoc networks. In WOWMOM, pages 26?33, 1999. [55] C. Toh. Associativity-based routing for ad-hoc mobile networks. [56] E. Royer and C. Toh. A review of current routing protocols for ad-hoc mobile wireless networks, 1999. [57] Vincent D. Park and M. Scott Corson. A highly adaptive distributed routing algorithm for mobile wireless networks. In INFOCOM ?97: Proceedings of the INFOCOM ?97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications So- cieties. Driving the Information Revolution, page 1405, Washington, DC, USA, 1997. IEEE Computer Society. [58] M. Pan, Sheng-Yan Chuang, and Sheng-De Wang. Local repair mechanisms for on- demand routing in mobile ad hoc networks. In Dependable Computing, 2005. Proceed- ings. 11th Pacific Rim International Symposium on, 2005. [59] Bu-Sung Lee Boon-Chong Seet Chuan-Heng Foh and Lijuan Zhu Genping Liu, Kai- Juan Wong. Patch: A novel local recovery mechanism for mobile ad-hoc networks. [60] C. Lu, B. Blum, T. Abdelzaher, J. Stankovic, and T. He. Rap: A real-time communi- cation architecture for large-scale wireless sensor networks, 2002. [61] Tian He, J.A. Stankovic, Chenyang Lu, and T. Abdelzaher. Speed: a stateless protocol for real-time communication in sensor networks. In Distributed Computing Systems, 2003. Proceedings. 23rd International Conference on, pages 46?55, 19-22 May 2003. 132 [62] Y. Chen, A. Liestman, and J. Liu. Clustering algorithms for ad hoc wireless networks, 2004. [63] Vlado Handziski, Andreas Koepke, Holger Karl, Christian Frank, and Witold Dry- tkiewicz. Improving the energy efficiency of directed diffusion using passive clustering. In EWSN 2004, volume 2920 of LNCS, pages 172?187, 2004. 133