Scalable, Self-Healing, and Real-Time Network Services for Directed Diffusion Except where reference is made to the work of others, the work described in this dissertation is my own or was done in collaboration with my advisory committee. This dissertation does not include proprietary or classi ed information. Kenan L. Casey Certi cate of Approval: Min-Te Sun Assistant Professor Computer Science and Software Engineering Alvin S. Lim, Chair Associate Professor Computer Science and Software Engineering David Umphress Associate Professor Computer Science and Software Engineering Yu Wang Assistant Professor Computer Science and Software Engineering George Flowers Interim Dean Graduate School Scalable, Self-Healing, and Real-Time Network Services for Directed Diffusion Kenan L. Casey A Dissertation Submitted to the Graduate Faculty of Auburn University in Partial Ful llment of the Requirements for the Degree of Doctor of Philosophy Auburn, Alabama August 9, 2008 Scalable, Self-Healing, and Real-Time Network Services for Directed Diffusion Kenan L. Casey Permission is granted to Auburn University to make copies of this dissertation at its discretion, upon the request of individuals or institutions and at their expense. The author reserves all publication rights. Signature of Author Date of Graduation iii Vita Kenan Luke Casey, son of Jerry and Paula Casey, was born January 8, 1982, in Louisville, Kentucky. He graduated from Je ersonville High School as salutatorian in 2000. He earned his Bachelor?s degree in Computer Science and Mathematics from Freed- Hardeman University, Henderson, Tennessee in 2004. In 2007, he completed his Master?s degree in Computer Science at Auburn University. iv Dissertation Abstract Scalable, Self-Healing, and Real-Time Network Services for Directed Diffusion Kenan L. Casey Doctor of Philosophy, August 9, 2008 (M.S., Auburn University, 2007) (B.S., Freed-Hardeman University, 2004) 157 Typed Pages Directed by Alvin S. Lim Directed di usion is a data-centric publish-subscribe routing protocol for sensor net- works. We have proposed three network services which increase the capabilities of directed di usion. Our protocols build on the inherent strengths of di usion and lessen its weak- nesses. The system architecture emphasizes e cient communication, local route repair, and real-time response. Our suite of network services signi cantly improves the performance of directed di usion by addressing the fundamental challenges of sensor networks: energy- e ciency, dynamic environments, and scalability. Our design increases the e ciency of ooding, improves packet delivery rates in the presence of node failure, and decreases the number of packets that miss their deadlines. We evaluate the performance of our improved di usion in terms of routing overhead, delivery e ectiveness, and deadline achievement. Our results demonstrate the bene ts of the network services in all three respects. We in- crease the e ciency of ooding by 48%, improve packet delivery rates in the presence of node failure by up to 28%, and decrease the number of packets that miss their deadlines by 30-60%. v Acknowledgments I would like to express my appreciation to Dr. Alvin Lim for the guidance he has provided throughout my study at Auburn. I would also like to express my gratitude to the advisory committee members, Dr. Min-Te Sun, Dr. David Umphress, and Dr. Yu Wang. Several fellow students have made signi cant contributions to this research including Raghu Neelisetti, Qing Yang, and Philip Sitton. I am thankful for their help and their friendship. Above all, I am grateful to my wife, Ashley, whose love and support have made this work possible. I thank her for her patience and encouragement during the long research process. Her companionship is my greatest joy. vi Style manual or journal used Journal of Approximation Theory (together with the style known as \aums"). Bibliograpy follows van Leunen?s A Handbook for Scholars. Computer software used The document preparation package TEX (speci cally LATEX) together with the departmental style- le aums.sty. vii Table of Contents List of Figures xi 1 Introduction 1 2 Motivations, Objectives, and Applications 5 2.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 E cient Flooding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.2 Route Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.3 Real-Time Communication . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 E cient Flooding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 Route Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.3 Real-Time Communication . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.1 Emergency Medical Response . . . . . . . . . . . . . . . . . . . . . . 13 2.3.2 Business Alerts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.3 Meteorological Command and Control . . . . . . . . . . . . . . . . . 14 3 Related Work 19 3.1 Sense-and-Respond Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Directed Di usion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.1 Directed Di usion Architecture . . . . . . . . . . . . . . . . . . . . . 20 3.2.2 Strengths of Di usion . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.3 Weaknesses of Di usion . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.4 Route Repair Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3 E cient Flooding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.1 Heuristic-based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.2 Topology-based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4 Route Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.4.1 WAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4.2 ADMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4.3 SWR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4.4 SHORT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4.5 RDMAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4.6 ABR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.7 TORA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4.8 AODV-LRQT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 viii 3.4.9 PATCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.5 Real-Time Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.5.1 RAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.5.2 SPEED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.5.3 Other Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4 Network Services Architecture 50 4.1 Clustering Mechanism (PCDD) . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.1.1 Passive Clustering Overview . . . . . . . . . . . . . . . . . . . . . . . 53 4.1.2 Protocol Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.1.3 Adaptation to Di usion . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Repair Mechanism (LRDD) . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.1 Break Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.2 Break Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2.3 Localized Gradient Repair . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3 Real-Time Communication Mechanism (RTDD) . . . . . . . . . . . . . . . . 66 4.3.1 SVM and DVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.3.2 SAT and DAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.3.3 SRT and DRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5 Network Services Design 73 5.1 Design Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.1.1 Energy-e cient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.1.2 Scalable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.1.3 Localized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.1.4 Distributed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.1.5 Real-Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.1.6 Reactive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.2 Design Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.2.1 Clustering Mechanism (PCDD) . . . . . . . . . . . . . . . . . . . . . 76 5.2.2 Repair Mechanism (LRDD) . . . . . . . . . . . . . . . . . . . . . . . 77 5.2.3 Real-time Communication Mechanism (RTDD) . . . . . . . . . . . . 79 6 Network Services Implementation 82 6.1 Clustering Mechanism (PCDD) . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.1.1 Passive Clustering Attribute . . . . . . . . . . . . . . . . . . . . . . 82 6.1.2 Passive Clustering Filters . . . . . . . . . . . . . . . . . . . . . . . . 84 6.2 Repair Mechanism (LRDD) . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.2.1 LRDD Attribute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.2.2 LRDD Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.2.3 Local Flooding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.3 Real-Time Communication Mechanism . . . . . . . . . . . . . . . . . . . . . 91 ix 6.3.1 RTDD Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.3.2 RTDD Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.3.3 Prioritized Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 7 Performance Evaluation 98 7.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7.2 Clustering Mechanism (PCDD) . . . . . . . . . . . . . . . . . . . . . . . . . 99 7.2.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7.2.2 Flooding E ciency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 7.2.3 Delivery E ectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . 104 7.2.4 End-to-End Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7.2.5 Disconnection Probability . . . . . . . . . . . . . . . . . . . . . . . . 108 7.3 Repair Mechanism (LRDD) . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 7.3.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 7.3.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.4 Real-Time Communication Mechanism (RTDD) . . . . . . . . . . . . . . . . 125 7.4.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 7.4.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 8 Conclusions and Future Work 136 Bibliography 139 x List of Figures 2.1 Global Route Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Ideal Local Route Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 DART 2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1 Regional Flooding (Region Filter) . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Relative Distance Micro-Discovery . . . . . . . . . . . . . . . . . . . . . . . 41 3.3 RAP Communication Architecture . . . . . . . . . . . . . . . . . . . . . . . 44 3.4 SPEED Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.1 Layered Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2 Detailed Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3 Example PC Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.4 Full Gateway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.5 Distributed Gateway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.6 PC Transition State Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.7 Local Repair for Directed Di usion . . . . . . . . . . . . . . . . . . . . . . . 64 7.1 Number of MAC-layer interest messages versus number of nodes (1 ow) . . 102 7.2 Number of Routing-layer interest messages versus number of nodes (1 ow) 103 7.3 Number of MAC-layer interest messages versus number of nodes (2 ows) . 104 7.4 Number of Routing-layer interest messages versus number of nodes (2 ows) 105 7.5 Number of MAC-layer exploratory data messages versus number of nodes (2 ows) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 xi 7.6 Number of Routing-layer exploratory data messages versus number of nodes (2 ows) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7.7 Number of MAC-layer exploratory data messages versus number of nodes (2 ows) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 7.8 Number of Routing-layer exploratory data messages versus number of nodes (2 ows) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 7.9 Average energy consumed per node versus number of nodes (1 ow) . . . . 110 7.10 Average energy consumed per node versus number of nodes (2 ows) . . . . 111 7.11 Delivery ratio versus number of nodes (2 Flows) . . . . . . . . . . . . . . . 112 7.12 Delivery ratio versus number of nodes (2 Flows) . . . . . . . . . . . . . . . 113 7.13 End-to-end delay versus number of nodes (1 Flow) . . . . . . . . . . . . . . 114 7.14 End-to-end delay versus number of nodes (2 Flows) . . . . . . . . . . . . . . 115 7.15 Disconnection probability versus number of nodes (1 Flow) . . . . . . . . . 116 7.16 Disconnection probability versus number of nodes (2 Flows) . . . . . . . . . 117 7.17 Packets delivered for each repair and ooding algorithm . . . . . . . . . . . 118 7.18 Total ooded packets for each repair and ooding algorithm . . . . . . . . . 120 7.19 Average energy consumed per node for each repair and ooding algorithm . 121 7.20 Total ooded packets per data packet delivered for each repair and ooding algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.21 Average energy consumed per data packet delivered for each repair and ood- ing algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.22 Topology 1: 2 Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 7.23 Topology 1: 3 Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 7.24 Delivery Ratio of Flow 1 (Topology 1) . . . . . . . . . . . . . . . . . . . . . 128 7.25 Delivery Ratio of Flow 2 (Topology 1) . . . . . . . . . . . . . . . . . . . . . 130 xii 7.26 Average Delivery Ratio of Flows 1 and 2 (Topology 1) . . . . . . . . . . . . 131 7.27 Delivery Ratio of Flow 1 (Topology 2) . . . . . . . . . . . . . . . . . . . . . 132 7.28 Delivery Ratio of Flow 2 (Topology 2) . . . . . . . . . . . . . . . . . . . . . 133 7.29 Delivery Ratio of Flow 3 (Topology 2) . . . . . . . . . . . . . . . . . . . . . 134 7.30 Average Delivery Ratio of Flows 1, 2, and 3 (Topology 2) . . . . . . . . . . 135 xiii Chapter 1 Introduction In 1991, Mark Weiser challenged computer scientists to consider a new model of com- puting, a model in which computers are interwoven into the \fabric of everyday life until they are indistinguishable from it" [1]. Weiser?s vision, now called ubiquitous computing, emphasizes intuitive interaction with pervasive computer resources. Proactive computing is a branch of ubiquitous computing which focuses on autonomous and environmentally- based devices. Three fundamental goals for proactive computing have been proposed: get physical, get real, and get out [2]. The rst goal is to connect computers to the physical world. Such computers are typically small sensors and actuators which collect data over a large physical environment and communicate it over a wireless network. The second goal emphasizes the need for real-time performance. Proactive computing systems should be faster than real-time so that feedback can be provided to automated control loops for future predictions. The nal goal is to remove humans from the interactive loop in order to allow for faster-than-human response times. This change essentially shifts computation from human-centered to human-supervised. Sensor networks have emerged in response to these lofty goals. Sensor networks are large-scale, wireless networks of resource-constrained sensor devices. They most clearly support the rst goal of proactive computing, connecting the physical and virtual worlds though sensors. In reference to the second and third goals, sensor networks allow for rapid and automated response to environmental stimuli. They allow faster-than-human response 1 by taking humans out of the loop. Sensor devices, called nodes, consist of a processor, en- vironmental sensors, a wireless communication device (usually a radio), and a power source (usually a battery). Current sensor devices are the size of a quarter, but the goal is to further reduce their size and cost. The SmartDust project [3], for example, envisions de- vices one cubic millimeter in size and capable of being suspended in air. A wide range of applications for sensor networks has been proposed. Examples include environmental mon- itoring, military surveillance, and medical monitoring. Sensor networks are most noticeably di erentiated from other types of networks by their large size, limited energy source, and dynamic nature. Sensor deployment is almost always ad-hoc due to the sheer number of sensors. Potential deployment methods include being launched by artillery shelling, scat- tered by ground vehicles, or dropped by airplanes. The small size of sensor devices puts severe limits on the amount of available energy so all operations are energy-e cient. Sensor networks are characterized by robustness in handling the inevitable failures which occur in the eld. The system must continue to function after devices fail, die, or are destroyed. While a great deal of sensor network research has been conducted, the inherent char- acteristics of sensor networks provide ample challenges for system researchers. Three of the most compelling challenges of sensor networks are their immense scale, their resource scarcity, and their dynamic topologies. Sensor networks aim to include tens of thousands to millions of nodes. To support networks of this size, communication mechanisms must be incredibly scalable. The hardware of sensor devices also provides signi cant challenges. Sen- sor devices have very limited processing, storage, and communication capabilities. Perhaps most critical is the limited energy budget available to sensor nodes. Since batteries may not 2 be replaced in the eld, all sensor network algorithms must conserve energy whenever pos- sible. Consequently, networking protocols must maintain high computation and communi- cation e ciency. Communication is particularly expensive in terms of energy (transmitting 1 bit consumes as much power as executing 800-1000 instructions [4]) so minimizing the number of transmissions (and receptions) is a paramount goal. Another challenge is the dynamic nature of sensor network topologies. Since sensor networks are typically deployed in an ad hoc fashion, the network must rst con gure itself and then maintain a working con guration in the presence of adverse environmental e ects which may result in node failure. In the face of anticipated node failure, the network should continue to function normally by taking advantage of node redundancy. The overall goal of our research is to mask the fundamental challenges of the sensor network from high level applications and application developers. Ideally, the application programmer should not know about the scale, energy, or dynamics of the sensor network. The lower layer protocols should transparently support any size network, adapt to changing energy levels, and perform dynamic recon guration in response to topology changes. Ap- plications should not have to worry about such details. To address this issue, we propose three mechanisms which shield the application from the network level challenges. First, we present an e cient ooding scheme to increase the scalability of the network. Secondly, we address the dynamic nature of sensor networks through a route repair algorithm which recon gures the network after node failure. Lastly, we propose a protocol for real-time communication which gives application developers control over data ow priorities so that time-critical messages can be delivered satisfactorily. Together, the three network services compose a new layer in the network protocol stack which applications can easily, even 3 transparently, utilize to gain improved network performance. The network services reside slightly above the network layer but have access to the inner workings of the routing pro- tocol. The low-level implementation of these services provides signi cant bene ts to all the layers above the network level (i.e., transport and application). By taking advantage of the network services, any software running at higher layers will be capable of increased scalability, reactive self-healing, and real-time communication. In Chapter 2, we discuss the motivations for our network services and describe relevant applications of sense and response systems. Chapter 3 gives background information about directed di usion and an overview of research related to our three protocols. We describe the architecture of the proposed system in Chapter 4 and the design principles in Chapter 5. In Chapter 6, we give an overview of the implementation of the three protocols. The simulation performance of each protocol is described in Chapter 7. We conclude in Chapter 8 with a summary of our contributions and areas for future work. 4 Chapter 2 Motivations, Objectives, and Applications In this chapter we discuss the motivations for our enhancements to directed di usion. We also present several existing and potential applications of sense and response (S&R) sensor networks. Our protocols are particularly relevant to such systems since they require e cient and robust data collection as well as timely response to events. 2.1 Motivations Although directed di usion is generally well-suited to sensor networks, it has several apparent weaknesses. Its reliance on ooding incurs a signi cant penalty on communication and energy e ciency. Furthermore, di usion also handles node failure rather poorly with its use of periodic global ooding. Finally, di usion lacks any support for time-critical message delivery. We propose three network services to augment and extend directed di usion to handle these issues. First, we propose an e cient ooding scheme to increase the energy e ciency and scalability of the network. Secondly, we address the dynamic nature of sensor networks through a route repair algorithm that recon gures the network after node failure. Lastly, we present a protocol for real-time communication which gives application developers control over data ow priorities so that time-critical messages can be delivered satisfactorily. In the subsequent sections we give speci c motivations for each of these mechanisms. 5 2.1.1 E cient Flooding Flooding is a packet delivery process that delivers a packet to every connected node in the network [5]. Typically, ooding requires each node to rebroadcast every ooded packet so that every node is guaranteed to receive the message at least once. This type of ooding has been called blind ooding [6] and simple ooding [7]. E cient ooding algorithms attempt to reduce the number of redundant packet transmissions through the use of heuristics or information about the network topology. Such techniques increase algorithm complexity for the sake of communication e ciency. Flooding has frequently been used in both proactive and reactive routing protocols. Proactive routing protocols often rely on ooding for route advertisement. In this case, every node in the network must know the current status of a link in order to maintain correct routing tables. E cient ooding techniques are often utilized by link-state protocols since extensive neighbor information is gathered and propagated by such routing protocols [5]. By using topology information, a node can forward ooded packets to a reduced set of neighbors without a ecting the global reception of the packet. In reactive protocols, which are generally more appropriate for sensor networks, the principle reason for ooding is route discovery. Unlike proactive protocols, reactive protocols typically use blind ooding. Reactive protocols must ood initial route creation packets because no prior topology information is known. The path to an unknown host is found by ooding the network in search of the destination. Directed di usion relies strongly on ooding since both interests and exploratory data are ooded during the gradient setup phase. After routes are discovered, di usion performs route maintenance using the same two-phase ooding mechanism. 6 Due to its simplicity, blind ooding has often been implemented in reactive protocols [8] [9]. Although this naive approach to ooding simpli es the design, it results in ine ciency since nodes may rebroadcast packets that all their neighbors have already received. In blind ooding, a node may receive duplicate copies of the same ooded packet from multiple neighbors. The performance of blind ooding is inversely related to the average number of neighbors (neighbor degree) per node. As the neighbor degree increases, blind ooding results in greater redundant packets (network layer), greater probability for collision (MAC layer), and greater congestion of the wireless medium (physical layer) [10]. In dense networks it is not necessary for every node to forward each ooded packet. If only a subset of the nodes is chosen as relays, the ooding e ciency can be signi cantly improved. The underlying problem is to select a dominant set of nodes to relay ooded packets. More formally, we wish to nd the minimal subset of forwarding nodes su cient to deliver a ooded packet to every other node in the system [5]. This is typically accomplished with algorithms that use topology information or heuristics to identify nodes that should forward packets as discussed in Section 3.3. 2.1.2 Route Repair Because of the dynamic nature of sensor networks, node and link failures are expected occurrences. When links fail, the routing protocol may attempt to repair from the breakage in one of two ways: end-to-end error recovery or local error recovery. End-to-end repair protocols initiate the recovery process by either explicitly alerting the source node of the problem with a negative acknowledgment or by implicitly informing the source with the ab- sence of a positive acknowledgment. In the former case, the node which detects a break will 7 take no action so that the sender will timeout while waiting for a positive acknowledgment. In the latter case, a negative acknowledgment will be sent from the intermediate node to the source node, reporting the link failure. Di usion essentially uses the implicit approach (positive acknowledgment) in that the protocol de nes no explicit error message depending instead on periodic route re-discovery to repair broken links. The problem with end-to-end repair is the high cost of network-wide ooding. This has serious implications on the performance of a system in terms of scalability, energy consumption, and latency. Protocols which depend on global error recovery mechanisms do not scale well with network size. Moreover, since every node must forward the ooded packet, each node consumes energy repairing a route which is possibly very distant. Latency of end-to-end repair mechanisms also su ers since routes are completely rediscovered from the source to the sink. Figure 2.1 graphically illustrates the high cost of global route repair in two-phase pull directed di usion. Notice the global ood of interests and exploratory data messages in both directions. Also notice that the repaired path is only a few hops di erent than the original data path. Figure 2.1: Global Route Repair 8 When a node relatively far away from the source fails, it makes little sense to involve the source (and other distant nodes) in the error recovery process. Ideally, only nodes around the link failure should be involved in the repair process. In cases where the repaired path is only a few hops di erent from the original path, such as in the previous gure, the localized approach greatly reduces the overhead associated with repair. Figure 2.2 shows an ideal case for local route repair where distant nodes are not involved in the recovery process at all. Nodes in the immediate vicinity of the break participate in the repair algorithm. Note that neither the source nor the sink are involved in the recovery process. Figure 2.2: Ideal Local Route Repair The advantages of the local repair approach include increased scalability, increased ooding e ciency, and decreased latency of repair. Since only a portion of the nodes are involved in local repair, the protocol scales more gracefully with network size and consumes less overall system energy. The localized nature of the recovery also lends itself to faster route repair since the complete (source to sink) route does not have to be traversed. In their analytical comparison of local and end-to-end recovery mechanisms, Aron and Gupta [11] show that with end-to-end recovery the probability of successfully delivering a packet on 9 the rst attempt rapidly degrades with increasing network size. In summary, local repair improves the scalability, e ciency, and latency of a network protocol. Additionally, the amount of resources consumed per packet is several orders of magnitude larger for end-to- end repair than for local repair. In summary, local repair improves the scalability, e ciency, and latency of a network protocol. 2.1.3 Real-Time Communication The general objective of sensor networks is distributed micro-sensing, i.e. to sense, monitor, and control physical environments. Examples include acoustic surveillance systems for monitoring homes, biometric sensors which detect harmful bio-agents in airports, and stress sensors which monitor the structural integrity of buildings during earthquakes. In many applications, data messages have time constraints in the form of end-to-end deadlines specifying an upper bound on the communication delay of a packet. Application-speci c deadlines, for example, allow packets associated with important events to take priority over periodic monitoring packets. Surveillance systems may require the position of an intruder to be reported to the command center within 15 seconds of detection while the bio- detection system at the airport may need to respond within 1 second. Data in monitoring systems often has an interval of validity after which it is no longer useful. The validity interval of a home surveillance system, for example, will be much shorter than that of a temperature monitoring system since the presence of an intruder will be of much greater interest immediately after his detection, but may be useless 30 minutes later. 10 To support this time-critical property of physical environments, sensor network proto- cols must implement real-time communication mechanisms. The goal of real-time commu- nication is to minimize the number of packets which miss their end-to-end deadlines. This is typically accomplished by prioritizing packets by their deadlines so that more urgent mes- sages are sent rst. Essentially, less urgent packets are delayed so that more urgent packets can reach their destinations on time. The challenge for real-time communication over sen- sor networks is the multi-hop nature of sensor networks. Although it is reasonable to give priority to packets with shorter deadlines, prioritization should also be given to packets that are farther away from their destination. In order to meet its deadline, a packet with a relatively long deadline may need to be prioritized higher than a packet with a shorter deadline, if the long-deadline packet travels twice as many hops. Thus, both the deadline of a packet and the distance it must travel should be considered when packet prioritization is performed. 2.2 Objectives In this section, we discuss the speci c objectives of our three protocols. We explain the overall goal for each protocol and highlight the improvement that will be achieved by our design. 2.2.1 E cient Flooding Performing e cient ooding in sensor networks is challenging for several reasons. First, the communication overhead necessary to collect topology information represents a signif- icant energy expenditure. Typically, Hello packets are exchanged among all neighboring 11 nodes, thus incurring signi cant packet (and energy) overhead. Secondly, such packet ex- changes occur periodically, thus incurring overhead regardless whether the network has data to transmit. The objective of our approach is to dynamically and reactively per- form e cient ooding. We will use a passive clustering technique that uses ongoing tra c to create a clustered structure which allows nodes that should drop forwarded packets to identify themselves. Our approach avoids additional packet exchanges by piggybacking the clustering information on existing packets. Finally, the network performs e cient ooding reactively instead of proactively. As a reactive protocol, unnecessary transmissions are re- duced. The dynamic and reactive characteristics of our approach promote energy e ciency by reducing communication. 2.2.2 Route Repair The goal of our route repair approach is to quickly repair broken data paths in a highly localized fashion. Reactive repair is vital so that the network can quickly recover from the failure. Repair time should be kept to a minimum so that latency and delivery e ectiveness are not harmed. Localized repair is essential in order to minimize the overhead associated with path repair. Thus, our route repair protocol aims to achieve localized and timely repair without impeding the scalability, e ciency, or timeliness of the network. 2.2.3 Real-Time Communication Our goal with respect to real-time communication is to develop a real-time communi- cation protocol which considers both distance and deadline for directed di usion. This will allow application developers to add deadline information to a data ow and have con dence that the network will maximize the number of packets which meet their deadlines. The 12 development of a real-time communication protocol for directed di usion will signi cantly broaden the applicability of directed di usion. Typically di usion is limited to simple data gathering applications. This enhancement will allow di usion networks to support complex applications which may involve timely data reporting or time-critical system response. 2.3 Applications The emergence of pervasive computing has expanded the frontier for S&R systems. Although the S&R architecture is broadly applicable to various elds, its potential utility has not been fully realized. We present a summary of the applications of previous S&R systems and propose several new areas well-suited to the S&R paradigm. 2.3.1 Emergency Medical Response One very compelling application of S&R systems is in the eld of healthcare. The authors of [12] have developed a sensor-based emergency medical response system. At the lowest level, sensors worn by patients report vital signs and location information to local command centers (e.g. ambulances) via 802.15.4. From there, information is forwarded over a cellular or satellite link to a global command center where the data is managed as a web service. The goal of the system is to provide greater situational awareness about patient condition and arrival time so that doctors can make more informed decisions. Our real-time communication model is particularly relevant to this type of application. Since information about vital signs is being communicated, timely response of the system is critical to the care of patients. Given the large size of many hospitals, e cient ooding is also an important mechanism for satisfactory performance of S&R systems in healthcare. 13 2.3.2 Business Alerts Many business applications for S&R systems have also been proposed [13] [14] [15]. [13] describes a uni ed event stream processing system to monitor, analyze, and detect critical business events. The objective is to improve e ciency, reduce cost, and enable the enterprise to react quickly. The Event Stream Processor calculates metrics based on event messages received from a variety of sources such as complaint databases or real-time production statistics. A domain expert creates rules based on the metrics to provide alerts for important business situations. Speci c applications of this system include inventory replenishment, product behavior, and retail performance. Our networking improvements are also bene cial in the business domain. The real-time communication mechanism is well suited to the time-critical nature of many phenomena that lead to immediate response. 2.3.3 Meteorological Command and Control S&R systems have also been developed for hazardous weather detection. Our enhanced communication protocols are especially relevant to meteorological applications where sensor failure is expected. The route repair mechanism allows the system to perform in challenging environments such as tornadoes and tsunamis. The redundant nature of the sensor network is used to overcome unavoidable node failure. E cient ooding reduces congestion due to network ooding. This is relevant in the context of energy e ciency and network la- tency. Since hazardous weather detection systems will be constantly in use but infrequently activated, it is important to e ciently utilize the energy resources of the sensor devices. Ef- cient ooding saves energy by reducing the redundant packet transmissions. This section describes the research in the areas of tornado detection and tsunami detection. 14 Tornado Detection NetRad [16] is a Distributed Adaptive Collaborative Sensing (DACS) system for early tornado detection. The goal of NetRad is to detect a tornado within 60 seconds of formation and track its centroid within a 60-second temporal region. NetRad is composed of a dense network of low-powered radars that collaborate to accurately predict tornado formation. The radars report atmospheric readings to the System Operations and Control Center (SOCC) where a merged version of the data is analyzed to detect meteorological features. Radars are re-tasked every 30 seconds based on the utility of the features identi ed. Thus, sensing can be focused on areas nearest to recently detected meteorological features so that a better picture of the tornado can be obtained. The NetRad system is somewhat di erent from the S&R sensor network system we proposed. Unlike our S&R system, the NetRad system uses a high-bandwidth, highly- structured, wired network to connect the sensors. Our system is designed to communicate over relatively low speed, ad-hoc, wireless networks. We also emphasize fast response time on the order of subseconds, as opposed to the 30-second turnaround time of NetRad. Tsunami Detection Our S&R system has speci cally been applied to the tsunami detection problem. In this section we describe the current tsunami warning system used by the United States and a modi ed system which we have proposed for better detection and response to this type of disaster. Current System The current tsunami warning system is composed of ten buoys in the Paci c and ve in the Atlantic/Caribbean. The Deep-ocean Assessment and Reporting of 15 Tsunamis (DART) project is maintained by the National Oceanic and Atmospheric Admin- istration and serves as part of a tsunami warning system for the United States [17] [18] [19]. Figure 2.3 illustrates the architecture of the current system. The DART stations consist of two parts: an anchored sea oor bottom pressure recorder called a tsunameter and a companion moored surface buoy. The tsunameter detects subtle pressure changes which indicate tsunami waves. An acoustic modem transmits data from the tsunameter to the buoy, which then relays the information via satellite to land-based warning centers. The goal of the DART system is to provide accurate and early warning for tsunamis. This includes avoiding false alarms and unnecessary evacuations. Proposed System Our proposed system is similar to the current system, but incorpo- rates two major modi cations. First, we suggest using a multi-hop sensor network connected by radio frequency (RF) links instead of a single-hop satellite communication scheme. Al- though this modi cation necessitates a greater number of devices, it also leads to signi cant savings in terms of latency, energy, and cost per device. Since satellite transmission delays are avoided, the latency of communication will be improved. Shorter range radio broadcasts will result in longer battery life and greater sensor lifetime. Improved latency and energy- e ciency represent signi cant advantages of our design. The use of a sensor network will result in greater accuracy of detection with respect to resolution. This allows for more localized tsunami detection and prediction. Secondly, we propose the addition of a tsunami mitigation system that responds to tsunamis. We assume it is feasible to build an arti cial barrier reef strong enough to withstand the force of tsunamis and reduce its strength before it hits the shoreline. The barrier may be engaged (or red) when a tsunami event is detected and be disengaged 16 otherwise. Although such a barrier system would be expensive to construct, the cost may be justi ed for protecting particularly valuable areas (e.g. nuclear reactors). The proposed system is similar to the Thames Barrier, a series of movable gate structures which protect London from tidal oods [20], and the gate system currently being developed to shield the city of Venice from dangerously high tides [21] [22] [23]. Although the focus of our work is on the networking and analysis aspects of the system, we have also given some consideration to the design of the barrier system. We envision a defense network inspired by the concept of air bags for automobiles. The basic idea is to create a series of arti cial islands which impede the progress of the wave. We propose an underwater deployment of in ation devices which are connected by wire to ballasts on the sea oor. When given the command to re, the barriers will in ate and oat to the surface while remaining tethered to the ballast. We propose using a layered series of barriers that successively attenuate the wave. Our proposed barrier system is similar to the breakwater coastal defense systems described in [24] [25]. By using advanced prediction techniques, we hope to engage the series of barriers that most e ectively obstructs the wave. 17 Figure 2.3: DART 2 System Architecture 18 Chapter 3 Related Work 3.1 Sense-and-Respond Systems Chandy [26] presents a high-level summary of sense and respond (S&R) systems in the context of IT issues. On the most basic level, sense and respond systems simply sense the environment and respond appropriately. Rules in S&R systems can be de ned using when- then semantics. The when-clause corresponds to the detection of an event, and the then- clause describes the execution of the response. The cost of an S&R system can be broken into three parts: the cost of false positives, the cost of false negatives, and the incremental costs of running the system. Chandy [26] outlines several important characteristics of S&R applications in order to categorize the application space. He presents the following S&R application properties: Response Type: Is the response human-centered or automated? Response Time: Is the response executed in minutes or sub-seconds? Event Rate: Are events generated a few per second or thousands per second? Condition Complexity: Are when-clauses based on a single event or a history of events? Are when-clauses based on a single stream or the fusion of multiple streams? Data Structure: Is the data structured in well de ned schema or generally unstruc- tured? Query Structure: Are queries structured or unstructured? 19 Our network services support S&R systems of fairly high complexity according to this taxonomy. We support an automated response time on the order of sub-seconds. Events may be generated at high rates and may compose several distinct streams of historical data that must be fused. The only exception to the trend of higher complexity is that our system utilizes structured data and structured queries. 3.2 Directed Di usion Directed di usion [8] [27] is a sensor network routing protocol based on the publish- subscribe communication model. Its development was guided by the principle of data-centric routing. This is in contrast to traditional address-centric protocols which route packets based on host information. Di usion also emphasizes in-network processing and localized interactions in order to promote energy e ciency and scalability. 3.2.1 Directed Di usion Architecture Protocol Overview The objective of directed di usion is to perform multipoint-to-multipoint communica- tion using named data. All routing is based on named data in the form of attribute-value tuples instead of host information like address-centric protocols (e.g. IP [28], DSR [9], or AODV [29]). Di usion is also characterized by its emphasis on localized routing. Each node stores routing information only about its immediate (i.e., 1-hop) neighbors; no global information is required. Di usion sets up subscriptions by ooding interest messages throughout the network. Nodes with matching data publish it by ooding exploratory data messages back through the 20 network to the interested node. The subscribing node then reinforces its fastest neighbor by sending it a reinforcement message. Subsequent nodes recursively forward the reinforcement message to their fastest neighbor until the lowest latency path back to the publishing node has been reinforced. After a reinforced path has been established, reinforced data messages ow from publisher to subscriber via unicast or e cient multicast. After the two-phase setup algorithm is complete, data is, in essence, \pulled" from source to sink along the reinforced gradients. Protocol Details Di usion uses four types of messages to establish paths within a network. A sink node subscribes to a data ow by ooding the network with interest messages that name the type of data the sink wants to receive. Intermediate nodes store the interest and record the neighbor from which it was sent. This saved path leading to the sink is known as a gradient. Nodes with data matching the interest will publish it by transmitting exploratory data along the gradients previously created. These publishing nodes are known as source nodes. When the exploratory data arrives at the sink, the sink will reinforce its single fastest neighbor (i.e., the neighbor that delivered the rst exploratory data message) by sending it a reinforcement message. Any node receiving a reinforcement message will, in turn, reinforce its fastest upstream neighbor until a reinforced path all the way back to the source is established. Slower paths may be negatively reinforced to remove them from the gradient tables. Subsequent data emanating from the source, known as reinforced data, will be unicast or multicast over the reinforced path to the sink. This two-phase process results in the creation of a multipoint-to-multipoint distribution tree. 21 3.2.2 Strengths of Di usion Data-Centricity Directed di usion has generally proven to be well-suited to sensor networks. Its data- centric model is more appropriate for many sensor network applications. It performs more e ciently and provides more useful services for many types of sensor network applications (e.g., query processing). The use of named data provides an energy e cient mechanism for routing data, which avoids the unnecessary complexity of host information. Since data is the primary concern it makes sense to use it as the routing criterion instead of host address, a property largely unimportant in sensor networks. Reactive Nature The reactive nature of di usion also supports the goal of energy e ciency. Directed di usion belongs to a class of protocols that creates routes in response to the application?s needs instead of proactively nding routes in advance. The reactive property is particularly relevant to senor networks since they may remain inactive for long periods of time waiting for events of interest. Localized Interactions One of the most notable characteristics of di usion is its fully localized nature. Nodes only require knowledge about their 1-hop neighbors to create gradients. No global informa- tion, whether end-to-end or multi-hop, is needed for routing. This means that the routing table scales with the number of neighbors as opposed to the total number of nodes in the network. Node density, not network size, is the determining factor in the gradient table 22 size. This improves the scalability of the protocol with respect to space complexity. The localized nature of di usion also simpli es the routing protocol since global information is not required to make routing decisions. Aggregation Another strength of the di usion protocol is its support for in-network processing or aggregation. Since each node understands the attribute-value tuples that compose a data message, any intermediate node may perform data aggregation. While this essentially pushes application level data down to the network layer, the resulting bene ts are signi cant. Aggregation essentially trades communication overhead for latency by consolidating the data from multiple packets into one packet. It has been demonstrated that signi cant energy savings (up to 42% in [27]) can be achieved by the use of aggregation in di usion. Multipoint-to-Multipoint Links A nal strong point for di usion is its support for various types of communication paradigms. Unlike other network protocols which only provide unicast (1!1) capabilities, di usion inherently supports multicast (1!N) communication. Additionally, di usion?s publish-subscribe model can create gather distribution trees (N!1) and multipoint-to- multipoint (N!M) dissemination paths. Other protocols may support these modes of communication with high-level protocols, but di usion?s relatively simple architecture in- herently supports all four classes of links. 23 3.2.3 Weaknesses of Di usion Flooding Perhaps the greatest weakness of directed di usion is its reliance on ooding for path creation and maintenance. In the case of standard di usion (the two-phase pull model), both interest and exploratory data messages are ooded throughout the network whenever gradients are established. This results in a huge amount of network tra c every time a new path is established or an old path is refreshed. Scalability The cost of ooding rises with the size of the network, so scalability is signi cantly a ected. Di usion does not e ectively support networks with multiple hundreds of nodes because of the huge overhead imposed by ooding across so many nodes. Although its localized nature supports scalability, di usion?s dependence on ooding severely limits the practical size of a network. Latency of Global Repair To deal with broken paths, di usion periodically re-creates routes by performing the same procedure used to initially nd routes: global ooding. This mechanism is described in Section 3.2.1. Since di usion handles failure and mobility with global repair mechanisms, the costs incurred for repair are signi cant. To reduce the energy costs of global repair, the path maintenance mechanism is performed on a relatively infrequent schedule (e.g. every 60 seconds). In the worst case, data may be delayed an entire refresh interval before a broken 24 path is repaired. While this may be adequate for some applications, it may be completely unacceptable for others, e.g., time-critical and real-time systems. Lack of Real-Time Communication Support Many applications have time deadlines which should be recognized by the network and handled di erently. Directed di usion routes packets in a rst come, rst serve fashion { no preference is given to higher priority ows. While not an inherent weakness, di usion?s lack of support for real-time communication is a de ciency nonetheless. 3.2.4 Route Repair Mechanisms Standard directed di usion includes two mechanisms for path repair. Di usion was designed to handle path failure primarily by the periodic re-creation of gradients using a global mechanism. The designers of di usion also mention a local repair procedure, but fail to adequately deal with the route repair problem. This section summarizes the costly global repair mechanism and the primitive local repair algorithm proposed by the original designers of the di usion protocol. Global Gradient Repair The standard and currently implemented method of handling broken links in di usion is global gradient repair. In this method, the sink periodically refreshes the path to the source by ooding the network with interest messages. The source also periodically oods exploratory data back to the sink in order to reinforce the path that is currently fastest. Although this mechanism provides the most optimal paths, it comes at a high cost in terms of energy and latency. Global repair results in network-wide ooding of interest 25 and exploratory data messages. To lessen this cost, global repair is performed at a fairly infrequent interval. As a result, links may remain broken for an entire gradient refresh interval. This means that a data ow may fail to report an event for 60 seconds (by default) in the worst case since no e ort is made to repair the link until the next gradient refresh. Local Gradient Repair In [30], the designers of di usion describe a local gradient repair strategy for the pro- tocol. In this method, intermediate nodes may participate in reinforcement. When a node detects degradation in link quality, it can discover a new path to the source and negatively reinforce the degraded path. The problem with this scheme, as pointed out by the authors, is that every node downstream from the break will attempt to nd a new path, resulting in a signi cant waste of resources. The authors suggest that the rst nodes after the break \interpolate" data events so that downstream nodes continue to \perceive" a high quality path and do not initiate unnecessary local repair. This method is far from ideal. It can hardly be considered local repair if intermediate nodes upstream from the break forward reinforcement messages all the way back to the source. On average, the intermediate node must reinforce half the distance between source and sink. In the worst case, the break would be 1-hop away from the sink and the so-called \local" repair would be only 1 hop di erent from global repair. Furthermore, without some mechanism on the part of the rst node downstream from the failure, all the downstream nodes will perform local repair. The interpolation mechanism proposed to solve this problem is somewhat nebulous and largely unspeci ed. 26 3.3 E cient Flooding Due largely to its simplicity, blind ooding is prevalent among reactive protocols; how- ever, because blind ooding often results in duplicate packet delivery, many resources can be conserved if a more intelligent approach to ooding is utilized. E cient ooding techniques use topological information or heuristics to decrease the number of redundant packets trans- mitted during a ood. There are a variety of approaches to this problem, but we divide them into the two types proposed in [6]: heuristic-based and topology-based. Heuristic- based protocols use some sort of rule to determine whether a ooded packet should be forwarded. The topology-based algorithms make use of connectivity information to deduce which nodes need to act as packet forwarders and which nodes can drop ooded messages. We further divide the heuristic-based protocols into four subcategories and the topology- based protocols into three subcategories. In the following sections we examine each family of e cient ooding protocols and describe a few of their most important examples. 3.3.1 Heuristic-based One approach to e cient ooding is to use heuristics to reduce the number of rebroad- casts. Heuristics based on probabilities, counters, distance, and location have been proposed [10] [31]. The main advantage of the heuristic approach is its simplicity. The primary dis- advantage is the challenge of appropriately setting the parameters of the heuristic. Probabilistic The probabilistic scheme [10] is similar to ooding, except that nodes rebroadcast ooded packets according to some pre-determined probability p. Thus, nodes will forward 27 ooded packets with probability p and drop ooded packets with probability 1 p. In blind ooding, p is always 100%. In su ciently dense networks, lower forwarding probabilities may be used without adversely a ecting delivery e ectiveness. In sparse networks, however, a greater probability is required for every node to receive the message. The Fireworks protocol [32] slightly modi es the simple probabilistic scheme just de- scribed. When nodes receive a ooded packet, they broadcast it to all their neighbors with probability p and unicast it to c of their N neighbors (cnum(CH) (4.1) Note that in Equation 4.1, num(GW) is the number of neighbors known to be gateways, num(CH) is the number of neighbors known to be cluster heads, and and are tunable parameters ( ; 0). The values of and should be chosen based on factors such as channel quality, noise level, and tra c patterns [5]. They may be locally adjusted to provide better adaptability and exibility. This gateway selection procedure is fully distributed and requires only local information. It relies on overheard packets instead of active packet exchanges (e.g. cluster head-list exchanges). The disadvantage of this approach is that 55 network connectivity may be a ected for wrong values of the parameters. If the parameters are too aggressive in reducing the number of gateways, the topology may be partitioned. Ordinary Nodes Drop Flooded Packets The behavior of ordinary nodes lies at the heart of ooding reduction. A node that is neither a cluster head nor a gateway becomes an ordinary node. When a node identi es itself as an ordinary node, it no longer forwards ooded packets. All ooded packets received by an ordinary node can safely be dropped because neighboring nodes will have already received the packet (either from a CH or a GW). Hence, ooding e ciency is dependent on the number of ordinary nodes that can be found. In a su ciently dense network, a relatively large number of ordinary nodes should be found. 4.1.2 Protocol Details PC de nes seven clustering states for nodes: two internal states and ve external states. Internal states are entered when a packet is received and serve as a tentative role for the node while the packet is being processed. The two internal states are Gateway- Ready (GW READY) and Cluster Head-Ready (CH READY). Nodes enter external states when a packet is sent. These are externally visible states which are communicated to neighboring nodes and used in making adjustments of node state. The external states are Initial (IN), Cluster Head (CH), Full Gateway (FULL GW), Ordinary Node (OR), and Distributed Gateway (DIST GW). 56 Initial On startup, nodes enter the Initial state. A node in Initial state does not belong to any cluster. Nodes move from the Initial state to one of the two internal states when a packet is received. PC maintains soft-state clusters using an implicit timeout scheme. If a node does not receives any packets within time interval tclustertimeout, it reverts back to Initial state. Upon future reception of packets, the node will restart the clustering algorithm. Cluster Head-Ready Cluster Head-Ready is the internal state of potential cluster heads. A node in Initial state changes its state to CH READY only when it has received a packet from a non-CH node. Simultaneous cluster head can sometimes occur due to topology changes and packet delay. To resolve these con icts, PC uses a Lowest ID algorithm to select the node with the lowest ID to serve as cluster head. If a cluster head receives a packet from another cluster head with a lower ID, it gives up its role and enters the Gateway-Ready state. The node which loses the LID completion sends a CH Give-Up message to its neighbors informing them of the change in status. Gateway-Ready The Gateway-Ready state is the internal state of potential gateways (both full and distributed). A node enters GW READY state when it receives a packet from a CH. A GW READY node will become a gateway or an ordinary node depending on the gateway selection heuristic. A node will change from gateway-ready to FULL GW or DIST GW if an insu cient number of its neighbors are already gateways. 57 Cluster Head The Cluster Head state is the external state of cluster heads. A node enters the CH state from the CH READY state if it wins the Lowest ID competition or if it has not overheard any other cluster heads. Full Gateway Nodes in the state Full Gateway directly connect two cluster heads. A full gateway is thus a member of two clusters. Full gateways announce the IDs of the two CHs that they connect. A node that is reachable from two CHs may declare its role as FULL GW only if it has not heard from another FULL GW node announcing the same pair of IDs. In this way, gateways also follow the rst declaration wins rule. If two nodes concurrently declare themselves as FULL GW for the same pair of CHs, the node with the lowest ID will win and the loser will become an ordinary node. Full gateways are in the external state FULL GW. Figure 4.4 illustrates a full gateway. Figure 4.4: Full Gateway Distributed Gateway A node that is in the Distributed Gateway state connects two clusters, but is not directly connected to one of the cluster heads. A distributed gateway is composed of two nodes working together to serve as a gateway between two clusters. This allows cluster 58 heads that are two hops apart to be connected. Figure 4.5 illustrates a distributed gateway. Notice both the full and distributed gateways in the example topology in Figure 4.3. Figure 4.5: Distributed Gateway Ordinary Node Ordinary nodes are nodes that do not forward packets. A node enters the OR state from the GW READY state based on the gateway selection heuristic. A node changes from GW READY to OR if enough of its neighbors are gateways as determined by the gateway selection heuristic. When a node can hear from more than two CHs and every pair of announced CHs is connected by a FULL GW it will become an OR. Incoming Packet Processing The transition state diagram shown in Figure 4.6 summarizes the passive clustering algorithm. When a packet is received at a node, the state of the node will be changed to one of the two internal states: CH READY or GW READY. An Initial node will change its state to CH READY if the packet is from a non-CH (1). If the packet is from a gateway or an initial node, the node will enter the GW READY state (5). CHs revert to CH READY (3) and GWs (9) and ORs (8) revert to GW READY upon each incoming packet. The receiving node always updates its neighbor lists based on the state of the sending node which is contained in the message. By stepping back to the GW READY state (8, 9, and 11), nodes can adjust their state dynamically. For example, if the number of gateways 59 has changed, an ordinary node may promote itself to gateway. When the soft-state of PC expires, nodes in each state revert back to Initial (12, 13, 14, and 15). As a result, new clusters may be composed in response to subsequent packet ooding. This potential for re-structuring is bene cial for sensor networks since cluster head responsibilities will be handled by di erent nodes throughout the life of the system, thus distributing energy consumption. Figure 4.6: PC Transition State Diagram Outgoing Packet Processing When a packet is ready to be sent, the node changes its PC state from an internal state to one of the ve external states. If it has not heard from any other CHs, a candidate 60 CH (CH READY) will change its state to CH (2) when it has packets to send. Otherwise it will change to GW READY. Nodes in the GW READY state will become FULL GW, DIST GW, or OR. If the number of known CHs is greater than one and there is no gateway connecting any two CHs, the node becomes a FULL GW (7). If a DIST GW has announced another cluster not known by the current node and no FULL GW connects them, then the node becomes a DIST GW (10). Otherwise, it becomes OR (6). If the number of known CHs equals one, then the node becomes either a DIST GW or an OR. If a node hears a DIST GW announce a CH other than its CH or if there is no DIST GW in the cluster, it will become a DIST GW (10). Otherwise, the node will become an ordinary node (6). 4.1.3 Adaptation to Di usion PC state information must be appended to all ooded packets. To apply PC on a di usion network, the state information is added to interests and exploratory data. Recall that di usion creates gradients and periodically refreshes them. After this initial setup phase, ooding is no longer necessary since data ows over reinforced paths. Since PC piggybacks cluster status information only on ooded packets, PC creates the clustered structure during the initial setup phase of the gradient cycle. For the remainder of the cycle, no PC state information is transmitted, so PC is essentially inactive. PC is only active during period gradient refresh when di usion oods the network with interests and exploratory data. Since PC a ects the route setup, it is possible for suboptimal routes to be discovered. The structured topology created by PC may exclude the optimal route between source and sink. This end-to-end route from source to sink will only include nodes identi ed to be 61 cluster heads and gateways { no ordinary nodes will be on this path. This potential for suboptimal routes is one of the inherent trade o s of PC as compared to global ooding. PC will improve ooding e ciency but may result in slightly longer routes. 4.2 Repair Mechanism (LRDD) To deal with the shortcomings in di usion?s ability to adapt to failure and mobility, we have developed a mechanism to e ciently handle route repair. We call our local repair protocol for directed di usion LRDD. Our solution emphasizes truly localized repair in order to reduce latency and energy expenditure. Its basic structure is similar to the local repair algorithm used in ADMR [47], but its methods noticeably di er from ADMR since it is tailored to directed di usion. We divide the local repair problem into three phases: break detection, break localization, and localized gradient repair. We will describe how LRDD handles each of these phases in this section. 4.2.1 Break Detection The rst step in adapting to node failure or mobility is detecting the link breakage. This may be accomplished in a variety of ways. We assume that appropriate algorithms may be used to reliably detect a link breakage. Our focus is on handling the break after it is detected not adaptively detecting path breakage. For our experiments, we used a xed event rate known a priori to detect breaks. We identi ed a link as broken when no data was received from a ow after an entire event interval had elapsed. 62 4.2.2 Break Localization Once a break has been detected, the break localization phase begins. The goal of this phase is to identify the node immediately downstream from the broken path. This node will initiate the localized gradient repair algorithm described in the next section. All intermediate nodes that detect a break will send a repair noti cation message to their 1-hop downstream neighbors along the gradients for the missing data. Every intermediate node downstream from the break should send a repair noti cation, and every intermediate node except the one nearest to the break should also receive a repair noti cation. If a node does not receive a repair noti cation within trepair seconds after sending one, it must be the nearest node, so it initiates the localized gradient repair. 4.2.3 Localized Gradient Repair The heart of LRDD is the localized gradient repair phase. The goal of this phase is to nd and create new gradients in the area near the break by using the same basic mechanisms as global gradient repair. We rst describe several potential mechanisms for restricting ooding to a localized region and then explain the inner workings of the repair process itself. Figure 4.7 illustrates the break localization and localized gradient repair phases of the algorithm. Local Flooding In order to restrict the ooding required to nd new paths, we limit the packet for- warding in one of several ways. Possible methods include simple hop-limited ooding or 63 Figure 4.7: Local Repair for Directed Di usion limiting the reconnect packets to the 1-hop neighbors of the failed node. The most com- mon approach to localized ooding among repair protocols is hop-limited ooding. In this approach, a hop count (or time to live) eld is incremented on each hop. When the hop count exceeds a threshold value, the ooded packets are dropped. Another straightforward approach is to limit the ooding to 1-hop neighbors of the failed node. This can be easily implemented by including the ID of the failed node on the repair packets and restricting ooding to nodes who have overheard packets from that failed node. A third and more sophisticated strategy is to limit ooding to nodes in the one or two clusters around the failed node. Reconnect Interests The rst step in localized gradient repair is the local ooding of reconnect interest mes- sages. These interest messages for the broken data ow are essentially searching for nodes 64 upstream from the break which are still receiving data. Reconnect interests are only trans- mitted in a limited neighborhood (as de ned by the localized ooding algorithm) around the node nearest to the break (identi ed during the localization phase). This originator node acts like the sink in global gradient repair. For this reason, we label it the proxy sink in local gradient repair. Nodes outside the area determined by the local ooding al- gorithm will drop reconnect interests, enforcing the region boundaries. Reconnect interests are ooded in the region near the break in search of upstream nodes still connected to the data ow. Reconnect Exploratory Data In response to reconnect interests, upstream nodes on the data path that are still re- ceiving data transmit reconnect exploratory data messages to neighbors that send reconnect interests. Reconnect exploratory data is exploratory data that is sent back to the proxy sink. In order to perform the most localized repair, the node directly upstream from the break should be found. This node will act like the source in global gradient repair, so we call it the proxy source in local gradient repair. To achieve this behavior, reconnect exploratory data messages are only sent from nodes that have not overheard reconnect exploratory data. Thus, the rst node to send reconnect exploratory data will become the proxy source. In summary, only nodes along the original path can send reconnect exploratory data. The node nearest upstream to the break should be identi ed as the proxy source since it should be the rst node along the existing path to receive reconnect interests. LRDD will still function if another node further upstream from the break is identi ed as the proxy source. 65 The proxy source will send reconnect exploratory data back to the interest initiator (the proxy sink). Reconnect Reinforcement In global gradient repair, the sink reinforces the fastest path over which the exploratory data is received. Correspondingly, in the case of local repair, the proxy sink sends reconnect reinforcement messages to the rst neighbor that delivers a reconnect exploratory data. In turn, this neighbor node will reinforce its fastest neighbor all the way back to the proxy source. While this path is probably not optimal, it serves as a temporary x until the next global gradient refresh. Since the search for the new path was restricted to a relatively small region, it is almost identical to the original optimal path found by di usion except for a few hops where the repair was performed. As a result, the repaired path should provide a reasonably low-latency route from source to sink. 4.3 Real-Time Communication Mechanism (RTDD) To support timely communication, we have developed a real-time communication ser- vice for directed di usion similar to RAP [56]. Recall from Section 3.5.1 that the RAP architecture de nes a ve-layer network stack. Many of the layers in the RAP network stack are handled natively by di usion. The top layer of RAP, a query-event service that allows events to be registered and triggered, is inherently provided by the data-centric, publish-subscribe nature of di usion. Di usion operates without a transport layer, so RAP?s Location-Addressed Protocol is unnecessary in di usion. Di usion also handles the respon- sibilities assigned to Geographic Forwarding in RAP since di usion routing is based on 66 attribute vectors. Although routes in di usion are more costly to establish due to global ooding, no location information is required. RAP, in contrast, requires each node to know its own location requiring costly GPS hardware on every node. The core component of RAP is Velocity Monotonic Scheduling (VMS), the distance-aware and deadline-aware scheduling policy. VMS prioritizes messages by computing a velocity based on the packet?s distance to its destination and its temporal deadlines. Although packets are prioritized at the MAC and network levels in RAP, our implementation only performs prioritization at the network layer. While this may decrease the ability of RTDD to prioritize packets, it provides a clean separation of layers (i.e., RTDD, a network-layer protocol does not require changes in the MAC layer). The two-phase pull model of directed di usion may be easily extended to support real- time data ows. The primary additions to di usion required for RTDD are a prioritized queue and a scheduling policy. We have developed both static and dynamic scheduling policies for RTDD equivalent to SVM and DVM in RAP. In addition, we have extended the protocol to compute priority without requiring each node to possess location information. We call these protocols Static Absolute Time (SAT) and Dynamic Absolute Time (DAT) since they are based on absolute time. We have also developed protocols based on relative time di erences: Static Relative Time (SRT) and Dynamic Relative Time (DRT). The advantage of using relative time is a decreased dependence on global time synchronization. RTDD can use any of the six algorithms for computing priority: SVM, DVM, SAT, DAT, SRT, or DRT. If location information is known by each node, SVM or DVM may be used to prioritize packets. Otherwise, one of the time-based protocols will be utilized. 67 4.3.1 SVM and DVM To provide support for deadlines, RTDD simply adds a few new attributes to a data ow. Before interests are ooded from the sink node, the location of the sink is added to the packet as a new attribute/value tuple. Exploratory data then ows back to the sink from the source. Next, reinforcements are sent along the fastest path between sink and source. As the data is published at the source, RTDD computes the priority of the packet based on the deadline (supplied by the application) and the location of the sink (supplied by the interest packet). The priority is computed as the distance between the source and sink divided by the deadline (Equation 4.2). V = kpsrc pdstkT deadline (4.2) For DVM, this value is updated at each intermediate hop based on the current progress of the packet (Equation 4.3). V = kpi pdstkT deadline Ti (4.3) To enable this dynamic calculation, the source must timestamp the data packets before they are transmitted. This allows the intermediate nodes to compute the time expired since the packet was sent (Ti in Equation 4.3). Packets are then queued in priority order at each hop and retransmitted in prioritized order. Note that DVM requires intermediate nodes to know their location so that the updated priority may be computed. Also notice that DVM assumes global time synchronization in order to compute time di erences at each intermediate node. 68 4.3.2 SAT and DAT If location information is not available, a time-based approach is used to estimate the distance from source to sink. Instead of appending location information to interest messages, nodes add timestamps to the packets. Since di usion creates routes using a two- phase packet exchange, the time delay between source and sink can be estimated without introducing signi cant overhead. When reinforcement messages are received at the source, the time delay between the source sending the exploratory data and the sink sending the reinforcement message is computed. We assume this time di erence is proportional to the distance from source to sink. Note that this approach also assumes time synchronization among the nodes. Several time synchronization protocols for sensor networks have been proposed [62] [63] [64] [65], so this requirement is not infeasible. Like the location-based protocols, two versions of the absolute-time-based algorithm are also de ned. The priority may be calculated statically at the source (SAT) or dynamically at each hop (DAT) based on absolute time. In the former case, the priority is computed according to Equation 4.4. P = tsink tsourceT deadline (4.4) In this equation, tsource is the time the exploratory data packet was sent from the source, tsink is the time the reinforcement message was sent from the sink, and Tdeadline is the deadline of the data ow (in units of time, not a timestamp). The time-based protocols compute a priority value that is a ratio of times. In the dynamic case, the priority of a packet is re-calculated at each hop. If the source timestamps exploratory data messages and the sink timestamps reinforcement messages, 69 then each node along the reinforced path can calculate the time delay from itself to the sink. Each node along the data path must cache the delay between the most recent exploratory data and the reinforcement message in order to support distance awareness. Data messages must also be timestamped by the source so that intermediate nodes can calculate elapsed time for a given packet. The elapsed time is subtracted from the deadline to gauge the deadline urgency of the message. DAT calculates this priority value using Equation 4.5. P = tsink tiT deadline Telapsed;i (4.5) In this case, ti represents the time the exploratory data packet was sent from the intermediate node i, tsink is the time the reinforcement message was sent from the sink, Tdeadline is the deadline of the data ow, and Telapsed;i is the time elapsed in sending the data packet to hop i. Elapsed time, Telapsed;i, is computed as Telapsed;i = tnow tdata where tnow is the current time and tdata is the time the data packet was sent from the source. To simplify the protocol, we estimate the delay between intermediate node and sink (the numerator of Equation 4.5) as the total end-to-end delay minus the elapsed time, or more formally, tsink ti tsink tsource Telapsed;i. Thus, Equation 4.5 becomes P = tsink tsource Telapsed;iT deadline Telapsed;i (4.6) 4.3.3 SRT and DRT As an improvement upon SAT/DAT, we have also developed variants of RTDD which compute priorities based on relative time di erences. The static and dynamic versions of the relative time di erence protocols are SRT and DRT. The primary advantage of using 70 relative time is the reduced dependency on time synchronization. SRT and DRT use the round trip time as a measure of path length (as opposed to end-to-end delay in SAT/DAT). In SRT, the source stores timestamps when it sends exploratory data and when it receives the reinforcement message. Since both time measurements are taken at the source, global time synchronization is not be necessary. Similar to SAT, the priority of packets in SRT is computed statically at the source as the quotient of the round-trip delay and the deadline as shown in Equation 4.7. P = treinforcement; source texp: data; sourceT deadline (4.7) DRT computes the priority dynamically at each hop based on the round-trip time and the elapsed time. At hop i, the priority is computed according to Equation 4.8 P = (treinforcement; source texp: data; source) Telapsed;iT deadline Telapsed;i (4.8) Note that in Equations 4.7 and 4.8, treinforcement; source texp: data; source represents the time between when the source sends exploratory data and when it receives a reinforcement, i.e., the round trip time. Again, also note that all the dynamic versions of RTDD (DVM, DAT, and DRT) require time synchronization in order to calculate the elapsed time. The time-based protocols signi cantly enhance the applicability of RTDD for various network environments. The time-based techniques achieve the same goal, packet priori- tization based on both distance and deadline, but do so without the strong localization requirement of SVM and DVM. The numerators in Equations 4.4 - 4.8 correlate to distance 71 awareness, and the denominators encapsulate deadline awareness. Although our location- free design requires more communication overhead than the location-based algorithms, the communication is essentially free since the timestamp information is piggybacked on the packets involved in di usion?s two-phase path discovery protocol. No additional packets are required, only a slightly increased packet size. This trade-o may be advantageous for applications where location information is not available but time synchronization is possible. 72 Chapter 5 Network Services Design 5.1 Design Principles In designing the network services, we were guided by several principles. Our primary design goals were energy-e ciency, scalability, localization, distributability, real-time com- munication, and reactivity. Notice that our design principles are strongly correlated to the strengths of directed di usion. In this section, we describe each of these design goals and explain their implications on our design decisions. 5.1.1 Energy-e cient Power consumption is of paramount importance in sensor networks since devices usually cannot easily be recharged. Consequently, our design incorporates several features that promote energy-e ciency. The primary purpose of PCDD and the local ooding involved in LRDD is to save energy by minimizing communication. Communication is the primary energy expense in sensor networks, so reduced transmissions equate to saved energy. The energy required to transmit one bit is several orders of magnitude larger than the energy needed to perform one operation. According to [66] one ground to ground transmission of 1 kb over 100 m expends as much energy as processing 300 million instructions. Hence, reducing unnecessary communication is essential for sensor networks. PCDD accomplishes this by reducing the number of messages involved in route creation. LRDD reduces the number of messages required for route maintenance. The design of RTDD also emphasizes energy-e ciency in that no new communication overhead is required. Data for setting up 73 the real-time protocol is piggybacked on top of the standard two-way message exchange used by di usion for route discovery/refresh. Thus, all three network services work toward the goal of energy-e ciency. 5.1.2 Scalable Another general goal of our system and all sensor networks is scalability. Protocols should scale up to networks with large numbers of nodes. The reliance of standard directed di usion on global ooding greatly limits the scalability of the protocol since ooding is so costly for large networks. PCDD and LRDD both address the goal of scalability by reducing unnecessary transmissions. As the network size increases, these unneeded communications lead to network congestion. In su ciently large networks, ooding-induced congestion can completely cripple the network. Hence, e cient ooding is vital to the robust operation of large-scale networks. The ooding reduction mechanisms incorporated into PCDD and LRDD support the goal of scalability. 5.1.3 Localized Localized protocols maintain information about one-hop neighbors. This signi cantly reduces the amount of state information that must be stored and hence, improves the scal- ability and simplicity of the algorithm. By decreasing the complexity of an algorithm, the principle of localization simpli es the design and implementation. One of the best proper- ties of directed di usion is its completely localized nature. Since our network services are deeply integrated into di usion, they by design complement its localized aspects. Speci - cally, PCDD exhibits this localized nature in that nodes choose their state based only upon the states of their one-hop neighbors; two-hop neighbor information is not required. This 74 avoids the continual exchange of Hello messages, greatly reducing the communication overhead of PCDD compared to other e cient ooding algorithms. LRDD also supports the goal of localized interactions by localizing its repair work to the area immediately sur- rounding the failed node. Thus, the advantages of localized interactions are preserved by our algorithms. 5.1.4 Distributed The distributed nature of sensor networks is one of their most challenging and powerful characteristics. The goal is to fully distribute algorithms over all the nodes in the network, in contrast to typical algorithms which utilize the client/server model. Our protocols address this challenge by distributing tasks among all the nodes. This is particularly evident in PCDD where the clustered structure is created dynamically according to the rst declaration wins rule. Unlike many traditional algorithms, cluster formation in PCDD is completely distributed. Similarly, in LRDD, any node may potentially detect and repair a breakage. In this sense, all nodes act as adaptation servers forming a completely distributed adaptation service. LRDD distributes the responsibility for repair among all the nodes. The design of our protocols emphasizes distributed operation. 5.1.5 Real-Time Timely communication is an important goal for many applications. RTDD aims to achieve the goal of real-time communication. The prioritized queue gives preference to more urgent packets, helping them meet their deadlines. RTDD supports hard deadlines by dropping packets when their deadline has been exceeded. This further reduces congestion and gives other packets a greater chance of meeting their deadlines. RTDD signi cantly 75 enhances the capabilities of directed di usion, allowing the network to support time-critical data ows. 5.1.6 Reactive Our nal design goal is to develop reactive protocols and mechanisms. In general, reac- tive protocols are more suitable to sensor networks than proactive protocols since they are more energy e cient. Reactive protocols respond to events instead of proactively maintain- ing state information ahead of time. In energy-constrained systems, it makes little sense to continuously maintain routes if no data is being transferred over those routes. This is especially relevant in sensor systems which infrequently detect events of interest. PCDD and LRDD particularly incorporate reactive features in their designs. PCDD reactively creates clusters in response to and through the use of ongoing packet trans- missions. Since packets are not exchanged beforehand, no energy is consumed until it is necessary. LRDD responds to repair broken links by reactively repairing them. Unlike standard directed di usion which periodically repairs broken links, LRDD is fundamen- tally a reactive technique. In actuality, LRDD is a reactive version of di usion?s own route discovery algorithm but is performed locally instead of globally. 5.2 Design Decisions 5.2.1 Clustering Mechanism (PCDD) The compatibility of passive clustering and di usion was recognized by Handziski et. al [67] who rst implemented passive clustering over directed di usion. Passive clustering is especially well-suited to sensor networks due to its localized, reactive, and fully distributed 76 nature. All state changes are made using local knowledge gained by overhearing neighboring nodes. Passive clustering?s reactive cluster formation is well-matched to di usion?s reactive publish-subscribe model. Furthermore, the fully distributed nature of passive clustering maps nicely to any sensor network routing protocol, especially di usion. No client/server message exchanges are needed. Passive clustering does not need periodic messages, but instead takes advantage of existing packets. Finally, the protocol is very resource-e cient regardless of the size or density of the network. 5.2.2 Repair Mechanism (LRDD) Break Detection While break detection was not the focus of our research, we considered several potential methods for detecting a broken link. One approach is to calculate an expected time for data events to be received from each neighbor. The source may monitor its output and calculate an outgoing event rate for each gradient. It will then append this event rate to each data packet. Intermediate nodes need only to store the data rate of the packet and the time it was received. If the interval has been exceeded, a break will be detected. Another strategy for break detection is for intermediate nodes to calculate expected receive times based on incoming packet reception rates. Event intervals are calculated by monitoring the input from each neighbor at intermediate nodes. The advantage of this approach is that it is completely localized and distributed. Another possibility is to use physical layer metrics (e.g. signal strength, or SNR) to detect failing links. Given event rate Tr, we detected link breakage when a packet was not received in 1:5 Tr. More complex event interval determination algorithms should be investigated for production systems. 77 Break Localization Break localization may be carried out di erently depending on the break detection scheme used. If physical layer metrics are used to identify breakages, then our break local- ization scheme may be completely unnecessary. Our design assumes no access to physical layer information and no explicit network layer acknowledgments. Instead, our scheme uti- lizes an implicit timeout procedure to detect link failure. The advantage of this approach is that it may be broadly used regardless of physical or MAC layer di erences. Since no phys- ical or MAC layer assumptions are made, our repair protocol will support a wide variety of architectures. Local Gradient Repair In order to restrict the ooding required to nd new paths during local gradient repair, we restrict packet forwarding in one of several ways. Flooding boundaries may be found using the clusters created through the passive clustering protocol, a hop-limited ood, or the failed node ID. These mechanisms restrict the ooding of interest and exploratory data to a limited area. Our design separates the local ooding algorithm from LRDD. By decoupling the local ooding mechanism from the repair protocol, we gain several bene ts. First, from a software engineering perspective, the modularized software is easier to write, debug, and maintain. Secondly, our repair algorithm can be paired with any local ooding strategy. Thus, our design may easily be extended by more complex local ooding algorithms. Thirdly, the decoupling allows multiple local ooding algorithms to be used simultaneously. Multiple algorithms could be stacked on top of each other or used in di erent regions of the same 78 network. For example, an algorithm could be adaptively chosen based on local conditions, e.g., network density or congestion. 5.2.3 Real-time Communication Mechanism (RTDD) The prioritized queue is the essential component of RTDD. Implementing it proved to be a challenging task because several di erent approaches were possible. We recognized three options for the implementation of a prioritized queue in the di usion API. 1. Delay-Based: Set event delay times in proportion to priority levels. 2. Block-Based: Prioritize all events that have expired during the receiving period. 3. Timer-Based: Send the highest priority packet after the expiration of a recurring send timer. The delay-based scheme uses priority to compute the delay time associated with a particular packet. The delay time is inversely proportional to the priority. Hence, high priority packets have a lower delay while low priority packets are assigned a longer delay time. The second option is to prioritize blocks of packets that have expired timers. Di usion switches from receiving and sending when there are no packets to receive. It then processes all expired timers, i.e., all packets in the event queue with expired send times. This block of packets is usually sent in the order it was received. Using the block-based approach, the block of expired packets will be sent in priority order. Finally, the timer-based approach uses a timer to repeatedly send one packet every ttimer milliseconds. When packets are received, they are added to a queue in priority order. In a separate thread, a recurring timer sends the highest priority packet each time the timer expires. 79 The delay-based approach has one major drawback { it delays low priority packets when there are no high priority packets to be sent. Hence, low priority packets are penalized unnecessarily and experience greater latency as a result. The block-based scheme also presents signi cant di culties. It prioritizes within very small blocks of packets because the network so frequently switches between send and receive mode. This leads to a very small granularity of prioritization which may be almost imperceptible to the application. The prioritization occurs at such a minute level that it provides no bene t whatsoever. Because of these problems, we chose to implement the timer-based algorithm. While this results in the addition of a timeout parameter, it is superior to the other two approaches since it e ectively prioritizes packets without unnecessarily delaying low priority packets. We describe the implementation this approach in more detail in Section 6.3.3. The time-based protocols compute priority based on the deadline as given by the appli- cation and the distance from source to sink as estimated by communication delay. Although several message exchanges occur in di usion route creation, we choose to measure the time delay of the exploratory data message as it is sent from source to sink. We chose this message transmission because it follows the same path and direction as the actual data. DAT and DRT re-calculate the priority of a packet at each intermediate node based on the elapsed time. As the most complicated cases, they posed the most design options. The primary problem was in calculating the time to the sink (the numerators in Equation 4.5 and 4.8). We considered two approaches: stateful and stateless. In the stateful approach each intermediate node along the path from sink to source must maintain a time value for every data ow passing through it. The value may be computed by storing tnow tsink for each reinforcement message received. This represents the time from node i to the sink. These 80 values should also be indexed by data ow. In the stateless approach, all the information needed to compute priority is stored within each data packet. Using this strategy, the time to the sink is estimated by subtracting the elapsed time from the total end-to-end time, tnow tsink Telapsed. This gives an approximation for the time from the intermediate node to the sink. The stateful approach is more complex and requires greater storage requirements per node. However, it provides a better estimate of the time to the sink. The stateless algorithm is simpler to implement and more scalable but may be less accurate. We chose the stateless approach due to its simplicity and scalability. It provides su cient accuracy, especially over longer durations. 81 Chapter 6 Network Services Implementation 6.1 Clustering Mechanism (PCDD) We implemented passive clustering using the Filter API provided by ISI di usion [68]. This required the creation of a new di usion attribute and a program with two lters. The attribute was used to relay information about the passive clustering state of each node to neighboring nodes. The lters intercept packets and update state information based on the PC state of the previous hop. 6.1.1 Passive Clustering Attribute In order to communicate information about the passive clustering state of each node, we de ned the class PCInfo t to encapsulate all the passive clustering status information including state, node ID, and the cluster information. The PCInfo t class is de ned and explained below. class PCInfo_t \{ int nodeID; int state; int CH1; int CH2; timeval ts; void fromAttr(NRSimpleAttribute *attr) }; nodeID - The di usion ID of the node 82 state - An integer representing the external passive clustering state of the node CH1 - The ID of the node?s primary cluster head CH2 - The ID of the node?s secondary cluster head; For CH and OR nodes, CH2=-1 ts - A timestamp corresponding to the last received message from the node void fromAttr(NRSimpleAttribute *attr) - Converts attribute attr to a PCInfo t object. To transport the PC information in di usion packets, we pack the PCInfo t object into a PC attribute and push the attribute onto the attribute vector of the message. The PC attribute is de ned as follows: #define PC_STATE_KEY 5000 NRSimpleAttributeFactory PCAttrFactory(PC_STATE_KEY, NRAttribute::BLOB_TYPE); Upon reception of packets with this attribute, nodes unpack the data into a PCInfo t object using its fromAttr() method. Nodes maintain a table containing PC information about each of their neighbors. The neighbor state table is updated on the reception of every ooded packet. The neighbor table is implemented in the class PCNodeList as shown below. The PCNodeList class stores information about the passive clustering state of neigh- boring nodes as an STL list of PCInfo t objects. The core functionality provided by the class includes adding/updating neighbors as well as counting the number of neighbors in a particular state. The PCNodeList class also has helper methods that analyze the neighbor list to determine appropriate state changes. 83 class PCNodeList{ list neighbors; /* Core methods */ void update(int nodeID, PCInfo_t pcInfo); int count(int state); /* Helper method */ bool anyTwoCHsNotConnected(int &ch1, int &ch2); }; update(int nodeID, PCInfo t pcInfo) - Adds/updates PC information of node nodeID with PC status pcInfo to the neighbor list count(int state) - Returns the number of nodes in state state in the neighbor list anyTwoCHsNotConnected(int &ch1, int &ch2) - Returns true if any pair of known CHs are not connected by a GW and false if all pairs are connected. If a disconnected pair exists, ch1 and ch2 are set to their IDs. 6.1.2 Passive Clustering Filters The PCInfo t and PCNodeList classes are extensively used in the implementation of the passive clustering lter program. Our implementation utilized two lters in one program to intercept messages before and after the gradient lter. In the pre-gradient lter, nodes update their internal state to CH READY or GW READY according to the state of their neighbors. If no other CHs have been overheard, a node will enter CH READY. Otherwise, it will enter GW READY. Algorithm 1 summarizes the internal state update process that is executed when packets are received. 84 Algorithm 1: Incoming Packet Processing for Passive Clustering Input: Message, Neighbor State Array, MyState Output: Internal State switch MyState do case INITIAL if num(CH) = 0 then InternalState = CH READY else InternalState = GW READY break case CH if Message.State!=CH then InternalState = CH READY else if myID < Message.ID then InternalState = CH READY else InternalState = GW READY break case FULL GW InternalState = GW READY break case CH READY InternalState = GW READY break end In the post-gradient lter, the external state of the node is determined and added to the outgoing packet. Nodes in CH READY become CHs if they have not heard from any other CHs. Nodes in GW READY enter FULL GW or OR depending on the number of CHs and GWs among their neighbors. Algorithm 2 concisely summarizes the outgoing packet processing. 85 Algorithm 2: Outgoing Packet Processing for Passive Clustering Input: Internal State, Neighbor State Array Output: External State if InternalState = CH READY then if num(CH) = 0 then ExternalState = CH else if InternalState = GW READY then InternalState = GW READY if InternalState = GW READY then if num(CH) > 1 then if Any two CHs are not connected by any known gateway then ExternalState = FULL GW else if numCH + RepairNotificationAttr( REPAIR_NOTIFICATION_KEY, NRAttribute::STRING_TYPE); NRSimpleAttributeFactory ReconnectAttr(RECONNECT_KEY, NRAttribute::INT32_TYPE); 6.2.2 LRDD Filters The majority of local repair is implemented in one program with two lters: a pre- gradient lter and a post-gradient lter. The pre-gradient lter handles the bulk of the work. It sets data timeouts after each data packet is received. If the data timeout expires without receiving a new data packet, repair noti cations are sent one-hop downstream, and a repair noti cation timer is set. If it expires and no repair noti cation is received, the node becomes the proxy sink and oods interest messages with a reconnect attribute set 87 to the failed node ID. The post-gradient lter maintains a list of upstream neighbors and downstream neighbors in order to detect link breakages. This lter also drops reconnect exploratory data at the proxy sink and reconnect reinforcements at the proxy source. Interests and exploratory data are normally created by the di usion routing (dr) object. To support reactive repair we added the following two new methods to the dr class: int reconnectPublish(NRAttrVec *attrs) int reconnectSubscribe(NRAttrVec *attrs) These methods allow us to send reconnect packets when a break is detected. The LRDD lter at the proxy sink invokes reconnectSubscribe() to send reconnect interests, and the proxy source invokes reconnectPublish() to transmit reconnect exploratory data in response to reconnect interests. The proxy sink sends a reconnect reinforcement mes- sage to the rst neighbor that sent it a reconnect exploratory data message. Algorithm 3 summarizes the logic involved in the pre-gradient lter. 88 Algorithm 3: Pre-Gradient Processing for LRDD if Repair Noti cation then Update State as Not Proxy Sink else if Data Message then Update Receive Data Time Reset Expected Data Timer else if Reconnect Interest then if On Original Data Path and Still Receiving Data then Become Proxy Source Flood Reconnect Exploratory Data else Forward Reconnect Interest else if Reconnect Exploratory Data then if ProxySink then if ! Received Reconnect Exploratory Data then Send Reconnect Reinforcement else if Received Reconnect Interests then Forward Reconnect Exploratory Data else Drop Reconnect Exploratory Data else Forward Message Algorithm 4 shows the logic involved in the post-gradient lter. Notice that outgoing packets require much less processing. The post-gradient LRDD lter stores the neighbor in- formation for upstream and downstream nodes along the data path. It also drops reconnect exploratory data and reconnect reinforcement packets when appropriate. 89 Algorithm 4: Post-Gradient Processing for LRDD if Data Message then Save Next Hop in Downstream Neighbor Table Save Previous Hop in Upstream Neighbor Table else if Reconnect Exploratory Data then Drop Message else if Reconnect Reinforcement then if ProxySource then Drop Message else Forward Message 6.2.3 Local Flooding LRDD makes no attempt to restrict the ooding of reconnect packets. To handle this task, we have written two additional programs. The node ID lter creates a list of neighbors from which a node has received any message. It drops ooded reconnect messages that contain a reconnect attribute with a failed node ID not in the list of known neighbors. This essentially restricts reconnect messages to the one-hop neighbors of the failed node. In a su ciently dense network, a path back to the data ow may be found in this group of nodes. In sparse networks, ID-based local ooding may restrict ooding to such a degree that repair is impossible. A second simple method to limit the ooding of reconnect messages is to limit the number of hops a packet may travel. The hop lter appends a hop attribute containing the maximum number of hops that the packet may travel. Packets from foreign hosts are dropped if the decremented hop count reaches zero. Otherwise, messages are forwarded with a decremented hop count. This approach works in sparse networks as long as a su ciently large initial hop count is used. The main disadvantage of the hop-count algorithm is the 90 di culty in selecting an appropriate value for the maximum number of hops. In dense networks, a large hop count may result in \local" ooding which is very expensive. 6.3 Real-Time Communication Mechanism Like the other network services, the real-time communication protocol, RTDD, was implemented in the di usion API using attributes and lters. 6.3.1 RTDD Attributes We de ned several new attributes for RTDD. These include attributes for deadline, priority, DVM, SAT, DAT, SRT, and DAT. SVM only needs a priority value so no explicit SVM attribute is necessary. To utilize RTDD, applications simply add a deadline attribute to their publication de nition. The deadline is an integer representing the deadline in milliseconds. The de nition of the deadline attribute is shown below. #define TIME_DEADLINE_KEY 7001 NRSimpleAttributeFactory TDeadlineAttr(TIME_DEADLINE_KEY, NRAttribute::INT32_TYPE); Several other attributes are necessary for the functioning of RTDD itself. We de ned one attribute for each variant of RTDD. These attributes encapsulate the data that must be communicated by each version of the protocol. RTDD uses the state information in each version?s attribute to compute the priority and write it to the priority attribute. Thus, the priority attribute is used by all six versions of RTDD. These RTDD attribute de nitions are shown below. 91 #define PRIORITY_KEY 7002 #define DVM_KEY 7003 #define SAT_KEY 7004 #define DAT_KEY 7005 #define SRT_KEY 7006 #define DRT_KEY 7007 /* SVM, DVM, SAT, DAT, SRT, DRT */ NRSimpleAttributeFactory PriorAttr(PRIORITY_KEY, NRAttribute::FLOAT32_TYPE); /* DVM */ NRSimpleAttributeFactory DVMAttr(DVM_KEY, NRAttribute::BLOB_TYPE); /* SAT */ NRSimpleAttributeFactory SATAttr(SAT_KEY, NRAttribute::BLOB_TYPE); /* DAT */ NRSimpleAttributeFactory DATtAttr(DAT_KEY, NRAttribute::BLOB_TYPE); /* SRT */ NRSimpleAttributeFactory SRTAttr(SRT_KEY, NRAttribute::BLOB_TYPE); /* DRT */ NRSimpleAttributeFactory DRTAttr(DRT_KEY, NRAttribute::BLOB_TYPE); As the simplest case, SVM requires only one oating point number to be calculated at the source and sent to the destination { no state information is needed. The other ve protocols use the priority attribute to store the calculated priority but also require another attribute to communicate state information. The DVM attribute contains the time the data packet was sent from the source. The SAT attribute holds a structure with the two timestamps used to compute end-to-end delay. The DAT attribute contains three timestamps representing the end-to-end delay timestamps and the time the data packet was sent. The SRT attribute stores the two timestamps taken by the source to compute the round trip time. Finally, DRT stores the three timestamps needed to compute the 92 round trip time and the elapsed time. The de nitions of the structures for the time-based protocols are shown below. typedef struct RAP_SAT { EventTime ts_src_expdata; EventTime ts_snk_reinforcement; }RAPSAT; typedef struct RAP_DAT { EventTime ts_src_expdata; EventTime ts_snk_reinforcement; EventTime ts_src_data; }RAPDAT; typedef struct RAP_SRT { EventTime ts_src_expdata; EventTime ts_src_reinforcement; }RAPSRT; typedef struct RAP_DRT { EventTime ts_src_expdata; EventTime ts_src_reinforcement; EventTime ts_src_data; }RAPDRT; 6.3.2 RTDD Filters Similar to PCDD and LRDD, RTDD is implemented using a pre-gradient lter and a post-gradient lter. Each variant essentially performs the same three steps. We explain the di ering details of each version in the following subsections. 1. Add RTDD information to packets at source/sink. 2. Extract RTDD information from packets at source. 93 3. Compute priority and append it to outgoing data packets. SVM In SVM, the sink adds its location information (latitude and longitude) to outgoing interest packets. The source extracts the location information and uses it along with the deadline supplied by the application to compute the priority according to Equation 3.1. This priority is stored in the priority attribute by the source and used to prioritize the data packet at each intermediate hop. DVM In DVM, the sink adds its location information to outgoing interest packets. The source extracts the location information and uses it along with the deadline supplied by the application to compute the priority according to Equation 3.2. The source also timestamps the packet so that intermediate nodes can compute the elapsed time. Upon receiving a data packet, intermediate nodes extract the location of the sink and the timestamp. They use this information along with their location to update the priority again using Equation 3.2. SAT In SAT, the source adds a timestamp to outgoing exploratory data and the sink adds a timestamp to outgoing reinforcement messages. When the source is ready to send data, it computes the di erence of these two timestamps and divides it by the deadline to nd the priority (Equation 4.4). The priority is stored in the priority attribute and used at each intermediate hop for queue prioritization. 94 DAT In DAT, the source adds a timestamp to outgoing exploratory data and the sink adds a timestamp to outgoing reinforcement messages. The source appends a timestamp to the data packet corresponding to the time it was sent. The time di erence and deadline are used to compute the initial priority, which is written to the priority attribute. Intermediate nodes subtract the elapsed time from the end-to-end delay and deadline to update the priority at each hop (Equation 4.6). SRT In SRT, the source stores a timestamp when exploratory data is transmitted and then saves another timestamp when the rst reinforcement message is received. When the source is ready to send data, it computes the di erence of these two timestamps and divides it by the deadline to nd the priority (Equation 4.7). The priority is stored in the priority attribute and used at each intermediate hop for queue prioritization. DRT In DRT, the source stores a timestamp when exploratory data is transmitted and the saves another timestamp when the rst reinforcement message is received. The source appends a timestamp to the data packet corresponding to the time it was sent. The time di erence and deadline are used to compute the initial priority, which is written to the priority attribute. Intermediate nodes subtract the elapsed time from the round trip delay and deadline to update the priority at each hop (Equation 4.8). 95 6.3.3 Prioritized Queue We implemented the priority queue using the timer-based approach introduced in Sec- tion 5.2.3. The implementation involved changing the typical behavior of di usion lters that forward messages from the receive thread. Instead, we added each received message to a priority queue. The queue was implemented as a linked list of PriorityQueueEvent objects as de ned below. class PriorityQueueEvent{ public: Message *msg; handle h; double priority; PriorityQueueEvent *next; }; Each data message received by the receive thread of the RTDD lter was added to the queue using the insert() method. The run() method of the RTDD lter invoked the send() method that served as the sending thread. It dequeued the rst element in the priority queue (the highest priority message), sent it, and re-scheduled another send event in ttimer milliseconds. The de nition of the PriorityQueue class is shown below. Besides the basic, insert, and dequeue operations, we also wrote several utility methods isEmpty(), print(), and length(). class PriorityQueue { PriorityQueueEvent *head_; public: PriorityQueue(); 96 void insert(Message *msg, int handle, double priority) ; PriorityQueueEvent * dequeue(); bool isEmpty(); void print(); int length(); }; After the highest priority packet has been sent, the send() method waits for ttimer milliseconds before sending another packet. The value of ttimer determines the granularity of prioritization and a ects the maximum bandwidth of the network. Larger values result in greater amounts of prioritization at the cost of throughput. Smaller values of ttimer allow more packets to be sent, but also reduce the amount of prioritization possible. An appropriate value should be set based on the bandwidth needed for the application. 97 Chapter 7 Performance Evaluation 7.1 Simulation Setup We used ns-2.29 to simulate all three network protocols. We used the 802.11 MAC layer with directed di usion version 3 which is supplied with ns2. Table 7.1 shows the speci c settings used in our ns2 simulations. Simulation parameters unique to each protocol are explained in their respective sections. Table 7.1: ns2 Channel Parameters Parameters Value Channel Channel/WirelessChannel Propagation Model Propagation/TwoRayGround Physical Medium Phy/WirelessPhy MAC Layer Mac/802 11 Queue Type Queue/DropTail/PriQueue Link Layer LL Antenna Antenna/OmniAntenna Our energy model follows the standard energy usage model for ns2. The parameters and values are shown in Table 7.2. Table 7.2: ns2 Channel Parameters Parameters Value Transmission Power 0.660 Reception Power 0.395 Idle Power 0.035 98 7.2 Clustering Mechanism (PCDD) We evaluated our implementation of PCDD using several metrics. Since the major goal of passive clustering is to improve ooding performance, the primary metric of interest is ooding e ciency. We measured this in terms of the total number of ooded packets transmitted and the average energy consumed per node. We also measured the delivery ratio, the end-to-end delay, and the probability of disconnection for each simulation. Our results show that PCDD signi cantly reduces the number of interests and ex- ploratory data messages transmitted while also providing high delivery ratios and low end-to-end delays. The packet reduction results in a better average energy consumption, especially in dense topologies. The only disadvantage of PCDD is a slightly increased prob- ability of disconnection. However, this probability is acceptable given the greatly improved network performance. 7.2.1 Experiment Setup To test the performance of PCDD, we generated topologies with n nodes randomly dispersed over a 1000 m x 1000 m eld. The nodes had a transmission radius of 250m. The number of nodes n2f25, 50, 75, 100, 175, 250g. Five topologies were generated for each value of n. Each topology was used in 30 independent simulation runs of 1000 seconds. In our results, we computed the average of the 30 runs and then averaged the 5 means for the n-node topology. Hence, every data point is a result of 150 (30 * 5) runs of the simulator. Since the area was held constant and the number of nodes varies, we are e ectively changing network density. 99 The application used for testing was a constant bit rate sender which sent one 1024 B packet every 5 seconds to the receiver for the entirety of the simulation time. We tested two cases: a scenario with one data ow and a scenario with two data ows. For the 1- ow case, we selected node 1 to be the sender and node n to be the receiver. In the 2- ow case, we also choose node 2 to be a sender and node n 1 as the receiver (for the second ow). Since the topologies are randomly generated, this procedure essentially creates two one-to-one data ows which are randomly located. In our results, we plot the average of Flow 1 and Flow 2 results. We compare PCDD to the worst case ooding scenario (blind ooding), the near best case scenario (a near optimal connected dominating set), and another local knowledge-based e cient ooding protocol (probabilistic ooding). Blind ooding, performed by standard di- rected di usion, represents the worst case since no e ort is made to reduce redundant trans- missions. To nd the near best case ooding scenario, we used an evolutionary approach to nd the minimal set of nodes needed for complete connectivity, a connected dominating set (CDS), for a given topology. We also compared PCDD to probabilistic ooding. Like PCDD, probabilistic ooding only utilizes information about 1-hop neighbors and operates on-line (i.e., it does not require set up ahead of time). Thus, we evaluated the performance of PCDD relative to the near best and worst possible e cient ooding algorithms as well as an equivalent e cient ooding technique. 7.2.2 Flooding E ciency Primarily, PCDD provides improved ooding performance by reducing the number of redundant transmissions of ooded packets. To evaluate ooding e ciency, we measured 100 the total number of interest and exploratory data packets transmitted during the simula- tion. The total number of interest packets transmitted during the simulation is plotted against the number of nodes (n) for both the MAC layer (Figure 7.1) and Routing layer (Figure 7.2). While not as good as the CDS and probabilistic algorithms, PCDD provided a signi cant improvement over standard di usion in every case. PCDD does not perform as well as probabilistic ooding in terms of interests because PCDD uses interests to learn the clustered structure of the network. Exploratory data receives the bene t from this \learning" phase of PCDD. Furthermore, probabilistic ooding, in this case, has an unfair advantage over PCDD in that we tuned the ooding parameter o ine. Hence, we choose the lowest forwarding probability based on empirical tests run beforehand. PCDD had no such foreknowledge. Figures 7.3 and 7.4 shows the number of interests transmitted in the 2- ow scenarios. These results follow the same trends as the 1- ow scenarios previously described. PCDD outperforms standard di usion by about 20% while CDS and probabilistic ooding perform signi cantly better. This performance advantage does not carry over to exploratory data, however. The exploratory data packets show an even more dramatic reduction with PCDD. Figures 7.5 and 7.6 depict the number of exploratory data messages transmitted at the MAC layer and Routing layer respectively for the 1- ow scenario. The 2- ow scenarios exhibit the same behavior. PCDD soundly outperforms all the other protocols. These results are shown in Figurse 7.7 and 7.8 for MAC and Routing layers. PCDD provides dramatically increased ooding e ciency on the order of 46% fewer ooded routing-layer packets and 86% fewer MAC-layer ooded packets over all network 101 Figure 7.1: Number of MAC-layer interest messages versus number of nodes (1 ow) sizes in the 1- ow scenarios. On average, the number of interest packets was reduced by 22% and the number of exploratory data messages was reduced by 98%. In the 2- ow cases, the number of interest packets was reduced 24% and exploratory data was reduced 99%. The total number of ooded packets was reduced 48% (MAC) and 87% (Routing). Since PCDD learns the clustered structure of the network during the interest ooding phase of route discover, it is able to restrict the ooding of exploratory data more e ciently. This allows PCDD to perform much better during the second phase of ooding (exploratory data). 102 Figure 7.2: Number of Routing-layer interest messages versus number of nodes (1 ow) This ooding reduction directly correlates to energy savings since fewer transmissions mean less power is consumed. Figure 7.9 shows the average energy consumed per node over di erent topology sizes for the 1- ow scenarios. Due to the large number of packets transmitted by blind ooding, standard directed di usion performs poorly with increasingly dense networks. PCDD and CDS maintain almost constant energy consumption while probabilistic ooding has a slight increase in the larger topologies. Once again, the 2- ow scenarios follow the same general trends as the 1- ow results. Standard di usion performs worst, while CDS performs best, closely followed by PCDD, and probabilistic ooding. 103 Figure 7.3: Number of MAC-layer interest messages versus number of nodes (2 ows) 7.2.3 Delivery E ectiveness The packet delivery rates may be adversely a ected by the congestion caused by ood- ing. Since PCDD reduces ooding-induced congestion, delivery rates can be improved when e cient ooding is performed. Thus, a secondary bene t of PCDD is increased delivery e ectiveness in large, highly-connected networks. To measure delivery rates, we computed the delivery ratio, i.e., the number of packets received versus the number of packets sent. The delivery ratio for varying numbers of nodes n is shown in Figure 7.11. Not surprisingly, the near optimal CDS produced the best delivery ratios, all better than 99%. PCDD closely followed this performance with delivery ratios greater than 98% 104 Figure 7.4: Number of Routing-layer interest messages versus number of nodes (2 ows) even in the densest topologies. The performance of standard directed di usion, in contrast, decreases as the number of nodes increases due to the e ects of congestion. In the densest networks, di usion only delivered 93% of the packets. Probabilistic ooding was the worst performer, however, with delivery ratios ranging from 79% to 90%. The results of the 2- ow scenarios are shown in Figure 7.12. Like the 1- ow results, CDS performs the best, followed closely by PCDD. Directed di usion has acceptable performance at low densities, but su ers from congestion in the large topologies. Probabilistic ooding again performs the worst with delivery ratios between 80% and 92%. 105 Figure 7.5: Number of MAC-layer exploratory data messages versus number of nodes (2 ows) PCDD returned very high delivery ratios on par with optimal CDS based ooding. Thus, passive clustering provides signi cant advantages in terms of both ooding e ciency and delivery e ectiveness. These results imply greater potential for scalability of the network and greater robustness of performance. Our results corroborate with previous work with PC over di usion [67] in terms of improved ooding e ciency and delivery e ectiveness. 7.2.4 End-to-End Delay We also measured end-to-end delay of data packets. This metric gives another per- spective on the e ects of ooding-induced congestion. The average end-to-end delay of 106 Figure 7.6: Number of Routing-layer exploratory data messages versus number of nodes (2 ows) data packets for each set of 1- ow topologies is plotted in Figure 7.13. Notice that PCDD again closely follows the performance of the CDS algorithm. Standard di usion su ered from incredibly high delays in the large topologies (n = 250) with average delays of over 500ms. Probabilistic ooding also performed poorly in terms of delay. Its average delay times increased with increasing network density. The delays for the 2- ow scenarios (Figure 7.14) re ect similar trends as their 1- ow counterparts. CDS and PCDD are the best performers while probabilistic ooding is slightly worse. Notice that standard directed di usion su ers from massive delays (>2000ms) in the largest topologies. 107 Figure 7.7: Number of MAC-layer exploratory data messages versus number of nodes (2 ows) 7.2.5 Disconnection Probability One of the main disadvantages of PCDD is the possibility of disconnecting the network. If the gateway selection heuristic is overly aggressive, the network may be partitioned. This occurs because too many nodes are made ordinary nodes (i.e., removed from the ooding backbone). Figure 7.15 shows the average probability of disconnection for each network size. None of the other e cient ooding algorithms partitioned the network. PCDD, however, had a small probability of disconnecting the smaller topologies. For n = 25, 108 Figure 7.8: Number of Routing-layer exploratory data messages versus number of nodes (2 ows) PCDD disconnected 8.7% of the runs. In the larger topologies, disconnection was not a problem. In the 2- ow scenarios, disconnection presented more of a problem. As shown in Fig- ure 7.16, PCDD su ered from small levels of disconnectivity across all topology sizes. Once again, the smaller topologies were more prone to this problem with disconnection probabil- ities of 8.7% in the 25-node networks. This behavior occurs because the increased number of data ows creates more \critical" nodes, i.e., nodes that cannot be ordinary without interrupting the data ow. Hence, PCDD has more opportunity to interrupt a data ow by 109 Figure 7.9: Average energy consumed per node versus number of nodes (1 ow) wrongly identifying a critical node as ordinary. Although this behavior is disappointing, its relatively small rate of occurrence helps to mitigate the problem. 7.3 Repair Mechanism (LRDD) We evaluated the performance of LRDD by measuring the delivery ratio for data pack- ets and the overhead associated with local ooding. Since di usion has no reactive repair mechanism, our addition provided signi cant gains in terms of delivery e ectiveness. The cost of repair was generally low given the restricted ooding strategies we employed. We 110 Figure 7.10: Average energy consumed per node versus number of nodes (2 ows) explain the experimental scenario and discuss the performance of LRDD in terms of delivery e ectiveness and overhead. 7.3.1 Experiment Setup To test LRDD, we used a constant bit rate application that sent one 1024 B packet each second to one receiver application. We used ve topologies with 250 nodes randomly deployed over a 2000 m x 2000 m eld. To simulate the failures necessary to test LRDD, we generated failure scenarios consisting of a series of node failures over the course of the simulation. The node failures were drawn from an exponential distribution to model failures 111 Figure 7.11: Delivery ratio versus number of nodes (2 Flows) during the normal useful-life phase the system [69]. The mean time between failures for the exponential distribution was where 2f2.5, 5, 10, 15, 25 g. Five failure scenarios were generated for each value of . Our metrics were computed over a period of 300 simulated seconds for each topology and failure scenario pair. For each of the 25 pairs, we ran ve runs of the simulation with di erent initial seeds of the random number generator. Hence, each data point represents 125 runs of the simulation. We collected several metrics. First, we measured the number of data packets delivered to the sink. We also measured the network tra c involved in each run of the simulation. We were speci cally interested in the ooded packets (interests and 112 Figure 7.12: Delivery ratio versus number of nodes (2 Flows) exploratory data) since they are most relevant to gradient repair. We calculated the average energy consumed per node. Finally, we computed two metrics to evaluate the normalized overhead involved in LRDD: ooded packets per data packet and energy per data packet. We compare ve versions of LRDD to standard directed di usion. In LRDD with global ooding (LRDD-GF), reconnect packets were not restricted in any way and thus were ooded throughout the entire network. LRDD with ID-based local ooding (LRDD- ID) corresponds to the localized ooding strategy in which reconnect packets are only forwarded by nodes that are 1-hop neighbors of the failed node. We also tested LRDD with hop-based ooding (LRDD-Hop) with three di erent hop radii: 3, 4, and 5. In the gures, 113 Figure 7.13: End-to-end delay versus number of nodes (1 Flow) we denote the hop-limited protocols by appending the hop radius to the protocol name (i.e., LRDD-Hop3 means reconnect packets were forwarded in a 3-hop region). 7.3.2 Simulation Results Packets Received To measure LRDD?s ability to recover from failures, we measured the number of data packets delivered during the course of the 300 simulated seconds. Figure 7.17 shows the average number of data packets received by the sink for the ve topologies and ve failure scenarios. As expected, LRDD-GF performs the best since it has the best chance of repairing 114 Figure 7.14: End-to-end delay versus number of nodes (2 Flows) broken routes, albeit at a high cost (global ooding). ID-based LRDD performs second best followed closely by hop-based LRDD. ID-based LRDD performs better because it localizes the ooding to a region centered at the node which has failed. The hop-based protocols center their ooding at the proxy source and proxy sink and thus, have a lower probability of nding an alternate route around the failed node. We have summarized the packet reception data in Table 7.3. It contains the average number of packets delivered by each protocol over all values of . LRDD-GF performs best, followed by LRDD-ID. The hop-based protocols perform incrementally better with increasing hop size, and standard di usion performs worst. 115 Figure 7.15: Disconnection probability versus number of nodes (1 Flow) Flooded Packets The next metric we evaluated was the total number of ooded packets transmitted. This metric gauges the overhead associated with the algorithms. This number includes the additional reconnect interests and exploratory data created by LRDD, thus it measures the cost incurred by reactive repair. Figure 7.18 summarizes the average number of ooded packets for each value of . In this case, LRDD-GF is the worst performer since it produces the largest number of reconnect packets and does not attempt to restrict their transmission. The next worst performers are the hop-based local ooding algorithms in decreasing hop 116 Figure 7.16: Disconnection probability versus number of nodes (2 Flows) size. The ID-based ooding has very similar performance as the 3-hop ooding. Directed di usion is, naturally, the best performer since it transmits no additional reconnect packets. Table 7.4 shows the average number of ooded packets across all values of . The same trends are apparent as in Figure 7.18. Di usion has the lowest overhead and LRDD-GF has the highest. Notice that LRDD-ID narrowly outperforms LRDD-Hop3 in terms of the overall average. 117 Figure 7.17: Packets delivered for each repair and ooding algorithm Energy Next, we evaluated LRDD in terms of energy consumption. We computed the average energy consumed per node over the simulation time. Figure 7.19 contains the results across varying values of for each algorithm. These results necessarily correlate to the total ooded packets results discussed in the previous section. DD has the lowest energy consumption because it transmits the fewest number of packets. LRDD-GF performs worst, since it transmits the largest number of additional reconnect packets. The hop-based protocols perform slightly worse than the ID-based protocol except for a hop size of 3 where their performance is almost identical. 118 Table 7.3: Average packets delivered for all values of Repair and Flooding Algorithm Packets Delivered DD 213.10 LRDD-GF 237.18 LRDD-ID 233.62 LRDD-Hop3 216.47 LRDD-Hop4 219.03 LRDD-Hop5 221.14 Table 7.4: Average packets delivered for all values of Repair and Flooding Algorithm Flooded Packets DD 32088 LRDD-GF 63183 LRDD-ID 35664 LRDD-Hop3 35944 LRDD-Hop4 37682 LRDD-Hop5 39827 Table 7.5 shows the average energy consumption for each protocol over all failure scenarios. Di usion has the lowest, followed by LRDD-Hop3 and LRDD-ID. LRDD-Hop4 and LRDD-Hop5 have slightly higher energy consumption and LRDD-GF has the highest. Normalized Overhead In order to gain an understanding of the trade-o s associated with the additional overhead created by LRDD, we have computed several other derived metrics using the previously discussed measurements. First, we calculate the total number of ooded packets per data packet delivered. This essentially normalizes the additional cost by the additional bene t. Figure 7.20 shows the number of ooded packets per data packet successfully delivered to the sink. LRDD-GF has the worst performance. This means that the excessive 119 Figure 7.18: Total ooded packets for each repair and ooding algorithm ooding costs far outweigh the additional packets delivered. The hop-based versions of LRDD perform better than LRDD-GF, but not as well as LRDD-ID and standard di usion. LRDD-ID outperformed standard directed di usion at the lowest values of ( = 2:5 and = 5), but not at the larger values. Surprisingly, di usion performs quite well despite its low packet delivery rate. Its strong performance is due to its low overhead. LRDD-ID had superior performance at high failure rates (low values), but at lower failure rates, the bene t of route repair did not outweigh the cost of ooding. However, if recovery time is considered, di usion?s performance would not be strong since it only periodically repairs 120 Figure 7.19: Average energy consumed per node for each repair and ooding algorithm broken paths. On average, di usion will take 30 seconds to repair a broken path (since each refresh cycle is 60 seconds). In contrast, LRDD recovers from repairs in 2-3 seconds. Table 7.6 summarizes the normalized ooded packet overhead for all values of . LRDD-ID and DD have very similar performance (0.4% di erence). The hop-based versions of LRDD have slightly greater cost (11%-17% worse than directed di usion). LRDD-GF has the worst peformance with almost twice the number of ooded packets per each data packet delivered. Lastly, we computed the average energy consumed per node per data packet delivered. Like the previous metric, energy consumed per data packet gives an understanding of the 121 Table 7.5: Average packets delivered for all values of Repair and Flooding Algorithm Energy Consumed Per Node (Joules) DD 10.19 LRDD-GF 13.93 LRDD-ID 10.80 LRDD-Hop 3 10.72 LRDD-Hop4 11.07 LRDD-Hop5 11.40 Table 7.6: Average ooded packets per data packet for all values of Repair and Flooding Algorithm Flooded Packets Per Data Packet DD 45549 LRDD-GF 80247 LRDD-ID 45776 LRDD-Hop 3 50300 LRDD-Hop4 51828 LRDD-Hop5 54220 trade-o s associated with LRDD. In this case, we can evaluate the cost in terms of energy for the additional data packets delivered by LRDD. Figure 7.21 illustrates the average energy consumed per node per each data packet delivered to the sink. Generally, the worst performer was LRDD-GF since its overhead was so high. At the highest failure rate (lowest ), however, it gave the second best performance behind LRDD-ID. Overall, however, the best performers were LRDD-ID and DD. Once again, LRDD-ID bested DD at the two highest failure rates ( = 2:5 and = 5) but not at the other rates. LRDD-Hop3 gave strong performance at all but the lowest value of . The larger hop radii had worse performance however. To summarize, LRDD-ID had signi cantly better normalized energy consumption than di usion at the higher failure rates and only slightly worse performance at 122 Figure 7.20: Total ooded packets per data packet delivered for each repair and ooding algorithm the lower failure rates. LRDD-Hop3 was a close third place followed by the other hop-based protocols. Table 7.7 contains the average energy consumed per data packet across all failure rates. LRDD-ID is the best performer followed by DD. The hop-based protocols are next best in increasing hop size. LRDD-GF is unequivocally the worst. To summarize, LRDD signi cantly improves the delivery rate in the presence of node failures, however, a price must be paid for this improved behavior. We have developed several simple localized ooding techniques to reduce the cost of the protocol overhead. Our results show that the ID-based algorithm has the lowest overhead. On average, the 123 Figure 7.21: Average energy consumed per data packet delivered for each repair and ooding algorithm ID-based localized ooding gave the best or near best performance when normalized with respect to data packets delivered. The hop-based ooding also improves packet delivery rates but at a slightly higher cost than LRDD-ID. LRDD-GF o ers the best packet delivery rates but at unreasonably high cost due to global ooding. Overall, LRDD o ers improved delivery rates with acceptable overhead. 124 Table 7.7: Average energy consumed per data packet delivered for all values of Repair and Flooding Algorithm Energy Consumed Per Data Packet DD 14.64 LRDD-GF 17.59 LRDD-ID 13.94 LRDD-Hop 3 15.18 LRDD-Hop4 15.36 LRDD-Hop5 15.62 7.4 Real-Time Communication Mechanism (RTDD) To evaluate the e ectiveness of RTDD, we measured the on-time delivery ratio of packets received at the sinks. RTDD shows signi cant improvement over standard di usion in terms of timely delivery during congestion. In this section, we describe the experimental setup and compare the on-time delivery rates for directed di usion and RTDD. 7.4.1 Simulation Setup RTDD shows signi cant improvement over standard di usion in terms of timely delivery during congestion. We tested RTDD using the bottleneck topologies shown in Figures 7.22 and 7.23. In these topologies, two or three data ows shared the same three bottleneck nodes. The bottleneck topology was chosen in order to exaggerate the e ects of congestion. The metric used to evaluate RTDD was delivery ratio. Since late packets were actively dropped, packet delivery ratio corresponds to on-time delivery ratio. The sources ran a constant bit rate application that generated one 1024 byte packet every T milliseconds where T is a constant interval parameter summed with a 20% jitter term (T = t j). In Topology 1 t 2f38;48;54;60;69;89;125g, and in Topology 2 t 2 125 Figure 7.22: Topology 1: 2 Flows Figure 7.23: Topology 1: 3 Flows f40;45;50;75;100;125;150;200g. In both topologies j2[ 0:2 t;0:2 t]. We introduced the jitter term j in order to introduce randomness in the arrival rate. The source data rates r correspond to sending one 1024 byte packet every t milliseconds on average. For the tested values of t in Topology 1, r2f8:2;11:5;14:8;17:1;19:0;21:3;26:9gKBps. For the values of t in Topology 2, r2f5:1;6:8;8:2;10:2;13:7;20:5;22:8;25:6g KBps. Each simulation was run for 1000 simulated seconds. We ran each experiment 30 times to gain statistical con dence in the results. We report the average and the standard error of the 30 runs. 126 The RTDD lter queued all incoming packets and sent the highest priority packet every ttimer milliseconds. The ttimer parameter of the priority queue was set to 50 milliseconds for all the experiments. RTDD dropped packets that had missed their deadlines. This allows the network to avoid wasting bandwidth on packets which are already late. In order to make equivalent comparisons, we implemented a lter to delay standard directed di usion packets the same amount as the RTDD lter delayed its packets. The delay lter was identical to the RTDD lter except that a priority of 0.0 was assigned to all packets, thus enforcing a FCFS ordering of the queue. The deadlines for Flow 1 and 2 in Topology 1 were 500 ms and 625 ms respectively. For Topology 2, the deadlines were set to 500 ms, 625 ms, and 750 ms for Flows 1, 2, and 3 respectively. The lowest value (500 ms) was selected because it provided a realistic estimate of the end-to-end delay. Thus, it was possible to meet the lowest deadline in a lightly loaded network. The other deadline values were computed as 25% and 50% more than the baseline deadline. 7.4.2 Simulation Results Since RTDD dropped packets that were late, the delivery ratio represents the percent- age of packets that were delivered to the sink on-time. Packets that were not delivered were either dropped due to their lateness or lost. The overwhelming majority of the packets not delivered were actively dropped because of their lateness. Thus, the delivery ratio measures the e ectiveness of the prioritization in helping packets meet their deadlines. Figures 7.24 and 7.25 show the delivery ratios of Topology 1 for Flows 1 and 2 respectively for increasing source transmission rates. The error bars represent one standard error. Notice in Figure 127 7.24 that at the lowest data rate (8 KBps) all seven protocols delivered over 85% of the packets on time. Past this rate, the performance of standard directed di usion sharply dropped to almost 0%. For Flow 1, SVM and DVM maintained high delivery ratios until the transmission rates exceeded 19 KBps. Among the time-based protocols, the dynamic variants (DAT and DRT) generally performed better than their static counterparts in the intermediate region (from 11KBps to 19 KBps). At the two highest data transmission rates, none of the protocols were able to successfully deliver packets because congestion was so high. Figure 7.24: Delivery Ratio of Flow 1 (Topology 1) 128 Figure 7.25 shows the delivery ratio for Flow 2, the low priority ow. RTDD essentially delays packets from this ow to give preference to the high priority packets in Flow 1 (Figure 7.24). Notice that standard directed di usion and SVM have the worst performance, closely followed by SAT and SRT. These results are expected for the static protocols since packet priorities are set once at the source. The dynamic protocols, however, update the priority of the packets at each hop. This allows initially low priority packets to be given greater preference if they are excessively penalized. From 11 KBps to 17 KBps DVM is the best performer but at the higher data rates DAT and DRT deliver more packets on time. At such high data rates, however, only about 5-10 % of the packets can be delivered successfully. Figure 7.26 shows the average delivery ratio of both ows in Topology 1. All of the versions of RTDD provide a signi cant improvement over standard di usion in the interme- diate region (11 KBps to 19 KBps). The performance of the RTDD protocols falls into three classes. DVM is the best performer with a 5-15% advantage over the dynamic time-based protocols (DAT and DRT) by 5-15%. Although DVM outperforms the dynamic time-based algorithms, SVM did not outperform the static time-based protocols (SAT and SRT). Their average performance was not appreciably di erent (except at 19 KBps where SVM did 9% better). Notice that the performance of the dynamic protocols gracefully degrades with increas- ing congestion while the static algorithms have a much sharper drop in performance. Also interesting is the performance similarity among the static protocols. SRT can perform just as well as SAT and nearly as well as SVM. Thus, we can achieve acceptable performance without location knowledge or time synchronization. The dynamic case is not quite as encouraging, however. Without location knowledge, DAT and DRT perform signi cantly 129 Figure 7.25: Delivery Ratio of Flow 2 (Topology 1) worse than DVM. In summary, dynamic prioritization outperforms static, and dynamic, location-based prioritization is preferable to dynamic, time-based. Figures 7.27 - 7.29 depict the delivery ratios for Flows 1-3 for Topology 2. As expected, the greatest performance improvement occurs in the high priority ow. As in Topology 1, RTDD signi cantly outperforms standard directed di usion at all but the highest and lowest data rates (congestion levels). For Flow 1 (Figure 7.27), the location-based algorithms (SVM and DVM) outperform the time-based protocols by 10-20% in the intermediate region 130 Figure 7.26: Average Delivery Ratio of Flows 1 and 2 (Topology 1) (8 KBps to 20 KBps). The time-based protocols perform very similarly in this region, delivering 65-85% of the packets on time. Figure 7.28 shows the performance of Flow 2, the middle priority ow. Again, di usion has the worst performance, sharply dropping at 7 KBps. DVM is the overall best performer across all data rates. SVM has strong performance at the lower data rates (7 KBps to 13 KBps). The time-based protocols have almost identical performance from 7 KBps to 13 KBps, but DAT and DRT provide better delivery ratios than their static counterparts at the higher data rates. 131 Figure 7.27: Delivery Ratio of Flow 1 (Topology 2) Figure 7.29 summarizes the performance of the low priority ow (Flow 3). Directed di usion and SVM had almost identical performance. DVM also had poor performance on this ow. Interestingly, the time-based versions of RTDD performed better than SVM and DVM. The dynamic time-based protocols (DAT and DRT) returned the best delivery ratios. Figure 7.30 shows the average delivery of all three ows in Topology 2. The supe- rior performance of RTDD versus standard di usion is clearly shown. Among the RTDD protocols, DVM maintains a small but consistent performance advantage. SVM narrowly 132 Figure 7.28: Delivery Ratio of Flow 2 (Topology 2) edges out the SAT and SRT. As in Topology 1, SAT and SRT have no appreciable per- formance di erence, implying that relative time di erences can safely be used instead of absolute time di erences. Likewise, DAT and DRT have similar performance to each other and slightly better performance than the static algorithms. Interestingly, the performance di erence between the static and dynamic protocols is relatively small. This suggests that static protocols, despite their simplicity, are capable of providing good delivery rates dur- ing moderate levels of congestion. Furthermore, the small performance advantage of the location-basd protocols implies that time-based protocols may be used in networks without 133 Figure 7.29: Delivery Ratio of Flow 3 (Topology 2) localization capabilities. This signi cantly reduces the hardware and software requirements of the system. 134 Figure 7.30: Average Delivery Ratio of Flows 1, 2, and 3 (Topology 2) 135 Chapter 8 Conclusions and Future Work We have proposed, implemented, and evaluated three network services for directed di usion. These network services improve and augment the standard directed di usion protocol. They improve ooding e ciency, provide localized route repair, and support real-time packet delivery. The primary contributions of our work are The implementation and evaluation of passive clustering for directed di usion. The design, implementation, and evaluation of a reactive and localized route repair mechanism for directed di usion. The design, implementation, and evaluation of a real-time communication protocol for directed di usion that requires neither location knowledge nor time synchronization. The three network services support the overall goals of sensor networks. Flooding e ciency and localized repair promote energy conservation by reducing unneeded trans- missions. The reactive route repair mechanism improves the robustness of the network by e ciently adapting to node failure. Both PCDD and LRDD improve the scalability of the sensor network since they reduce the cost and scope of ooding. RTDD gives application developers greater control over time-critical communication, easing application develop- ment. All three network services shield high-level applications from the complexities of the underlying sensor network. By leveraging the strengths of di usion and minimizing 136 its weaknesses, the network services signi cantly improve its utility as a general purpose routing protocol. One of the most interesting directions for future research is in considering the possible interactions among the network services. For example, LRDD could use the clusters created by PCDD to limit the ooding of repair packets. In this way, repair packets would be restricted to the clusters adjacent to the failed node. The problem with this approach is that the PC clusters may be too small to allow for successful repair. Another possible interaction is between LRDD and RTDD. If RTDD is extended to include a congestion detection algorithm, LRDD could be used to nd routes around con- gested regions. Repair would be performed proactively in order to improve the throughput of slow links with the same mechanism used to reactively repair node failure. The challenge of this modi cation is in handling the additional congestion produced by LRDD?s localized ood. This mechanism will, at least temporarily, create more congestion in order to reduce congestion. Depending on the saturation of the network and the length of the new route, LRDD may not be able to improve on-time packet delivery performance. In fact, LRDD?s proactive repair process may make the network congestion worse. Finally, RTDD and PCDD could interact with each other to compose better clusters. If congestion data is maintained at each node by RTDD, then it could be used to in uence the creation of clusters. A node prone to congestion might refrain from becoming a cluster head in order to reduce its workload. Although providing slight improvement, this strategy requires signi cant overhead in terms of state information stored by each node. Thus, the bene ts do not seem to justify the cost in this instance of interaction. 137 Our three network services provide signi cant enhancements to directed di usion in several respects. We have increased the ooding e ciency of di usion by augmenting it with PCDD. LRDD improves the robustness of di usion in the face of node failure without excessive ooding overhead. We have also enhanced di usion by adding a distance and deadline-aware real-time communication protocol. By leveraging the strengths of di usion and minimizing its weaknesses, the network services signi cantly improve the utility of directed di usion as an routing protocol. 138 Bibliography [1] Mark Weiser. The computer for the twenty- rst century. Scienti c American, pages 94{10, September 1991. [2] David Tennenhouse. Proactive computing. Commun. ACM, 43(5):43{50, 2000. [3] B. Warneke, M. Last, B. Liebowitz, and K.S.J. Pister. Smart dust: communicating with a cubic-millimeter computer. Computer, 34(1):44{51, January 2001. [4] Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David E. Culler, and Kristofer S. J. Pister. System architecture directions for networked sensors. In Architectural Support for Programming Languages and Operating Systems, pages 93{104, 2000. [5] Taek Jin Kwon, M. Gerla, V.K. Varma, M. Barton, and T.R. Hsing. E cient ooding with passive clustering-an overhead-free selective forward mechanism for ad hoc/sensor networks. Proceedings of the IEEE, 91(8):1210{1220, Aug. 2003. [6] Yungjung Yi, M. Gerla, and Taek Jin Kwon. E cient ooding in ad hoc networks: a comparative performance study. In Communications, 2003. ICC ?03. IEEE Interna- tional Conference on, volume 2, pages 1059{1063vol.2, 11-15 May 2003. [7] Brad Williams and Tracy Camp. Comparison of broadcasting techniques for mobile ad hoc networks. In MobiHoc ?02: Proceedings of the 3rd ACM international symposium on Mobile ad hoc networking & computing, pages 194{205, New York, NY, USA, 2002. ACM Press. [8] Chalermek Intanagonwiwat, Ramesh Govindan, and Deborah Estrin. Directed di u- sion: a scalable and robust communication paradigm for sensor networks. In Mobile Computing and Networking, pages 56{67, 2000. [9] David B Johnson and David A Maltz. Dynamic source routing in ad hoc wireless networks. In Imielinski and Korth, editors, Mobile Computing, volume 353. Kluwer Academic Publishers, 1996. [10] Sze-Yao Ni, Yu-Chee Tseng, Yuh-Shyan Chen, and Jang-Ping Sheu. The broadcast storm problem in a mobile ad hoc network. In MobiCom ?99: Proceedings of the 5th annual ACM/IEEE international conference on Mobile computing and networking, pages 151{162, New York, NY, USA, 1999. ACM Press. [11] D. Aron and Sandeep K. S. Gupta. Analytical comparison of local and end-to-end error recovery in reactive routing protocols for mobile ad hoc networks. In MSWIM 139 ?00: Proceedings of the 3rd ACM international workshop on Modeling, analysis and simulation of wireless and mobile systems, pages 69{76, New York, NY, USA, 2000. ACM Press. [12] Nada Hashmi, Dan Myung, Mark Gaynor, and Steve Moulton. A sensor-based, web service-enabled, emergency medical response system. In EESR ?05: Proceedings of the 2005 workshop on End-to-end, sense-and-respond systems, applications and services, pages 25{29, Berkeley, CA, USA, 2005. USENIX Association. [13] Mitchell A. Cohen, Jakka Sairamesh, and Mao Chen. Reducing business surprises through proactive, real-time sensing and alert management. In EESR ?05: Proceed- ings of the 2005 workshop on End-to-end, sense-and-respond systems, applications and services, pages 43{48, Berkeley, CA, USA, 2005. USENIX Association. [14] S. Kapoor, K. Bhattacharya, S. Buckley, P. Chowdhary, M. Ettl, K. Katircioglu, E. Mauch, and L. Phillips. A technical framework for sense-and-respond business management. IBM Syst. J., 44(1):5{24, 2005. [15] S. Haeckel. Adaptive Enterprise: Creating and Leading Sense-and-Respond Organiza- tions. Harvard Business School Press, Cambridge, MA, 1999. [16] Michael Zink, David Westbrook, Sherief Abdallah, Bryan Horling, Vijay Lakamraju, Eric Lyons, Victoria Manfredi, Jim Kurose, and Kurt Hondl. Meteorological com- mand and control: an end-to-end architecture for a hazardous weather detection sensor network. In EESR ?05: Proceedings of the 2005 workshop on End-to-end, sense-and- respond systems, applications and services, pages 37{42, Berkeley, CA, USA, 2005. USENIX Association. [17] C. Meinig, S.E. Stalin, A.I. Nakamura, F. Gonzelez, and H.G. Milburn. Technology developments in real-time tsunami measuring, monitoring and forecasting. In Oceans 2005 MTS/IEEE, Washington, D.C., September 2005. [18] C.Meinig, S.E. Stalin, A.I. Nakamura, and H.B. Milburn. Real-time deep-ocean tsunami measuring, monitoring, and reporting system: The noaa dart ii description and disclosure. Technical report, NOAA, 2005. [19] F.I. Gonzelez, E.N. Bernard, C. Meifg, M. Eble, H.O. Mofjeld, and S. Stalin. The nthmp tsunameter network. National Hazards, 35(1):25{39, 2005. [20] Donna Casey. The thames barrier: Flood defence for london. Website, http://www.environment-agency.gov.uk /regions/thames/323150/335688/341764/, 2006. [21] Nova: Sinking city of venice. Website, http://www.pbs.org/wgbh/ nova/venice/gates.htm, October 2002. 140 [22] Josh McHugh. The lost city of venice. Wired, 11(8), August 2003. [23] Sammarco Paulo, Hoang H. Tran, and Chiang C. Mei. Subharmonic resonance of venice gates in waves. Journal of Fluid Mechanics, 349, 1997. [24] Lee E. Harris. Combined recreational amenities and coastal erosion protection using submerged breakwaters for shoreline stabilization. Technical report, Florida Instituteof Technology, September 2005. [25] Lee Harris. Breakwater wave attenuation. [26] K. Mani Chandy. Sense and respond systems. In 31st Annual International Conference of the Association of System Performance Professionals, December 2005. [27] John S. Heidemann, Fabio Silva, Chalermek Intanagonwiwat, Ramesh Govindan, Deb- orah Estrin, and Deepak Ganesan. Building e cient wireless sensor networks with low-level naming. In Symposium on Operating Systems Principles, pages 146{159, 2001. [28] J. Postel. Internet Protocol. RFC 791 (Standard), September 1981. Updated by RFC 1349. [29] C. Perkins. Ad hoc on demand distance vector (AODV) routing. RFC 791 (Experi- mental), July 2003. [30] C. Intanagonwiwat, R. Govindan, D. Estrin, J. Heidemann, and F. Silva. Directed di usion for wireless sensor networking. Networking, IEEE/ACM Transactions on, 11(1):2{16, Feb. 2003. [31] Yu-Chee Tseng, Sze-Yao Ni, and En-Yu Shih. Adaptive approaches to relieving broad- cast storms in a wireless multihop mobile ad hoc network. IEEE Transactions on Computers, 52(5):545{557, 2003. [32] L. Orecchia, A. Panconesi, C. Petrioli, and A. Vitaletti. Localized techniques for broadcasting in wireless sensor networks. In DIALM-POMC ?04: Proceedings of the 2004 joint workshop on Foundations of mobile computing, pages 41{51, New York, NY, USA, 2004. ACM Press. [33] Mark Ivester. Interactive and extensible runtime framework for execution and moni- toring of sensor network services. Master?s thesis, Auburn University, 2005. [34] Ya Xu, John S. Heidemann, and Deborah Estrin. Geography-informed energy conser- vation for ad hoc routing. In Mobile Computing and Networking, pages 70{84, 2001. [35] Hyojun Lim and Chongkwon Kim. Multicast tree construction and ooding in wireless ad hoc networks. In MSWIM ?00: Proceedings of the 3rd ACM international workshop on Modeling, analysis and simulation of wireless and mobile systems, pages 61{68, New York, NY, USA, 2000. ACM Press. 141 [36] Wei Peng and Xi-Cheng Lu. On the reduction of broadcast redundancy in mobile ad hoc networks. In MobiHoc ?00: Proceedings of the 1st ACM international symposium on Mobile ad hoc networking & computing, pages 129{130, Piscataway, NJ, USA, 2000. IEEE Press. [37] Amir Qayyum, Laurent Viennot, and Anis Laouiti. Multipoint relaying: An e cient technique for ooding in mobile wireless networks. Technical Report Research Report RR-3898, INRIA, February 2000. [38] T. Clausen and P. Jacquet. Optimized Link State Routing Protocol (OLSR). RFC 3626 (Experimental), October 2003. [39] C. Perkins. Multicast with minimal congestion using connected dominating sets. http://tools.ietf.org/html/draft-perkins-manet-smurf-00. IETF Internet Draft, July 2006. [40] Benjie Chen, Kyle Jamieson, Hari Balakrishnan, and Robert Morris. Span: An energy- e cient coordination algorithm for topology maintenance in ad hoc wireless networks. In Mobile Computing and Networking, pages 85{96, 2001. [41] J. Luna-Aceves and M. Spohn. Scalable link-state internet routing, 1998. [42] R. G. Ogier et al. Topology dissemination based on reverse-path forwarding (TBRPF). RFC 3684 (Experimental), Feb. 2004. [43] A. Ephremides, J.E. Wieselthier, and D.J. Baker. A design concept for reliable mobile radio networks with frequency hopping signaling. Proceedings of the IEEE, 75(1):56{73, 1987. [44] Chunhung Richard Lin and Mario Gerla. Adaptive clustering for mobile wireless net- works. IEEE Journal of Selected Areas in Communications, 15(7):1265{1275, 1997. [45] K. Mase, Y. Wada, N. Mori, K. Nakano, M. Sengoku, and S. Shinoda. Flooding schemes for a universal ad hoc network. In Industrial Electronics Society, 2000. IECON 2000. 26th Annual Confjerence of the IEEE, volume 2, pages 1129{1134, Nagoya, Japan, 2000. [46] M. Gerla, T. Kwon, and G. Pei. On demand routing in large ad hoc wireless networks with passive clustering. In Proceedings of the IEEE WCNC, September 2000. [47] Jorjeta Jetcheva David B. Johnson. Adaptive demand-driven multicast routing in multi-hop wireless ad hoc networks. In Proceedings of the Second Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc 2001), pages 33 { 44. ACM, Oct 2001. [48] D. Tian and N.D. Georganas. Energy e cient routing with guaranteed delivery in wireless sensor networks. IEEE Wireless Communications and Networking, 3:1923{ 1929, 2003. 142 [49] C. Gui and P. Mohapatra. A self-healing and optimizing routing technique for ad hoc networks. In ACM International Symposium on Mobile Ad Hoc Networking and Computing (MOBIHOC), 2003. [50] George Aggelou and Rahim Tafazolli. RDMAR: A bandwidth-e cient routing protocol for mobile ad hoc networks. In WOWMOM, pages 26{33, 1999. [51] Chai-Keong Toh. Associativity-based routing for ad hoc mobile networks. Wirel. Pers. Commun., 4(2):103{139, 1997. [52] E. Royer and C. Toh. A review of current routing protocols for ad-hoc mobile wireless networks. IEEE Personal Communications, April 1999. [53] Vincent D. Park and M. Scott Corson. A highly adaptive distributed routing algorithm for mobile wireless networks. In INFOCOM ?97: Proceedings of the INFOCOM ?97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications So- cieties. Driving the Information Revolution, page 1405, Washington, DC, USA, 1997. IEEE Computer Society. [54] M. Pan, Sheng-Yan Chuang, and Sheng-De Wang. Local repair mechanisms for on- demand routing in mobile ad hoc networks. In Dependable Computing, 2005. Proceed- ings. 11th Paci c Rim International Symposium on, 2005. [55] Genping Liu, Kai Juan Wong, Bu Sung Lee, Boon Chong Seet, Chuan Heng Foh, and Lijuan Zhu. PATCH: a novel local recovery mechanism for mobile ad-hoc networks. In Vehicular Technology Conference, 2003. VTC 2003-Fall. 2003 IEEE 58th, volume 5, pages 2995{2999, October 2003. [56] C. Lu, B. Blum, T. Abdelzaher, J. Stankovic, and T. He. Rap: A real-time communi- cation architecture for large-scale wireless sensor networks. In Proceedings of the IEEE RTAS, 2002. [57] Tian He, J.A. Stankovic, Chenyang Lu, and T. Abdelzaher. Speed: a stateless protocol for real-time communication in sensor networks. In Distributed Computing Systems, 2003. Proceedings. 23rd International Conference on, pages 46{55, 19-22 May 2003. [58] Emad Felemban, Member-Chang-Gun Lee, and Member-Eylem Ekici. Mmspeed: Mul- tipath multi-speed protocol for qos guarantee of reliability and timeliness in wireless sensor networks. IEEE Transactions on Mobile Computing, 5(6):738{754, 2006. Stu- dent Member-Emad Felemban and Member-Chang-Gun Lee and Member-Eylem Ekici. [59] Hyung Seok Kim, Tarek F. Abdelzaher, and Wook Hyun Kwon. Dynamic delay- constrained minimum-energy dissemination in wireless sensor networks. Trans. on Embedded Computing Sys., 4(3):679{706, 2005. 143 [60] O. Chipara, Zhimin He, Guoliang Xing, Qin Chen, Xiaorui Wang, Chenyang Lu, J. Stankovic, and T. Abdelzaher. Real-time power-aware routing in sensor networks. IWQoS 2006. 14th IEEE International Workshop on Quality of Service, pages 83{92, June 2006. [61] Y. Chen, A. Liestman, and J. Liu. Clustering algorithms for ad hoc wireless networks. Ad Hoc and Sensor Networks, 2004. [62] Saurabh Ganeriwal, Ram Kumar, and Mani B. Srivastava. Timing-sync protocol for sensor networks. In SenSys ?03: Proceedings of the 1st international conference on Embedded networked sensor systems, pages 138{149, New York, NY, USA, 2003. ACM Press. [63] Hui Dai and Richard Han. Tsync: a lightweight bidirectional time synchronization service for wireless sensor networks. SIGMOBILE Mob. Comput. Commun. Rev., 8(1):125{139, 2004. [64] Suyoung Yoon, Chanchai Veerarittiphan, and Mihail L. Sichitiu. Tiny-sync: Tight time synchronization for wireless sensor networks. ACM Trans. Sen. Netw., 3(2):8, 2007. [65] Weilian Su and Ian F. Akyildiz. Time-di usion synchronization protocol for wireless sensor networks. IEEE/ACM Trans. Netw., 13(2):384{397, 2005. [66] K. Sohrabi, J. Gao, V. Ailawadhi, and G. J. Pottie. Protocols for self-organization of a wireless sensor network. Personal Communications, IEEE [see also IEEE Wireless Communications], 7(5):16{27, 2000. [67] Vlado Handziski, Andreas Koepke, Holger Karl, Christian Frank, and Witold Dry- tkiewicz. Improving the energy e ciency of directed di usion using passive clustering. In EWSN 2004, volume 2920 of LNCS, pages 172{187, 2004. [68] F. Silva, J. Heidemann, and R. Govindan. Network routing application programmer?s interface, USC/Information Sciences Institute, December 2002. [69] Kishor S. Trivedi. Probability and statistics with reliability, queuing and computer science applications. John Wiley and Sons Ltd., Chichester, UK, UK, 2002. 144