Scalable, Self-Healing, and Real-Time Network Services for
Directed Diffusion
Except where reference is made to the work of others, the work described in this thesis is
my own or was done in collaboration with my advisory committee. This thesis does not
include proprietary or classified information.
Kenan L. Casey
Certificate of Approval:
Kai H. Chang
Professor
Computer Science and
Software Engineering
Alvin S. Lim, Chair
Associate Professor
Computer Science and
Software Engineering
Homer Carlisle
Associate Professor
Computer Science and
Software Engineering
Joe Pittman
Interim Dean
Graduate School
Scalable, Self-Healing, and Real-Time Network Services for
Directed Diffusion
Kenan L. Casey
A Thesis
Submitted to
the Graduate Faculty of
Auburn University
in Partial Fulfillment of the
Requirements for the
Degree of
Master of Science
Auburn, Alabama
May 10, 2007
Scalable, Self-Healing, and Real-Time Network Services for
Directed Diffusion
Kenan L. Casey
Permission is granted to Auburn University to make copies of this thesis at its
discretion, upon the request of individuals or institutions and at
their expense. The author reserves all publication rights.
Signature of Author
Date of Graduation
iii
Vita
Kenan Luke Casey, son of Jerry and Paula Casey, was born January 8, 1982, in
Louisville, Kentucky. He graduated from Jeffersonville High School as salutatorian in
2000. He earned his Bachelor?s degree in Computer Science and Mathematics from Freed-
Hardeman University, Henderson, Tennessee in 2004.
iv
Thesis Abstract
Scalable, Self-Healing, and Real-Time Network Services for
Directed Diffusion
Kenan L. Casey
Master of Science, May 10, 2007
(B.S., Freed-Hardeman University, 2004)
145 Typed Pages
Directed by Alvin S. Lim
Directed diffusion is a data-centric publish-subscribe routing protocol for sensor net-
works. We have designed, implemented, and evaluated three network services which build
on the strengths of directed diffusion and lessen its weaknesses. The system architecture
emphasizes efficient communication, local route repair, and real-time response. We evaluate
the performance of our improved diffusion in terms of routing overhead, delivery effective-
ness, and deadline achievement. Our results demonstrate the benefits of the network services
in all three respects. We increase the efficiency of flooding, improve packet delivery rates in
the presence of node failure, and decrease the number of packets that miss their deadlines.
v
Acknowledgments
I would like to express my appreciation to Dr. Alvin Lim for the guidance he has
provided throughout my study at Auburn. I would also like to express my gratitude to the
advisory committee members, Dr. Kai Chang and Dr. Homer Carlisle. I am especially
thankful to Dr. Gerry Dozier for his advice and encouragement.
I would like to thank the Department of Education for its support through the GAANN
Fellowship. This fellowship was instrumental in supporting my research.
Several fellow students have made significant contributions to this research including
Raghu Neelisetti and Philip Sitton. I am thankful for their help and their friendship.
Above all, I am grateful to my wife, Ashley, whose love and support have made this
work possible. I thank her for her patience and encouragement during the long research
process. Her companionship is my greatest joy.
vi
Style manual or journal used Journal of Approximation Theory (together with the style
known as ?aums?). Bibliograpy follows van Leunen?s A Handbook for Scholars.
Computer software used The document preparation package TEX (specifically LATEX)
together with the departmental style-file aums.sty.
vii
Table of Contents
List of Figures xi
1 Introduction 1
2 Background 5
2.1 Sense-and-Respond Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Directed Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Directed Diffusion Architecture . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Strengths of Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.3 Weaknesses of Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.4 Route Repair Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.5 Routing and Filter API . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Distributed Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 Lookup Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.2 Composition Service . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.3 Adaptation Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 ISEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.1 Isview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.2 Ismanage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.3 Isplugin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.4 Isproxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Motivations and Applications 26
3.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.1 Efficient Flooding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.2 Route Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.3 Real-Time Communication . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.1 Emergency Medical Response . . . . . . . . . . . . . . . . . . . . . . 33
3.2.2 Business Alerts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.3 Meteorological Command and Control . . . . . . . . . . . . . . . . . 34
4 Related Work 39
4.1 Efficient Flooding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.1 Heuristic-based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.2 Topology-based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Route Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
viii
4.2.1 WAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2.2 ADMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.3 SWR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.4 SHORT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.5 RDMAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.6 ABR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.7 TORA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.8 AODV-LRQT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.9 PATCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3 Real-Time Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.1 RAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.2 SPEED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5 Network Services Architecture 61
5.1 Clustering Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.1.1 Passive Clustering Overview . . . . . . . . . . . . . . . . . . . . . . . 64
5.1.2 Protocol Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.1.3 Adaptation to Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2 Repair Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.1 Break Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.2 Break Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2.3 Localized Gradient Repair . . . . . . . . . . . . . . . . . . . . . . . . 74
5.3 Real-Time Communication Mechanism . . . . . . . . . . . . . . . . . . . . . 77
5.3.1 DD-RAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.3.2 SVM-t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.3.3 DVM-t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6 Network Services Design 82
6.1 Design Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.1.1 Energy-efficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.1.2 Scalable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.1.3 Localized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.1.4 Distributed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.1.5 Real-Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.1.6 Reactive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.2 Design Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2.1 Clustering Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2.2 Repair Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2.3 Real-time Communication Mechanism . . . . . . . . . . . . . . . . . 87
ix
7 Network Services Implementation 91
7.1 Clustering Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.1.1 Passive Clustering Attribute . . . . . . . . . . . . . . . . . . . . . . 91
7.1.2 Passive Clustering Filters . . . . . . . . . . . . . . . . . . . . . . . . 93
7.2 Repair Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.2.1 DD-Repair Attribute . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.2.2 DD-Repair Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.2.3 Local Flooding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.3 Real-Time Communication Mechanism . . . . . . . . . . . . . . . . . . . . . 99
7.3.1 DD-RAP Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.3.2 DD-RAP Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.3.3 Prioritized Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8 Performance Evaluation 105
8.1 Clustering Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8.1.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8.1.2 Flooding Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.1.3 Delivery Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.2 Repair Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.2.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.2.2 Repair Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.2.3 Repair Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.2.4 Packet Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.3 Real-Time Communication Mechanism . . . . . . . . . . . . . . . . . . . . . 117
8.3.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
8.3.2 On-Time Delivery Ratio . . . . . . . . . . . . . . . . . . . . . . . . . 119
9 Conclusions and Future Work 125
Bibliography 128
x
List of Figures
2.1 Distributed Service Framework . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 ISEE Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1 Global Route Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Ideal Local Route Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 DART 2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1 Regional Flooding (Region Filter) . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 Relative Distance Micro-Discovery . . . . . . . . . . . . . . . . . . . . . . . 52
4.3 RAP Communication Architecture . . . . . . . . . . . . . . . . . . . . . . . 55
4.4 SPEED Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.1 Layered Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2 Detailed Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3 Example PC Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.4 Full Gateway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.5 Distributed Gateway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.6 PC Transition State Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.7 Local Repair for Directed Diffusion . . . . . . . . . . . . . . . . . . . . . . . 75
8.1 Number of interest messages versus number of nodes for transmission radius
(r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.2 Number of exploratory data messages versus number of nodes for transmis-
sion radius (r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
xi
8.3 Number of interest messages versus number of nodes (n) and transmission
radius (r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.4 Number of exploratory data messages versus number of nodes (n) and trans-
mission radius (r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.5 Average percent flooding reduction for each network size (n) . . . . . . . . 111
8.6 Delivery ratio versus number of nodes for transmission radius (r) . . . . . . 112
8.7 Delivery ratio for each repair/flooding algorithm . . . . . . . . . . . . . . . 113
8.8 Latentcy of repair for each repair/flooding algorithm . . . . . . . . . . . . . 115
8.9 Overhead (Interests+Exploratory Data) for each repair/flooding algorithm 116
8.10 Normalized Overhead (Overhead Packets/Diffusion Overhead Packets) for
each repair/flooding algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 117
8.11 Overhead Packets/Data Packet for each repair/flooding algorithm . . . . . 118
8.12 DD-RAP network topology . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.13 On-time delivery ratios of Flow 1 (high priority) versus data rate (r) for SVM
and DVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
8.14 On-time delivery ratios of Flow 2 (low priority) versus data rate (r) for SVM
and DVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.15 Percent improvement of on-time delivery ratio for SVM . . . . . . . . . . . 123
8.16 Percent improvement of on-time delivery ratio for DVM . . . . . . . . . . . 124
xii
Chapter 1
Introduction
In 1991, Mark Weiser challenged computer scientists to consider a new model of com-
puting, a model in which computers are interwoven into the ?fabric of everyday life until
they are indistinguishable from it? [1]. Weiser?s vision, now called ubiquitous computing,
emphasizes intuitive interaction with pervasive computer resources. Proactive computing
is a branch of ubiquitous computing which focuses on autonomous and environmentally-
based devices. Three fundamental goals for proactive computing have been proposed: get
physical, get real, and get out [2]. The first goal is to connect computers to the physical
world. Such computers are typically small sensors and actuators which collect data over
a large physical environment and communicate it over a wireless network. The second
goal emphsizes the need for real-time performance. Proactive computing systems should
be faster than real-time so that feedback can be provided to automated control loops for
future predictions. The final goal is to remove humans from the interactive loop in order
to allow for faster-than-human response times. This change essentially shifts computation
from human-centered to human-supervised.
Sensor networks have emerged in response to these lofty goals. Sensor networks are
large-scale, wireless networks of resource-constrained sensor devices. They most clearly
support the first goal of proactive computing, connecting the physical and virtual worlds
though sensors. In reference to the second and third goals, sensor networks allow for rapid
and automated response to environmental stimuli. They allow faster-than-human response
1
by taking humans out of the loop. Sensor devices, called nodes, consist of a processor, en-
vironmental sensors, a wireless communication device (usually a radio), and a power source
(usually a battery). Current sensor devices are the size of a quarter, but the goal is to
further reduce their size and cost. The SmartDust project [3], for example, envisions de-
vices one cubic millimeter in size and capable of being suspended in air. A wide range of
applications for sensor networks has been proposed. Examples include environmental mon-
itoring, military surveillance, and medical monitoring. Sensor networks are most noticeably
differentiated from other types of networks by their large size, limited energy source, and
dynamic nature. Sensor deployment is almost always ad-hoc due to the sheer number of
sensors. Potential deployment methods include being launched by artillery shelling, scat-
tered by ground vehicles, or dropped by airplanes. The small size of sensor devices puts
severe limits on the amount of available energy so all operations are energy-efficient. Sensor
networks are characterized by robustness in handling the inevitable failures which occur in
the field. The system must continue to function after devices fail, die, or are destroyed.
While a great deal of sensor network research has been conducted, the inherent char-
acteristics of sensor networks provide ample challenges for system researchers. Three of
the most compelling challenges of sensor networks are their immense scale, their resource
scarcity, and their dynamic topologies. Sensor networks aim to include tens of thousands
to millions of nodes. To support networks of this size, communication mechanisms must be
incredibly scalable. The hardware of sensor devices also provides significant challenges. Sen-
sor devices have very limited processing, storage, and communication capabilities. Perhaps
most critical is the limited energy budget available to sensor nodes. Since batteries may not
2
be replaced in the field, all sensor network algorithms must conserve energy whenever pos-
sible. Consequently, networking protocols must maintain high computation and communi-
cation efficiency. Communication is particularly expensive in terms of energy (transmitting
1 bit consumes as much power as executing 800-1000 instructions [4]) so minimizing the
number of transmissions (and receptions) is a paramount goal. Another challenge is the
dynamic nature of sensor network topologies. Typically, sensor networks are deployed in
an ad hoc fashion (e.g. by being dropped from a plane). The network must configure itself
and maintain a working configuration in the presence of adverse environmental effects as
well as node failure. In the face of anticipated node failure, the network should continue to
function normally by taking advantage of node density.
The overall goal of our research is to mask the fundamental challenges of the sensor
network from high level applications and application developers. Ideally, the application
programmer should not know about the scale, energy, or dynamics of the sensor network.
The lower layer protocols should transparently support any size network, adapt to changing
energy levels, and perform dynamic reconfiguration in response to topology changes. Ap-
plications should not have to worry about such details. To address this issue, we propose
three mechanisms which shield the application from the network level challenges. First, we
present an efficient flooding scheme to increase the scalability of the network. Secondly,
we address the dynamic nature of sensor networks through a route repair algorithm which
reconfigures the network after node failure. Lastly, we propose a protocol for real-time
communication which gives application developers control over data flow priorities so that
time-critical messages can be delivered satisfactorily. Together, the three network services
compose a new layer in the network protocol stack which applications can easily, even
3
transparently, utilize to gain improved network performance. The networks services reside
slightly above the network layer but have access to the inner workings of the routing pro-
tocol. The low level implementation of these services provides significant benefits to all
the layers above the network level (i.e., transport and application). By taking advantage
of the network services, any software running at higher layers will be capable of increased
scalability, reactive self-healing, and real-time communication.
Chapter 2 gives background information about directed diffusion and the distributed
services framework, technologies which underlie and precede our research. In Chapter 3,
we discuss the motivations for our network services and describe relevant applications of
sense and response systems. We give an overview of research related to our three protocols
in Chapter 4 and describe the architecture of the system in Chapter 5. In Chapter 6 we
discuss the design principles that guided the development of the system. Chapters 7 and
8 explain the implementation and performance of the network services. We conclude our
research and discuss future work in Chapter 9.
4
Chapter 2
Background
One of the relevant applications for our network services are sense and respond (S&R)
problems. S&R problems include any problem in which data must be acquired, analyzed,
and acted upon in a relatively short amount of time. This chapter gives some background
information on sense and respond systems and describes the fundamental network compo-
nents underlying our network services. We first introduce a general architecture for sense
and respond systems and identify how our network services fit into this taxonomy. We
then give an introduction to the directed diffusion routing protocol and the distributed
services framework built on top of diffusion. Finally, we summarize ISEE, a runtime frame-
work for sensor network execution and monitoring which was used to evaluate our network
enhancements.
2.1 Sense-and-Respond Systems
Chandy [5] presents a high-level summary of sense and respond (S&R) systems in the
context of IT issues. On the most basic level, sense and respond systems simply sense the
environment and respond appropriately. Rules in S&R systems can be defined using when-
then semantics. The when-clause corresponds to the detection of an event, and the then-
clause describes the execution of the response. The cost of an S&R system can be broken
into three parts: the cost of false positives, the cost of false negatives, and the incremental
costs of running the system. Chandy [5] outlines several important characteristics of S&R
5
applications in order to categorize the application space. He presents the following S&R
application properties:
? Response Type: Is the response human-centered or automated?
? Response Time: Is the response executed in minutes or sub-seconds?
? Event Rate: Are events generated a few per second or thousands per second?
? Condition Complexity: Are when-clauses based on a single event or a history of events?
Are when-clauses based on a single stream or the fusion of multiple streams?
? Data Structure: Is the data structured in well defined schema or generally unstruc-
tured?
? Query Structure: Are queries structured or unstructured?
Our network services support S&R system of fairly high in complexity according to this
taxonomy. We support an automated response time on the order of sub-seconds. Events
may be generated at high rates and may compose several distinct streams of historical data
that must be fused. The only exception to the trend of higher complexity is that our system
utilizes structured data and structured queries.
2.2 Directed Diffusion
Directed diffusion [6] [7] is a sensor network routing protocol based on the publish-
subscribe communication model. Its development was guided by the principle of data-
centric routing. This is in contrast to traditional address-centric protocols which route
6
packets based on host information. Diffusion also emphasizes in-network processing and
localized interactions in order to promote energy efficiency and scalability.
2.2.1 Directed Diffusion Architecture
Protocol Overview
The objective of directed diffusion is to perform multipoint-to-multipoint communica-
tion using named data. All routing is based on named data in the form of attribute-value
tuples instead of host information like address-centric protocols (e.g. IP [8], DSR [9], or
AODV [10]). Diffusion is also characterized by its emphasis on localized routing. Each
node stores routing information only about its immediate (i.e., 1-hop) neighbors; no global
information is required.
Diffusion sets up subscriptions by flooding interest messages throughout the network.
Nodes with matching data publish it by flooding exploratory data messages back through the
network to the interested node. The subscribing node then reinforces its fastest neighbor by
sending it a reinforcement message. Subsequent nodes recursively forward the reinforcement
message to their fastest neighbor until the lowest latency path back to the publishing node
has been reinforced. After a reinforced path has been established, reinforced data messages
flow from publisher to subscriber via unicast or efficient multicast. After the two-phase
setup algorithm is complete, data is, in essence, ?pulled? from source to sink along the
reinforced gradients.
7
Protocol Details
Diffusion uses four types of messages to establish paths within a network. A sink node
subscribes to a data flow by flooding the network with interest messages that name the
type of data the sink wants to receive. Intermediate nodes store the interest and record
the neighbor from which it was sent. This saved path leading to the sink is known as a
gradient. Nodes with data matching the interest will publish it by transmitting exploratory
data along the gradients previously created. These publishing nodes are known as source
nodes. When the exploratory data arrives at the sink, the sink will reinforce its single
fastest neighbor (i.e., the neighbor that delivered the first exploratory data message) by
sending it a reinforcement message. Any node receiving a reinforcement message will, in
turn, reinforce its fastest upstream neighbor until a reinforced path all the way back to the
source is established. Slower paths may be negatively reinforced to remove them from the
gradient tables. Subsequent data emanating from the source, known as reinforced data, will
be unicast or multicast over the reinforced path to the sink. This two-phase process results
in the creation of a multipoint-to-multipoint distribution tree.
2.2.2 Strengths of Diffusion
Data-Centricity
Directed diffusion has generally proven to be well-suited to sensor networks. Its data-
centric model is more appropriate for many sensor network applications. It performs more
efficiently and provides more useful services for many types of sensor network applications
(e.g., query processing). The use of named data provides an energy efficient mechanism for
routing data, which avoids the unnecessary complexity of host information. Since data is
8
the primary concern it makes sense to use it as the routing criterion instead of host address,
a property largely unimportant in sensor networks.
Reactive Nature
The reactive nature of diffusion also supports the goal of energy efficiency. Directed
diffusion belongs to a class of protocols that creates routes in response to the application?s
needs instead of proactively finding routes in advance. The reactive property is particularly
relevant to senor networks since they may remain inactive for long periods of time waiting
for events of interest.
Localized Interactions
One of the most notable characteristics of diffusion is its fully localized nature. Nodes
only require knowledge about their 1-hop neighbors to create gradients. No global informa-
tion, whether end-to-end or multi-hop, is needed for routing. This means that the routing
table scales with the number of neighbors as opposed to the total number of nodes in the
network. Node density, not network size, is the determining factor in the gradient table
size. This improves the scalability of the protocol with respect to space complexity. The
localized nature of diffusion also simplifies the routing protocol since global information is
not required to make routing decisions.
Aggregation
Another strength of the diffusion protocol is its support for in-network processing
or aggregation. Since each node understands the attribute-value tuples that compose a
data message, any intermediate node may perform data aggregation. While this essentially
9
pushes application level data down to the network layer, the resulting benefits are significant.
Aggregation essentially trades communication overhead for latency by consolidating the
data from multiple packets into one packet. It has been demonstrated that significant
energy savings (up to 42% in [7]) can be achieved by the use of aggregation in diffusion.
Multipoint-to-Multipoint Links
A final strong point for diffusion is its support for various types of communication
paradigms. Unlike other network protocols which only provide unicast (1?1) capabilities,
diffusion inherently supports multicast (1?N) communication. Additionally, diffusion?s
publish-subscribe model can create gather distribution trees (N?1) and multipoint-to-
multipoint (N?M) dissemination paths. Other protocols may support these modes of
communication with high-level protocols, but diffusion?s relatively simple architecture in-
herently supports all four classes of links.
2.2.3 Weaknesses of Diffusion
Flooding
Perhaps the greatest weakness of directed diffusion is its reliance on flooding for path
creation and maintenance. In the case of standard diffusion (two-phase pull), both interest
and exploratory data messages are flooded throughout the network whenever gradients are
established. This results in a huge amount of network traffic every time a new path is
established or an old path is refreshed.
10
Limited Scalability
The cost of flooding rises with the size of the network, so scalability is significantly
affected. Diffusion does not effectively support networks with multiple hundreds of nodes
because of the huge overhead imposed by flooding across so many nodes. Although its
localized nature supports scalability, diffusion?s dependence on flooding severely limits the
practical size of a network.
Latency of Global Repair
To deal with broken paths, diffusion periodically re-creates routes by performing the
same procedure used to initially find routes: global flooding. This mechanism is described
in Section 2.2.1. Since diffusion handles failure and mobility with global repair mechanisms,
the costs incurred for repair are significant. To reduce the energy costs of global repair, the
path maintenance mechanism is performed on a relatively infrequent schedule (e.g. every 60
seconds). In the worst case, data may be delayed an entire refresh interval before a broken
path is repaired. While this may be adequate for some applications, it may be completely
unacceptable for others, e.g., time-critical and real-time systems.
Lack of Real-Time Communication Support
Many applications have time deadlines which should be recognized by the network and
handled differently. Directed diffusion routes packets in a first come, first serve fashion ?
no preference is given to higher priority flows. While not an inherent weakness, diffusion?s
lack of support for real-time communication is a deficiency nonetheless.
11
2.2.4 Route Repair Mechanisms
Standard directed diffusion includes two mechanisms for path repair. Diffusion was
designed to handle path failure primarily by the periodic re-creation of gradients using a
global mechanism. The designers of diffusion also mention a local repair procedure, but
fail to adequately deal with the route repair problem. This section summarizes the costly
global repair mechanism and the primitive local repair algorithm proposed by the original
designers of the diffusion protocol.
Global Gradient Repair
The standard and currently implemented method of handling broken links in diffusion
is global gradient repair. In this method, the sink periodically refreshes the path to the
source by flooding the network with interest messages. The source also periodically floods
exploratory data back to the sink in order to reinforce the path that is currently fastest.
Although this mechanism provides the most optimal paths, it comes at a high cost
in terms of energy and latency. Global repair results in network-wide flooding of interest
and exploratory data messages. To lessen this cost, global repair is performed at a fairly
infrequent interval. As a result, links may remain broken for an entire gradient refresh
interval. This means that a data flow may fail to report an event for 60 seconds (by
default) in the worst case since no effort is made to repair the link until the next gradient
refresh.
12
Local Gradient Repair
In [11], the designers of diffusion describe a local gradient repair strategy for the pro-
tocol. In this method, intermediate nodes may participate in reinforcement. When a node
detects degradation in link quality, it can discover a new path to the source and negatively
reinforce the degraded path. The problem with this scheme, as pointed out by the authors,
is that every node downstream from the break will attempt to find a new path, resulting
in a significant waste of resources. The authors suggest that the first nodes after the break
?interpolate? data events so that downstream nodes continue to ?perceive? a high quality
path and do not initiate unnecessary local repair.
This method is far from ideal. It can hardly be considered local repair if intermediate
nodes upstream from the break forward reinforcement messages all the way back to the
source. On average, the intermediate node must reinforce half the distance between source
and sink. In the worst case, the break would be 1-hop away from the sink and the so-called
?local? repair would be only 1 hop different from global repair. Furthermore, without some
mechanism on the part of the first node downstream from the failure, all the downstream
nodes will perform local repair. The interpolation mechanism proposed to solve this problem
is somewhat nebulous and largely unspecified.
2.2.5 Routing and Filter API
The Information Science Institute (ISI), University of Southern California has imple-
mented directed diffusion and defined an API for developing applications based on the pro-
tocol [12]. Additionally, the implementation includes a filter API which allows modification
and extension of the network protocol. User-level programs use the routing API to publish
13
and subscribe to data flows in the network. System-level changes can be accomplished with
the filter API.
ISI diffusion supports three communication models: two-phase pull, one-phase pull,
and one-phase push. In two-phase pull, the standard model, subscribers flood interests
throughout the network, and publishers flood exploratory data back to subscribers. The
lowest latency path over which exploratory data arrives at the subscriber will be reinforced.
The one-phase pull communication model eliminates the flooding of exploratory data by
implicitly reinforcing the fastest path on which interest messages are received. In this case,
exploratory data is never flooded. In both pull models, data is conceptually ?pulled? from
publisher to subscriber, i.e., the subscriber initiates the data flow. In contrast, the one-phase
push model reverses the role of subscriber and publisher ? publishers initiate the transmis-
sion of data, and subscribers respond by sending reinforcement messages. In this model,
publishers periodically push exploratory data throughout the network. If a subscriber is
interested in the exploratory data, it will reinforce a path back to the publisher to con-
tinue receiving data messages. Interest messages are never flooded in this communication
paradigm.
Named data is supported through vectors of attribute pairs, which have a unique name
(i.e., key) and a data value. These attribute vectors are the basis of diffusion?s data-centric
routing. The directed diffusion routing API allows for access to the publish/subscribe
network routing model and implementation of network modifications. The routing API
allows user level programs to make subscriptions for data and publications of data. The
diffusion protocol is implemented in two programs: the diffusion core and the gradient filter.
While the core handles basic functionality involved in the system, the bulk of the actual
14
network protocol is handled by the gradient filter. The ISI diffusion routing API calls are
described below. For a complete description consult [12].
Routing API
? static NR* NR::createNR()
? This static function returns a pointer to an NR object, which encapsulates all
network routing mechanisms and, thus, provides access to the directed diffusion
stack. The routing and filter API is comprised of the methods exposed by the
NR object.
? handle NR::subscribe( NRAttrVec* subAttrs, const NR::Callback* callback )
? subAttrs ? A pointer to a vector containing elements describing subscription
information.
? callback ? A pointer to a callback object to be invoked when the subscription
attributes are fulfilled.
? handle NR::unsubscribe( handle subscription handle )
? subscription handle ? A handle associated with a subscription, returned by the
subscribe method.
? handle NR::publish( NRAttrVec* pubAttrs )
? pubAttrs ? A pointer to a vector containing elements describing the data to be
sent.
? int NR::unpublish( handle publish handle )
15
? publish handle ? A handle associated with a publication, returned by the publish
method.
? int NR::send( handle publish handle, NRAttrVec* sendAttrs )
? publish handle ? The handle associated with the data being published.
? sendAttrs ? A pointer to a vector of data attributes and the original elements
describing the publication.
Filter API
A filter is a process that resides between the directed diffusion stack and the user level
applications. Filters are software modules that allow network developers to influence
routing and data processing. Representative uses of filters include routing, in-network
aggregation, caching, debugging, and monitoring [12]. Filters have greater access to
low-level message contents (e.g. last hop and next hop) than diffusion applications.
The filter API facilitates the development of filters. It is composed of the following
methods. Again, refer to [12] for a complete explanation.
? handle addFilter(NRAttrVec* filterAttrs, int16 r priority, FilterCallback* callback)
? filterAttrs ? A pointer to a vector containing the filter attributes.
? priority ? A unique filter priority between 2 and 253. The higher the number the
higher the priority.
? callback ? A pointer to the object to be executed when an incoming packet
matches the filter.
16
? int NR::removeFilter(handle filter handle )
? filter handle ? The handle of the filter returned by addFilter.
? void sendMessage( Message* msg, handle h, u int16 t priority )
? msg ? The pointer to the message to send along to the next filter.
? h ? The handle to the filter returned by addFilter.
? priority ? An optional argument used to modify the filter matching scheme.
2.3 Distributed Services
In order to support dynamic, self-organizing sensor networks, three fundamental ser-
vices for sensor networks have been proposed: lookup service, composition service, and
adaptation service [13]. These services shield higher-level services and applications from the
difficulties inherent to dynamic nature of sensor networks. Essentially, the three services
mask the dynamic and distributed nature of sensor networks. Representative functionality
provided by the distributed services includes remote service execution, dynamic task group
creation, and fault tolerance. Figure 2.1 illustrates the overall system design using the
distributed service framework.
2.3.1 Lookup Service
The lookup service stores information about services offered in the network and dis-
tributes this information to interested nodes. Applications utilize the lookup service through
the lookup service API which exposes the following methods:
? service register(char* serviceType, char* serviceName, NRAttrVec* serviceAttrs)
17
Figure 2.1: Distributed Service Framework
? serviceType ? String specifying the generic service type.
? serviceName ? String specifying the unique service name.
? serviceAttrs ? An STL vector of attributes containing service information e.g.
location, address, interface definition, or mobile code.
? service deregister(char* serviceType, char* serviceName)
? serviceType ? String specifying the generic service type to deregister.
18
? serviceName ? String specifying the unique service name to deregister.
? lookup service(char* serviceType, char* serviceName, NRAttrVec* inputAttrs, NRAttrVec*
outputAttrs)
? serviceType ? String specifying the generic service type to lookup.
? serviceName ? String specifying the unique service name to lookup.
? inputAttrs ? An STL vector of attributes containing the predicate for matching
service providers, e.g., the number of service providers to return.
? outputAttrs ? An STL vector of attributes containing the results of the lookup.
This may include a list of service providers along with application specific infor-
mation.
? service exec(char* serviceType, char* serviceName, NRAttrVec* inputAttrs, NRAttrVec*
outputAttrs)
? serviceType ? String specifying the generic service type to execute.
? serviceName ? String specifying the unique service name to execute.
? inputAttrs ? An STL vector of attributes containing the input to the remote
service execution. This includes the name of the remote function and the input
parameters to remote service.
? outputAttrs ? An STL vector of attributes containing the results of the remote
service execution.
? service call(char* serviceType, char* serviceName, NRAttrVec* inputAttrs, NRAttrVec*
outputAttrs)
19
? serviceType ? String specifying the generic service type to execute.
? serviceName ? String specifying the unique service name to execute.
? inputAttrs ? An STL vector of attributes containing the input to the remote
service execution. This includes the name of the remote function and the input
parameters to remote service.
? outputAttrs ? An STL vector of attributes containing the results of the remote
service execution.
Applications that provide services, register the service with lookup server using the
service register() method. Correspondingly, services can be deregistered when they are no
longer available with service deregister(). The lookup service() call is invoked by applications
that want to find details about a service in the network, e.g., interface or location infor-
mation. Lookup clients may also request a list of service providers from the lookup server
in order to select the most appropriate provider (e.g., the nearest geographically). After
finding a service, a client application invokes the service exec() method to execute a service
remotely. The service call() method of the lookup service encapsulates lookup service() and
service exec() into one call. The lookup service is completely described in [14].
2.3.2 Composition Service
Many sensor network applications involve groups of nodes coordinating to solve one
problem. Consequently, the task of organizing nodes into appropriate clusters becomes
crucial. The distributed composition service composes and maintains dynamic task groups.
This simplifies the development and execution of collaborative sensor network applications.
The composition service defines a connector abstraction which encapsulates the underlying
20
link characteristics of a data flow. Extensive cluster information is maintained by the
composition service in order to provide clients with an accurate picture of the organization
of the network. Clusters may be formed ahead of time or dynamically based on events
detected in the network. The composition service is accessed using the simple API shown
below.
? join group(char* serviceType, int minGroupSize, int Xspan, int Yspan, NRAttrVec* out-
putAttrs)
? serviceType ? String specifying the generic service type of the task group.
? minGroupSize ? Minimum requested group size.
? Xspan ? Location bounds for the group in the horizontal direction.
? Yspan ? Location bounds for the group in the vertical direction.
? outputAttrs ? An STL vector of attributes containing the results of the group
insertion request. This may include group ID, group leader ID, and IDs of other
members of the group.
? leave group(char* serviceType, int groupID, NRAttrVec* outputAttrs)
? serviceType ? String specifying the generic service type of the task group.
? groupID ? Group ID of the task group that the node is leaving.
? outputAttrs ? An STL vector of attributes containing the results of the group
deletion request. This may include success or failure of the operation as well as
application specific information.
21
Nodes not currently belonging to a group are assigned to a group when join group() is called.
Cluster assignment is based on (1) location, so that clusters will be composed of nodes
geographically near each other and (2) group size, so that clusters will have a similar number
of members. The leave group() method removes a node from a cluster. The composition
service API allows applications to access the cluster creation and management services
provided by composition service. A complete specification of the composition service can
be found in [15] [16].
2.3.3 Adaptation Service
In the most general sense, an adaptation service is a system that modifies its behavior
based on changes to its environment [17]. The adaptation service for distributed sensor
networks allows applications to adapt to changes in network and service configuration. It
is organized into three phases: change awareness, consensus, and system behavior modi-
fication. During the change awareness phase, the environment is monitored for possible
changes. Affected units communicate with the adaptation service in the consensus phase
to determine an appropriate adaptation action. In the last phase, the adaptation service
implements the changes in its behavior determined in the consensus stage. The adapta-
tion service for sensor networks specifically handles changes such as failure of nodes and
services. The goal of the adaptation service is to mask failures and other network changes
from higher-level services in order to improve fault tolerance and dynamic reconfiguration.
Further details regarding the adaptation service can be found in [17].
22
2.4 ISEE
Mark Ivester developed a runtime framework for the execution and monitoring of sen-
sor network services [18]. The Interactive Sensor network Execution Environment (ISEE)
allows for the creation, configuration, and monitoring of emulated and real sensor networks.
We used ISEE to configure and control many of our experimental networks. ISEE is a
graphical execution environment which facilitates sensor network experiment repeatability
and real-time visualization. The control and access environment provides remote execution,
logging, interaction, and analysis facilities independent of the implementation of the sensor
network. The runtime framework allows for simple extensibility, scenario creation, and ex-
periment repeatability. In combination with the distributed service framework, ISEE allows
reactive and straightforward developer interaction with sensor network experiments. ISEE
is composed of four modules: Isview, Ismanage, Isplugin, and Isproxy, as shown in Figure
2.2.
Figure 2.2: ISEE Architecture Overview
23
2.4.1 Isview
The Isview module handles user interaction and network visualization. It allows the
user to create and configure sensor networks through a graphical interface. Services, in-
cluding but not limited to those described in Section 2.3, may be started on any subset of
the network. Additionally, Isview displays state information retrieved through the Ethernet
back channel. Application state, debug information, and load statistics may be monitored.
2.4.2 Ismanage
The Ismanage module manages runtime plugins and communication with the sensor
network. At its core, Ismanage sends commands to and receives state information from
nodes in the sensor network. This module maintains consistent state information regard-
ing links, nodes, and processes. Control information is sent to the nodes and feedback is
received from the network by Ismanage. Another important responsibility of Ismanage is
the dynamic loading and management of plugins.
2.4.3 Isplugin
To increase the utility of ISEE, custom plugins, called Isplugins, can be created to
handle new sensor network services and protocols. Plugins extend the capabilities of Isview
and Ismanage. In addition to supporting new communication modes, plugins can also
modify the visualization properties of ISEE.
24
2.4.4 Isproxy
Isproxy is the module that manages remote execution of commands on sensor nodes.
Fundamentally, Isproxy interfaces sensor nodes to the ISEE framework so that they can be
remotely controlled through the graphical user interface. Commands given by Ismanage are
sent to Isproxy and executed on the node by Isproxy. Typical actions invoked by Isproxy
include starting, stopping, and monitoring processes.
25
Chapter 3
Motivations and Applications
In this chapter we discuss the motivations for our enhancements to directed diffusion.
We also present several existing and potential applications of S&R sensor networks.
3.1 Motivations
Although directed diffusion is generally well-suited to sensor networks, it has several
apparent weaknesses. Its reliance on flooding incurs a significant penalty on communication
and energy efficiency. Furthermore, diffusion also handles node failure rather poorly with its
use of periodic global flooding. Finally, diffusion lacks any support for time-critical message
delivery. We propose three network services to augment and extend directed diffusion to
handle these issues. First, we propose an efficient flooding scheme to increase the scalability
of the network. Secondly, we address the dynamic nature of sensor networks through a
route repair algorithm that reconfigures the network after node failure. Lastly, we present
a protocol for real-time communication which gives application developers control over
data flow priorities so that time-critical messages can be delivered satisfactorily. In the
subsequent sections we give specific motivations for each of these mechanisms.
3.1.1 Efficient Flooding
Flooding is a packet delivery process that delivers a packet to every connected node
in the network [19]. Typically, flooding requires each node to rebroadcast every flooded
packet so that every node is guaranteed to receive the message at least once. This type
26
of flooding has been called blind flooding [20] and simple flooding [21]. Efficient flooding
algorithms attempt to reduce the number of redundant packet transmissions through the use
of heuristics or information about the network topology. Such techniques increase algorithm
complexity for communication efficiency.
Flooding has frequently been used in both proactive and reactive routing protocols.
Proactive routing protocols often rely on flooding for route advertisement. In this case,
every node in the network must know the current status of a link in order to maintain correct
routing tables. Efficient flooding techniques are often utilized by link-state protocols since
extensive neighbor information is gathered and propagated by the routing protocol [19]. By
using this topology information, a node can forward flooded packets to the minimum set of
neighbors to guarantee global reception.
In reactive protocols, which are generally more appropriate for sensor networks, the
principle reason for flooding is route discovery. Unlike proactive protocols, reactive protocols
typically use blind flooding. Reactive protocols must flood initial route creation packets
because no prior topology information is known. The path to an unknown host is found
by flooding the network in search of the destination. Directed diffusion, as discussed in
Section 2.2.3, relies strongly on flooding. Interests and exploratory data are flooded during
the gradient setup phase. After routes are discovered, diffusion performs route maintenance
using the same two-phase flooding mechanism.
Due to its simplicity, blind flooding has often been implemented in reactive protocols [6]
[9]. Although this naive approach to flooding simplifies the design, it results in inefficiency
since nodes may rebroadcast packets that all their neighbors have already received. In blind
flooding, a node may receive duplicate copies of the same flooded packet from multiple
27
neighbors. The performance of blind flooding is inversely related to the average number
of neighbors (neighbor degree) per node. As the neighbor degree increases, blind flooding
results in greater redundant packets (network layer), greater probability for collision (MAC
layer), and greater congestion of the wireless medium (physical layer) [22].
In dense networks it is not necessary for every node to forward each flooded packet. If
only a subset of the nodes is chosen as relays, the flooding efficiency can be significantly
improved. The underlying problem is to select a dominant set of nodes to relay flooded
packets. More formally, we wish to find the minimal subset of forwarding nodes sufficient
to deliver a flooded packet to every other node in the system [19]. This is typically accom-
plished with algorithms that use topology information or heuristics to identify nodes that
should forward packets. We have implemented efficient flooding through a clustering tech-
nique that uses ongoing traffic to create a clustered structure. This structure allows nodes
that may drop forwarded packets to identify themselves and, thereby, reduce the number
of unnecessary retransmissions.
3.1.2 Route Repair
Because of the dynamic nature of sensor networks, node and link failures are expected
occurrences. When links fail, the routing protocol may attempt to repair from the breakage
in one of two ways: end-to-end error recovery or local error recovery. End-to-end repair
protocols initiate the recovery process by either explicitly alerting the source node of the
problem with a negative acknowledgment or by implicitly informing the source with the ab-
sence of a positive acknowledgment. In the former case, the node which detects a break will
take no action so that the sender will timeout while waiting for a positive acknowledgment.
28
In the latter case, a negative acknowledgment will be sent from the intermediate node to
the source node, reporting the link failure. Diffusion essentially uses the implicit approach
(positive acknowledgment) in that the protocol defines no explicit error message depending
instead on periodic route re-discovery to repair broken links.
The problem with end-to-end repair is the high cost of network-wide flooding. This
has serious implications on the performance of a system in terms of scalability, energy
consumption, and latency. Protocols which depend on global error recovery mechanisms
do not scale well with network size. Moreover, since every node must forward the flooded
packet, each node consumes energy repairing a route which is possibly very distant. Latency
of end-to-end repair mechanisms also suffers since routes are completely rediscovered from
the source to the sink. Figure 3.1 graphically illustrates the high cost of global route repair
in two-phase pull directed diffusion. Notice the global flood of interests and exploratory
data messages in both directions. Also notice that the repaired path is only a few hops
different than the original data path.
Figure 3.1: Global Route Repair
29
When a node relatively far away from the source fails, it makes little sense to involve the
source (and other distant nodes) in the error recovery process. Ideally, only nodes around
the link failure should be involved in the repair process. In cases where the repaired path is
only a few hops different from the original path, such as in the previous figure, the localized
approach greatly reduces the overhead associated with repair. Figure 3.2 shows an ideal
case for local route repair where distant nodes are not involved in the recovery process at
all. Nodes in the immediate vicinity of the break participate in the repair algorithm. Note
that neither the source nor the sink are involved in the recovery process.
Figure 3.2: Ideal Local Route Repair
The advantages of the local repair approach include increased scalability, increased
flooding efficiency, and decreased latency of repair. Since only a portion of the nodes are
involved in local repair, the protocol scales more gracefully with network size and consumes
less overall system energy. The localized nature of the recovery also lends itself to faster
route repair since the complete (source to sink) route does not have to be traversed. In
their analytical comparison of local and end-to-end recovery mechanisms, Aron and Gupta
[23] show that with end-to-end recovery the probability of successfully delivering a packet
30
on the first attempt rapidly degrades with increasing network size (and path length), while
the performance of local recovery increases with longer routes. Additionally, the amount of
resources consumed per packet is several orders of magnitude larger for end-to-end repair
than for local repair. In summary, local repair improves the scalability, efficiency, and
latency of a network protocol.
3.1.3 Real-Time Communication
The general objective of sensor networks is distributed micro-sensing, i.e., to sense,
monitor, and control physical environments. Examples include acoustic surveillance systems
for monitoring homes, biometric sensors which detect harmful bio-agents in airports, and
stress sensors which monitor the structural integrity of buildings during earthquakes. In
many applications, data messages have time constraints in the form of end-to-end deadlines
specifying an upper bound on the communication delay of a packet. Application-specific
deadlines, for example, allow packets associated with important events to take priority
over periodic monitoring packets. Surveillance systems may require the position of an
intruder to be reported to the command center within 15 seconds of detection while the bio-
detection system at the airport may need to respond within 1 second. Data in monitoring
systems often has an interval of validity after which it is no longer useful. The validity
interval of a home surveillance system, for example, will be much shorter than that of a
temperature monitoring system since the presence of an intruder will be of much greater
interest immediately after his detection, but may be useless 30 minutes later.
31
To support this time-critical property of physical environments, sensor network proto-
cols must implement real-time communication mechanisms. The goal of real-time commu-
nication is to minimize the number of packets which miss their end-to-end deadlines. This
is typically accomplished by prioritizing packets by their deadlines so that more urgent
messages are sent first. Essentially, less urgent packets are delayed so that more urgent
packets can reach their destinations on time. The challenge for real-time communication
over sensor networks is the multi-hop nature of sensor networks. Although it is reason-
able to give priority to packets with shorter deadlines, prioritization should also be given
to packets that are farther away from their destination. In order to meet its deadline, a
packet with a relatively long deadline may need to be prioritized higher than a packet with
a shorter deadline, if the long-deadline packet travels twice as many hops. To address this
problem, we have extended an existing real-time communication protocol which considers
both distance and deadline to directed diffusion. Our system allows application develop-
ers to add deadline information to a data flow and have confidence that the network will
maximize the number of packets which meet their deadlines.
3.2 Applications
The emergence of pervasive computing has expanded the frontier for S&R systems.
Although the S&R architecture is broadly applicable to various fields, its potential utility
has not been fully realized. We present a summary of the applications of previous S&R
systems and propose several new areas well-suited to the S&R paradigm.
32
3.2.1 Emergency Medical Response
One very compelling application of S&R systems is in the field of healthcare. The
authors of [24] have developed a sensor-based emergency medical response system. At the
lowest level, sensors worn by patients report vital signs and location information to local
command centers (e.g. ambulances) via 802.15.4. From there, information is forwarded over
a cellular or satellite link to a global command center where the data is managed as a web
service. The goal of the system is to provide greater situational awareness about patient
condition and arrival time so that doctors can make more informed decisions. Our real-time
communication model is particularly relevant to this type of application. Since information
about vital signs is being communicated, timely response of the system is critical to the care
of patients. Given the large size of many hospitals, efficient flooding is also an important
mechanism for satisfactory performance of S&R systems in healthcare.
3.2.2 Business Alerts
Many business applications for S&R systems have also been proposed [25] [26] [27].
[25] describes a unified event stream processing system to monitor, analyze, and detect
critical business events. The objective is to improve efficiency, reduce cost, and enable
the enterprise to react quickly. The Event Stream Processor calculates metrics based on
event messages received from a variety of sources such as complaint databases or real-time
production statistics. A domain expert creates rules based on the metrics to provide alerts
for important business situations. Specific applications of this system include inventory
replenishment, product behavior, and retail performance. Our networking improvements
33
are also beneficial in the business domain. The real-time communication mechanism is well
suited to the time-critical nature of many phenomena that lead to immediate response.
3.2.3 Meteorological Command and Control
S&R systems have also been developed for hazardous weather detection. Our enhanced
communication protocols are especially relevant to meteorological applications where sensor
failure is expected. The route repair mechanism allows the system to perform in challenging
environments such as tornadoes and tsunamis. The redundant nature of the sensor network
is used to overcome unavoidable node failure. Efficient flooding reduces congestion due
to network flooding. This is relevant in the context of energy efficiency and network la-
tency. Since hazardous weather detection systems will be constantly in use but infrequently
activated, it is important to efficiently utilize the energy resources of the sensor devices. Ef-
ficient flooding saves energy by reducing the redundant packet transmissions. This section
describes the research in the areas of tornado detection and tsunami detection.
Tornado Detection
NetRad [28] is a Distributed Adaptive Collaborative Sensing (DACS) system for early
tornado detection. The goal of NetRad is to detect a tornado within 60 seconds of formation
and track its centroid within a 60-second temporal region. NetRad is composed of a dense
network of low-powered radars that collaborate to accurately predict tornado formation.
The radars report atmospheric readings to the System Operations and Control Center
(SOCC) where a merged version of the data is analyzed to detect meteorological features.
Radars are re-tasked every 30 seconds based on the utility of the features identified. Thus,
34
sensing can be focused on areas nearest to recently detected meteorological features so that
a better picture of the tornado can be obtained.
The NetRad system is somewhat different from the S&R sensor network system we
proposed. Unlike our S&R system, the NetRad system uses a high-bandwidth, highly-
structured, wired network to connect the sensors. Our system is designed to communicate
over relatively low speed, ad-hoc, wireless networks. We also emphasize fast response time
on the order of subseconds, as opposed to the 30-second turnaround time of NetRad.
Tsunami Detection
Our S&R system has specifically been applied to the tsunami detection problem. In
this section we describe the current tsunami warning system used by the United States and
a modified system which we have proposed for better detection and response to this type
of disaster.
Current System The current tsunami warning system is composed of ten buoys in the
Pacific and five in the Atlantic/Caribbean. The Deep-ocean Assessment and Reporting of
Tsunamis (DART) project is maintained by the National Oceanic and Atmospheric Admin-
istration and serves as part of a tsunami warning system for the United States [29] [30] [31].
Figure 3.3 illustrates the architecture of the current system.
The DART stations consist of two parts: an anchored seafloor bottom pressure recorder
called a tsunameter and a companion moored surface buoy. The tsunameter detects subtle
pressure changes which indicate tsunami waves. An acoustic modem transmits data from
the tsunameter to the buoy, which then relays the information via satellite to land-based
35
warning centers. The goal of the DART system is to provide accurate and early warning
for tsunamis. This includes avoiding false alarms and unnecessary evacuations.
Proposed System Our proposed system is similar to the current system, but incorpo-
rates two major modifications. First, we suggest using a multi-hop sensor network connected
by radio frequency (RF) links instead of a single-hop satellite communication scheme. Al-
though this modification necessitates a greater number of devices, it also leads to significant
savings in terms of latency, energy, and cost per device. Since satellite transmission delays
are avoided, the latency of communication will be improved. Shorter range radio broadcasts
will result in longer battery life and greater sensor lifetime. Improved latency and energy-
efficiency represent significant advantages of our design. The use of a sensor network will
result in greater accuracy of detection with respect to resolution. This allows for more
localized tsunami detection and prediction.
Secondly, we propose the addition of a tsunami mitigation system that responds to
tsunamis. We assume it is feasible to build an artificial barrier reef strong enough to
withstand the force of tsunamis and reduce its strength before it hits the shoreline. The
barrier may be engaged (or fired) when a tsunami event is detected and be disengaged
otherwise. Although such a barrier system would be expensive to construct, the cost may
be justified for protecting particularly valuable areas (e.g. nuclear reactors). The proposed
system is similar to the Thames Barrier, a series of movable gate structures which protect
London from tidal floods [32], and the gate system currently being developed to shield the
city of Venice from dangerously high tides [33] [34] [35].
Although the focus of our work is on the networking and analysis aspects of the system,
we have also given some consideration to the design of the barrier system. We envision a
36
defense network inspired by the concept of air bags for automobiles. The basic idea is to
create a series of artificial islands which impede the progress of the wave. We propose an
underwater deployment of inflation devices which are connected by wire to ballasts on the
seafloor. When given the command to fire, the barriers will inflate and float to the surface
while remaining tethered to the ballast. We propose using a layered series of barriers that
successively attenuate the wave. Our proposed barrier system is similar to the breakwater
coastal defense systems described in [36] [37]. By using advanced prediction techniques, we
hope to engage the series of barriers that most effectively obstructs the wave.
37
Figure 3.3: DART 2 System Architecture
38
Chapter 4
Related Work
4.1 Efficient Flooding
Due largely to its simplicity, blind flooding is prevalent among reactive protocols; how-
ever, because blind flooding often results in duplicate packet delivery, many resources can be
conserved if a more intelligent approach to flooding is utilized. Efficient flooding techniques
use topological information or heuristics to decrease the number of redundant packets trans-
mitted during a flood. There are a variety of approaches to this problem, but we divide them
into the two types proposed in [20]: heuristic-based and topology-based. Heuristic-based
protocols use some sort of rule to determine whether a flooded packet should be forwarded.
The topology-based algorithms make use of connectivity information to deduce which nodes
need to act as packet forwarders and which nodes can drop flooded messages. We further
divide the heuristic-based protocols into four subcategories and the topology-based proto-
cols into three subcategories. In the following sections we examine each family of efficient
flooding protocols and describe a few of their most important examples.
4.1.1 Heuristic-based
One approach to efficient flooding is to use heuristics to reduce the number of rebroad-
casts. Heuristics based on probabilities, counters, distance, and location have been proposed
[22] [38]. The main advantage of the heuristic approach is its simplicity. The primary dis-
advantage is the challenge of appropriately setting the parameters of the heuristic.
39
Probabilistic
The probabilistic scheme [22] is similar to flooding, except that nodes rebroadcast
flooded packets according to some pre-determined probability p. Thus, nodes will forward
flooded packets with probability p and drop flooded packets with probability 1?p. In blind
flooding, p is always 100%. In sufficiently dense networks, lower forwarding probabilities
may be used without adversely affecting delivery effectiveness. In sparse networks, however,
a greater probability is required for every node to receive the message.
The Fireworks protocol [39] slightly modifies the simple probabilistic scheme just de-
scribed. When nodes receive a flooded packet, they broadcast it to all their neighbors with
probability p and unicast it to c of their N neighbors (c < N) with probability 1?p. This
modification allows for greater delivery effectiveness and finer grained control than the naive
probabilistic scheme.
Counter-based
The counter-based approach [22] is another simple heuristic used to limit the forwarding
of flooded packets. A counter c keeps track of the number of times a redundant broadcast
message is received during some time interval. A counter threshold C is chosen as the
maximum number of times a redundant message will be rebroadcast. Whenever c ? C,
the rebroadcast is inhibited. If the threshold has not been exceeded, the packet will be
forwarded. The compelling features of this approach are its simplicity and adaptability
to varying network densities. In dense areas of the network, some nodes will refrain from
rebroadcasting while in sparse regions all nodes may forward the message.
40
Distance-based
The distance-based scheme uses relative distance between nodes as the criteria for
deciding whether or not to rebroadcast. If two nodes are relatively close to each other, then
the coverage area of another rebroadcast will largely overlap with the original broadcast
region, and few new nodes will receive the message. However, if a node receives a message
from a distant node, a rebroadcast will, for the most part, cover a different region (and
therefore should reach a different set of nodes). The obvious disadvantage of this scheme is
the requirement of distance estimation capabilities between each node. Ni et al. [22] claim
that this may be achieved without a Global Positioning System (GPS) by the use of signal
strength.
Location-based
As an improvement on the distance-based scheme, a location-based approach has also
been proposed [22]. In this scheme, exact (i.e. GPS) location is used to compute a precise
calculation of the additional coverage area provided by a rebroadcast. When a node sends
a flooded packet, it adds its own location to the packet header. Upon reception of a flooded
packet, a node calculates the additional coverage obtained by a rebroadcast and rebroadcasts
if this value exceeds a coverage threshold. Otherwise, the packet is dropped.
Another location-based approach to efficient flooding is regional flooding [18]. In this
scheme, the flooding of route discovery packets in directed diffusion is limited to a region
encompassing both the source and sink. By adding the location of the source and sink
to flooded messages, packets can easily be dropped when they traverse outside the region
defined around the two nodes. For example, two circular regions with radii slightly greater
41
than the half the distance between the source and sink may be defined around the source
and sink as shown in Figure 4.1. This scheme has been implemented in previous research
[18] as a filter using the directed diffusion filter API.
Figure 4.1: Regional Flooding (Region Filter)
Another location-based approach is the Geographic Adaptive Fidelity algorithm (GAF)
[40]. The primary objective of GAF is to identify ?equivalent? nodes so that some of them
can be put in an energy-conserving sleep mode. From a routing perspective, nodes are
equivalent when a constant routing fidelity can be maintained with only one representative
42
node awake. Node equivalence is determined using a virtual grid which overlays the net-
work. The grid is created with dimensions such that nodes in adjacent grids are able to
communicate with each other. Since only one node per grid must be awake to maintain
connectivity, nodes in the same grid are equivalent. Thus, all the nodes except one may en-
ter a sleep state. Although not specifically an efficient flooding technique, GAF is designed
to save energy throughout every phase of the routing protocol, not just route discovery.
4.1.2 Topology-based
In contrast to heuristic-based efficient flooding protocols, topology-based algorithms
exploit topological information about the network to identify the best set of forwarding
nodes. Most topology-based schemes use Hello message exchanges to collect topological
information from neighboring nodes. Other algorithms construct a source-tree rooted at the
source node and restrict leave nodes from forwarding flooded messages. A third topology-
based approach involves grouping nodes into clusters in which only one member, the cluster
head, is responsible for forwarding flooded packets to other cluster members.
Neighbor Knowledge-based
The neighbor knowledge-based schemes use 1- or 2-hop neighbor topology information
to build a well-covered mesh of forwarding nodes. Perhaps the simplest neighbor knowledge
method is Flooding with Self Pruning [41]. This protocol requires 1-hop neighbor informa-
tion to be exchanged with periodic Hello packets. Each broadcast packet contains a list
of the sending node?s 1-hop neighbors in the header. If the receiving node can not reach
any additional nodes by a rebroadcast (i.e., its 1-hop neighbors are a subset of the set of
nodes listed in the received packet), it will refrain from rebroadcasting the packet. More
43
advanced protocols utilize 2-hop information to achieve greater flooding efficiency. The
Scalable Broadcast Algorithm (SBA) [42] uses 2-hop neighbor information and the identity
of the previous hop to determine if any new nodes will be reached by rebroadcasting. This
may easily be determined since 2-hop connectivity is known. Dominant Pruning [41], like
SBA, uses 2-hop neighbor information to make forwarding decisions. Unlike SBA, how-
ever, Dominant Pruning requires forwarding nodes to explicitly choose which of their 1-hop
neighbors will be forwarding nodes. The protocol uses a Greedy Set Cover algorithm to
recursively choose 1-hop neighbors which cover the most 2-hop neighbors until all the 2-hop
neighbors are reached.
Multipoint Relaying (MPR) [43] is similar to Dominant Pruning in that upstream nodes
explicitly notify a subset of their 1-hop neighbors to rebroadcast, but it differs in the way
these forwarding nodes are selected. The only nodes allowed to rebroadcast a packet are
those chosen as Multipoint Relays (MPRs). In turn, MPRs must choose a subset of their
1-hop neighbors as MPRs. The algorithm for choosing MPRs is outlined below:
1. Select all 2-hop neighbors that can only be reached by one 1-hop neighbor.
2. Select the 1-hop neighbor which covers the most 2-hop neighbors.
3. Repeat 2 until all 2-hop neighbors are covered.
A protocol similar to MPR is Span. Span [44] selects a set of coordinating (forwarding)
nodes that covers all 2-hop neighbors but does so in a different fashion. If a node detects
insufficient neighboring coordinator nodes, it will proactively declare itself to be a coordina-
tor to alleviate the shortage. A node?s aggressiveness in becoming a coordinator is directly
44
related to the number of neighbors it connects and its current energy level. Like GAF (Sec-
tion 4.1.1), Span attempts to construct a forwarding backbone so that other (redundant)
nodes can enter an energy-saving sleep mode.
Source-Tree-based
The source-tree approach involves creating a source tree with the maximal number
of leaf nodes [45] [46]. The Adaptive Link-State Protocol (ALP) [45] is an example of
source-tree based protocols for flooding efficiency. Unlike traditional link-state protocols,
ALP does not require the state of each link to be flooded to the entire network. It uses
a tree structure to disseminate link state information to only those links along paths used
to reach destinations. Another source-tree-based protocol is Topology Broadcast Based on
Reverse-Path Forwarding (TBRPF) [46]. TBRPF is a proactive, link-state routing protocol
for mobile ad-hoc networks. A node rebroadcasts a flooded packet only if it is not a leaf
node in the spanning-tree formed by the minimum-hop paths from all nodes to the source
node. Constructing and maintaining the tree requires significant overhead. Nodes update
their tree status with each received packet and also periodically perform blind flooding.
Cluster-based
A third topology-based approach to efficient flooding is to divide the network into
a clustered structure. A representative from each group serves as a cluster head. Nodes
belonging to more than two clusters at the same time are called gateways. Other nodes are
called ordinary nodes. A cluster is defined by the transmission radius of the cluster head.
Efficient flooding is achieved by restricting rebroadcasting to non-ordinary nodes (cluster
heads and gateways).
45
The general concept of network clustering was first introduced by Ephremides et al. as
the linked cluster algorithm (LCA) [47]. Cluster heads are usually elected using the Lowest
ID algorithm (LID) or the Highest Degree algorithm (HD) [48].
In general, there are two approaches to clustering: active and passive.
Active In active clustering, clusters are created whether or not data transmissions are
occurring. Typically, non-trivial computation and communication overhead is required to
identify cluster heads and gateways. Each node must broadcast its ID (or degree) for
cluster head election and compare it to the IDs (or degrees) of all of its neighbors. Periodic
packet exchanges are often used to maintain the clustered structure. Communication is
also necessary for gateway selection. To increase flooding efficiency, the number of potential
gateways must be reduced using some gateway selection mechanism. The Flooding Gateway
Selection protocol (FGS) [49], for example, selects the best gateways using a greedy set cover
algorithm and then explicitly notifies the forwarding gateways of their status.
Passive Another approach to clustering is to compose groups passively by the use of
ongoing data traffic. Passive clustering (PC) is a cluster formation mechanism designed to
increase flooding efficiency in mobile and ad-hoc networks [50], [19]. The protocol reactively
constructs and adaptively maintains a clustered architecture to increase flooding efficiency.
Unlike the active clustering protocols described previously, PC achieves flooding reduction
on the fly, without explicit signaling and cluster setup packets. PC piggybacks cluster status
information on data packets and constructs the cluster structure as a by-product of ongoing
user traffic.
46
PC uses the piggybacked cluster information to deduce the role of a node as one of
the three states: cluster head, gateway, or ordinary node. Cluster heads broadcast flooded
packets to their neighboring nodes. Gateway nodes forward flooded packets between clus-
ter heads. Ordinary nodes drop all flooded packets (their neighbors have already received
the packet from a cluster head or gateway). By utilizing on-going packets to share cluster
information instead of explicit control messages, PC significantly reduces communication
overhead and the latency of cluster setup [19]. We describe passive clustering more thor-
oughly in Section 5.1.
4.2 Route Repair
In this section we summarize several protocols proposed to handle route repair in mobile
ad-hoc and sensor networks. Since our emphasis is on route repair, we describe the portion
of the protocol that performs path repair and omit general routing aspects of the protocol
whenever possible.
4.2.1 WAR
Aron and Gupta propose the Witness Aided Routing protocol (WAR) [23] which is
very similar to DSR but incorporates local correction mechanisms to recover from route
failures. One of the primary goals of WAR is to avoid costly end-to-end error recovery with
local repair mechanisms. WAR differs from DSR in two respects: unidirectional routing
and error handling. WAR uses witness hosts to perform promiscuous route maintenance.
Witness hosts are essentially routers that act on behalf of other nodes when they detect
possible packet loss. For example, if host X sends to host Y and is overheard by W1 and
47
W2 then W1 and W2 are witness hosts. If W1 and W2 do not hear Y forward the packet to
host Z (the next hop), one of them will attempt to deliver the packet to host Z. Secondly,
WAR uses a localized error recovery mechanism to repair broken paths without involving
the source. When a link error occurs, the node upstream from the break broadcasts a copy
of the original message with a recovery flag set. Like many other repair methods, WAR?s
route recovery message is constrained to a hop limited region around the repair initiator.
The hop limit, denoted as the Recovery Depth, is appended to the packet. To successfully
repair the link, route recovery messages must find a path to a downstream node that was
on the original path. Downstream nodes can identify themselves by searching for their
own ID to the route listed in the packet header. Analytical results show that as network
size and route length increase, the performance of end-to-end error recovery mechanisms,
degrades rapidly [23]. Hence, the local repair mechanisms of WAR support greater network
scalability.
4.2.2 ADMR
An Adaptive Demand-Driven Multicast Routing (ADMR) protocol [51] is chiefly con-
cerned with delivering packets to a multicast group in an on-demand fashion (instead of
continuously maintaining the multicast group structure). ADMR routes messages through a
tree structure from the source (root) to each group member (leaf or branch). When forward-
ing nodes or receiving members become disconnected from the multicast forwarding tree,
ADMR invokes a local subtree repair algorithm to detect and repair the path. Each node
maintains a disconnection timer for each group based on the inter-packet arrival time. If
no packet is received within this time interval, ADMR assumes disconnection has occurred.
48
Nodes that detect disconnection initiate local repair by sending a repair notification packet
to nodes below them in the subtree (downstream). After sending a Repair Notification,
nodes wait for repair delay period of time. If a Repair Notification is received within
this time interval, the node will cancel its local repair (since it is further downstream from
the break). No Repair Notification will be received by the node whose parent has failed
so it will identify itself using this procedure and initiate local repair. This node will flood
a hop-limited Reconnect packet in the neighborhood around itself. When nodes along
the original path receive Reconnect packets, they forward them without incrementing
the hop count. Thus, Reconnect packets can travel back to the source along the original
path. If the original path is found by the hop-limited flood, the source will respond with
a Reconnect Reply packet which will be unicast back to the repair node along the re-
verse path the reconnect message took. In this way, a path from the source (root) to the
destination around the broken link will be reconstructed.
4.2.3 SWR
A Single path With Repair routing scheme (SWR) is proposed in [52]. This protocol
is motivated by the desire to avoid source-initiated path repair. In SWR, the pivot node,
located immediately upstream from the break, searches for alternate paths around the
broken link. It does so by broadcasting a Help Request (HREQ) to its neighbors. Upon
reception of an HREQ message, nodes use previously stored information about the topology
of the network to create an alternate path around the failed node with Help Response
(HREP) packets. SWR maintains topological knowledge of the network by the use of
49
cost information transmitted in each data packet. This procedure is recursively repeated
upstream back to the source if the original pivot node is unable to find an alternate path.
4.2.4 SHORT
A framework of Self-Healing and Optimizing Routing Techniques (SHORT) is presented
in [53]. Unlike other repair mechanisms, SHORT attempts to find new routes constantly,
even when connectivity is present. In some sense, it attempts to heal broken links before
they break by constantly searching for better paths. SHORT can be applied on top of
existing mobile ad-hoc routing protocols (e.g. DSR or AODV) to increase performance by
optimizing existing routes when a better local sub-path becomes available. Two algorithms
are described which optimize proactive repair based on path length (Path Aware-SHORT)
or energy level (Energy Aware-SHORT). The primary goal of this approach is to discover
short-cut routing paths. Short-cuts result from the mobility of nodes in the ad hoc network.
Although connectivity may still be intact, the shortest path from source to destination may
change after its initial discovery. SHORT continually seeks and takes advantage of such
short-cuts. This is accomplished through the use of a hop count (HC) field on each packet.
Nodes maintain a hop comparison array for each data flow to identify short-cuts. The
addition of SHORT improved the performance of DSR and AODV in terms of both path
optimality and message delivery rate [53].
50
4.2.5 RDMAR
Relative Distance Micro-discovery Ad Hoc Routing (RDMAR) [54] localizes flooding
of route discovery queries by estimating the relative distance between the source and desti-
nation. This approach to limiting route discovery and repair floods is the distinctive char-
acteristic of RDMAR. By assuming a maximum velocity and average transmission range of
all mobile nodes, RDMAR calculates the maximum number of hops separating two nodes
their relative position (i.e. number of hops apart). RDMAR requires each node to maintain
a routing table with information about every other host in the network. This includes the
following fields:
? Default Router - Neighbor indicating the next hop for this destination host.
? Relative Distance - Estimate of the relative distance (in hops) to this destination host.
? Time of Last Update - Time last update was received from this destination host.
? Route Timeout - Time remaining before route is considered invalid.
? Route Active - Flag indicating whether the route is currently active.
RDMAR uses the Time of Last Update to compute a time interval of uncertainty
designated as tmotion. Assuming a velocity Micro Velocity and a transmission range
Micro Range, the source and destination nodes can estimate their minimum and maxi-
mum radius of movement during time period tmotion as shown in Figure 4.2. This distance
is divided by the Micro Range to compute the number of hops that should be traversed in
the route discovery flood. RDMAR utilizes the same Micro-Discovery mechanism for route
51
repair except that it is initiated by an intermediate node instead of the source. Upon detec-
tion of a link failure, an intermediate node will flood route requests to the destination using
relative distance information from its routing table to set the number of hops appropriately.
In this way, route requests will be restricted to the region between the intermediate node
and the destination.
Figure 4.2: Relative Distance Micro-Discovery
4.2.6 ABR
Associativity-Based Routing (ABR) [55] [56] defines a routing metric called degree of
association stability used to make routing decisions. Association stability is related to the
connection stability of one node with respect to another node over time and space. The
52
destination examines association stability of potential routes and chooses the one which is
most stable and contains the fewest number of hops. ABR is divided into three phases:
route discovery, route re-construction, and route deletion. The second phase, route re-
construction, deals with route repair in a manner very similar to SWR (Section 4.2.3).
The node immediately upstream from a break broadcasts local (hop-limited) queries that
perform partial route discovery. This process is repeated recursively upstream toward the
source. The backtracking process is discontinued if the node currently performing the repair
is more than half the distance (in hops) from the destination to the source. In this case,
global route discovery is performed instead of localized repair.
4.2.7 TORA
TORA (Temporally-Ordered Routing Algorithm) [57] is a source-initiated routing pro-
tocol based on the concept of link reversal. Routes are created using a height metric to
establish a Directed Acyclic Graph (DAG) rooted at the destination. Links are assigned a
direction (either upstream or downstream) based on their relative height. In order to create
this structure, nodes need information about their 1-hop neighbors. TORA incorporates
a route maintenance mechanism that supports localized route repair. The node immedi-
ately upstream from the link failure generates a new reference level, which is propagated
by neighboring nodes and causes nodes to adjust to the new height. When a node has
no downstream links, it reverses the direction of its links. This link reversal propagates
upstream until a new route to the destination can be found.
53
4.2.8 AODV-LRQT
Pan et al. [58] present an extension to the local repair mechanism for the Ad-hoc
Distance Vector (AODV) routing protocol called AODV-LRQT. The modified protocol
limits the extent to which the repair mechanism is applied along two network dimensions:
depth and breadth. Both approaches limit the scope and cost of flooding. Decreasing the
depth means limiting the number of times a node can forward a repair route request. This
is similar to the counter-based efficient flooding technique discussed in Section 4.1.1. To
reduce the breadth of repair, route repair packets are given a hop count, or time to live
(TTL), which localizes the scope of potential new paths to some n-hop neighborhood around
the broken link. These mechanisms are implemented using two algorithms: repair quota and
adaptive TTL. Each node has a repair quota (RQ) to control the breadth of repairs. When
the RQ has been met, a node will no longer forward route request packets. Depth reduction
is implemented with an adaptive TTL. The algorithm assumes knowledge of the network
topology and transmission range in order to find a hop count which is half the length of the
longest path in the network. Simulation showed that, in AODV, constraining the breadth
(repair quota) of route request floods provided a greater performance improvement than
constraining the depth (adaptive TTL) [58].
4.2.9 PATCH
Proximity Approach To Connection Healing (PATCH) [59] is another local recovery
mechanism proposed for DSR. It aims to reduce the control overhead and achieve fast,
localized recovery. When an intermediate node detects a broken link, it floods a local
recovery request in the two-hop region around itself, looking for downstream nodes on the
54
path (i.e. those closer to the destination). It does so by including in its header a list of all
the nodes between the intermediate node and the destination. If one of the nodes included
in the header receives the local recovery request, it sends a local recovery reply to the source
so that the new route will be used in future communication. If no route is found within
some recovery interval, the source will initiate the end-to-end error recovery process.
4.3 Real-Time Communication
4.3.1 RAP
RAP is a real-time communication architecture for large scale wireless sensor networks
proposed in [60]. The goal of RAP is to provide a scalable and lightweight communication
service that maximizes the number of packets meeting end-to-end deadlines. The network
stack for the RAP protocol suite is shown in Figure 4.3.
Figure 4.3: RAP Communication Architecture
55
On the top layer, a query-event service allows application developers to easily monitor
events and submit queries to the sensor network. Queries are registered beforehand and
triggered when an event occurs. When an event that matches the attributes of interest
occurs in the geographic area of interest, a message stamped with the timing constraint is
sent to the base station at the registered location. Queries are written and registered using
the following API:
? query(attributes, area, timing constraints, base station location)
? attributes - Attributes of interest.
? area - Geographic area of interest.
? timing constraints - Timing constraints for event, i.e., end-to-end deadline.
? base station location - Location of base station which will receive event data.
? register event(event, area, query)
? event - Name of event.
? area - Geographic area of interest.
? query - Query associated with event.
The Location-Addressed Protocol is a transport layer similar to UDP except that it uses
location information instead of IP addresses for identifying hosts. Routing is handled by
Geographic Forwarding (GF), a simple but robust network protocol which forwards packets
based on location information. GF greedily forwards packets to the neighbor closest to the
packet?s destination. GPSR is used to route packets around the perimeter of a void region.
56
The heart of RAP is Velocity Monotonic Scheduling (VMS), the layer that provides
support for real-time communication. VMS is the packet scheduling policy that determines
the order in which incoming packets are forwarded. Typically, ad hoc networks forward
packets in FCFS order but this policy performs poorly in networks where data flows have
different end-to-end deadlines. In contrast, RAP prioritizes packets based on their ?local
urgency.? VMS considers both the temporal deadline and the geographic distance when
scheduling packets. Thus, VMS is both deadline-aware and distance-aware. This means
that packets with shorter deadlines and packets with longer distances to the destination will
have higher priorities. VMS defines the velocity of a packet as the quotient of the distance
to the destination and the time deadline. By assigning priorities based on the velocity, VMS
is able to accurately quantify the ?urgency? of a packet and thereby meet more deadlines.
Two priority assignment policies are defined in VMS: static velocity monotonic (SVM) and
dynamic velocity monotonic (DVM). SVM calculates a fixed velocity once at the source
before the packet is transmitted. SVM computes the velocity using Equation 4.1 where
psrc = (xsrc,ysrc) and pdst = (xdst,ydst) are the locations of the source and destination
respectively, bardbl?bardbl is the distance between two points, and Tdeadline is the end-to-end deadline.
V = bardblpsrc ?pdstbardblT
deadline
(4.1)
DVM re-computes the velocity at each intermediate hop using Equation 4.2. Note
that Ti represents the time elapsed for the packet to reach intermediate hop i. Initially,
Ti = T0 = 0 and pi = psrc at the source node. By updating the priority at each node, a
packet that is progressing more slowly than its velocity may dynamically increase its priority.
57
Likewise, a packet traveling more quickly than its requested velocity may be slowed down
to give way to more urgent packets.
V = bardblpi ?pdstbardblT
deadline ?Ti
(4.2)
To implement VMS, the network must use a prioritized queue. RAP prioritizes at two
levels to maximize performance. Prioritization based on the velocity is performed at the
network layer and at the MAC-layer. The network layer places packets in a queue ordered by
velocity (higher velocity first). Additionally, RAP extends the IEEE 802.11 MAC protocol
to adapt the wait time (DIFS) and backoff window (CW) based on the priority of the packet.
RAP has been shown to significantly reduce the deadline miss ratio when compared
to traditional FCFS mechanisms. When compared to DSR over standard IEEE 802.11,
RAP reduced deadline miss ratio from 90.0% to 17.9% [60]. Despite its simplicity, RAP
significantly increases the real-time performance.
4.3.2 SPEED
SPEED [61] is another real-time protocol for sensor networks. SPEED employs feed-
back control and stateless algorithms to support soft real-time communication. The proto-
col?s design also emphasizes load balancing, localized behavior, and minimized dependence
on the MAC layer. Like RAP, SPEED uses a location-based routing protocol to forward
packets. The organization of the SPEED protocol suite is illustrated in Figure 4.4.
At the top layer, SPEED provides an API that supports three types of communication
modes: unicast, area-multicast, and area-anycast. While unicast follows the familiar 1-1
communication model, multicast and anycast are somewhat unconventional modes based
58
Figure 4.4: SPEED Architecture
on geographic constraints. Area-multicast delivers a packet to each node in a circular region
defined by a center position and radius. Area-anycast is designed for applications in which
it is sufficient for one node in some area to respond for that region. Like other geographic
routing protocols, every node in SPEED participates in a periodic beacon exchange with
its neighbors to share location information. Two on-demand beacons, a delay estimate
beacon and a backpressure beacon, are also employed by the protocol. By timestamping
each data packet and ACK, the single hop delay to each neighbor can be calculated. Instead
of using queue size to gauge congestion, SPEED uses this single hop delay as a metric to
approximate load in the network.
Routing in SPEED is handled by Stateless Non-deterministic Geographic Forwarding
(SNGF), a modified version of simple Geographic Forwarding (GF). SGNF defines the
forwarding candidate set, FSi, of node i as the set of neighbors of node i that are closer to
the destination. Relay Speed, Speed, is computed by dividing the gain in distance to node
j by the estimated time delay to node j. Formally,
Speedji = bardbl vectorpdest ? vectorpibardbl?bardbl vectorpdest ? vectorpjbardblHopDelayj
i
(4.3)
59
where pdest is the location of the destination, pi is the location of node i, bardbl?bardbl is the distance
between two points, and HopDelayji is the estimated delay from node i to node j. Packets
are forwarded to nodes in FSi based on their relay speed. The node with the maximum
relay speed is chosen as the next hop if the relay speed is greater than some Ssetpoint, a
system parameter. Otherwise, the packet is probabilistically forwarded to the neighbor with
the highest relay speed. If the packet is not forwarded, backpressure rerouting is initiated.
This algorithm works with the neighborhood feedback loop (NFL) to adapt the network
layer to congestion. When congestion is detected by NFL, backpressure beacons will be sent
upstream to find routes around the congested area. SPEED uses this same mechanism to
discover routes around network voids. Although SPEED is somewhat more complex than
RAP, it provides several advanced real-time and congestion-avoidance features not offered
by RAP. In some sense, SPEED is a heavyweight approach to real-time communication
while RAP is more a lightweight protocol.
60
Chapter 5
Network Services Architecture
Directed diffusion serves as the baseline network protocol for our S&R system. We
have developed three improvements for diffusion to lessen its weaknesses and enhance its
strengths. We first describe in detail the passive clustering algorithm for efficient flooding
and its adaptation to diffusion. Next, we explain a route repair mechanism for directed
diffusion which emphasizes localization of repair. Thirdly, we outline the architecture of
a real-time communication protocol for diffusion which prioritizes data flows. The specific
purpose of the network services is to improve efficiency, robustness, and timeliness of delivery
for directed diffusion. More generally, the purpose of the enhanced communication services
is to enable developers to easily utilize the power of the distributed sensor environment
without its inherent complexities. The improved network communication mechanisms allow
the system to function in challenging environments, which may have hindered functionality
previously. Passive clustering reduces congestion resulting from network flooding. Flooding
reduction is especially relevant in the context of energy efficiency and network latency. The
route repair mechanism is crucial for reliable operation of the system in the face of link
failures. It provides an efficient method of route healing to cope with node failure. Also
critical to the performance of the system is timely response to detected events. To this
end, our real-time communication mechanism performs packet prioritization and, thereby,
reduces the number of packets that miss deadlines. The real-time communication protocol
is a based on RAP [60], but has been adapted to the two-phase pull model of directed
diffusion. It has also been extended to support location-unaware networks. The distributed
61
services described in Section 2.3, the lookup service, composition service, and adaptation
service, may also be used in the S&R architecture for greater system usability and reliability.
Figure 5.1 shows a layered overview of the S&R architecture.
Figure 5.1: Layered Architecture
Note that the distributed services reside above the network services proposed in this
thesis. The distributed services can take advantage of the lower level services provided by
the network services (i.e., efficient flooding, route repair, and real-time communication).
A more detailed illustration of the architecture of the system is shown in Figure 5.2.
This diagram shows the interactions among the distributed services and among the network
services. Also illustrated is the abstraction of the underlying routing protocol. Although
applications and distributed services see diffusion as the routing entity, in actuality the
diffusion core, gradient, and all three network services are working to deliver packets. Thus,
the network services transparently benefit applications.
62
Figure 5.2: Detailed Architecture
63
5.1 Clustering Mechanism
As a result of the high cost of flooding and diffusion?s strong reliance on it, we chose
to implement a clustering mechanism for efficient flooding. We selected passive clustering
(PC) for this purpose because its design objectives are very compatible with diffusion. In
Section 6.2.1, we discuss the specific reasons for this choice. In this section, we give an
overview of passive clustering and describe the detailed workings of the protocol. Finally,
we describe the minor modifications necessary to adapt passive clustering to diffusion.
5.1.1 Passive Clustering Overview
The distinctive characteristic of passive clustering is its use of on-going data traffic to
initiate cluster formation and communicate cluster-related information among the nodes.
Using promiscuous packet reception, nodes gather cluster status information about all their
1-hop neighbors and adjust their own cluster state accordingly. Passive clustering creates
clusters by assigning one of three states to a node. Nodes may be cluster head (CH),
gateway (GW) or ordinary nodes (OR). CHs serve as leader nodes for their clusters and
forward flooded packets to each member of the cluster. GWs connect two or more CHs
together, thus serving as cluster relays. ORs receive flooded packets from CHs but do not
forward the packet to their neighbors. Passive clustering results in a clustered structure
similar to the topology shown in Figure 5.3. The circles represent cluster boundaries.
Corresponding to the three node states, PC defines three fundamental rules for oper-
ation: first cluster head declaration wins, gateway selection heuristic, and ordinary nodes
drop flooded packets.
64
Figure 5.3: Example PC Topology
First Declaration Wins
Perhaps the most distinctive PC rule is its cluster head election rule. PC selects cluster
heads by allowing the first node that declares itself CH to become CH. This is known as the
first declaration wins rule. If a node has not heard from another CH, it claims itself to be
CH and rules the rest of the nodes in its radio range. The first declaration wins rule provides
several advantageous properties. Unlike many conventional clustering schemes, PC requires
no waiting period or neighbor checking requirement to elect a CH. A node becomes CH as
soon as it declares itself to be CH. The first declaration wins approach is also less likely
to result in chain re-clustering [50]. In traditional clustering protocols, when two cluster
heads move within transmission range of each other, one of them must defer to the other.
This can trigger cluster head changes that propagate throughout the network [62]. The
first declaration wins rule reduces this effect. Thirdly, PC creates a clustering that more
65
closely resembles the data flow in the network. By creating cluster heads based on traffic,
PC achieves a better correlation between traffic flow and resulting clustered topology.
Gateway Selection Heuristic
If too many nodes become gateways, the flooding efficiency decreases since such a
large number of nodes forward flooded data. If too few nodes become gateways, network
connectivity may be adversely affected. The goal is to choose a sufficient number of gateways
to preserve connectivity, but very few more. To make this trade-off dynamically, PC defines
a gateway selection heuristic. The gateway selection heuristic limits the number of nodes
that become gateways without breaking the passive nature of PC. A node becomes a gateway
according to the number of cluster heads and gateways it has overheard. Whenever a non-
cluster head node hears a packet from a cluster head or gateway, the node becomes a
gateway if Equation 5.1 is true. Otherwise, the node will become an ordinary node.
??num(GW)+? > num(CH), (5.1)
Note that in Equation 5.1, num(GW) is the number of neighbors known to be gateways,
num(CH) is the number of neighbors known to be cluster heads, and ? and ? are tunable
parameters (?,? ? 0). The values of ? and? should be chosen based on factors such
as channel quality, noise level, and traffic patterns [19]. They may be locally adjusted
to provide better adaptability and flexibility. This gateway selection procedure is fully
distributed and requires only local information. It relies on overheard packets instead
of active packet exchanges (e.g. cluster head-list exchanges). The disadvantage of this
approach is that network connectivity may be affected for wrong values of the parameters.
66
If the parameters are too aggressive in reducing the number of gateways, the topology may
be partitioned.
Ordinary Nodes Drop Flooded Packets
The behavior of ordinary nodes lies at the heart of flooding reduction. A node that is
neither a cluster head nor a gateway becomes an ordinary node. When a node identifies itself
as an ordinary node, it no longer forwards flooded packets. All flooded packets received
by an ordinary node can safely be dropped because neighboring nodes will have already
received the packet (either from a CH or a GW). Hence, flooding efficiency is dependent
on the number of ordinary nodes that can be found. In a sufficiently dense network, a
relatively large number of ordinary nodes should be found.
5.1.2 Protocol Details
PC defines seven clustering states for nodes: two internal states and five external
states. Internal states are entered when a packet is received and serve as a tentative role
for the node while the packet is being processed. The two internal states are Gateway-
Ready (GW READY) and Cluster Head-Ready (CH READY). Nodes enter external states
when a packet is sent. These are externally visible states which are communicated to
neighboring nodes and used in making adjustments of node state. The external states are
Initial (IN), Cluster Head (CH), Full Gateway (FULL GW), Ordinary Node (OR), and
Distributed Gateway (DIST GW).
67
Initial
On startup, nodes enter the Initial state. A node in Initial state does not belong to any
cluster. Nodes move from the Initial state to one of the two internal states when a packet
is received. PC maintains soft-state clusters using an implicit timeout scheme. If a node
does not receives any packets within time interval tclustertimeout, it reverts back to Initial
state. Upon future reception of packets, the node will restart the clustering algorithm.
Cluster Head-Ready
Cluster Head-Ready is the internal state of potential cluster heads. A node in Initial
state changes its state to CH READY only when it has received a packet from a non-CH
node. Simultaneous cluster head can sometimes occur due to topology changes and packet
delay. To resolve these conflicts, PC uses a Lowest ID algorithm to select the node with the
lowest ID to serve as cluster head. If a cluster head receives a packet from another cluster
head with a lower ID, it gives up its role and enters the Gateway-Ready state. The node
which loses the LID completion sends a CH Give-Up message to its neighbors informing
them of the change in status.
Gateway-Ready
The Gateway-Ready state is the internal state of potential gateways (both full and
distributed). A node enters GW READY state when it receives a packet from a CH. A
GW READY node will become a gateway or an ordinary node depending on the gateway
selection heuristic. A node will change from gateway-ready to FULL GW or DIST GW if
an insufficient number of its neighbors are already gateways.
68
Cluster Head
The Cluster Head state is the external state of cluster heads. A node enters the CH
state from the CH READY state if it wins the Lowest ID competition or if it has not
overheard any other cluster heads.
Full Gateway
Nodes in the state Full Gateway directly connect two cluster heads. A full gateway
is thus a member of two clusters. Full gateways announce the IDs of the two CHs that
they connect. A node that is reachable from two CHs may declare its role as FULL GW
only if it has not heard from another FULL GW node announcing the same pair of IDs.
In this way, gateways also follow the first declaration wins rule. If two nodes concurrently
declare themselves as FULL GW for the same pair of CHs, the node with the lowest ID
will win and the loser will become an ordinary node. Full gateways are in the external state
FULL GW. Figure 5.4 illustrates a full gateway.
Figure 5.4: Full Gateway
Distributed Gateway
A node that is in the Distributed Gateway state connects two clusters, but is not
directly connected to one of the cluster heads. A distributed gateway is composed of two
nodes working together to serve as a gateway between two clusters. This allows cluster
69
heads that are two hops apart to be connected. Figure 5.5 illustrates a distributed gateway.
Notice both the full and distributed gateways in the example topology in Figure 5.3.
Figure 5.5: Distributed Gateway
Ordinary Node
Ordinary nodes are nodes that do not forward packets. A node enters the OR state
from the GW READY state based on the gateway selection heuristic. A node changes from
GW READY to OR if enough of its neighbors are gateways as determined by the gateway
selection heuristic. When a node can hear from more than two CHs and every pair of
announced CHs is connected by a FULL GW it will become an OR.
Incoming Packet Processing
The transition state diagram shown in Figure 5.6 summarizes the passive clustering
algorithm. When a packet is received at a node, the state of the node will be changed to
one of the two internal states: CH READY or GW READY. An Initial node will change its
state to CH READY if the packet is from a non-CH (1). If the packet is from a gateway or
an initial node, the node will enter the GW READY state (5). CHs revert to CH READY
(3) and GWs (9) and ORs (8) revert to GW READY upon each incoming packet. The
receiving node always updates its neighbor lists based on the state of the sending node
which is contained in the message. By stepping back to the GW READY state (8, 9, and
11), nodes can adjust their state dynamically. For example, if the number of gateways
70
has changed, an ordinary node may promote itself to gateway. When the soft-state of
PC expires, nodes in each state revert back to Initial (12, 13, 14, and 15). As a result,
new clusters may be composed in response to subsequent packet flooding. This potential
for re-structuring is beneficial for sensor networks since cluster head responsibilities will
be handled by different nodes throughout the life of the system, thus distributing energy
consumption.
Figure 5.6: PC Transition State Diagram
Outgoing Packet Processing
When a packet is ready to be sent, the node changes its PC state from an internal
state to one of the five external states. If it has not heard from any other CHs, a candidate
71
CH (CH READY) will change its state to CH (2) when it has packets to send. Otherwise
it will change to GW READY. Nodes in the GW READY state will become FULL GW,
DIST GW, or OR. If the number of known CHs is greater than one and there is no gateway
connecting any two CHs, the node becomes a FULL GW (7). If a DIST GW has announced
another cluster not known by the current node and no FULL GW connects them, then the
node becomes a DIST GW (10). Otherwise, it becomes OR (6). If the number of known
CHs equals one, then the node becomes either a DIST GW or an OR. If a node hears a
DIST GW announce a CH other than its CH or if there is no DIST GW in the cluster, it
will become a DIST GW (10). Otherwise, the node will become an ordinary node (6).
5.1.3 Adaptation to Diffusion
PC state information must be appended to all flooded packets. To apply PC on a
diffusion network, the state information is added to interests and exploratory data. Recall
that diffusion creates gradients and periodically refreshes them. After this initial setup
phase, flooding is no longer necessary since data flows over reinforced paths. Since PC
piggybacks cluster status information only on flooded packets, PC creates the clustered
structure during the initial setup phase of the gradient cycle. For the remainder of the
cycle, no PC state information is transmitted, so PC is essentially inactive. PC is only
active during period gradient refresh when diffusion floods the network with interests and
exploratory data.
Since PC affects the route setup, it is possible for suboptimal routes to be discovered.
The structured topology created by PC may exclude the optimal route between source and
sink. This end-to-end route from source to sink will only include nodes identified to be
72
cluster heads and gateways ? no ordinary nodes will be on this path. This potential for
suboptimal routes is one of the inherent trade offs of PC as compared to global flooding.
PC will improve flooding efficiency but may result in longer routes.
5.2 Repair Mechanism
To deal with the shortcomings in diffusion?s ability to adapt to failure and mobility,
we have developed a mechanism to efficiently handle route repair. We call our local repair
protocol for directed diffusion DD-Repair. Our solution emphasizes truly localized repair
in order to reduce latency and energy expenditure. Its basic structure is similar to the local
repair algorithm used in ADMR [51], but its methods noticeably differ from ADMR since it
is tailored to directed diffusion. We divide the local repair problem into three phases: break
detection, break localization, and localized gradient repair. We will describe how DD-RAP
handles each of these phases in this section.
5.2.1 Break Detection
The first step in adapting to node failure or mobility is detecting the link breakage.
This may be accomplished in a variety of ways. We assume that appropriate algorithms
may be used to reliably detect a link breakage. Our focus is on handling the break after it
is detected not adaptively detecting path breakage. For our experiments, we used a fixed
event rate known a priori to detect breaks. We identified a link as broken when no data
was received from a flow after an entire event interval had elapsed.
73
5.2.2 Break Localization
Once a break has been detected, the break localization phase begins. The goal of
this phase is to identify the node immediately downstream from the broken path. This
node will initiate the localized gradient repair algorithm described in the next section. All
intermediate nodes that detect a break will send a repair notification message to their 1-hop
downstream neighbors along the gradients for the missing data. Every intermediate node
downstream from the break should send a repair notification, and every intermediate node
except the one nearest to the break should also receive a repair notification. If a node does
not receive a repair notification within trepair seconds after sending one, it must be the
nearest node, so it initiates the localized gradient repair.
5.2.3 Localized Gradient Repair
The heart of DD-Repair is the localized gradient repair phase. The goal of this phase
is to find and create new gradients in the area near the break by using the same basic
mechanisms as global gradient repair. We first describe several potential mechanisms for
restricting flooding to a localized region and then explain the inner workings of the repair
process itself. Figure 5.7 illustrates the break localization and localized gradient repair
phases of the algorithm.
Local Flooding
In order to restrict the flooding required to find new paths, we limit the packet for-
warding in one of several ways. Possible methods include simple hop-limited flooding or
limiting the reconnect packets to the 1-hop neighbors of the failed node. A more complex
74
Figure 5.7: Local Repair for Directed Diffusion
approach is to make use of the clusters created by passive clustering to restrict flooding.
The most common approach to localized flooding among repair protocols is a simple hop-
limited flood. In this approach, a hop count (or time to live) field is incremented on each
hop. When the hop count exceeds a threshold value, the flooded packets are dropped.
Another straightforward approach is to limit the flooding to 1-hop neighbors of the failed
node. This can be easily implemented by including the ID of the failed node on the repair
packets and restricting flooding to nodes who have overheard packets from that failed node.
A third and more sophisticated strategy is to limit flooding to nodes in the one or two
clusters around the failed node. The trade-offs associated with each of these approaches are
discussed in Section 6.2.2.
Reconnect Interests
The first step in localized gradient repair is the local flooding of reconnect interest mes-
sages. These interest messages for the broken data flow are essentially searching for nodes
75
upstream from the break which are still receiving data. Reconnect interests are only trans-
mitted in a limited neighborhood (as defined by the localized flooding algorithm) around
the node nearest to the break (identified during the localization phase). This originator
node acts like the sink in global gradient repair. For this reason, we label it the proxy
sink in local gradient repair. Nodes outside the area determined by the local flooding al-
gorithm will drop reconnect interests, enforcing the region boundaries. Reconnect interests
are flooded in the region near the break in search of upstream nodes still connected to the
dataflow.
Reconnect Exploratory Data
In response to reconnect interests, upstream nodes on the data path that are still re-
ceiving data transmit reconnect exploratory data messages to neighbors that send reconnect
interests. Reconnect exploratory data is exploratory data that is sent back to the proxy
sink. In order to perform the most localized repair, the node directly upstream from the
break should be found. This node will act like the source in global gradient repair, so we call
it the proxy source in local gradient repair. To achieve this behavior, reconnect exploratory
data messages are only sent from nodes that have not overheard reconnect exploratory data.
Thus, the first node to send reconnect exploratory data will become the proxy source. In
summary, only nodes along the original path can send reconnect exploratory data. The
node nearest upstream to the break should be identified as the proxy source since it should
be the first node along the existing path to receive reconnect interests. The proxy source
will send reconnect exploratory data back to the interest initiator (the proxy sink).
76
Reconnect Reinforcement
In global gradient repair, the sink reinforces the fastest path over which the exploratory
data is received. Correspondingly, in the case of local repair, the proxy sink sends reconnect
reinforcement messages to the first neighbor that delivers a reconnect exploratory data. In
turn, this neighbor node will reinforce its fastest neighbor all the way back to the proxy
source. While this path is probably not optimal, it serves as a temporary fix until the next
global gradient refresh. Since the search for the new path was restricted to a relatively
small region, it is almost identical to the original optimal path found by diffusion except for
a few hops where the repair was performed. As a result, the repaired path should provide
a reasonably low-latency route from source to sink.
5.3 Real-Time Communication Mechanism
To support timely communication, we have developed a real-time communication ser-
vice for directed diffusion based on RAP [60]. Recall from Section 4.3.1 that the RAP
architecture defines a five layer network stack. Many of the layers in the RAP network
stack can be handled natively by diffusion. We have extended diffusion to layers. The top
layer of RAP, a query-event service which allows events to be registered and triggered is
inherently provided by the data-centric, publish-subscribe nature of diffusion. Diffusion op-
erates without a transport layer, so RAP?s Location-Addressed Protocol is unnecessary in
diffusion. Diffusion also handles the responsibilities assigned to Geographic Forwarding in
RAP since diffusion routing is based on attribute vectors. Although routes in diffusion are
more costly to establish due to global flooding, no location information is required. RAP,
in contrast, requires each node to know its own location requiring costly GPS hardware
77
or every node. The core component of RAP is Velocity Monotonic Scheduling (VMS), the
distance-aware and deadline-aware scheduling policy. VMS prioritizes messages by comput-
ing a velocity based on the packet?s distance to its destination and its temporal deadlines.
Although packets are prioritized at the MAC and network levels in RAP, our implementa-
tion only performs prioritization at the network layer.
5.3.1 DD-RAP
We have implemented both static (SVM) and dynamic (DVM) velocity monotonic
scheduling policies on top of diffusion. In addition, we have extended the protocol to
compute priority without requiring each node to know location information. Instead we
use time information to estimate distance. We denote our time-based scheduling policies
SVM-t and DVM-t. Our real-time mechanism, DD-RAP, can use any of the four algorithms
to compute priority: SVM, DVM, SVM-t, or DVM-t. If location information is known by
each node, SVM or DVM may be used to prioritize packets in the same fashion as RAP.
The sink adds its location and the deadline of the data flow to outgoing interest packets.
For SVM, when the source receives an interest message, it extracts the latitude, longitude,
and deadline from the packet and calculates the velocity according to Equation 4.1. In
DVM, each node along the reinforced path updates the priority of the packet based on its
own location and the elapsed time of the packet as defined in Equation 4.2.
5.3.2 SVM-t
If location information is not available, a time-based approach is used to estimate the
distance from source to sink. The time delay between source and sink can be estimated
without introducing significant communication overhead since two-way message exchanges
78
are necessary for route creation in diffusion. Instead of appending location information
to outgoing messages, the sink node adds timestamps to reinforcement packets. When
reinforcement messages are received at the source, the time delay between when the sink
sent the reinforcement and when the source received it is computed. We assume this time
difference is proportional to the distance from source to sink and that links are symmetric
(i.e., the time delay from sink to source and from source to sink is the same). We also
assume time synchronization among the nodes. The primary disadvantage of using time to
gauge distance is that congestion delay will erroneously increase the distance estimate. End-
to-end delay will increase due to congestion, implying a greater distance between source and
sink. To compensate for this, we store a running minimum time difference for use in priority
calculation. The time difference used in computing priority is the smallest end-to-end delay
recorded up to this point in time.
As in RAP, the priority of data packets may be calculated in two ways: statically
(SVM-t) or dynamically (DVM-t). In the static case, priority is computed at the source
and is never modified by intermediate nodes. In DVM-t, intermediate nodes dynamically
re-calculate the priority at each hop. In SVM-t, the priority is computed according to
Equation 5.2.
P = tsource ?tsinkT
deadline
(5.2)
Note that tsink is the time the reinforcement packet was sent from the sink, tsource is the
time the reinforcement message was received at the source, and Tdeadline is the deadline
of the data flow in units of time (not a timestamp). Instead of a velocity, the time-based
approach results in a priority value which is a ratio of times.
79
5.3.3 DVM-t
In the dynamic case, the priority of a packet will be re-calculated at each hop. Data
messages must be timestamped by the source so that intermediate nodes can calculate
elapsed time for a given packet. To estimate the distance to the sink, the elapsed time
is subtracted from the total end-to-end time found using the same procedure as SVM-t.
The elapsed time is also subtracted from the deadline to gauge the deadline urgency of the
message. DVM-t calculates priority using Equation 5.3.
P = tsource ?tsink ?TelapsedT
deadline ?Telapsed
(5.3)
As in SVM-t, tsink is the time the reinforcement packet was sent from the sink, tsource is
the time the reinforcement message was received at the source, and Tdeadline is the deadline
of the data flow in units of time. Telapsed is the time elapsed in sending the packet to the
current hop and is computed as Telapsed = tnow ?tdata where tnow is the current time and
tdata is the time the data packet was sent from the source. The numerators in Equations
5.2 and 5.3 correlate to distance awareness and the denominators encapsulate deadline
awareness. The time difference tsource ? tsink gives an estimate of the time required for
a packet to travel from source to sink. In DVM-t, the elapsed time Telapsed is subtracted
from the end-to-end transmission time in order to estimate the time (and distance) from
the current node to the sink. In each gradient refresh cycle, a new reinforcement packet
will be received by the source. The source will update its values of tsource and tsink if the
time difference is less than the previous difference. The difference tsource ? tsink will be
the running minimum of all calculated time delays. Taking the minimum time difference
80
each cycle will reduce the effects of congestion on the time delay. Note that this approach
assumes a static (non-mobile) network.
Our scheme achieves the same basic goals as RAP, packet prioritization based on both
distance and deadline, but does so without the strong localization requirement. Although
our location-free design requires more communication overhead than RAP, the additional
cost is minimal since timestamp information is piggybacked on the standard packets involved
in two-phase directed diffusion path discovery. No additional packets are required, only
a slightly increased packet size. This trade-off may be advantageous for applications in
which location information is not available. While time synchronization is a non-trivial
task, it can be accomplished without additional hardware and, therefore, may be more
desirable than equipping each node with an expensive GPS receiver. RAP also assumes
time synchronization in order to compute elapsed time in DVM. Thus, our time-based
distance estimation scheme eliminates the strong location-knowledge requirement of RAP
without adding significant overhead.
81
Chapter 6
Network Services Design
6.1 Design Principles
In designing the sense and response architecture, we were guided by several principles.
Our primary design goals were energy-efficiency, scalability, localization, distributability,
real-time communication, and reactivity. In this section, we describe each of these design
goals and explain their implications on our design decisions.
6.1.1 Energy-efficient
Power consumption is of paramount importance in sensor networks since devices usually
cannot easily be recharged. Consequently, our design incorporates features that promote
energy-efficiency. The primary purpose of passive clustering and local flooding involved in
DD-Repair is to save energy by minimizing communication. Communication is the primary
energy expense in sensor networks, so reduced transmissions equate to saved energy. Passive
clustering reduces the number of messages involved in route creation and local repair reduces
the number of messages required for route maintenance. The design of DD-RAP also
emphasizes energy-efficiency in that no new packets are required. Data transfer for setting
up the real-time protocol is piggybacked on top of the standard two-way message exchange
used by diffusion for route discovery/refresh. All three protocols work toward the goal of
energy-efficiency.
82
6.1.2 Scalable
Another general goal of our system and all sensor networks is scalability. Protocols
should scale up to networks with large numbers of nodes. The reliance of standard diffusion
on global flooding greatly limits the scalability of the protocol. Passive clustering and
localized flooding both address the goal of scalability by reducing unnecessary transmissions.
As the network size increases, these unneeded communications lead to network congestion.
With sufficiently large networks, flooding-induced congestion can completely cripple the
network. The flooding reduction mechanisms support the goal of scalability.
6.1.3 Localized
One of the best properties of directed diffusion is its completely localized nature. We
wanted to preserve this characteristic in the design of our protocols because of its inherent
advantages. Localized protocols maintain information about one-hop neighbors. This sig-
nificantly reduces the amount of state information that must be stored and hence increases
the scalability of the algorithm. The principle of localization also tends to decrease the
complexity of the algorithm, simplifying design and implementation. Since the protocols
are deeply integrated into diffusion, they necessarily inherit many of the localized aspects of
diffusion. Passive clustering exhibits this localized nature in that nodes choose their state
based only upon the states of their one-hop neighbors. DD-Repair localizes its repair work
to the area immediately surrounding the failed node. Thus, the advantages of localized
interactions are preserved by our algorithms.
83
6.1.4 Distributed
The distributed nature of sensor networks is one of their most challenging and powerful
characteristics. The goal is to fully distribute the algorithms over all the nodes in the
network. This is in contrast to typical algorithms which utilize the client-server model.
Our protocols address this challenge by distributing tasks among all the nodes. This is
particularly evident in passive clustering where a clustered structure is created dynamically
according to the first declaration wins rule. Cluster formation is completely distributed.
In the local repair protocol, any node may potentially detect and repair a breakage. In
this sense, all nodes act as adaptation servers forming a completely distributed adaptation
service. The design of our protocols emphasizes distributed computation.
6.1.5 Real-Time
Timely communication is an important goal for many applications. Our extension
of the RAP protocol to diffusion aims to achieve the goal of real-time communication.
The prioritized queue gives preference to more urgent packets, helping them meet their
deadlines. The real-time protocol also allows for hard deadlines by dropping packets when
their deadline has been exceeded. This further reduces congestion and gives other packets
a greater chance of meeting their deadlines.
6.1.6 Reactive
Our final design goal is to develop reactive protocols and mechanisms. In general,
reactive protocols are more suitable to sensor networks than proactive protocols. In energy-
constrained systems, it makes little sense to continuously maintain routes if no data is
84
being transferred over those routes. This is especially relevant in sensor systems which
infrequently detect events of interest. Passive clustering and the local repair protocol par-
ticularly incorporate reactive features in their design. Passive clustering reactively creates
clusters in response to and through the use of ongoing packet transmissions. The local
repair algorithm responds to repair broken links by reactively repairing them.
6.2 Design Decisions
6.2.1 Clustering Mechanism
The compatibility of passive clustering and diffusion was recognized by Handziski et.
al [63] who first implemented passive clustering over directed diffusion. Passive clustering is
especially well-suited to sensor networks due to its localized, reactive, and fully distributed
nature. All state changes are made using local knowledge gained by overhearing neighboring
nodes. Passive clustering?s reactive cluster formation is well-matched to diffusion?s reactive
publish-subscribe model. Furthermore, the fully distributed nature of passive clustering
maps nicely to any sensor network routing protocol, especially diffusion. No client/server
message exchanges are needed. Passive clustering does not need periodic messages, but
instead takes advantage of existing packets. Finally, the protocol is very resource-efficient
regardless of the size or density of the network.
6.2.2 Repair Mechanism
Break Detection
While break detection was not the focus of our research, we considered several potential
methods for detecting a broken link. One approach is to calculate an expected time for data
85
events to be received from each neighbor. The source may monitor its output and calculate
an outgoing event rate for each gradient. It will then append this event rate to each data
packet. Intermediate nodes need only to store the data rate of the packet and the time it
was received. If the interval has been exceeded, a break will be detected. Another strategy
for break detection is for intermediate nodes to calculate expected receive times based on
incoming packet reception rates. Event intervals are calculated by monitoring the input
from each neighbor at intermediate nodes. The advantage of this approach is that it is
completely localized and distributed. Another possibility is to use physical layer metrics
(e.g. signal strength, or SNR) to detect failing links. Since we did not have access to
physical layer metrics, we used the constant event rate method. Given event rate Tr, we
detected link breakage when a packet was not received in 1.5 ? Tr. More complex event
interval determination algorithms should be investigated for production systems.
Break Localization
Break localization may be carried out differently depending on the break detection
scheme used. If physical layer metrics are used to identify breakages, then our break local-
ization scheme may be completely unnecessary. Our design assumes no access to physical
layer information and no explicit network layer acknowledgments. Instead, our scheme uti-
lizes an implicit timeout procedure to detect link failure. The advantage of this approach is
that it may be broadly used regardless of physical or MAC layer differences. Since no phys-
ical or MAC layer assumptions are made, our repair protocol will support a wide variety of
architectures.
86
Local Gradient Repair
In order to restrict the flooding required to find new paths during local gradient repair,
we restrict packet forwarding in one of several ways. Flooding boundaries may be found
using the clusters created through the passive clustering protocol, a simple n-hop flood, or
the failed node ID. These mechanisms restrict the flooding of interest and exploratory data
to a limited area. Another mechanism for local flooding is to append a hop radius to the
message and drop it when the hop count reaches zero. Finally, flooding may be restricted
to the one hop neighbors of the failed node, i.e., only nodes that have heard from the failed
node will forward the interests and exploratory data.
Our design separates the local flooding algorithm from the repair protocol. By decou-
pling the local flooding mechanism from the repair protocol, we gain several benefits. First,
from a software engineering perspective, the modularized software is easier to write, debug,
and maintain. Secondly, our repair algorithm can be paired with any local flooding strat-
egy. Thus, our design may easily be extended by more complex local flooding algorithms.
Thirdly, the decoupling allows multiple local flooding algorithms to be used simultaneously.
Multiple algorithms could be stacked on top of each other or used in different regions of
the same network. For example, an algorithm could be adaptively chosen based on local
conditions, e.g., network density or congestion.
6.2.3 Real-time Communication Mechanism
The design of the real-time communication protocol is directly modeled after RAP.
SVM and DVM scheduling policies from RAP have been ported to diffusion with no change
87
in design. The two time-based scheduling policies were inspired by RAP, but involved
moderate redesign.
Prioritized Queue The prioritized queue is the essential component of DD-RAP. We
recognized three options for the implementation of a prioritized queue in the diffusion API.
1. Delay-Based: Set event delay times in proportion to priority levels.
2. Block-Based: Prioritize all events that have expired during the receiving period.
3. Timer-Based: Send the highest priority packet after the expiration of a recurring send
timer.
The delay-based scheme uses priority to compute the delay time associated with a
particular packet. The delay time is inversely proportional to the priority. Hence, high
priority packets have a lower delay while low priority packets are assigned a longer delay
time. The second option is to prioritize blocks of packets that have expired timers. Diffusion
switches from receiving and sending when there are no packets to receive. It then processes
all expired timers, i.e., all packets in the event queue with expired send times. This block
of packets is usually sent in the order it was received. Using the block-based approach, the
block of expired packets will be sent in priority order. Finally, the timer-based approach
uses a timer to repeatedly send one packet every ttimer milliseconds. When packets are
received, they are added to a queue in priority order. In a separate thread, a recurring
timer sends the highest priority packet each time the timer expires.
The delay-based approach has one major drawback ? it delays low priority packets when
there are no high priority packets to be sent. Hence, low priority packets are penalized
88
unnecessarily and experience greater latency as a result. The block-based scheme also
presents significant difficulties. It prioritizes within very small blocks of packets because
the network so frequently switches between send and receive mode. This leads to a very
small granularity of prioritization which may be almost imperceptible to the application.
The prioritization occurs at such a minute level that it provides no benefit whatsoever.
Because of these problems, we have chosen to implement the timer-based algorithm. While
this results in the addition of a timeout parameter, it is superior to the other two approaches
since it effectively prioritizes packets without unnecessarily delaying low priority packets.
We describe the implementation this approach in more detail in Section 7.3.3.
SVM-t SVM-t, as described in Section 5.3.2, computes a priority based on the deadline as
given by the application and the distance from source to sink as estimated by communication
delay. Although several message exchanges occur in diffusion route creation, we choose to
measure the time delay of the reinforcement message as it is sent from sink to source. There
are several benefits of this choice. First, it conserves bandwidth by appending timestamps
to non-flooded packets. If interests or exploratory data were used to measure the time delay,
then the timestamps would have to be flooded to the entire network. In a dense network,
congestion may result from flooding and thereby increase the end-to-end delay. Secondly, the
use of reinforcements as timing messages simplified the implementation since only one type
of packet needed to be processed differently. The use of reinforcement messages assumes
symmetry of the end-to-end path. We assume the time required for reinforcement messages
to travel from sink to source is the same as for exploratory data messages to travel from
source to sink. This assumption is warranted since it symmetric links are already assumed
by diffusion.
89
DVM-t DVM-t re-calculates the priority of a packet at each intermediate node based
on the elapsed time (Section 5.3.3). As the most complicated case, DVM-t posed the
most design options. The primary problem was in calculating the time to the sink (the
numerator in Equation 5.3). We considered two approaches: stateful and stateless. In the
stateful approach each intermediate node along the path from sink to source must maintain
a time value for every data flow passing through it. The value may be computed by storing
tnow ? tsink for each reinforcement message received. These values should be indexed by
data flow. In the stateless approach, all the information needed to compute priority is stored
within each data packet. The time to the sink is estimated by subtracting the elapsed time
from the total end-to-end time, tnow ?tsink ?Telapsed. End-to-end time is computed using
the same procedure as SVM-t, and elapsed time is measured with a timestamp on each
outgoing data message.
The stateful approach is more complex and requires greater storage requirements per
node. However, it provides a better estimate of the time to the sink. The stateless algorithm
is simpler to implement and more scalable but may be less accurate. We chose the stateless
approach due to its simplicity and scalability. It provides sufficient accuracy, especially over
longer durations. These issues may be further investigated in future work.
90
Chapter 7
Network Services Implementation
7.1 Clustering Mechanism
We implemented passive clustering using the Filter API provided by ISI diffusion [12].
This required the creation of a new diffusion attribute and a program with two filters. The
attribute was used to relay information about the passive clustering state of each node
among neighboring nodes. The filters intercept packets and update state information based
on the PC state of the previous hop.
7.1.1 Passive Clustering Attribute
In order to communicate information about the passive clustering state of each node,
we defined the class PCInfo t to encapsulate all the passive clustering status information
including state, node ID, and the cluster information. The PCInfo t class is defined and
explained below.
class PCInfo_t \{
int nodeID;
int state;
int CH1;
int CH2;
timeval ts;
void fromAttr(NRSimpleAttribute<void *> *attr)
};
? nodeID - The diffusion ID of the node
91
? state - An integer representing the external passive clustering state of the node
? CH1 - The ID of the node?s primary cluster head
? CH2 - The ID of the node?s secondary cluster head; For CH and OR nodes, CH2=-1
? ts - A timestamp corresponding to the last received message from the node
? void fromAttr(NRSimpleAttribute<void *> *attr) - Converts attribute attr to
a PCInfo t object.
To transport the PC information in diffusion packets, we pack the PCInfo t object into
a PC attribute and push the attribute onto the attribute vector of the message. The PC
attribute is defined as follows:
#define PC_STATE_KEY 5000
NRSimpleAttributeFactory<void *> PCAttrFactory(PC_STATE_KEY,
NRAttribute::BLOB_TYPE);
Upon reception of packets with this attribute, nodes unpack the data into a PCInfo t
object using its fromAttr() method. Nodes maintain a table containing PC information
about each of their neighbors. The neighbor state table is updated on the reception of
every flooded packet. The neighbor table is implemented in the class PCNodeList as shown
below. The PCNodeList class stores information about the passive clustering state of neigh-
boring nodes as an STL list of PCInfo t objects. The core functionality provided by the
class includes adding/updating neighbors as well as counting the number of neighbors in a
particular state. The PCNodeList class also has helper methods that analyze the neighbor
list to determine appropriate state changes.
92
class PCNodeList{
list<PCInfo_t> neighbors;
/* Core methods */
void update(int nodeID, PCInfo_t pcInfo);
int count(int state);
/* Helper methods */
bool anyTwoCHsNotConnected(int &ch1, int &ch2);
bool remoteDistGWexists(int ch1, int &ch2);
};
? update(int nodeID, PCInfo t pcInfo) - Adds/updates PC information of node
nodeID with PC status pcInfo to the neighbor list.
? count(int state) - Returns the number of nodes in state state in the neighbor list.
? anyTwoCHsNotConnected(int &ch1, int &ch2) - Returns true if any pair of known
CHs are not connected by a GW and false if all pairs are connected. If a disconnected
pair exists, ch1 and ch2 are set to their IDs.
? remoteDistGWexists(int ch1, int &ch2) - Return true if a remote distributed
gateway for cluster head ch1 exists. If true, ch2 is set to the cluster ID of the remote
gateway.
7.1.2 Passive Clustering Filters
The PCInfo t and PCNodeList classes are extensively in the implementation of the
passive clustering filter program. Our implementation utilized two filters in one program
to intercept messages before and after the gradient filter. In the pre-gradient filter, nodes
update their internal state to CH READY or GW READY according to the state of their
neighbors. If no other CHs have been overheard, a node will enter CH READY. Otherwise,
93
it will enter GW READY. Algorithm 1 summarizes the internal state update process which
is executed when packets are received.
Algorithm 1: Incoming Packet Processing for Passive Clustering
Input: Message, Neighbor State Array
Output: Internal State
switch Message.State do
case INITIAL
if num(CH) = 0 then
InternalState = CH READY
else
InternalState = GW READY
break
case CH
if myID < Message.ID then
InternalState = CH READY
else
InternalState = GW READY
break
case FULL GW
InternalState = GW READY
break
case DIST GW
InternalState = GW READY
break
case CH READY
InternalState = GW READY
break
end
In the post-gradient filter, the external state of the node is determined and added to
the outgoing packet. Nodes in CH READY become CHs if they have not heard from any
other CHs. Nodes in GW READY enter FULL GW, DIST GW, or OR depending on the
number of CHs and GWs among their neighbors. Algorithm 2 concisely summarizes the
outgoing packet processing.
94
Algorithm 2: Outgoing Packet Processing for Passive Clustering
Input: Internal State, Neighbor State Array
Output: External State
if InternalState = CH READY then
if num(CH) = 0 then
ExternalState = CH
else if InternalState = GW READY then
InternalState = GW READY
if InternalState = GW READY then
if num(CH) > 1 then
if Any two CHs are not connected by any known gateway then
ExternalState = FULL GW
else
ExternalState = OR
else if num(CH) = 1 then
if A DIST GW has announced another CH other than my CH then
ExternalState=DIST GW
if No DIST GW exists for my CH then
ExternalState = DIST GW
else
ExternalState = OR
else
ExternalState = CH
If the external state has been determined to be OR, then the interest or exploratory data
packet will be dropped. This is accomplished in the Filter API by simply not forwarding
the message back to the diffusion core. Thus, only nodes in the state CH, DIST GW, and
FULL GW are allowed to forward interests and exploratory data. Nodes in OR will receive
and process packets, but not forward them. Note that Algorithm 2 and the gateway selection
heuristic (Equation 5.1) represent different implementations of the same mechanism. Our
implementation, although more complex, requires no parameter tuning for ? and ?.
95
7.2 Repair Mechanism
The path repair mechanism, DD-Repair, is implemented using new attributes and the
Filter API. DD-Repair is one program, but the local flooding mechanism is implemented
separately. This decoupling allows any local flooding algorithm to be used in combination
with the core repair protocol. In this section, we describe the additional attributes used by
DD-Repair and explain the details of its implementation.
7.2.1 DD-Repair Attribute
We define two new attributes that correspond to the new messages introduced by our
protocol. The Repair Notification attribute marks a data packet as a repair notification.
The Reconnect attribute is added to reconnect interests and reconnect exploratory data
to differentiate them from standard route discovery packets created by diffusion. The new
attributes are listed below.
#define REPAIR_NOTIFICATION_KEY 4500
#define RECONNECT_KEY 4501
NRSimpleAttributeFactory<char *> RepairNotificationAttr(
REPAIR_NOTIFICATION_KEY, NRAttribute::STRING_TYPE);
NRSimpleAttributeFactory<int> ReconnectAttr(RECONNECT_KEY,
NRAttribute::INT32_TYPE);
7.2.2 DD-Repair Filters
The majority of local repair is implemented in one program with two filters: a pre-
gradient filter and a post-gradient filter. The pre-gradient filter handles the bulk of the
work. It sets data timeouts after each data packet is received. If the data timeout expires
96
without receiving a new data packet, repair notifications are sent one-hop downstream, and
a repair notification timer is set. If it expires and no repair notification is received, the
node becomes the proxy sink and floods interest messages with a reconnect attribute set
to the failed node ID. The post-gradient filter maintains a list of upstream neighbors and
downstream neighbors in order to detect link breakages. This filter also drops reconnect
exploratory data at the proxy sink and reconnect reinforcements at the proxy source.
Interests and exploratory data are normally created by the diffusion routing (dr) object.
To support reactive repair we added the following two new methods to the dr class:
? int reconnectPublish(NRAttrVec *attrs)
? int reconnectSubscribe(NRAttrVec *attrs)
These methods allow us to send reconnect packets when a break is detected. The DD-
Repair filter at the proxy sink invokes reconnectSubscribe() to send reconnect interests,
and the proxy source invokes reconnectPublish() to transmit reconnect exploratory data
in response to reconnect interests. The proxy sink sends a reconnect reinforcement mes-
sage to the first neighbor that sent it a reconnect exploratory data message. Algorithm 3
summarizes the logic involved in the pre-gradient filter.
97
Algorithm 3: Pre-Gradient Processing for DD-Repair
if Repair Notification then
Update state as not proxy sink
else if Data Message then
Update receive data time
Reset expected data timer
else if Reconnect Interest then
if On Original Data Path and Still Receiving Data then
Become Proxy Source
Flood Reconnect Exploratory Data
else
Forward Reconnect Interest
else if Reconnect Exploratory Data then
if ProxySink then
if ! Received Reconnect Exploratory Data then
Send Reconnect Reinforcement
else if Received Reconnect Interests then
Forward Reconnect Exploratory Data
else
Drop Reconnect Exploratory Data
else
Forward Message
7.2.3 Local Flooding
DD-Repair makes no attempt to restrict the flooding of reconnect packets. To handle
this task, we have written two additional programs. The node ID filter creates a list of
neighbors from which a node has received any message. It drops flooded reconnect messages
that contain a reconnect attribute with a failed node ID not in the list of known neighbors.
This essentially restricts reconnect messages to the one-hop neighbors of the failed node.
In a sufficiently dense network, a path back to the data flow may be found in this group of
98
nodes. In sparse networks, ID-based local flooding may restrict flooding to such a degree
that repair is impossible.
A second simple method to limit the flooding of reconnect messages is to limit the
number of hops a packet may travel. The hop filter appends a hop attribute containing
the maximum number of hops that the packet may travel. Packets from foreign hosts are
dropped if the decremented hop count reaches zero. Otherwise, messages are forwarded with
a decremented hop count. This approach works in sparse networks as long as a sufficiently
large initial hop count is used. The main disadvantage of the hop-count algorithm is the
difficulty in selecting an appropriate value for the maximum number of hops. In dense
networks, a large hop count may result in local flooding which is very expensive. Perhaps
in the future, the hop count could be adaptively set based on the connectedness of a node.
Thus, highly connected nodes would use smaller hop counts (or the ID-based algorithm)
and nodes with very few neighbors would use larger hop counts.
7.3 Real-Time Communication Mechanism
Like the other network services, the real-time communication protocol, DD-RAP, was
implemented in the diffusion API using attributes and filters.
7.3.1 DD-RAP Attributes
We defined several new attributes for DD-RAP. These include attributes for deadline,
priority, DVM, SVM-t, and DVM-t. SVM only needs a priority value so no explicit attribute
is necessary. To utilize DD-RAP, applications simply have to add a deadline attribute
99
to their publication definition. The deadline is an integer representing the deadline in
milliseconds. The definition of the deadline attribute is shown below.
#define TIME_DEADLINE_KEY 7001
NRSimpleAttributeFactory<int> TDeadlineAttr(TIME_DEADLINE_KEY,
NRAttribute::INT32_TYPE);
Several other attributes are necessary for the functioning of DD-RAP itself. We defined
one attribute for each variant of DD-RAP. These attributes encapsulate the data that must
be communicated by each version of the protocol. DD-RAP uses the state information in
each version?s attribute to compute the priority and write it to the priority attribute. Thus,
the priority attribute is used by all four versions of DD-RAP. These DD-RAP attribute
definitions are shown below.
#define PRIORITY_KEY 7002
#define DVM_KEY 7003
#define SVM_T_KEY 7004
#define DVM_T_KEY 7005
/* SVM, DVM, SVM-t, and DVM-t */
NRSimpleAttributeFactory<float> PriorAttr(PRIORITY_KEY,
NRAttribute::FLOAT32_TYPE);
/* DVM */
NRSimpleAttributeFactory<void *> DVMAttr(DVM_KEY, N
RAttribute::BLOB_TYPE);
/* SVM-t */
NRSimpleAttributeFactory<void *> SVMtAttr(SVM_T_KEY,
NRAttribute::BLOB_TYPE);
/* DVM-t */
NRSimpleAttributeFactory<void *> DVMtAttr(DVM_T_KEY,
NRAttribute::BLOB_TYPE);
100
As the simplest case, SVM requires only one floating point number to be calculated at
the source and sent to the destination ? no state information is needed. The other three
protocols (DVM, SVM-t, DVM-t) use the priority attribute to store the calculated priority
but also require another attribute to communicate state information. The DVM attribute
contains the time the data packet was sent from the source. The SVM-t attribute holds a
structure with the two timestamps used to compute end-to-end delay. The DVM-t attribute
contains three timestamps representing the end-to-end delay timestamps and the time the
data packet was sent. The definitions of the structures for SVM-t and DVM-t are shown
below.
typedef struct RAP_SVM_T {
EventTime ts_src_expdata;
EventTime ts_snk_reinforcement;
}RAPSvmT;
typedef struct RAP_DVM_T {
EventTime ts_src_expdata;
EventTime ts_snk_reinforcement;
EventTime ts_src_data;
}RAPDvmT;
7.3.2 DD-RAP Filters
Similar to passive clustering and the repair protocol, DD-RAP is implemented using
a pre-gradient filter and a post-gradient filter. Each variant essentially performs the same
three steps. We explain the differing details of each version in the following subsections.
1. Add DD-RAP information to packets at sink.
2. Extract DD-RAP information from packets at source.
101
3. Compute priority and append it to outgoing data packets.
SVM
In SVM, the sink adds its location information (latitude and longitude) to outgoing
interest packets. The source extracts the location information and uses it along with the
deadline supplied by the application to compute the priority according to Equation 4.1.
This priority is stored in the priority attribute by the source and used to prioritize the data
packet at each intermediate hop.
DVM
In DVM, the sink adds its location information to outgoing interest packets. The
source extracts the location information and uses it along with the deadline supplied by the
application to compute the priority according to Equation 4.2. The source also timestamps
the packet so that intermediate nodes can compute the elapsed time. Upon receiving a data
packet, intermediate nodes extract the location of the sink and the timestamp. They use
this information along with their location to update the priority again using Equation 4.2.
SVM-t
In SVM-t, the source adds a timestamp to outgoing exploratory data and the sink adds
a timestamp to outgoing reinforcement messages. When the source is ready to send data,
it computes the difference of these two timestamps and divides it by the deadline to find
the priority (Equation 5.2). The priority is stored in the priority attribute and used at each
intermediate hop for queue prioritization.
102
DVM-t
In DVM-t, the source adds a timestamp to outgoing exploratory data and the sink
adds a timestamp to outgoing reinforcement messages. The source appends a timestamp to
the data packet corresponding to the time it was sent. The time difference and deadline are
used to compute the initial priority, which is written to the priority attribute. Intermediate
nodes subtract the elapsed time from the end-to-end delay and deadline to update the
priority at each hop (Equation 5.3).
7.3.3 Prioritized Queue
We implemented the priority queue using the timer-based approach introduced in Sec-
tion 6.2.3. The implementation involved changing the typical behavior of diffusion filters
which forward messages from the receive thread. Instead, we added each received message
to a priority queue. The queue was implemented as a linked list of PriorityQueueEvent
objects as defined below.
class PriorityQueueEvent{
public:
Message *msg;
handle h;
double priority;
PriorityQueueEvent *next;
};
The queue itself has two member variables: a pointer to the head of the queue and
a mutex for thread safety. Each data message received by the receive thread of the DD-
RAP filter was added to the queue using the insert() method. The run method() of
103
the DD-RAP filter acted as the sending thread. It simply dequeued the first element in
the priority queue (the highest priority message) and sent it. The mutex was necessary to
lock the queue for insert and dequeue operations since they were performed by different
threads. The definition of the PriorityQueue class is shown below. Besides the basic,
insert and dequeue operations, we also wrote several utility methods isEmpty(), print(),
and length().
class PriorityQueue {
PriorityQueueEvent *head_;
pthread_mutex_t mut;
public:
PriorityQueue();
void insert(Message *msg, int handle, double priority) ;
PriorityQueueEvent * dequeue();
bool isEmpty();
void print();
int length();
};
After the highest priority packet has been sent, the send thread sleeps for ttimer mil-
liseconds before sending another packet. The value of ttimer determines the granularity of
prioritization and affects the maximum bandwidth of the network. Larger values result
in greater amounts of prioritization at the cost of throughput. Smaller values of ttimer
allow more packets to be sent, but also reduce the amount of prioritization possible. An
appropriate value should be set based on the bandwidth needed for the application.
104
Chapter 8
Performance Evaluation
8.1 Clustering Mechanism
We evaluated our implementation of the passive clustering protocol using two metrics:
flooding efficiency and delivery effectiveness. The primary purpose of passive clustering is
to increase the efficiency of flooding. Our performance results show significant improvement
over standard diffusion in this respect. A secondary benefit of passive clustering is increased
delivery effectiveness. The packet delivery rates of diffusion may be adversely affected by the
congestion caused by flooding. Since passive clustering reduces flooding-induced congestion,
delivery rates can be improved by the use of passive clustering. In this section, we explain
the experimental setup used to evaluate passive clustering and then describe its performance
with respect to the two metrics.
8.1.1 Experiment Setup
To test the performance of passive clustering, we generated topologies with n nodes
randomly dispersed over a 100 m x 100 m field. The nodes had a transmission radius r
which determined the connectedness of the topology. The number of nodes n ? {25, 50, 75,
100, 150, 200}. The value of the transmission radius r ? {10, 20, 30, 40} m. We simulated
directed diffusion and passive clustering for 600 seconds on the same random topology for
each of the 24 combinations of n and r. A constant bit rate sender application sent one 1024
B packet each second to one receiver application for the entirety of the simulation time.
105
8.1.2 Flooding Efficiency
Primarily, passive clustering provides improved flooding performance by reducing the
number of redundant transmissions. To evaluate flooding efficiency, we measured the total
number of interest and exploratory data packets transmitted during the simulation. The
total number of interest packets transmitted during the simulation is plotted against the
number of nodes (n) for varying values r in Figure 8.1. Passive clustering provided a
significant improvement over standard diffusion in every case. Note that for r = 10 (Figures
8.1(a) and 8.2(a)), none of the topologies were connected except for n = 200. Likewise in
Figures 8.1(b) and 8.2(b) the topology generated with n = 25 was not connected.
Similarly, the exploratory data packets also show significant reduction because of pas-
sive clustering. Figure 8.2 illustrates the improvement in flooding efficiency gained by
passive clustering.
A more complete picture can be seen when plotted in three dimensions. Figure 8.3
shows the number of interest packets transmitted versus n and r. Again, notice that passive
clustering is always more efficient than standard diffusion. Figure 8.4 shows the exploratory
data messages plotted against n and r.
Passive clustering provides dramatically increased flooding efficiency on the order of
50% fewer flooded packets over all network sizes and transmission radii. On average, the
number of interest packets was reduced by 55% and the number of exploratory data mes-
sages was reduced by 59%. This flooding reduction directly correlates to energy savings since
fewer transmissions mean less power is consumed. Figure 8.5 shows the average flooding
106
(a) r = 10 (b) r = 20
(c) r = 30 (d) r = 40
Figure 8.1: Number of interest messages versus number of nodes for transmission radius (r)
reduction percentage of interests and exploratory data for each network size (n). Reduc-
tion is computed as NDD?NPCNDD where NDD is the number of packets required for directed
difffusion and NPC is the number of packets transmitted by passive clustering.
8.1.3 Delivery Effectiveness
A secondary benefit of passive clustering is increased delivery effectiveness in large,
highly-connected networks. To measure delivery rates, we computed the delivery ratio, i.e.,
107
(a) r = 10 (b) r = 20
(c) r = 30 (d) r = 40
Figure 8.2: Number of exploratory data messages versus number of nodes for transmission
radius (r)
the number of packets received versus the number of packets sent. We included a sequen-
tially increasing number in each packet and compared the sequence numbers received to
the last number sent. This was necessary because diffusion delivers duplicate data packets,
making it impossible to compare the raw number of data packets sent and received. The
delivery ratio for varying numbers of nodes n at each value of r is shown in Figure 8.6.
Generally, passive clustering was at least equivalent to diffusion and often performed bet-
ter than diffusion. A performance advantage was present for highly-connected topologies
108
Figure 8.3: Number of interest messages versus number of nodes (n) and transmission radius
(r)
because of the congestion. The effects of flooding-induced congestion are more prevalent
in highly dense networks. Since passive clustering reduces the number of flooded packets,
it reduces congestion and allows more data packets to be delivered. For topologies with
fewer than 100 nodes, the difference in performance between diffusion and passive cluster-
ing was not significantly different. However, in very dense networks (n ? {150,200} and
r ? {30,40}) passive clustering performed appreciably better. The performance difference
was 5-10%.
Passive clustering returned delivery ratios on par with standard diffusion and incremen-
tally improved delivery rates in dense networks. Thus passive clustering provides significant
109
Figure 8.4: Number of exploratory data messages versus number of nodes (n) and trans-
mission radius (r)
advantages in terms of both flooding efficiency and delivery effectiveness. These results im-
ply greater potential scalability of the network and greater robustness of performance. Our
results corroborate with previous work with PC over diffusion [63] in term of improved
flooding efficiency and delivery effectiveness.
8.2 Repair Mechanism
We evaluated the performance of DD-Repair by measuring the delivery ratio for data
packets and the overhead associated with the local flooding. Since diffusion has no reactive
repair mechanism, our addition provided significant gains in terms of delivery effectiveness.
The cost of repair was generally low given the restricted flooding strategies we employed.
110
Figure 8.5: Average percent flooding reduction for each network size (n)
We explain the experimental scenario and discuss the performance of DD-Repair in terms
of delivery effectiveness, repair time, and overhead.
8.2.1 Experiment Setup
To test DD-Repair we used a constant bit rate application that sent one 1024 B packet
each second to one receiver application. We used topologies with n nodes where n ? {16,
64, 100}. Each value of n was tested on topologies with an average number of neighbors d
where d ? {4,6}. We measured the delivery ratio (number of data packets received/number
data packets sent) over a period of 60 seconds. This time period was chosen to correspond
with one gradient refresh cycle. Half way through this interval (t = 30 seconds), we killed
all processes running one intermediate node in the data flow. Since data packets were
111
(a) r = 10 (b) r = 20
(c) r = 30 (d) r = 40
Figure 8.6: Delivery ratio versus number of nodes for transmission radius (r)
sent at a rate of one per second, the delivery ratio directly measures the time required to
repair the broken link. In the case of diffusion, no packets were delivered after the breakage
since its repair mechanism does not execute until the end of the gradient refresh cycle.
We also measured the network traffic involved in each run of the simulation. We were
specifically interested in the flooded packets (interests and exploratory data) since they are
most relevant to gradient repair.
112
8.2.2 Repair Effectiveness
To gauge the effectiveness of repair, we measured the delivery effectiveness in the face of
node failure. We killed one intermediate node on a one-to-one data flow and calculated the
number of data packets dropped during route repair. Figure 8.7 shows the delivery ratio
for each repair/flooding algorithm. Standard directed diffusion corresponds to periodic
repair/global flooding (PR-GF). Reactive repair (RR) means that the protocols use the
break detection mechanism. The possible strategies for flooding include global flooding
(GF), hop-based (Hop), and ID-based (ID).
Figure 8.7: Delivery ratio for each repair/flooding algorithm
As expected, the reactive repair strategies performed significantly better than the pe-
riodic approach of standard diffusion. The global flooding and hop-based reactive repair
were able to successfully repair the path and, as a result, were both able to deliver about
113
95% of the packets. They have similar delivery ratios because they both found a node on
the original data path still receiving data (a proxy source). The ID-based flooding algo-
rithm prevented successful path repair on degree 4 topologies because it overly restricted
the flooding of reconnect packets, and thus, was unable to find a proxy source. It returned
delivery ratios similar to periodic repair since it reverted back into that mode of repair
when reactive repair failed. In the degree 6 topologies, the ID-based algorithm was able to
reactively repair the path since the failed node had more neighbors and more paths to find
potential proxy sources. As far as delivery effectiveness is concerned, global flooding is the
safest option since it will always find a path to the data flow. Its flooding cost, however, is
most expensive. The hop-based approach trades repair probability for flooding overhead.
The overhead costs of each algorithm are discussed in Section 8.2.4.
8.2.3 Repair Latency
The latency of reactive repair is directly connected to the repair effectiveness. We
calculated the amount of time required to successfully repair the path after intermediate
node failure. The latency of repair for each repair/flooding algorithm is shown in Figure
8.8.
Not surprisingly, the reactive protocols performed much better than periodic repair.
They were able to repair the broken path in about 2-3 seconds after the breakage. The
only exception to this was in the case of ID-based flooding in the degree 4 topologies, where
the reactive repair was unsuccessful due to the low level of connectivity. In these scenarios,
the network was repaired with the periodic repair mechanism. For the most part, however,
114
Figure 8.8: Latentcy of repair for each repair/flooding algorithm
reactive repair was able to decrease the repair time by a factor of 10, from around 30 seconds
to about 3 seconds.
8.2.4 Packet Overhead
To understand the cost associated with the local flooding schemes, we measured the
number of flooded packets transmitted during the simulation. The primary overhead of
the repair algorithm is the additional reconnect interests and exploratory data. Figure 8.9
shows the total interests and exploratory data for each topology.
To understand the overhead relative to standard diffusion, Figure 8.10 shows the over-
head of each repair algorithm normalized with respected to diffusion. Figure 8.10 represents
115
Figure 8.9: Overhead (Interests+Exploratory Data) for each repair/flooding algorithm
the number of flooded packets sent by each protocol for each flooded packet sent by dif-
fusion (PR-GF) during one 60-second gradient refresh interval. Notice that the reconnect
interests and exploratory data add less than 10% overhead compared to diffusion. With
this additional overhead, significantly more data packets can be delivered.
To show the relationship between overhead and delivery ratio, we computed the ratio
of overhead packets to data packets received for each algorithm. This is shown in Figure
8.11. We divided the total number of interests and exploratory data by the number of data
messages which were successfully delivered to the sink. Smaller bars represent lower delivery
cost for data messages. Note that all three reactive repair algorithms outperform diffusion.
The ID-based local flooding mechanism sometimes had the same cost as diffusion because
it was unable to repair the path for low density topologies. Hop-based local flooding was
116
Figure 8.10: Normalized Overhead (Overhead Packets/Diffusion Overhead Packets) for each
repair/flooding algorithm
the best overall performer in that it always repaired the path with relatively few overhead
packets.
8.3 Real-Time Communication Mechanism
To evaluate the effectiveness of DD-RAP, we measured the on-time delivery ratio of
packets received at the sinks. DD-RAP shows significant improvement over standard diffu-
sion in terms of timely delivery during congestion. In this section, we describe the experi-
mental setup and compare the on-time delivery rates for diffusion and DD-RAP.
117
Figure 8.11: Overhead Packets/Data Packet for each repair/flooding algorithm
8.3.1 Experiment Setup
We tested DD-RAP using the topology shown in Figure 8.12. We created two data
flows that shared the same five bottleneck nodes. The bottleneck topology was chosen
in order to exaggerate the effects of congestion. The two sources ran a constant bit rate
application which generated one 1024 byte packet every T milliseconds where T = t ? j
where t ? {25,50,75,100,125} and j ? [?0.2 ? t,0.2 ? t]. We introduced the jitter term
j in order to create random queue lengths. Source 1 sent packets with a deadline of 115
milliseconds to Sink 1 while Source 2 gave packets a deadline of 165 milliseconds to reach
Sink 2. Since the distance between sources and sinks was the same for both flows, the
deadline differentiated the high priority flow (115 msec) from the low priority flow (165
msec).
118
Figure 8.12: DD-RAP network topology
In order to make an equivalent comparison, we wrote a filter to delay standard diffusion
packets the same amount as the DD-RAP filter delayed its packets. The delay filter was
identical to the DD-RAP filter except that a priority of 0.0 was assigned to all packets, thus
enforcing a FIFO ordering of the queue. The ttimer parameter of the priority queue was set
to 50 milliseconds for all the experiments. We evaluated DD-RAP at five different source
data rates. The source data rate is the rate at which each source generated data packets.
We denote this parameter as r where r ? {8,10,14,20,41} KB/s. These r values correspond
to sending one 1024 byte packet every t milliseconds where t ? {25,50,75,100,125}. We
ran each experiment 30 times to gain statistical confidence in the results. We report the
average of the 30 runs along with the standard error.
8.3.2 On-Time Delivery Ratio
The main performance metric relevant to a real-time communication protocol is the
on-time delivery ratio. This metric represents the ratio of packets that are received before
119
their deadlines to the total number of packets received by the sink. This metric measures
the effectiveness of the prioritization in helping packets meet their deadlines.
SVM
DD-RAP performed significantly better than standard directed diffusion for every
tested source data rate. The performance difference was small at the lowest and high-
est data rates but very appreciable in between these extremes. At low data generation rates
there is little congestion, so neither flow has trouble meeting its deadline. At the highest
data rate, congestion delays are so great that the packets miss their deadlines despite pri-
oritization. In the middle region, congestion exists, but does not dominate. As Figures
8.13 and 8.14 illustrate, DD-RAP (SVM) delivers significantly more packets on time in this
middle region (10-20 Kb/s) than diffusion. Figure 8.13 shows the on-time delivery ratio for
Flow 1, the high priority flow, and Figure 8.14 shows the on-time delivery ratio for Flow 2
which has a lower priority. The error bars in the plots show the standard error over the 30
independent runs.
Figure 8.15 shows the percent improvement of SVM over directed diffusion with respect
to on-time delivery ratio. Notice that the improvement for Flow 1 decreased as the data
rate increased, while that of Flow 2 remained near 60%. This occured because Flow 1 had
a smaller deadline, and thus DD-RAP had little flexibility in delivering these packets. As
the congestion delay increased, the possibility of delivering low deadline packets decreased
despite the efforts of DD-RAP.
120
Figure 8.13: On-time delivery ratios of Flow 1 (high priority) versus data rate (r) for SVM
and DVM
DVM
Like SVM, DVM consistently outperformed directed diffusion at every data rate (Fig-
ures 8.13 and 8.14 ). The performance difference also followed a similar trend: DVM
performed best relative to diffusion in the middle region (10-20 KB/s). Notice that DVM
was able to deliver 80-95% of the packets for both flows over this range of data rates. This
occurs for the same reasons as it did in SVM. Congestion delays exist but are not impossible
to overcome for these data rates.
121
Figure 8.14: On-time delivery ratios of Flow 2 (low priority) versus data rate (r) for SVM
and DVM
Figure 8.16 shows the relative improvement of DVM over standard diffusion. DVM
consistently performed 70-100% better than diffusion. This means that DVM delivered
almost twice as many packets on time.
Not surprisingly, DVM returned the best performance. Although their performance
was similar on the lowest two data rates, DVM significantly outperformed SVM for the
faster rates. The reason for this behavior is that DVM dynamically updates the priority at
each hop, thus adaptively calculating priority. DVM gives packets that have experienced
longer delays along the way a chance to change their priority and compensate for lost time.
Conversely, packets ahead of schedule yield for more urgent packets. In SVM, priorities
are set at the source and never modified. As a result, all Flow 1 packets receive the same
122
Figure 8.15: Percent improvement of on-time delivery ratio for SVM
preferential treatment by intermediate nodes. DVM gives higher preference to packets that
have been delayed more. Packets in Flow 1 will be given different priorities depending on
their elapsed time. This dynamic adaptation allows DVM to deliver more packets on time.
123
Figure 8.16: Percent improvement of on-time delivery ratio for DVM
124
Chapter 9
Conclusions and Future Work
We have proposed, implemented, and evaluated three network services for directed
diffusion. These network services improve and augment the standard directed diffusion
protocol. They improve flooding efficiency, provide localized route repair, and support
real-time packet delivery.
The primary contributions of our work are outlined below.
? Implementation of passive clustering for directed diffusion in the ISI filter API
? Design and implementation of a route repair mechanism for directed diffusion which
supports localized and reactive route repair
? Extension of RAP to directed diffusion and the elimination of its location-knowledge
requirement
These three network services support the overall goals of sensor networks. Flooding
efficiency and localized repair promote energy conservation by reducing unneeded trans-
missions. Our results demonstrated that passive clustering can decrease the number of
flooded packets by a factor of two or more. The reactive route repair mechanism improves
the robustness of the network by efficiently adapting to node failure. The repair protocol
delivers a significantly greater number of packets than diffusion. Both passive clustering
and DD-Repair improve the scalability of the sensor network since they reduce the cost and
scope of flooding. The real-time protocol, DD-RAP, gives application developers greater
125
control over time-critical communication, easing application development. All three net-
work services shield high level applications from the complexities of the underlying sensor
network.
There are many possibilities for future research in the area of network services for
diffusion. One of the most interesting directions is to consider the possible interactions
among the services. For example, the repair protocol could use the clusters created by
passive clustering to limit the flooding of repair packets. In this way, repair packets would
be restricted to the clusters adjacent to the failed node. Another possible interaction is
between DD-Repair and DD-RAP. If DD-RAP is extended to include a congestion detection
algorithm, DD-Repair could be used to find routes around congested regions. Repair would
be performed proactively in order to improve the throughput of slow links with the same
mechanism used to repair node failure. Finally, DD-RAP and passive clustering could
interact with each other to compose better clusters. If congestion data is maintained at
each node by DD-RAP, then it could be used to influence the creation of passive clusters.
A node prone to congestion might refrain from becoming a cluster head in order to reduce
its workload. Clearly, the interaction among the services could be very beneficial for the
network.
We also hope to investigate other efficient flooding protocols besides passive clustering.
The probabilistic and counter-based schemes seem particularly well-suited to the localized
nature of diffusion since they do not assume 2-hop neighbor information. If location is
known, the location-based efficient flooding approach may be useful. The flooding with
self-pruning algorithm would also be easy to implement over diffusion.
126
A very interesting future research area is the application of DD-Repair to mobile net-
works. Directed diffusion has traditionally been limited to static networks. The periodic
repair of standard diffusion cannot adapt broken routes caused by mobility. The reactive
repair provided by DD-Repair may overcome this issue by quickly repairing paths broken
by node mobility.
Another possible improvement is to eliminate the global time synchronization require-
ment of DD-RAP. This might be accomplished by using round-trip times to estimate dis-
tance instead of one-way time delays. Since diffusion utilizes a two-way route setup process
this could easily be implemented. More advanced time synchronization protocols could also
be incorporated into DD-RAP to provide more accuracy. We also plan to evaluate the
SVM-t and DVM-t protocols in the future.
Our three network services provide significant enhancements to directed diffusion in
several respects. We have increased the flooding efficiency of diffusion by augmenting it
with passive clustering. The local repair algorithm improves the robustness of diffusion in
the face of node failure without excessive flooding. We have enhanced diffusion by adding a
distance and deadline-aware real-time communication protocol. By leveraging the strengths
of diffusion and minimizing its weaknesses the network services significantly improve the
utility of the routing protocol.
127
Bibliography
[1] Mark Weiser. The computer for the twenty-first century. Scientific American, pages
94?10, September 1991.
[2] David Tennenhouse. Proactive computing. Commun. ACM, 43(5):43?50, 2000.
[3] B. Warneke, M. Last, B. Liebowitz, and K.S.J. Pister. Smart dust: communicating
with a cubic-millimeter computer. Computer, 34(1):44?51, January 2001.
[4] Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David E. Culler, and Kristofer
S. J. Pister. System architecture directions for networked sensors. In Architectural
Support for Programming Languages and Operating Systems, pages 93?104, 2000.
[5] K. Mani Chandy. Sense and respond systems. In 31st Annual International Conference
of the Association of System Performance Professionals, December 2005.
[6] Chalermek Intanagonwiwat, Ramesh Govindan, and Deborah Estrin. Directed diffu-
sion: a scalable and robust communication paradigm for sensor networks. In Mobile
Computing and Networking, pages 56?67, 2000.
[7] John S. Heidemann, Fabio Silva, Chalermek Intanagonwiwat, Ramesh Govindan, Deb-
orah Estrin, and Deepak Ganesan. Building efficient wireless sensor networks with
low-level naming. In Symposium on Operating Systems Principles, pages 146?159,
2001.
[8] J. Postel. Internet Protocol. RFC 791 (Standard), September 1981. Updated by RFC
1349.
[9] David B Johnson and David A Maltz. Dynamic source routing in ad hoc wireless
networks. In Imielinski and Korth, editors, Mobile Computing, volume 353. Kluwer
Academic Publishers, 1996.
[10] C. Perkins. Ad hoc on demand distance vector (aodv) routing, 1997.
[11] D. Estrin J. Heidemann C. Intanagonwiwat, R. Govindan and F. Silva. Directed diffu-
sion for wireless sensor networking. Networking, IEEE/ACM Transactions on, 11(1):2?
16, Feb. 2003.
[12] F. Silva, J. Heidemann, and R. Govindan. Network routing application programmer?s
interface, 2002.
128
[13] Alvin Lim. Distributed services for information dissemination in self-organized sensor
networks. Journal of Franklin Institute, 338:707?727, 2001.
[14] Xuan Yu. Implementation of lookup service protocols for self-organizing sensor net-
works. Master?s thesis, Auburn University, 2001.
[15] Qiao Shen. Distributed composition service for self-organizing sensor networks. Mas-
ter?s thesis, Auburn University, 2002.
[16] Udadyan Naik. Implementation of distributed composition service for self-organizing
sensor networks. Master?s thesis, Auburn University, 2005.
[17] Ye Wang. A dynamic adaptation service framework in self-organizing sensor networks.
Master?s thesis, Auburn University, 2002.
[18] Mark Ivester. Interactive and extensible runtime framework for execution and moni-
toring of sensor network services. Master?s thesis, Auburn University, 2005.
[19] Taek Jin Kwon, M. Gerla, V.K. Varma, M. Barton, and T.R. Hsing. Efficient flooding
with passive clustering-an overhead-free selective forward mechanism for ad hoc/sensor
networks. Proceedings of the IEEE, 91(8):1210?1220, Aug. 2003.
[20] Yungjung Yi, M. Gerla, and Taek Jin Kwon. Efficient flooding in ad hoc networks: a
comparative performance study. In Communications, 2003. ICC ?03. IEEE Interna-
tional Conference on, volume 2, pages 1059?1063vol.2, 11-15 May 2003.
[21] Brad Williams and Tracy Camp. Comparison of broadcasting techniques for mobile ad
hoc networks. In MobiHoc ?02: Proceedings of the 3rd ACM international symposium
on Mobile ad hoc networking & computing, pages 194?205, New York, NY, USA, 2002.
ACM Press.
[22] Sze-Yao Ni, Yu-Chee Tseng, Yuh-Shyan Chen, and Jang-Ping Sheu. The broadcast
storm problem in a mobile ad hoc network. In MobiCom ?99: Proceedings of the
5th annual ACM/IEEE international conference on Mobile computing and networking,
pages 151?162, New York, NY, USA, 1999. ACM Press.
[23] D. Aron and Sandeep K. S. Gupta. Analytical comparison of local and end-to-end
error recovery in reactive routing protocols for mobile ad hoc networks. In MSWIM
?00: Proceedings of the 3rd ACM international workshop on Modeling, analysis and
simulation of wireless and mobile systems, pages 69?76, New York, NY, USA, 2000.
ACM Press.
[24] Nada Hashmi, Dan Myung, Mark Gaynor, and Steve Moulton. A sensor-based, web
service-enabled, emergency medical response system. In EESR ?05: Proceedings of the
2005 workshop on End-to-end, sense-and-respond systems, applications and services,
pages 25?29, Berkeley, CA, USA, 2005. USENIX Association.
129
[25] Mitchell A. Cohen, Jakka Sairamesh, and Mao Chen. Reducing business surprises
through proactive, real-time sensing and alert management. In EESR ?05: Proceed-
ings of the 2005 workshop on End-to-end, sense-and-respond systems, applications and
services, pages 43?48, Berkeley, CA, USA, 2005. USENIX Association.
[26] S. Kapoor, K. Bhattacharya, S. Buckley, P. Chowdhary, M. Ettl, K. Katircioglu,
E. Mauch, and L. Phillips. A technical framework for sense-and-respond business
management. IBM Syst. J., 44(1):5?24, 2005.
[27] S. Haeckel. Adaptive Enterprise: Creating and Leading Sense-and-Respond Organiza-
tions. Harvard Business School Press, Cambridge, MA, 1999.
[28] Michael Zink, David Westbrook, Sherief Abdallah, Bryan Horling, Vijay Lakamraju,
Eric Lyons, Victoria Manfredi, Jim Kurose, and Kurt Hondl. Meteorological com-
mand and control: an end-to-end architecture for a hazardous weather detection sensor
network. In EESR ?05: Proceedings of the 2005 workshop on End-to-end, sense-and-
respond systems, applications and services, pages 37?42, Berkeley, CA, USA, 2005.
USENIX Association.
[29] A.I. Nakamura F. Gonzelez C. Meinig, S.E. Stalin and H.G. Milburn. Technology
developments in real-time tsunami measuring, monitoring and forecasting. In Oceans
2005 MTS/IEEE, Washington, D.C., September 2005.
[30] A.I. Nakamura H.B. Milburn C.Meinig, S.E. Stalin. Real-time deep-ocean tsunami
measuring, monitoring, and reporting system: The noaa dart ii description and disclo-
sure. Technical report, NOAA, 2005.
[31] E.N. Bernard C. Meifg M. Eble H.O. Mofjeld Gonzelez, F.I. and S. Stalin. The nthmp
tsunameter network. National Hazards, 35(1):25?39, 2005.
[32] Donna Casey. The thames barrier: Flood defence for london. Website,
http://www.environment-agency.gov.uk /regions/thames/323150/335688/341764/,
2006.
[33] Nova: Sinking city of venice. Website, http://www.pbs.org/wgbh/
nova/venice/gates.htm, October 2002.
[34] Josh McHugh. The lost city of venice. Wired, 11(8), August 2003.
[35] Hoang H. Tran Sammarco, Paulo and Chiang C. Mei. Subharmonic resonance of venice
gates in waves. Journal of Fluid Mechanics, 349, 1997.
[36] Lee E. Harris. Combined recreational amenities and coastal erosion protection using
submerged breakwaters for shoreline stabilization. Technical report, Florida Instituteof
Technology, September 2005.
130
[37] Lee Harris. Breakwater wave attenuation.
[38] Yu-Chee Tseng, Sze-Yao Ni, and En-Yu Shih. Adaptive approaches to relieving broad-
cast storms in a wireless multihop mobile ad hoc network. IEEE Transactions on
Computers, 52(5):545?557, 2003.
[39] L. Orecchia, A. Panconesi, C. Petrioli, and A. Vitaletti. Localized techniques for
broadcasting in wireless sensor networks. In DIALM-POMC ?04: Proceedings of the
2004 joint workshop on Foundations of mobile computing, pages 41?51, New York, NY,
USA, 2004. ACM Press.
[40] Ya Xu, John S. Heidemann, and Deborah Estrin. Geography-informed energy conser-
vation for ad hoc routing. In Mobile Computing and Networking, pages 70?84, 2001.
[41] Hyojun Lim and Chongkwon Kim. Multicast tree construction and flooding in wireless
ad hoc networks. In MSWIM ?00: Proceedings of the 3rd ACM international workshop
on Modeling, analysis and simulation of wireless and mobile systems, pages 61?68, New
York, NY, USA, 2000. ACM Press.
[42] Wei Peng and Xi-Cheng Lu. On the reduction of broadcast redundancy in mobile ad
hoc networks. In MobiHoc ?00: Proceedings of the 1st ACM international symposium
on Mobile ad hoc networking & computing, pages 129?130, Piscataway, NJ, USA, 2000.
IEEE Press.
[43] Amir Qayyum, Laurent Viennot, and Anis Laouiti. Multipoint relaying: An efficient
technique for flooding in mobile wireless networks. Technical Report Research Report
RR-3898, INRIA, February 2000.
[44] Benjie Chen, Kyle Jamieson, Hari Balakrishnan, and Robert Morris. Span: An energy-
efficient coordination algorithm for topology maintenance in ad hoc wireless networks.
In Mobile Computing and Networking, pages 85?96, 2001.
[45] Scalable link-state internet routing. In ICNP ?98: Proceedings of the Sixth Interna-
tional Conference on Network Protocols, page 52, Washington, DC, USA, 1998. IEEE
Computer Society.
[46] R. G. Ogier et al. Topology dissemination based on reverse-path forwarding (tbrpf).
http://www.sri.com/esd/projects/tbrpf/docs/draft-ietf-manet-tbrpf-11.txt. Online.
[47] A. Ephremides, J.E. Wieselthier, and D.J. Baker. A design concept for reliable mobile
radio networks with frequency hopping signaling. Proceedings of the IEEE, 75(1):56?73,
1987.
[48] Chunhung Richard Lin and Mario Gerla. Adaptive clustering for mobile wireless net-
works. IEEE Journal of Selected Areas in Communications, 15(7):1265?1275, 1997.
131
[49] K. Mase, Y. Wada, N. Mori, K. Nakano, M. Sengoku, and S. Shinoda. Flooding schemes
for a universal ad hoc network. In Industrial Electronics Society, 2000. IECON 2000.
26th Annual Confjerence of the IEEE, volume 2, pages 1129?1134, Nagoya, Japan,
2000.
[50] M. Gerla, T. Kwon, and G. Pei. On demand routing in large ad hoc wireless networks
with passive clustering, 2000.
[51] David B. Johnson Jorjeta Jetcheva. Adaptive demand-driven multicast routing in
multi-hop wireless ad hoc networks. In Proceedings of the Second Symposium on Mobile
Ad Hoc Networking and Computing (MobiHoc 2001), pages 33 ? 44. ACM, ACM, Oct
2001.
[52] D. Tian and N.D. Georganas. Energy efficient routing with guaranteed delivery in
wireless sensor networks. IEEE Wireless Communications and Networking, 3:1923?
1929, 2003.
[53] C. Gui and P. Mohapatra. A self-healing and optimizing routing technique for ad hoc
networks, 2003.
[54] George Aggelou and Rahim Tafazolli. RDMAR: A bandwidth-efficient routing protocol
for mobile ad hoc networks. In WOWMOM, pages 26?33, 1999.
[55] C. Toh. Associativity-based routing for ad-hoc mobile networks.
[56] E. Royer and C. Toh. A review of current routing protocols for ad-hoc mobile wireless
networks, 1999.
[57] Vincent D. Park and M. Scott Corson. A highly adaptive distributed routing algorithm
for mobile wireless networks. In INFOCOM ?97: Proceedings of the INFOCOM ?97.
Sixteenth Annual Joint Conference of the IEEE Computer and Communications So-
cieties. Driving the Information Revolution, page 1405, Washington, DC, USA, 1997.
IEEE Computer Society.
[58] M. Pan, Sheng-Yan Chuang, and Sheng-De Wang. Local repair mechanisms for on-
demand routing in mobile ad hoc networks. In Dependable Computing, 2005. Proceed-
ings. 11th Pacific Rim International Symposium on, 2005.
[59] Bu-Sung Lee Boon-Chong Seet Chuan-Heng Foh and Lijuan Zhu Genping Liu, Kai-
Juan Wong. Patch: A novel local recovery mechanism for mobile ad-hoc networks.
[60] C. Lu, B. Blum, T. Abdelzaher, J. Stankovic, and T. He. Rap: A real-time communi-
cation architecture for large-scale wireless sensor networks, 2002.
[61] Tian He, J.A. Stankovic, Chenyang Lu, and T. Abdelzaher. Speed: a stateless protocol
for real-time communication in sensor networks. In Distributed Computing Systems,
2003. Proceedings. 23rd International Conference on, pages 46?55, 19-22 May 2003.
132
[62] Y. Chen, A. Liestman, and J. Liu. Clustering algorithms for ad hoc wireless networks,
2004.
[63] Vlado Handziski, Andreas Koepke, Holger Karl, Christian Frank, and Witold Dry-
tkiewicz. Improving the energy efficiency of directed diffusion using passive clustering.
In EWSN 2004, volume 2920 of LNCS, pages 172?187, 2004.
133