Adaptive TCP Flow Control for Improving Performance of
Mobile Cloud Clusters
by
Seungbae Lee
A dissertation submitted to the Graduate Faculty of
Auburn University
in partial ful llment of the
requirements for the Degree of
Doctor of Philosophy
Auburn, Alabama
May 4, 2014
Keywords: mobile device, mobile ad hoc cloud, Hadoop mobile cluster, distributed
computing performance, analytic data  ow, TCP  ow control
Copyright 2014 by Seungbae Lee
Approved by
Alvin Lim, Chair, Associate Professor of Computer Science and Software Engineering
David Umphress, Associate Professor of Computer Science and Software Engineering
Sanjeev Baskiyar, Associate Professor of Computer Science and Software Engineering
Xiao Qin, Associate Professor of Computer Science and Software Engineering
Abstract
Signi cant innovations in mobile technologies are enabling mobile users to make real-
time actionable decisions based on balancing opportunities and risks in order to take co-
ordinated actions with others in the workplace. This requires a new distributed analytic
framework that collects relevant information from internal and external sources, performs
real-time distributed analytics, and delivers a critical analysis to any user at any place in a
given time frame through the use of mobile devices such as smartphones and tablets.
This work discusses the bene ts and challenges of building mobile cloud clusters using
recent mobile devices for distributed analytic applications by showing its feasibility with
MapReduce framework and also investigates performance issues of Hadoop mobile clusters
by conducting extensive experiments using typical Hadoop benchmarks; the newest release of
Hadoop software framework with its enhancements is ported to the latest Android-based mo-
bile devices through mobile virtualization technique. In addition, it develops the MapReduce
simulator based on the ns-2 network simulator to comprehensively examine the performance
and e ciency of mobile cloud clusters in extensive operating environments, which enables it
to identify critical performance issues under di erent cluster (or workload) scales, dynamic
node mobility, and various wireless channel conditions.
From the performance analysis this work identi es TCP (Transmission Control Proto-
col) communication problems resulting from distinct tra c patterns of MapReduce-based
Hadoop distributed framework and proposes adaptive TCP  ow control algorithms for im-
proving the performance of mobile cloud clusters. The overall computing power of the mobile
cluster is no longer signi cantly bounded by typical processing capabilities of each individ-
ual mobile node as mobile devices have been constantly enhanced, but the mobile cluster
has limitations on interchanging large amounts of analytical data among mobile devices and
ii
monitoring real-time status of cluster nodes through timely state updates, which result in
signi cant delays in the processing time with corresponding performance degradation.
This work proposes an algorithm for cross-layer TCP  ow control and dynamic network
resource scheduling to avoid frequent over ows of the MAC (Media Access Control) layer
transmit queue on the mobile nodes, which interrupt long-lived analytical data streams
required for the partition and aggregation work ow of distributed analytic frameworks. It
controls TCP?s packet transmission based on the queueing level and implements dynamic
resource scheduling for incoming and outgoing frames to minimize the queueing delay and
stabilize the queueing level. In the evaluation test, the aggregate throughput of peer-to-peer
TCP connections was signi cantly improved without incurring any throughput collapse.
In order to prevent TCP receive bu er over ows on the controller node due to the use of
TCP push packets, which involve many interruptions in the reception of the latest status up-
dates and progress reports from its worker nodes, this work also introduces another algorithm
for mitigating the adverse e ects of the TCP push  ows that cause excessive transmissions
from fast congestion window growth and frequent RTO (retransmission timeout) underesti-
mation. It moderates the congestion window outburst and skips the RTO estimation using
RTT (round-trip time) updates from the TCP push streams when the bu er over ow is de-
tected. In the Hadoop TeraSort benchmark test, there were marked decreases in frequency
of receive bu er over ows and TCP packet retransmissions and the overall processing time
could be shortened accordingly.
iii
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Recent mobile trends in the workplace . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Mobile device capabilities . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Mobility in the workplace . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Actionable analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Di erent types of mobile cloud computing . . . . . . . . . . . . . . . . . . . 4
1.2.1 Remote and local cloud services . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Ad hoc cloud and other concepts . . . . . . . . . . . . . . . . . . . . 6
2 Background and related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Overview of Apache Hadoop for cloud clusters . . . . . . . . . . . . . . . . . 9
2.1.1 Apache Hadoop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 MapReduce programming model . . . . . . . . . . . . . . . . . . . . 10
2.1.3 Hadoop distributed  lesystem . . . . . . . . . . . . . . . . . . . . . . 10
2.1.4 MapReduce task execution . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Overview of TCP  ow control for performance analysis . . . . . . . . . . . . 13
2.2.1 Sliding window  ow control . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Packet loss detection mechanisms . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Congestion window dynamics . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Related studies on mobile cloud clusters . . . . . . . . . . . . . . . . . . . . 19
2.3.1 Implementation of mobile ad hoc cloud . . . . . . . . . . . . . . . . . 20
iv
2.3.2 Network problems of traditional cloud . . . . . . . . . . . . . . . . . 21
2.3.3 MapReduce performance prediction using simulation . . . . . . . . . 22
3 Understanding performance issues of Hadoop mobile clusters . . . . . . . . . . . 23
3.1 Hadoop benchmarks for performance evaluation . . . . . . . . . . . . . . . . 23
3.2 Assumptions on mobile cloud clusters . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Performance experiments of Hadoop mobile clusters . . . . . . . . . . . . . . 27
3.3.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.2 I/O performance of mobile nodes . . . . . . . . . . . . . . . . . . . . 32
3.3.3 Performance of WordCount workload . . . . . . . . . . . . . . . . . . 34
3.3.4 Performance of TeraSort workload . . . . . . . . . . . . . . . . . . . . 35
3.3.5 Performance of scale testing . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Performance simulations of Hadoop mobile clusters . . . . . . . . . . . . . . 42
3.4.1 MRPerf simulator for MapReduce . . . . . . . . . . . . . . . . . . . . 43
3.4.2 Implementation and validation of MapReduce simulator . . . . . . . . 44
3.4.3 Performance of scale testing . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.4 Performance over di erent radio propagations . . . . . . . . . . . . . 49
3.5 Performance issues of Hadoop mobile clusters . . . . . . . . . . . . . . . . . 51
4 Problem statements and research questions . . . . . . . . . . . . . . . . . . . . . 54
4.1 Limitations on TCP performance over mobile devices . . . . . . . . . . . . . 54
4.2 Problems of using mobile devices for mobile cloud . . . . . . . . . . . . . . . 55
4.3 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5 Adaptive TCP  ow control for mobile clusters . . . . . . . . . . . . . . . . . . . 58
5.1 Queueing level control for transmit queue over ow . . . . . . . . . . . . . . . 58
5.1.1 Analysis of MAC-layer transmit queue over ow . . . . . . . . . . . . 58
5.1.2 Transmit queueing level control algorithm . . . . . . . . . . . . . . . 62
5.2 TCP push  ow control for receive bu er over ow . . . . . . . . . . . . . . . 63
5.2.1 Analysis of TCP receive bu er over ow . . . . . . . . . . . . . . . . . 63
v
5.2.2 TCP push  ow control algorithm . . . . . . . . . . . . . . . . . . . . 69
6 Evaluation of proposed solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.1 Transmit queueing level control algorithm . . . . . . . . . . . . . . . . . . . 72
6.1.1 Performance improvement of peer-to-peer data transfer . . . . . . . . 72
6.1.2 Performance improvement in data aggregation . . . . . . . . . . . . . 76
6.2 TCP push  ow control algorithm . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2.1 Packet analysis of Hadoop master running TeraSort . . . . . . . . . . 77
6.2.2 Performance improvement of TeraSort workload . . . . . . . . . . . . 80
7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
vi
List of Figures
1.1 CPU performance improvement of Apple and Samsung?s smartphones . . . . . . 2
1.2 Di erent types of mobile cloud computing . . . . . . . . . . . . . . . . . . . . . 5
2.1 Data  ows in MapReduce task execution . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Sliding window based TCP data transfer process . . . . . . . . . . . . . . . . . 15
2.3 Upper bound of TCP congestion window growth . . . . . . . . . . . . . . . . . 15
3.1 Experimental mobile cluster using Google NEXUS 7 . . . . . . . . . . . . . . . 28
3.2 Mobile virtualization for MapReduce implementation . . . . . . . . . . . . . . . 31
3.3 Network and  lesystem throughput of mobile nodes . . . . . . . . . . . . . . . . 33
3.4 Network utilization of Hadoop mobile cluster with WordCount workload . . . . 35
3.5 Resource utilization of MapReduce nodes running WordCount tasks . . . . . . . 36
3.6 Network utilization of Hadoop mobile cluster with TeraSort workload . . . . . . 37
3.7 Resource utilization of MapReduce nodes running TeraSort tasks . . . . . . . . 38
3.8 Cluster size scaling of WordCount and TeraSort experiments . . . . . . . . . . . 39
3.9 Data block size scaling of TeraSort experiments . . . . . . . . . . . . . . . . . . 40
3.10 Input size scaling of WordCount experiments . . . . . . . . . . . . . . . . . . . 41
3.11 Architecture of MRperf simulator . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.12 Screenshot of MapReduce simulation . . . . . . . . . . . . . . . . . . . . . . . . 44
3.13 Validation of MapReduce simulation with TeraSort workload . . . . . . . . . . . 45
3.14 Cluster size scaling of TeraSort simulations . . . . . . . . . . . . . . . . . . . . 47
3.15 Input size scaling of TeraSort simulations . . . . . . . . . . . . . . . . . . . . . 48
vii
3.16 Simulation area scaling with node mobility . . . . . . . . . . . . . . . . . . . . . 49
3.17 MapReduce simulations with di erent radio propagation models . . . . . . . . . 51
5.1 TCP packet transmitting process of mobile nodes . . . . . . . . . . . . . . . . . 59
5.2 Data  ows between master and slave nodes . . . . . . . . . . . . . . . . . . . . 64
5.3 Comparison of Hadoop data and heartbeat  ow . . . . . . . . . . . . . . . . . . 66
5.4 TCP packet receiving process of master node . . . . . . . . . . . . . . . . . . . 67
6.1 Evaluation setup for transmit queueing level control . . . . . . . . . . . . . . . . 73
6.2 TCP performance of mobile nodes with di erent MAC queue sizes . . . . . . . . 74
6.3 TCP performance with modi ed network stack . . . . . . . . . . . . . . . . . . 75
6.4 TCP throughput of original and modi ed network stack . . . . . . . . . . . . . 76
6.5 Resource utilization of master node running TeraSort workload . . . . . . . . . 78
6.6 Tra c patterns of cluster nodes running TeraSort workload . . . . . . . . . . . 78
6.7 TCP packet analysis of master node . . . . . . . . . . . . . . . . . . . . . . . . 79
6.8 Zero Window advertisements of original and modi ed TCP . . . . . . . . . . . . 80
6.9 Performance improvement of experimental mobile cluster . . . . . . . . . . . . . 81
viii
List of Tables
1.1 Internal hardware sensors of Samsung Galaxy S4 smartphone . . . . . . . . . . 3
3.1 Hadoop benchmarks and their characteristics . . . . . . . . . . . . . . . . . . . 25
3.2 Hardware and software speci cations of experimental nodes . . . . . . . . . . . 29
3.3 Comparison of Java and Dalvik bytecode . . . . . . . . . . . . . . . . . . . . . . 30
3.4 Hadoop con gurations of the experimental mobile cluster . . . . . . . . . . . . . 32
3.5 Simulation con gurations of the Hadoop mobile cluster . . . . . . . . . . . . . . 46
5.1 Parameters for transmit queue over ow analysis . . . . . . . . . . . . . . . . . . 60
5.2 Variables for transmit queueing level control algorithm . . . . . . . . . . . . . . 61
5.3 Parameters for receive bu er over ow analysis . . . . . . . . . . . . . . . . . . . 68
5.4 Variables for TCP push  ow control algorithm . . . . . . . . . . . . . . . . . . . 70
ix
Chapter 1
Introduction
Many IT industry analysts predict that mobile growth is one of the most signi cant
trends for the forthcoming years. In recent years, mobile technologies have matured and
become suitable for a wider range of use, which creates new opportunities and demands for
emerging platforms and services including mobile computing, hybrid cloud, and actionable
analytics.
According to International Data Corporation (IDC) estimates, the recent surge in de-
mand for mobile systems will continue to lead market growth of smartphones and tablet
computers surpassing PC sales [1]. Gartner, Inc. also forecasts sales of 1.9 billion mobile
phones in 2014, a 5 percent increase from 2013. In 2014, the worldwide tablet market is
forecast to grow 47 percent with lower average selling prices attracting new users while
world-wide shipments of traditional PCs are forecast to a 7 percent decline from 2013 [2].
The rising demand for mobile devices and mobility-related services has led smartphones
and tablets to become far more powerful. Mobile devices with a quad-core 1.9 GHz processor
and 2 GB memory are already widely available. Even octa-core processors and 3 GB memory
modules for mobile devices are planned for release at the time of writing this dissertation.
Along with enhancements in battery capacity and network capability, mobile devices are
now capable of sharing their resources for distributed processing of critical data analytics as
resource providers of cloud computing.
1.1 Recent mobile trends in the workplace
With the increasing popularity of mobile devices, a growing number of organizations
are adopting Bring Your Own Devices (BYOD) programs [3]. With BYOD, workers bring
1
2007 2008 2009 2010 2011 2012 2013 20140
0.5
1
1.5
2
2.5
3
3.5
4
4.5x 10
4
iPhone3G 3GS
4
4S
5
5S
Release date
CPU benchmark rating
Apple iPhone series
2010 2011 2012 2013 20140
0.5
1
1.5
2 x 10
4
Galaxy S
S2
S3
S4
Release date
CPU benchmark rating
Samsung Galaxy S series
Figure 1.1: CPU performance improvement of Apple and Samsung?s smartphones
their own mobile devices to their workplace and use those devices to access privileged in-
formation and run applications of their organization. It provides a great opportunity for
improving productivity by accelerating the speed of decision-making and problem-solving
and by allowing more  exibility.
1.1.1 Mobile device capabilities
The strong demand for diverse mobile devices and applications has led smartphones
and tablets to o er the latest advanced features. Mobile platforms are leveraging multi-core
processor architecture to dramatically increase processing power at a low cost. Manufactur-
ers are introducing high-speed memory and increasing storage capacity for mobile devices.
Moreover, advances in battery capacity and power saving techniques enable mobile devices
to support large complex computations and long-running processes and provide more reli-
able high-speed wireless connectivity with more optional features, including 4G LTE, Wi-Fi,
Bluetooth, and Near Field Communication (NFC). Figure 1.1 shows the CPU performance
improvement (CPU benchmark results from PassMark [4]) of two  agship smartphones from
2
Table 1.1: Internal hardware sensors of Samsung Galaxy S4 smartphone
Sensors Features
Accelerometer Detects the mobile phone movement state based on three axes
Barometer Identi es the atmospheric pressure at the user?s current location
Geomagnetic sensor Detects magnetic  eld intensity based on three axes
Gesture sensor Recognizes the user?s hand movements using infrared rays
Gyro sensor Detects the mobile phone rotation state based on three axes
Hall sensor Recognizes whether the cover is open or closed
Proximity sensor Recognizes whether the mobile phone is located near the user by using
infrared rays
RGB light sensor Measures the red, greed, blue, and white intensity of the light source
Temperature sensor Checks temperature level
Humidity sensor Detects humidity level
Apple and Samsung Electronics, iPhone and Galaxy S series, which run mobile operating
systems of Apple iOS and Google Android, respectively.
Furthermore, recent mobile devices provide innovative visual interaction through the use
of advanced high-resolution touchscreens. Also, they integrate a variety of sensors that are
constantly being improved, which include microphones, image sensors, 3-axis accelerometers,
gyroscopes, atmosphere pressure sensors, digital compasses, optical proximity sensors, ambi-
ent light sensors, humidity sensors, touch sensors, etc, as detailed in Table 1.1. These cutting
edge touchscreens and sensors enable mobile devices to monitor the operating environment
in real time and adapt to the situation accordingly.
1.1.2 Mobility in the workplace
Today, people are connected in more ways than before. By using mobile devices, they
no longer need to sit in front of desktop computers at o ce or at home in order to search
for information and communicate with other people. People are creating new connections
anywhere, anytime, and on any device. Mobility is having a huge impact on the way people
work by saving time and resources and by opening new opportunities for innovation. A recent
3
IDC report shows the world?s mobile worker population will reach 1.3 billion, representing
37.2% of the total workforce by 2015 [5]. The number of mobile workers in the U.S. will
grow from 182.5 million in 2010 to more than 212.1 million by 2015.
The increasingly mobile and remote workforce is driving organizations to support a
wide range of mobile applications and services, which enables workers to proactively detect
and collect more information from internal and external sources by using mobile devices, and
perform real-time analytics for rapid decision, thus improving collaboration and productivity.
A Gartner?s report shows that more than 40 percent of all enterprise web applications will
support mobile environments by 2014 and 90 percent of companies will support corporate
applications on mobile devices by 2014 [6].
1.1.3 Actionable analytics
Conventional explanatory analytics usually focused on what happened in the past. Such
analytics may be an outdated and ine ective approach to o ering timely, accurate, and
actionable insights needed for distributed decision making and coordinated action planning
today. What is happening now? What is going to happen in the future? The ability to
answer these questions in real time or near real time can provide a competitive advantage.
Recent advances in mobile technologies enable mobile users to collaborate with their
team members through coordinated actions by balancing opportunities and risks. These
actions can be generated by ad hoc distributed analytics that may consist of simulation,
prediction, and/or optimization. This capability leads to a great opportunity for reducing
cost while improving outcomes through more  exible decision-making that can be optimized
for a speci c scenario at a certain time and place [6].
1.2 Di erent types of mobile cloud computing
The increasing number of mobile applications require more complex processing and more
operational data. These applications may include real-time mobile analytics that enhances
4
(a) Remote cloud (b) Local cloud
(c) Ad hoc cloud (d) Personal cloud
Figure 1.2: Di erent types of mobile cloud computing
situational awareness, risk assessment, distributed decision making, coordinated action plan-
ning, team collaboration, and instant responsiveness.
Despite the increasing use of mobile devices, exploiting its full potential is di cult due
to the inherent problems such as resource limitations (e.g., low computational capability and
battery capacity) and frequent disconnections from mobility. Mobile cloud computing can
solve these problems cost-e ectively by utilizing computing and storage capabilities from
remote resource providers or other mobile devices. There are several approaches to mobile
cloud computing with di erent concepts and con gurations [7, 8]. Figure 1.2 illustrates the
di erent types of mobile cloud computing.
5
1.2.1 Remote and local cloud services
The mobile cloud computing commonly means to run a mobile application (e.g., Google
Maps) on a remote resource-rich server (e.g., Google Cloud Platform) while the mobile device
acts like a thin-client connecting to the remote server through Wi-Fi or 3G/4G wireless
networks. Mobile users can access cloud systems using a web browser or mobile application
regardless of their location or device type. Other examples include mobile commerce (m-
commerce), mobile learning (m-learning), mobile healthcare (m-health), etc.
Although the mobile cloud applications that connect to a remote infrastructure are
becoming popular, they can perform well only under reliable connectivity. It is not practical
to assume high-speed connections, seamless handovers, and fast responses on mobile devices
as mobile environments are subject to high probability of signi cant disruptions to network
services due to mobility, where  xed infrastructures are frequently unavailable and network
partitions are common.
The Cloudlet proposed by [9] is another approach to mobile cloud computing. Mobile
users usually run client-server software at a remote location and high network latency (or
low network bandwidth) makes it insu cient for real-time applications. To cope with this
issue, they introduced the concept of local cloud: trusted, resource-rich computers in the
near vicinity of the mobile user (e.g., near or co-located with wireless access points). Mobile
users can rapidly instantiate custom virtual machines (VMs) on the cloudlet running the
required software in a thin-client fashion. The cloudlet can be deployed in common areas,
such as public o ces, airports, shopping malls, etc.
1.2.2 Ad hoc cloud and other concepts
Another approach is to consider other mobile devices as resource providers of cloud
computing by making up a mobile ad hoc network as in [10]. Reliable access to remote
resources is the  rst challenge in the mobile environments since the resources are commonly
distributed across a variety of remote sources. Thus, clustering with nearby mobile devices
6
will promise faster connectivity and better availability. This work primarily focuses on this
type of mobile cloud computing, where the remote resources are mobile and available only
within the range of the wireless transmission.
The collective resources of the various mobile devices in a local area, and other stationary
devices if available, are utilized for cloud computing in the mobile ad hoc cloud. As a result,
it can e ectively support user mobility, collective sensing, and distributed applications. On
the other hand, a mobile device that initiates cloud computing in the ad hoc cloud needs
to dynamically take advantage of mobile cloud resources depending on requirements of its
workload because the ad hoc resources and operating environments of the mobile cloud are
subject to change. Hence, monitoring and scheduling of available cloud resources are one of
the most critical capabilities for the success of the mobile ad hoc cloud.
In addition to the approaches above, there are other concepts and models for imple-
menting cloud services using mobile devices. For example, personal cloud and private cloud
are operated solely for a single person and a single organization, respectively, public cloud
is open for public use, and community cloud shares resources between several organizations
from a speci c community with common concerns [11]. Although there may be little or no
di erence between those cloud architecture, they have substantially di erent security con-
sideration for resources and applications. Furthermore, Gartner identi ed hybrid cloud as
one of the top strategic trends for 2014, which is composed of some combination of personal,
private, public and community cloud services o ering the bene ts from multiple deployment
models [12].
The rest of this paper is organized as follows. Chapter 2 introduces the background and
work related to this work. Chapter 3 then describes the details of our experimental setup for
mobile cloud clusters and conducts performance analysis to identify performance problems
of mobile MapReduce applications. Chapter 4 presents problem statements and research
questions of this study. Then Chapter 5 proposes adaptive TCP  ow control algorithms
for mobile clusters with an analysis of TCP performance problems and Chapter 6 provides
7
evaluation results for the proposed solutions. Finally, Chapter 7 concludes this paper with
a summary of the previous chapters and a discussion of main contributions and further
research ideas.
8
Chapter 2
Background and related work
This section provides background information about the distributed analytic framework
for mobile cloud clusters and its communication protocol, and also presents summaries on
previous studies related to this work.
2.1 Overview of Apache Hadoop for cloud clusters
When reviewing multiple distributed analytic frameworks, this work found that Apache
Hadoop [13] can provide a good starting point for implementing mobile cloud clusters since
it supports cost-e ective and high performance analysis of a large volume of unstructured
data on a set of commodity hardware. This section summarizes main ideas of MapReduce
programming model and describe working mechanisms of Hadoop distributed  le system and
MapReduce task execution in Hadoop framework.
2.1.1 Apache Hadoop
Apache Hadoop is an open-source framework that uses a simple distributed processing
model based on Google MapReduce [14] and Google  le system (GFS) [15]. It e ectively
handles massive amount of information by either transforming it to a more useful structure
and/or format, or extracting valuable analytics from it. Hadoop runs on any machines
equipped with a lower cost processor and storage, and automatically recovers from hardware,
and software failures by providing fault tolerance mechanisms. Hence, Hadoop is more cost-
e ective for handling large unstructured data sets than conventional data mining approaches
by o ering great scalability and high availability.
9
2.1.2 MapReduce programming model
MapReduce is a fundamental programming model in Hadoop architecture to process
large volumes of data. MapReduce applications with a parallel approach utilize a scale-out
architecture that makes use of inexpensive commodity servers con gured as a cluster, which
allows users to analyze terabytes or petabytes of data in a fast and reliable way. To take
advantage of parallel processing that MapReduce provides, users need to submit a job with
desired computations for a series of MapReduce tasks. MapReduce operates by dividing the
processing into two phases: the Map phase and the Reduce phase. Each phase has key-value
pairs as input and output, the data formats can be speci ed according to the application.
The user implements two functions: the Map function and the Reduce function. The Map
function reads each input key-value pairs and produces a list of intermediate key-value pairs.
The Reduce function takes in all intermediate key-value pairs by sorting with the identical
key and generates a  nal set of key-value pairs. Both the input and output key-value pairs
in Map and Reduce phase are stored in an distributed  le system.
The Hadoop platform that runs MapReduce applications automatically parallelizes the
execution, coordinates network communication and ensures fault tolerance, which spares
users from concerning about parallelization and hardware failures in a large-scale distributed
environment. In addition, the Hadoop framework tries to assign Map and Reduce tasks to
cluster nodes where the data to be processed is stored for alleviating loads on network band-
width and preventing unnecessary network transfers, which is critical for high performance
in data-intensive computing.
2.1.3 Hadoop distributed  lesystem
Hadoop implements a distributed, scalable, and portable  lesystem, called Hadoop dis-
tributed  le system (HDFS), designed for storing large  les with streaming access to data
and optimized for data-intensive workloads such as MapReduce, running across clusters with
commodity hardware. The data in the Hadoop cluster is broken down into small,  xed-size
10
pieces, called blocks; the default size of a block is 64 MB. The  les in HDFS are split into
block-sized chunks, which are stored as independent units. HDFS has two types of instance
in a master-slave pattern: a single NameNode (the master) and a number of DataNodes (the
slaves). NameNode manages the  lesystem namespace which maintains the  lesystem tree
and the metadata for all the  les and directories in the tree. NameNode also monitors the
DataNodes on which all the blocks for a given  le are located. DataNodes are the worker
nodes of the  lesystem. They store and retrieve blocks at the request of clients or NameNode
through remote procedure call (RPC), and report their recent status updates periodically to
NameNode with the metadata of their current blocks.
When an HDFS client wants to read a  le, it  rst contacts NameNode to obtain the
location information of data blocks comprising the  le and then retrieves block contents from
the DataNodes closest to the client. When writing data, the client also requests NameNode
to schedule a set of DataNodes to store the block replicas for fault tolerance and then writes
data to the DataNodes in a pipeline-like fashion. Since the current architecture has a single
NameNode for each cluster where NameNode keeps all the namespace and block locations in
memory, the size of the NameNode?s heap limits the number of  les as well as the number
of blocks addressable. This also limits the total cluster storage that can be supported by
NameNode.
2.1.4 MapReduce task execution
A typical Hadoop cluster consists of a single master node and many slave nodes. An
entire MapReduce computation is called a job and the execution of a Map or Reduce function
on a slave is called a task. A sever process running on the master node, named JobTracker,
coordinates jobs on a cluster of slave nodes. The JobTracker is responsible for accepting jobs
from clients, dividing the input data (i.e.,  le) into many splits (i.e., blocks) and assigning
the blocks to Map tasks that are processed concurrently on multiple slave nodes. Each
slave node runs a client process called TaskTracker that manages the execution of the tasks
11
Figure 2.1: Data  ows in MapReduce task execution
assigned to the node. Each TaskTracker has a  xed number of slots for executing tasks; the
number of slots on each cluster node may be di erent and it depends on the node?s hardware
capacity.
After the MapReduce job is submitted, the number of Map tasks is determined by
the total number of data blocks spilt from input  les. Each Map task reads a block from
HDFS, parses it into key-value pairs and applies the Map function to each pair. Once the
intermediate key-value pairs are generated on the local disk by the Map tasks, partitions on
the intermediate pairs are performed using a scheme that keeps pairs with the identical key
to be processed on the same Reduce task. After that, JobTracker sends the locations of the
pairs to corresponding Reduce tasks. MapReduce guarantees that the intermediate result
to every Reduce task is sorted by key. The process by which the system performs the sort
and transfers the Map outputs to the Reduce tasks as inputs is known as the shu e. After
a Reduce task has read all the intermediate pairs, it runs the Reduce function and writes
the output pairs to a  nal output  le in HDFS. The data  ows in MapReduce execution
consisting of the Map, Shu e, and Reduce phase are illustrated in Figure 2.1.
12
This section omits some details of con guration parameters, performance metrics, and
analytic algorithms in Hadoop architecture, which can be found in many excellent publica-
tions including tutorials, academic papers, and technical reports. Some of them (e.g., [16],
[17], [18], and [19]) are referenced in this work.
2.2 Overview of TCP  ow control for performance analysis
Most of the current distributed systems including Hadoop analytic framework exten-
sively employ Transmission Control Protocol (TCP) for reliable communications between
cluster nodes. Since TCP is optimized for accurate delivery rather than timely delivery,
TCP on mobile devices may frequently incur signi cant delays while waiting for retrans-
missions of lost or damaged packets and rearrangements of out-of-order packets on wireless
communication links as detailed in [20]. This section presents an overview of TCP  ow con-
trol mechanisms to provide a background for performance analysis of mobile cloud clusters.
2.2.1 Sliding window  ow control
TCP provides reliable data transfer with its  ow control and congestion control mecha-
nisms between applications on two hosts in the network. Since the TCP sender that desires
to transmit a large amount of data may try to send data too fast for the TCP receiver to
receive and process it reliably, the TCP standard [21] describes a sliding window based  ow
control mechanism. The TCP sender  rst bu ers all data (in a TCP transmit bu er) before
the transmission, assigning a sequence number to each bu ered byte. Pieces of the bu ered
data are continuously formed into TCP packets that include a sequence number of the  rst
data byte in the packet. Then a portion (window) of the packets that are ready to send is
transmitted to the receiver using the IP protocol. As soon as the sender receives delivery
con rmation for at least one transmitted packet, it transmits a new portion of packets; the
window slides along the sender?s bu er as shown in Figure 2.2a. Since packet transfers over
bad network conditions may not be reliable, the receiver is required to respond with an
13
acknowledgment as it receives the data while the sender keeps a record of each packet it
transmits. The sender also maintains a retransmission timer, and retransmits a packet if the
timer expires before the data has been accurately acknowledged.
Although a sliding window based  ow control is relatively simple, it has con icting
objectives. In order to maximize the throughput of a TCP  ow, it is essentially required
that the size of a sliding window also be maximized. On the other hand, if the sliding
window is too large, there is a high probability of packet loss because the network and the
receiver have resource limitations. Thus, reducing packet losses requires minimizing the
sliding window. Hence,  nding an optimal value for the sliding window parameter (which
is usually referred to as the congestion window size) that provides better throughput, yet
does not overwhelm the network bandwidth and the receiver capacity is a critical problem
in TCP communications.
The TCP  ow control mechanism integrates the receive window concept as well, which is
designed for the receiver to share the information about the available receive bu er with the
sender. In each TCP segment, the receiver speci es the amount of additionally acceptable
data (in bytes) in the receive window  eld. The sender can send only up to that amount
of data before it must wait for an acknowledgment and window update from the receiver.
Figure 2.2b illustrates this mechanism. When establishing a connection, the receiver informs
the sender about the available bu er size for incoming packets (in the example shown, the
receiver?s window reported initially is 7). The sender transmits a portion (window) of data
packets. This portion must not exceed the receiver?s window and may be smaller if the
sender is not willing (or ready) to send a larger portion. In the case where the receiver
is unable to process data as fast as the sender generates it, the receiver reports decreasing
values of the window (3 and 0 in the example). This induces the sender to shrink the sliding
window. As a result, the TCP transmission will eventually synchronize with the receiver?s
processing rate as demonstrated in Figure 2.3.
14
(a)
Sliding
win
do
w
con
trol
(b)
Receiv
ewindo
w
con
trol
Figure
2.2:
Sliding
windo
w
based
TCP
data
transfer
pro
cess
(a)
Net
work
limit
(b)
Receiv
er
limit
Figure
2.3:
Upp
er
bound
of
TCP
congestion
windo
w
gro
wth
15
When a receiver advertises a window size of 0, Zero Window (i.e., a receiver is not able to
receive any more data at the moment) as shown in Figure 2.2b, the sender stops transmitting
data and starts a persist timer. The persist timer protects TCP from the deadlock situation
that the sender will never be able to transmit further data by waiting for a new window
update from the receiver, which could arise if a subsequent window update from the receiver
is lost. When the persist timer expires, the TCP sender attempts recovery by transmitting
a small packet (called Zero Window probe) so that the receiver forcibly responds by sending
another acknowledgement containing the receive window update.
2.2.2 Packet loss detection mechanisms
The early and accurate detection of packet loss is one of core mechanisms in the TCP
congestion and  ow control. Most of the TCP control proposals gradually increase the
utilization of network resources up to the limit where a packet loss is starting to be detected,
at which point they reduce its transmitting rate, retransmit the lost packet, and begin
another phase of rate controls (e.g., rate increases). TCP detects a packet loss through two
critical mechanisms: retransmission timeout and duplicate acknowledgement [21].
The  rst mechanism de nes the retransmit timeout (RTO), in which TCP waits for
a timeout of the retransmission timer for the detection of a packet loss. Although it is
capable of reliably detecting all losses, the detection is not fast enough and is necessarily
time consuming since the retransmission timer must be set high enough to avoid unnecessary
timeouts caused by transient network conditions; the minimum time when a loss can be
detected is the round-trip time (RTT) and the RTO should be greater than the RTT.
If the RTO value is overestimated, the TCP packet loss detection mechanism becomes
very conservative, and performance of individual  ows may be signi cantly degrade. In the
opposite case, when the RTO value is underestimated, the loss detection mechanism may
cause unnecessary retransmissions by wasting the available network resources and aggravat-
ing the network congestion level. For the problems of erroneous RTO estimates, the RTT
16
variance estimation algorithm [22] tries to alleviate the overestimation problem by establish-
ing a  ne-grained upper bound for the RTO and the exponential RTO backo algorithm [23]
to mitigate the underestimation problem by doubling the RTO value on each retransmission
occurrence.
The second mechanism assumes that if TCP receives a few duplicate acknowledgements
of a packet then the packet was lost, in which TCP counts the number of acknowledgements
with the same sequence number for the detection of a packet loss. When the probability
of packet duplication (and packet reordering) in the network is negligible, the duplicate ac-
knowledgments can be considered a reliable loss indication. Thus, the sender can retransmit
a lost packet faster without waiting for the corresponding retransmission timer to expire.
2.2.3 Congestion window dynamics
The initial TCP standard lacks any means to adjust the transmission rate to the actual
capacity of the network. When a TCP sender (or many senders) is transmitting too much
data that can exceed the available network bandwidth, congestion collapse easily occurs in
the form of queueing delay, packet loss or the blocking of new connections, which result in an
substantial reduction in network throughput. To resolve this congestion collapse problem,
a number of solutions have been proposed. Most of them share the same idea of utilizing
a network-aware rate limiting mechanism along with the receiver-based  ow control. For
this purpose the congestion window mechanism was introduced, in which the TCP sender
determines (or estimates) the number of data packets that the network can accept for delivery
without becoming overloaded. If the receiver does not have any resource limitations, the
congestion window limit can be considered an indication of the maximum capacity of the
connection. On the other hand, when the  ow control limit (i.e., the receive window) is less
than the congestion control limit (i.e., the congestion window), the former is considered a real
bound for outstanding data packets. Graphs in Figure 2.3 show two cases of the congestion
window dynamics; the left graph shows the congestion window dynamics when the network
17
cannot deliver any more data at the transmitting rate, and the right graph represents the
case when the receiver cannot process further data at the receiving rate [20].
One of the earliest solutions to solve the congestion problem in TCP operation has been
proposed by [22]. The solution is based on the original TCP speci cation and includes a
number of algorithms to avoid the network congestion. The most important algorithms are
the Slow Start and Congestion Avoidance algorithm. These provide two slightly di erent
distributed peer-to-peer mechanisms which allow a TCP sender to detect available network
resources and adjust the transmission rate of the TCP  ow to the limit determined. Assuming
the probability of random packet corruption during transmission is negligible, the sender can
consider all the packet losses as the congestion problem. Furthermore, the reception of any
acknowledgement packet is an indication that the network can accept and deliver a new
packet. Thus, the sender expecting that it will not result in any congestion can transmit
at least the amount of data that has just been acknowledged. This incoming and outgoing
packet balancing is called the packet conservation principle that is the basic concept of both
Slow Start and Congestion Avoidance algorithm.
In the Slow Start algorithm, the reception of an acknowledgement packet indicates avail-
ability of network resources for transmitting double the amount of data (i.e., multiplicative
increase) that has been acknowledged by the receiver. In other words, instead of a linear
growth in the congestion window, its growth follows an exponential function (i.e., the growth
is quite aggressive). If a packet loss is detected (i.e., the network is experiencing congestion
because all network resources have been utilized), on the other hand, the congestion window
is reset to the initial value (e.g., the maximum segment size) to ensure release of network
resources. The other algorithm for improving TCP e ectiveness in the networks with re-
source limitations is Congestion Avoidance that combines linear growth of the congestion
window with an exponential reduction when a congestion takes place. In comparison to the
Slow Start, this algorithm is much more conservative in response to acknowledgement of
transmitted packets and to detection of packet losses. As opposed to doubling the size, the
18
congestion window increases by one (i.e., additive increase) only if all data packets have been
successfully delivered during the last RTT. After a loss is detected, the algorithm reacts in
a di erent way, which cuts the congestion window size by half (i.e., multiplicative decrease).
Figure 2.3a demonstrates the behaviors of the Slow Start and Congestion Avoidance algo-
rithm over the bandwidth-limited network.
A number of proposals to improve various aspects of TCP performance (e.g., the e ective
use of available network resources) have been presented over the past 20 years, which in-
clude mechanisms to probe the available network resources, estimate the network congestion
state, and detect the packet loss under di erent assumptions on the network environment.
Moreover, some proposals contain algorithms to improve the poor utilization of error-prone
wireless and high-speed wired networks. For example, TCP Westwood+ [24] proposes a
bandwidth estimation technique for wireless networks with random loss and TCP CUBIC
[25]) a congestion control algorithm that scales well in high-speed networks with long la-
tency. The current Linux kernel supports some of those proposals and users can choose a
better algorithm for a particular network connection; TCP CUBIC is implemented by de-
fault in Linux kernel 2.6.19 and above. However, there are not yet the practical guideline
and performance criteria for the selection of a congestion control algorithm.
2.3 Related studies on mobile cloud clusters
Many researchers have identi ed key attributes, technologies, and challenges that dis-
tinguish cloud computing from traditional computing paradigms [26, 27, 28, 29, 30]. To put
it brie y, cloud computing provides reliable, customizable and dynamic computing environ-
ments with Quality of Service (QoS) guarantee for end-users [31]. Also, many studies have
been interested in mobile cloud services on the Internet as summarized in [7] and [8].
This work pays particular attention to the performance of mobile ad hoc cloud, where
ad hoc networks of mobile devices themselves work as resource providers of the cloud. In
19
this type of cloud, the workload and data reside on individual mobile devices rather than on
remote resource servers.
2.3.1 Implementation of mobile ad hoc cloud
Hyrax [32, 33] explores the feasibility of using a cluster of mobile phones as resource
providers by porting Apache Hadoop to Android smartphones. For a sample application,
they present a simple distributed mobile multimedia search and sharing program. However,
their performance evaluations for the Hadoop mobile cluster are limited since they completed
only a partial implementation of the Hadoop architecture, where many core features had to
be removed due to di culties and obstacles in Hadoop migration. Even the major controllers
of Hadoop framework, such as JobTracker for MapReduce and NameNode for HDFS, are not
installed on the mobile node. A similar approach to implementing the Hadoop framework
on mobile devices is found in [34].
Serendipity [35, 36] discusses the challenges of remote computing using mobile devices
and introduces a framework that enables a mobile computation initiator to use remote com-
putational resources available on mobile devices. They implement an actual prototype on
Android devices and evaluate their system using two sample applications: a face detection
application and a speech-to-text application. However, no performance comparison with
the existing distributed frameworks is made. Another study, Cuckoo [37], proposes a com-
putation o oading framework for Android smartphones and illustrates its utility with an
application for multimedia content analysis.
In short, several studies on the ad hoc cloud framework for mobile devices have been
conducted by implementing only part of an existing distributed analytic framework or by
proposing a customized framework similar to the existing one [38]. Furthermore, the previous
studies are mostly evaluated using just one or two domain-speci c applications and fail
to provide comparative analysis of their performance and e ciency. To the best of our
20
knowledge, there has been no comparable framework and performance analysis for practical
mobile cloud clusters running distributed analytic applications.
Although this work mostly focuses on the performance of practical distributed analytics
on the mobile cloud clusters in terms of job processing time and response time, other work
concentrates on mobile device?s energy e ciency which is a key aspect to enable data analysis
and mining over mobile devices. For example, an energy-aware scheduling over the mobile
cluster to optimize energy utilization should be taken into account for mobile distributed
analytics.
2.3.2 Network problems of traditional cloud
This work also focuses on reliable data communications between mobile devices for
analytical data transfers in the work ow of distributed analytics under the limitations of
TCP performance over wireless links. To  nd the best way to control data  ows on mobile
devices for improving performance of mobile cloud clusters, it is necessary to review previous
solutions for the typical datacenter cloud.
In datacenter networks, there have been many proposals to solve typical network prob-
lems in many-to-one communication patterns, known as TCP Incast, where the tra c bursts
overload the switch bu ers, which lead to a signi cant increase in queueing delay and decrease
in TCP throughput. The problems a ect the performance of cloud computing frameworks
in which distributed processing cannot progress until all parallel threads in a stage complete
[39]. Examples of such frameworks include distributed  lesystems, web search, and other
applications with partition or aggregation work ows [40, 41, 42].
The traditional solutions to TCP Incast include modifying network parameters [41,
43] or TCP congestion/ ow control algorithms in Link and Transport network layer [44,
45], which may involve customized network designs for the e cient switching [40, 42], and
optimizing application data transfer patterns for mitigating the TCP congestion [46, 42].
Since TCP Incast is originally incurred in the switch-based network topology using TCP,
21
however, most of the solutions may be inapplicable directly to the mobile network issues
that arise in the wireless environments with di erent characteristics.
2.3.3 MapReduce performance prediction using simulation
Simulation has been widely used for performance prediction and characterization, which
can simplify research process by skipping intricate processes of experimental setup and con-
 guration and signi cantly reduce the experiment time. Since MapReduce has been adopted
as a preferred choice of framework for data intensive computing, there have been many
e orts toward developing MapReduce simulators over the past few years to address the per-
formance analysis of scheduling algorithms. They provide several MapReduce simulators
such as Mumak [47], MRSim [48], and MRPerf [49].
Mumak uses job trace of real world workload as input to estimate job execution time,
throughput, etc. MRSim is another simulator based on discrete event simulation, which can
predict job performance as well as hardware utilization. The other simulator is MRPerf that
uses information about node speci cation, cluster topology, data layout, and job description
as inputs and generates a detailed phase-level execution trace that provides job execution
time, amount of data transferred, and time-line of each phase of the job execution [19].
MRPerf is uniquely based on the popular ns-2 network simulator [50], which models
task communications over the networks with varying cluster con gurations and di erent
network topologies to simulate correct network behavior of the real system. Although this
work primarily performs actual experiments using Hadoop clusters consisting of practical
mobile devices to identify performance issues of mobile cloud clusters, it also develops a
MapReduce simulator based on an existing simulator (MRPerf on ns-2) to address dynamic
node mobility under various wireless environments and capture more details of performance
aspects.
22
Chapter 3
Understanding performance issues of Hadoop mobile clusters
MapReduce is the fundamental software programming model in the Hadoop architec-
ture, which performs distributed processing of large data sets on a computing cluster. A
single large workload (job) is divided or mapped into smaller sub-workloads (tasks) to be
processed in parallel. The results from the sub-workloads are merged, condensed, and re-
duced to produce the  nal result. Both input and output are stored on the nodes throughout
the cluster in the distributed  lesystem.
Numerous factors can a ect the performance of the Hadoop cluster. The typical perfor-
mance factors such as workload type, cluster size, input/output data size, and characteristics
of computing nodes (e.g., CPU, memory, and I/O resources) have signi cant impacts on the
processing time. In addition, the network is also a critical factor on the Hadoop perfor-
mance since the nodes are interconnected through the network in order to transfer data for
distributed processing during one or more phases of MapReduce execution.
This chapter examines the performance of MapReduce in practical mobile cluster se-
tups using Hadoop benchmarks and identi es its critical performance issues in the mobile
operating environments.
3.1 Hadoop benchmarks for performance evaluation
A benchmark provides a method of comparing the performance of various subsystems
across di erent system architectures and mimics a particular type of workload on a com-
ponent or system. In order to perform extensive performance analysis of Hadoop mobile
clusters, this work chooses workloads from a benchmark suite provided by [51]. It contains
23
typical Hadoop workloads with options for input/output con gurations (e.g., data size, com-
pression method, etc). Table 3.1 lists its benchmark workloads with their characteristics.
All the workloads are implemented with Hadoop MapReduce framework and are capable of
performing a variety of data intensive computations such as sorting, I/O operations, web
search and machine learning.
The Sort, WordCount and TeraSort benchmark for micro-benchmarking are currently
implemented in Hadoop software releases, which are three most popular benchmarks widely
used by Hadoop developers and engineers. The Sort and WordCount are representative of
a large group of real world MapReduce applications that extract the interesting information
from large input data. The Sort benchmark simply uses the MapReduce framework to sort
the input directory into the output directory, where the inputs and outputs must be sequence
 les with keys and values. The WordCount benchmark reads text  les and counts how often
words occur. Each Map task takes a line as input, breaks it into words and then generates a
key-value pair of <word;1 >. The Reduce task sums the counts for each word and generates
a single key-value. The TeraSort benchmark sorts data at terabyte scale, which has been
used in many cluster competitions among distributed computing platforms to show their
performance and e ciency.
The TestDFSIO benchmark also included in the Hadoop releases is another microbench-
mark for I/O performance, which performs parallel  le read and write operations in separate
Map tasks. The output of the Map task is used for collecting statistics relating to the  le
just processed. The statistics are accumulated in a Reduce task to generate a summary on
average I/O throughput of the distributed  le system (i.e., HDFS).
The Nutch Indexing and PageRank benchmark represent the web search applications,
which are open-source applications. The Nutch Indexing benchmark that comes from Apache
Nutch project crawls web links from root URLs and converts the link information into
inverted index  les with MapReduce tasks. The PageRank benchmark is composed of a
24
Table
3.1:
Hado
op
benc
hmarks
and
their
characteristics
Category
W
orkload
Resource
uti
lization
Data
 o
w
patterns
Microb
enc
hmarks
Sort
I/O
bound
data   
 !
Map
data   
 !
Reduce
data   
 !
W
ordCoun
t
CPU
bound
data   
 !
Map
data   
 !
Reduce
data
  
  
 !
TeraSort
Map:
CPU
bound,
Reduce:
I/O
bound
data   
 !
Map
data   
 !
Reduce
data   
 !
TestDFSIO
I/O
bound
Map
data
  
  
 !
Reduce
ApplicationsW
eb
searc
h
Nutc
h
indexing
Map:
CPU
and
I/O
bound,
Re-
duce:
I/O
bound
data   
 !
Map
data   
 !
Reduce
data
  
  
 !
Page
rank
CPU
bound
data   
 !
Map
data
  
  
 !
Reduce
data
  
  
 !
Mac
hine
learn
ing
K-mean
clustering
I/O
bound,
data   
 !
Map
data
  
  
 !
Reduce
data
  
  
 !
1st
Map:
CPU
bound
data   
 !
Map
data   
 !
Ba
yesian
classi cation
CPU
bound
in
ite
ration,
data
  
  
 !
Map
data   
 !
Reduce
data   
 !
I/O
bound
in
clusterin
g
data   
 !
Map
data   
 !
Reduce
data
  
  
 !
Note:
Fon
tsize
denotes
the
data
size
of
the
input,
intermediate,
and
output;
data   
 !
>>
data   
 !
>>
data
  
  
 !
25
chain of Hadoop jobs calculating the rank of each web page according to the number of
reference links.
The K-means Clustering and Bayesian Classi cation are also introduced as the machine
learning applications. The K-means clustering benchmark takes in a numerical vector in n
dimensional space representing the features of the objects to be clustered. The algorithm
that randomly chooses k points in the vector space serving as the initial centers of the
clusters recalculates the center of each cluster iteratively with MapReduce jobs until the
points are not reselected or the maximum number of iterations is reached. The Bayesian
Classi cation benchmark involves four chained MapReduce jobs that extract labels from
input text, compute the Term Frequency-Inverse Document Frequency (TFIDF) of each
feature in each label, and then perform the weighting and normalization.
3.2 Assumptions on mobile cloud clusters
This work performs actual experiments and runs simulations based on the following
assumptions on the basic, common con gurations of practical mobile cloud clusters for ad
hoc analytics. However, future work will consider extensive scenarios that include dynamic
node mobility and complicated analytical workloads over various mobile environments.
 Mobile devices may process computational workload that exceeds their capability by
o oading portions of the workload to remote resources for distributed execution. All
mobile devices are capable of sharing their computing resources, and behave in a col-
laborative and trustworthy manner.
 Clustering with nearby mobile devices to build a mobile ad hoc cloud provides faster
connectivity and better availability because the actual connectivity with typical remote
cloud infrastructures may be intermittent and unpredictable due to the mobility of
mobile devices.
26
 Mobile nodes belonging to a cluster only communicate with adjacent nodes within their
communication range in a wireless single-hop network. The data transmission of the
multiple nodes might interfere with each other due to the overlapping communication
ranges on a shared channel.
 The mobile cluster runs a single workload at a time, either transforming the unstruc-
tured input data to a more useful structure without adding new data, or extracting
small but valuable analytics from the input data. The amount of intermediate and
output data generated by mobile devices depends on the type of workload.
3.3 Performance experiments of Hadoop mobile clusters
This section describes details of the practical experimental setup for Hadoop mobile
clusters and presents the experimental results for performance analysis.
3.3.1 Experimental setup
In the experiments this work measured the performance of Hadoop clusters using Android-
based mobile platforms including smartphones (e.g., Samsung Galaxy S2 and Google Galaxy
NEXUS), media players (e.g., Samsung Galaxy player), and tablets (e.g., Samsung Galaxy
Tab and Google NEXUS 7) under extensive distributed con gurations. This chapter presents
experimental results from one of those cluster setups, which consists of eleven NEXUS 7
tablets developed by Google in conjunction with Asus. Figure 3.1 displays the experimental
mobile cluster with Google NEXUS 7 tablets.
The experimental platform, NEXUS 7, is the  rst tablet in the Google Nexus series
that implements the Android operating system. The Nexus 7 features a 7-inch display,
NVIDIA Tegra 3 quad-core processor, 1 GB of memory, and 16 GB of internal storage, and
incorporates built-in Wi-Fi, Bluetooth, and NFC connectivity [52]. The tablet runs the
latest Android operating system (version 4.2.2, nicknamed Jelly Bean) and Hadoop stable
release (version 1.1.2) with Oracle JDK (version 1.6) at the time of writing this paper. The
27
Figure 3.1: Experimental mobile cluster using Google NEXUS 7
detailed speci cations of experimental platforms are listed in Table 3.2. All platforms are
reliably interconnected with a Wi-Fi based wireless access point, Asus RT-N66U, in an IEEE
802.11n [53] infrastructure mode.
Android is a mobile operating system designed for smartphones and tablets, which makes
use of a virtual machine on a customized embedded Linux system as its runtime environment
to run mobile applications. The virtual machine provides an isolated environment for code
execution, where an application with a malicious piece of code cannot directly a ect the
system (i.e., the core OS will be kept from getting corrupted). Thus, it makes the system
more stable and reliable. In addition, it allows for cross-compatibility as its applications can
be executed on any mobile platform using the virtual machine. The android applications
are usually written in Java language and are executed in the Dalvik virtual machine (DVM)
that is substantially di erent from the classical Java virtual machine (JVM) [54]. The DVM
is developed by Google and optimized for the characteristics of mobile operating systems
(especially for the Android platform). The di erences between Java and Dalvik bytecode
based on JVM and DVM, respectively, are summarized in Table 3.3.
28
Table 3.2: Hardware and software speci cations of experimental nodes
Platform Google NEXUS 7
CPU NVIDIA Tegra 3 quad-core processor (1.7 GHz single / 1.6 GHz quad)
Memory 1 GB, RAM
Storage 16 GB, Nand  ash
Network Wi-Fi 802.11 b/g/n, Bluetooth, NFC
Mobile OS Android 4.2, Jelly Bean (Build number: JDQ39)
Kernel Linux 3.1.10
Linux extension Ubuntu 12.04 for ARM
JVM JDK 1.6.0 32 (Oracle Java SE for Embedded 6u32 ARMv7 Linux)
Hadoop 1.1.2 stable release
Resource monitoring Sysstat 10.0.3-1 stable version
Porting Hadoop framework on the Android operating system was a big and signi cant
challenge at the early stage of this work. Since Android employs the Dalvik virtual machine
to support its mobile applications, Hadoop software framework based on a speci c Java
virtual machine is not fully compatible with the Android runtime environment. Hence,
Hadoop can be ported by either converting from its JVM based source codes and libraries
to DVM compatible ones or installing a speci c JVM recommended by the Hadoop project
to run the original Hadoop software.
Most of the previous work [32, 33, 34] had di culties with rewriting Hadoop codes for
Android. They implemented only a small number of Hadoop functions by removing many
core features that are incompatible with the Dalvik environment. In contrast to earlier
approaches, this work successfully installed Oracle JVM that is a base platform for perfectly
running Hadoop framework on Linux-based operating systems by adding a Linux extension
[55], Ubuntu 12.04, to the Android Linux kernel as illustrated in Figure 3.2; this kind of
approach is called Mobile virtualization, in which virtualization technology enables multiple
operating systems or virtual machines to run simultaneously on a mobile device [56]. It was
carefully ensured that there was no degradation of the hardware performance or adverse e ect
on Android operations. As a result, the experimental mobile cluster runs all the existing
29
Table
3.3:
Comparison
of
Ja
va
and
Dalvik
byteco
de
Ja
va
byteco
de
executed
in
JVM
Dalvik
byteco
de
running
in
DVM
Application
St
ructure
Consists
of
one
or
more
.class
 les,
one
 le
per
class.
Has
asignle
.dex
 le
con
tain
ging
all
classes.
Register
arc
hit
ect
ur
e
Stac
k-based.
Push
local
var
ialbles
on
to
a
pro-
gram
stac
kfor
man
uipulation.
Register-based.
Assigns
local
variables
to
an
yof
the
216
av
ailable
registers
and
directly
man
ipu-
late
registers.
Instruction
set
Has
200
op
co
des.
Ha
ve
adozen
of
op
co
des
dedi-
cated
to
mo
ving
eleme
nts
bet
ween
the
stac
kand
local
variab
le
tables.
Has
218
op
co
des.
Has
longer
instructions,
since
most
of
them
con
tain
source
an
ddestination
ad-
dress
of
registers.
Constan
tp
ool
structure
Replicates
elemen
ts
in
constan
tp
ool
swithin
the
multiple
.class
 les,
e.g.,
ref
errer
an
d
ref
ere
nt
metho
dnames.
Uses
asingle
po
ol
that
all
classes
sim
ultaneously
reference
and
eliminates
some
constan
ts
by
in-
lining
their
values
directly
into
the
byteco
de.
Am
biguous
prim
itiv
e
typ
es
Variable
assignmen
ts
distinguish
bet
ween
in-
teger
(int
)
and
single-
prec
ision
 oating-p
oin
t
(float
)
constan
ts
and
bet
ween
long
inte-
ger
(long
)and
double-precision
 oating
poin
t
(double
)constan
ts.
Variable
assignmen
ts
(int
/float
and
long
/double
)
use
th
e
same
op
co
des
for
integers
and
 oats,
e.g.,
the
op
co
des
are
un
typ
ed
bey
ond
sp
ecif
ying
precision.
Null
references
Has
anull
typ
e.
Do
es
not
sp
ecify
anull
typ
e,
ins
tead
opting
to
use
azero
value
constan
t.
Ob
ject
references
Uses
typ
ed
op
co
des
for
the
comparison
of
ob
ject
references
(if
acmpeq
and
if
acmpne
)and
for
null
comparison
of
ob
jec
treferences
(ifnull
and
ifnonnull
).
Uses
a
more
simplistic
integer
com
parison
be-
tw
een
tw
oin
tegers,
and
acomparison
of
an
in-
teger
and
zero,
resp
ectiv
ely
.
Storage
of
primitiv
e
typ
es
in
arra
ys
Uses
de ned,
unam
biguous
op
co
des.
Uses
am
biguous
op
co
des
to
store
and
re-
triev
e
elemen
ts
in
arra
ys
of
primitiv
e
typ
es
(e.g.,
aget
for
int
/float
and
aget-wide
for
long
/double
).
30
Figure 3.2: Mobile virtualization for MapReduce implementation
and emerging features of the Hadoop architecture, including MapReduce 2.0, also known as
YARN [16].
The experimental mobile cluster that runs the Hadoop software is composed of a single
master node and ten slave nodes which are con gured with the default values for system
parameters of the Android OS and Hadoop framework. The master node coordinates the
slave nodes to get the workload done and the slaves run the sub-workloads, Map and Reduce
tasks, assigned by the master node. The usage of computing and networking resources on
each node is carefully monitored with a performance monitoring tool, Sysstat. To investigate
node?s behavior in the Hadoop work ow, two typical workloads { WordCount and TeraSort
{ are tested with associated Hadoop benchmark tools on the mobile cluster.
 WordCount: this workload counts the occurrence of each word in the input data sets
generated by the Hadoop RandomTextWriter tool. It represents workload that extracts
small but valuable analytics from the input data.
31
Table 3.4: Hadoop con gurations of the experimental mobile cluster
Con guration metrics Settings / values
Number of slave nodes 1 to 10
Available Map slots 2 per node
Available Reduce slots 2 per node
Workloads WordCount, TeraSort (and TestDFSIO for I/O test)
Data block size 1, 2, 4, 8, 16, 32, 64 MB
Input data size 1 to 10 GB
DFS replication 1
Note: MapReduce slots de ne the maximum number of Map and Reduce tasks that can run in parallel
on a cluster node. The number of slots on each cluster node may be di erent and it depends on the
node?s hardware capacity.
 TeraSort: this workload sorts the input data sets generated by the Hadoop TeraGen
tool in a prede ned order. It represents workload that transforms unstructured source
data to a more useful structure or format without adding new data.
The input and output data usually need to be replicated to a small number of physically
separate nodes (typically three) to insure against data block corruption and hardware failure.
However, this work disables the replication of both input and output data in the experiments
to concentrate on core behaviors of the MapReduce work ow. The details of the con guration
metrics are listed in Table 3.4.
3.3.2 I/O performance of mobile nodes
Before analyzing the performance of Hadoop framework on the mobile cloud cluster, the
experimental cluster nodes are tested to examine their robustness, availability, and error han-
dling under a heavy load and to investigate which resource (e.g., CPU, memory,  lesystem,
network, etc.) may a ect the performance of MapReduce applications. In the stress testing,
remarkable performance characteristics of the  lesystem and network I/O are observed, in
which the Hadoop TestDFSIO benchmark that tests the I/O performance of the distributed
 lesystem by sampling the number of bytes read/written at  xed time intervals is utilized
32
0 50 100 150 200 250 300
Data rate (MB/s)
2.13   Wireless TCP thoughput of mobile nodes
2.32   Wireless UDP thoughput of mobile nodes
119.21   Gigabit Ethernet bandwidth
27.12   Filesystem read rate of mobile nodes
9.36   Filesystem write rate of mobile nodes
SATA II read / write speed    285 / 275
Figure 3.3: Network and  lesystem throughput of mobile nodes
to measure the actual HDFS read/write speed (in a single-node cluster setup) and the Iperf
network performance measurement tool that generates constant TCP or UDP tra c  ows
is employed to compute the actual network throughput (between two cluster nodes).
Figure 3.3 displays throughput measurements of the  lesystem and network in the load
tests. The result shows that the network and  lesystem I/O of mobile devices are far slower
than those of commodity servers with one or two wired Gigabit connections and SATA
II internal storages; 10-Gigabit Ethernet and SATA III storage are already common. In
particular, the available network throughput of mobile nodes is too much lower than that of
typical Hadoop clusters presented in [18]. The network speed is mush slower than the data
transfer rates of internal storage as well. Since the actual e ect of the network bandwidth on
Hadoop performance is relatively small in conventional Hadoop setups with high-speed wired
connectivity, not much attention has been paid to Hadoop operations under the network
bandwidth constraint that is critical for reliable data transfers.
33
Consequently, the performance of mobile cloud clusters may be strongly in uenced by
the characteristics of wireless links in the operating environments. Although computing
capabilities of cluster nodes are a signi cant performance factor, each node also needs the
capability to read and write large amounts of data to and from the distributed  lesystem
that is implemented on remote nodes. In wireless networks with relatively low network
bandwidth, time required to transfer data blocks can signi cantly contribute to the total
processing time even though the distributed computing power generally decreases the amount
of time required for job completion.
3.3.3 Performance of WordCount workload
The WordCount workload that counts the occurrence of each word in the input data sets
produces small  nal output. The Map phase is generally computation intensive, compared
to other phases. Network utilization is low in the Shu e phase, in which the Map tasks
transfer their output (i.e., intermediate results) to the Reduce task as input, because the
Map output is a small subset of the large input data set in this kind of workload.
Figure 3.4 shows the network utilization with MapReduce task progress of the Word-
Count workload that starts with 1 GB input data. In the workload, 20 Map tasks corre-
sponding to the 1 GB input size are equally distributed over 10 slave nodes. One node is
chosen to run the single Reduce task that produces the  nal output. Figure 3.5 displays
resource utilization on two typical slave nodes; the Map node runs only two of 20 Map tasks
and the Reduce node runs both the Map tasks and the additional Reduce task.
Figure 3.4 contains an aggregate data tra c pattern receiving from all nodes running
Map tasks, which is denoted by the solid line and a single data  ow transmitted by a typical
Map node, denoted by the dash line. The graph shows two bursts of received tra c since
each node  nishes two assigned Map tasks one at a time and transmits the intermediate
result at the same time to the single node running the Reduce task.
34
0 50 100 150 200 2500
20
40
60
80
100
Progress (%)
1 GB WordCount task progress
 
 
Map
Reduce
0 50 100 150 200 2500
0.5
1
1.5
2
2.5
3
Run time (s)
Data rate (MB/s)
Data flows on Map and Reduce node
 
 
Map transmit
Reduce receive
Figure 3.4: Network utilization of Hadoop mobile cluster with WordCount workload
Although Hadoop has the ability to process multiple tasks simultaneously within re-
source bounds, the experimental nodes run tasks sequentially due to lack of memory (see
Figure 3.5). This explains the separated bursts of tra c and corresponding delays in the
Map and Reduce progress. The network bandwidth is saturated during each burst, but it
only lasts for a short period of time since the output of the Map tasks is very small.
3.3.4 Performance of TeraSort workload
The TeraSort workload that sorts input data sets generates a large amount of interme-
diate data in the Map phase, which needs to be transmitted to the Reduce task over the
network to produce the  nal output. Both Map and Reduce phase are commonly compu-
tation and I/O intensive. Network utilization is very high in the Shu e phase because the
output of Map tasks has the same size as the input data sets in this workload.
35
0
50
100
150
200
250
020406080100
Run time (s)
CPU usage (%)
CPU
0
50
100
150
200
250
020406080100
Run time (s)
Memory usage (%)
Memory
0
50
100
150
200
250
051015202530
Run time (s)
I/O rate (MB/s)
Storage
 
 
read
write
(a)
Map
no
de
0
50
100
150
200
250
020406080100
Run time (s)
CPU usage (%)
CPU
0
50
100
150
200
250
020406080100
Run time (s)
Memory usage (%)
Memory
0
50
100
150
200
250
051015202530
Run time (s)
I/O rate (MB/s)
Storage
 
 
read
write
(b)
Reduce
no
de
Figure
3.5:
Resource
utilizatio
nof
MapReduce
no
des
running
W
ordC
oun
ttasks
36
0 100 200 300 400 500 600 700 800 9000
20
40
60
80
100
Progress (%)
1 GB TeraSort task progress
 
 
Map
Reduce
0 100 200 300 400 500 600 700 800 9000
0.5
1
1.5
2
2.5
3
Run time (s)
Data rate (MB/s)
Data flows on Map and Reduce node
 
 
Map transmit
Reduce receive
Figure 3.6: Network utilization of Hadoop mobile cluster with TeraSort workload
Figure 3.6 shows the network utilization with MapReduce task progress of the TeraSort
workload initialized with 1 GB input data. The con guration is identical to the WordCount
workload; 20 Map tasks are equally distributed over 10 slave nodes and one node runs the
single Reduce task. The resource utilization of two di erent slave nodes is detailed in Figure
3.7 in the same way as the WordCount performance analysis.
Figure 3.6 illustrates a large volume of aggregate tra c made up of data  ows transmit-
ted at the same time by multiple nodes because the entire input data needs to be shu ed
to the single node running the Reduce task. The network bandwidth is saturated while the
output of all Map tasks is being transferred. This tra c pattern increases the possibility
of packet loss, resulting in throughput reduction and  uctuating performance; a signi cant
number of TCP packets are dropped during the shu e phase. Consequently, the Map tasks
 nish relatively quickly but the Reduce task makes slow progress since it spends a great deal
37
0
150
300
450
600
750
900
020406080100
Run time (s)
CPU usage (%)
CPU
0
150
300
450
600
750
900
020406080100
Run time (s)
Memory usage (%)
Memory
0
150
300
450
600
750
900
051015202530
Run time (s)
I/O rate (MB/s)
Storage
 
 
read
write
(a)
Map
no
de
0
150
300
450
600
750
900
020406080100
Run time (s)
CPU usage (%)
CPU
0
150
300
450
600
750
900
020406080100
Run time (s)
Memory usage (%)
Memory
0
150
300
450
600
750
900
051015202530
Run time (s)
I/O rate (MB/s)
Storage
 
 
read
write
(b)
Reduce
no
de
Figure
3.7:
Resource
utilizati
on
of
MapReduce
no
des
running
Te
raSort
tasks
38
1 2 3 4 5 6 7 8 9 100
200
400
600
800
1000
1200
1400
1600
1800
Number of nodes running tasks
Job completion time (s)
 
 
1 GB WordCount
1 GB TeraSort
Figure 3.8: Cluster size scaling of WordCount and TeraSort experiments
of time in receiving the large input data (i.e., the output of Map tasks) and processing the
entire data sets.
3.3.5 Performance of scale testing
This section examines the e ects of scaling up the cluster size, data block size, and input
data size that represents the variability in con guring the mobile ad hoc cloud.
First, an optimally con gured cluster generally has the ability to improve performance
by scaling up the cluster size. Figure 3.8 shows the results from the experiments which
are intended to verify how the cluster size a ects performance of the mobile distributed
framework. The job completion time of two typical workloads, WordCount and TeraSort,
with 1 GB input data is measured as the number of slave nodes participating in the cluster
gradually increases.
39
0 8 16 24 32 40 48 56 640
250
500
750
1000
1250
1500
Job completion time (s)
Data block size (MB)
 
 
0
500
1000
1500
2000
2500
3000
Number of tasks (blocks)
1 GB 10 node TeraSort
Map tasks (data blocks)
Figure 3.9: Data block size scaling of TeraSort experiments
As indicated in Figure 3.8, increasing the number of nodes considerably decreases the
job completion time of the WordCount workload. On the other hand, in the cluster scaling
with the TeraSort workload, the increase in cluster size does not lead to a signi cant decrease
in job completion time because the performance of the mobile cluster is bounded by the time
taken by the entire input data to be shu ed under the limited network bandwidth that is
also highly variable.
Second, the unit of input for a Map task is a data block of the input  le. A single
large input  le is split into many blocks which are distributed over the nodes in the Hadoop
cluster. The size of a data block stored in Hadoop  lesystem is large { 64 MB by default,
compared to a block size in traditional  lesystems { normally 512 bytes. By making a block
large enough, the data transfer time from the disk becomes signi cantly larger compared to
the time required to seek the start of the block. Thus, the transfer operation of a large  le
made of multiple blocks becomes faster by minimizing the seek time [16].
40
0 1 2 3 4 5 6 7 8 9 100
200
400
600
800
1000
1200
1400
1600
1800
Input data size (GB)
Job completion time (s)
 
 
10 node WordCount
Figure 3.10: Input size scaling of WordCount experiments
What is the e ect of the data block size in wireless con gurations where one or more
phases of MapReduce transfer a considerable number of data blocks over wireless links with
low throughput? The previous study, Hyrax [32], suggested the use of a small block size
in consideration of the lengthy transfer time and delay of the large block in the wireless
network. However, they did not provide any comparative measurements to validate their
suggested value. To determine an appropriate data block size for the Hadoop mobile cluster,
the job completion time of the I/O intensive TeraSort workload with 1 GB input data is
measured as the data block size gradually increases.
Contrary to expectations, Figure 3.9 displays performance degradation in small data
block sizes. A Map task handles a data block of input at a time. If the data block is very
small (i.e., there are a large number of data blocks), more Map tasks are required to process
each data block as also shown in the  gure. This imposes an ine cient data access pattern
by causing frequent seeks to retrieve each small block. Furthermore, resources are scarce for
41
an excessive number of Map tasks. Hence, con guration parameters for the mobile cluster
should be carefully determined by taking into account various other performance aspects.
Finally, Figure 3.10 demonstrates the impact of input data size on the job completion
time of the WordCount workload as the size of the input data increases. The larger the
input data, the longer it takes to process the workload and produce the output result.
Meanwhile, a problem has been encountered when plotting the same measurements from the
TeraSort workload because its performance is extremely variable and unreliable due to an
increasing number of task failures (caused by task response timeouts and intermittent node
disconnections) and re-runs. This work identi es the cause of the failures in the following
chapters.
3.4 Performance simulations of Hadoop mobile clusters
Comprehensively evaluating the performance in extensive operating setups is important
to understand the performance issues of mobile cloud clusters and the overall e ciency of
the distributed system. However, it is increasingly harder to evaluate and repeat every
possible con guration as the scale of the cluster size become larger before gaining insight
into the system performance. For example, to make cluster nodes move coordinating a large
number of nodes and to observe performance constrains in various wireless communication
conditions are necessary, but performing the actual experiments involves both time and
cost. For a similar reason, many researchers have proposed simulation-based solutions to
evaluate and optimize MapReduce systems by saving cost and time. This work also shares
the concern of a need for simulation approaches and develops a MapReduce simulator for
mobile cloud clusters based on one of existing simulators, MRPerf. Using the simulator,
extensive MapReduce simulations are conducted to identify more performance issues under
di erent MapReduce scale, node mobility, and wireless channel condition.
42
Figure 3.11: Architecture of MRperf simulator
3.4.1 MRPerf simulator for MapReduce
MRPerf [49] is one of the earlier simulator tools for the MapReduce data processing
framework, which provides a  ne-grained simulation to capture various performance aspects
of MapReduce execution phases and predict application performance. Since MRPerf is based
on the popular ns-2 network simulator [50], It models inter- and intra-rack task communi-
cations over the networks to simulate correct network behavior. Some of the important
motivations behind MRPerf were to examine the performance of MapReduce applications
under varying cluster con gurations, di erent network topologies, di erent data placement
algorithms and di erent task schedulers by assuming that a node?s resources are equally
shared among tasks assigned concurrently to the node and the simulator does not model
OS-level asynchronous prefetching.
Figure 3.11 shows the high-level architecture of MRPerf. The input con gurations for
initializing the simulator are de ned in a set of  les, and processed by di erent processing
modules. The ns-2 driver module provides the interface for network simulation. Similarly,
43
Figure 3.12: Screenshot of MapReduce simulation
the simple disk module that models the storage I/O can be extended to interface with other
disk simulators. All the modules are driven by the MapReduce Heuristics module (MRH)
that simulates Hadoop?s behavior. To perform a simulation, MRPerf  rst reads all the
con guration parameters and instantiates the required number of simulated nodes arranged
in the speci ed topology. The MRH then schedules tasks to the nodes based on the speci ed
scheduling algorithm. This enables each node to run its assigned tasks, which further creates
network tra c (modeled through ns-2) as nodes interact with each other. Thus, a simulated
MapReduce workload is created.
3.4.2 Implementation and validation of MapReduce simulator
This work selects MRPerf as a base simulator to implement MapReduce simulation for
mobile cloud clusters since it is open source and supports various cluster con gurations and
network topologies by utilizing the network simulator. MRPerf was originally developed to
investigate the impact of network topologies on the performance of Hadoop clusters that
44
1 2 3 4 5 6 7 8 9 100
200
400
600
800
1000
1200
1400
1600
1800
Number of nodes running tasks
Job completion time (s)
 
 
simulation
experiment
Figure 3.13: Validation of MapReduce simulation with TeraSort workload
consist of commodity servers. All the network topologies (e.g., Star, Double rack, Tree and
DCell) represent switch-based wired network setups and all the cluster con gurations are
based on node characteristics (e.g., CPU, memory, disk, etc.) from server platforms. Thus,
the mobile cluster?s characteristics and behaviors under wireless ad hoc con gurations need
to be integrated into the MRPerf framework.
Implementing a mobile cluster on MRPerf involves a lot of tasks. First, a careful analysis
of MRPerf architecture and source codes was conducted. Second, cluster con guration pa-
rameters for the mobile clusters, such as hardware capacity and Hadoop/MapReduce setup,
were attentively tuned. Finally, additional features and exception handlings for node mobil-
ity were implemented. Figure3.12 displays a sample screenshot of the MapReduce simulator
for mobile cloud clusters, where 20 mobile nodes are moving in a simulation area by running
a MapReduce application; black circle denotes a master node, blue and green circle denotes
a slave node with and without running tasks, and red circle denotes a malfunctioning node.
45
Table 3.5: Simulation con gurations of the Hadoop mobile cluster
Con guration metrics Settings / values
Simulation area (meter meter) 50 50, 100 100, 200 200, 300 300, 400 400, 500 500
Radio propagation model FreeSpace, Shadowing, TwoRayGround
Node mobility No mobility (static nodes), Random Waypoint
Number of slave nodes 1 to 40
Available Map slots 2 per node
Available Reduce slots 2 per node
Workloads TeraSort
Data block size 64 MB
Input data size 1 to 10 GB
DFS replication 1
For the validation tests, this work compares data collected from real cluster con gu-
rations running 1 GB TeraSort workload with data observed in the MapReduce simulation
runs with the same workload. Each cluster node has wireless single-hop connections and
is stationary during the tests, which allows the validation tests to consistently evaluate the
performance of the cluster under reliable connectivity. The job completion time is measured
as the number of cluster nodes gradually increases. Figure 3.13 shows the results from the
actual experiments as well as simulation runs. As indicated in the  gure, the MapReduce
simulator is able to predict the job completion time within the margin of error. Through
a series of validation tests, this work  nds that it is capable of simulating the MapReduce
performance of mobile clusters fairly accurately.
3.4.3 Performance of scale testing
Using the MapReduce simulation, this section investigates the e ects of scaling up the
cluster size, input data size, and simulation area size to provide more insights into the mobile
ad hoc cloud. The details of the con guration metrics are listed in Table 3.5.
First, Figure 3.14 displays the result of simulations designed to verify how the cluster
size a ects the performance of I/O intensive MapReduce applications. The job completion
46
0 5 10 15 20 25 30 35 400.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8x 10
4
Number of nodes running tasks
Job completion time (s)
 
 
10 GB TeraSort
Figure 3.14: Cluster size scaling of TeraSort simulations
time of TeraSort workload with 10 GB input data is measured as the number of cluster nodes
increases up to 40. As indicated in  gure, increasing the number of nodes decreases the job
completion time of the TeraSort workload but it does not lead to a signi cant decrease in job
completion time since the performance is bounded by the time taken by the large amount of
intermediate data to be shu ed under the wireless communication links as discussed in the
earlier section.
Second, Figure 3.15 demonstrates the impact of input data size on the job completion
time of TeraSort workload in the 10 node cluster setup as the size of the input data increases
up to 10 GB. The result exactly shows that the execution time is proportional to the size of
input data. Note that the actual performance of TeraSort workload in large-scale input data
setups was unmeasurable because unexpected problems of system operation were frequently
encountered before getting a  nal result and the performance measures were highly variable
during the experiments using real mobile devices. Thus, the MapReduce simulator can be
47
0 1 2 3 4 5 6 7 8 9 100
1000
2000
3000
4000
5000
6000
7000
8000
9000
Input data size (GB)
Job completion time (s)
 
 
10 node TeraSort
Figure 3.15: Input size scaling of TeraSort simulations
a useful tool to predict the performance trend by avoiding temporarily unobservable faults
from other performance factors that are not relevant to the MapReduce execution.
Lastly, Figure 3.16 shows the e ect of the size of simulation area. In the simulation, the
mobile cluster with 10 nodes runs 1 GB TeraSort workload. The e ective communication
range of wireless cluster nodes is set to around 100 meters in the simulation considering the
actual radio coverage of mobile devices. As a result, cluster nodes are frequently discon-
nected in a large simulation space due to range-limited communication, which means that
the Hadoop cluster experiences numerous task timeout events as timely task monitoring is
unavailable, resulting in job execution failure. Interestingly, the simulation result shows that
it is possible to run MapReduce applications with node mobility by adjusting Hadoop con-
 guration parameters (e.g., the heartbeat interval and task timeout period), which enables
the intermittently connected mobile nodes to perform cooperative processing in the Hadoop
framework.
48
50^2 100^2 200^2 300^2 400^2 500^20
500
1000
1500
2000
2500
3000
3500
4000
4500
Simulation area (Sq. meters)
Job completion time (s)
 
 
No mobility
Random Waypoint
Figure 3.16: Simulation area scaling with node mobility
3.4.4 Performance over di erent radio propagations
Wireless network simulation is an important tool for evaluating wireless protocols and
wireless applications before they evolves towards real world implementation. Many studies
have used network simulators, such as ns-2, for the simulation of wireless networks. The ns-2
simulator supports di erent Physical (e.g., radio propagation) and MAC layer models (e.g.,
IEEE 802.11 standard), network routing protocols (e.g., AODV, DSR, DSDV and OLSR),
and their extensions (or modi cations). Since the MapReduce simulator exploits the ns-2
for simulating application?s performance over di erent network con gurations, it is possible
to apply various wireless network models and protocols for the MapReduce simulation.
The wireless networks are much more complicated to analyze than wired networks.
Their characteristics and conditions may change rapidly and randomly. There are signi cant
di erences between simple wireless paths with line of sight (LOS) and those with obstacles
(e.g., vehicles and buildings) between the sender and receiver as well. To investigate the
49
impact of various wireless channel conditions, this work compares the performance of the
Hadoop mobile cluster with di erent propagation models.
The network simulations generally consider two channel models: large-scale and small-
scale propagation models. The large scale propagation models predict the mean signal
strength for an arbitrary transmitter-receiver separation distance to estimate the radio cov-
erage area of the transmitter. On the other hand, the small scale models characterize the
rapid  uctuations of the received signal strength over very short travel distances or short
time durations. Due to multipath propagation of radio waves, movements of the receiver can
have large e ects on the received signal strength.
This work tests three frequently used radio propagation models included in the ns-2
network simulator for the mobile cluster simulation. First, FreeSpace model assumes that
the received power is only dependent on the transmitted power, the antenna?s gains and
on the distance between the transmitter and the receiver, where a radio wave that moves
away from the sender has to cover a larger area so the received power decreases with the
square of the distance. Second, Shadowing model assumes that the average received signal
power decreases logarithmically with distance, where a Gaussian random variable is added
to this path loss to account for environmental in uences at the transmitter and the receiver.
Third, TwoRayGround model assumes that the received energy is the sum of the direct line
of sight path and the path including one re ection on the ground between the sender and
the receiver.
Figure 3.17 demonstrates the result of simulation runs (where 10 cluster nodes runs 1
GB TeraSort workload in two simulation area sizes) with di erent propagation models, which
indicates that the radio wave propagation has a strong impact on the job completion time.
Actual mobile devices are equipped with heterogeneous wireless adapters and antennas,
which means that mobile nodes have di erent transmit and receive capabilities and their
mobility induces complicated changes to the communication coverage. Thus, the received
signal strength of mobile devices may vary over the time periods of communication (i.e.,
50
FreeSpace Shadowing TwoRayGround0
1000
2000
3000
4000
5000
6000
7000
8000
Radio propagation model
Job completion time (s)
 
 
502 Sq. meters
1002 Sq. meters
Figure 3.17: MapReduce simulations with di erent radio propagation models
data transfers). The Shadowing model simulates this case and the simulation result shows
that variation on the job completion time increases in comparison to one with the FreeSpace
model. Furthermore, it is also observed that the multipath propagation of radio waves
simulated by the TwoRayGround model has a signi cant e ect on the performance of the
mobile cluster, especially with a larger simulation space (i.e., an increased communication
range), since the propagation pattern causes serious communication problems such as jitter
and ghosting.
3.5 Performance issues of Hadoop mobile clusters
In the performance analysis it is found that the overall computing power of the mobile
cluster is no longer signi cantly bounded by internal resource capabilities of each individual
node since mobile devices have been constantly enhancing their resources and processing
power. On the other hand, this work identi es distinct performance problems in MapReduce
51
analytics on the Hadoop mobile clusters, which come in the form of longer job completion
time or frequent task failure from task response timeout and node disconnection.
In distributed systems where a controller usually makes control decisions with limited
information from remote components, a timeout control provides a key mechanism through
which the controller can infer valuable information about unobservable states and events in
the system when direct feedback is either impossible or costly [57]. The timeout control is
con gured using a timer which expires after a timeout threshold. This de nes an expected
time by which a speci c event should occur. If no information arrives within this period, a
timeout event occurs and the controller triggers corresponding reactions. In fact, the timeout
control is an integral component for building up a reliable distributed system.
The Hadoop distributed system also adopts the timeout control for both job scheduling
and progress monitoring. A MapReduce job initiates long-lived batch tasks running on slave
nodes, which usually take a few minutes or hours. Because of the signi cant length of run-
time, it is important for the master node to get feedback on how the job is progressing in
a timely fashion. It enables the master to keep track of task status and restart failed or
slow tasks. If a slave (task) fails by crashing or running very slowly, for example, it stops
sending (or sends intermittently) current status and progress updates, called heartbeats, to
the master; the master then marks the slave (task) as failed after the timeout threshold
which is 10 minutes by default [16].
In the previous experiments and simulations, the frequent timeout occurrences (task
failures) with corresponding performance degradation while running the I/O intensive Tera-
Sort workload with large input data were observed in the Hadoop mobile clusters. The
problems can be summarised as follows:
First, the job execution time is sensitive to slow-running tasks as only one slow task
makes the time signi cantly longer. When a mobile node running Map tasks has signi cant
delays in transmitting a large amount of intermediate result to Reduce tasks through wireless
connections (i.e., tasks are running slower than expected due to the lengthy transfer time
52
of Shu e phase), the master launches another, equivalent tasks as a backup instead of
diagnosing and  xing the slow-running tasks. The slow-running (or hanging) tasks are
considered failed and automatically killed after the timeout period. The master also tries to
avoid rescheduling the tasks on the slave node where they have previously failed.
Second, depending on the size of the cluster, the master node has high resource require-
ments as it manages the cluster resources, schedules all user jobs, and holds block metadata
of the distributed  lesystem. On a busy cluster running a heavy workload, the master uses
more memory and CPU resources. Thus, the master node based on a mobile device is subject
to resource scarcity and bottlenecks in processing received data in a timely fashion. When
the master has not received an expected progress update from a slave node for the timeout
threshold, it arranges for all the Map tasks that were scheduled on the failed node, whether
completed or not, to be rerun since intermediate output residing on the node may not be
accessible to the Reduce task.
Consequently, theses failures and reactions lead to a signi cant increase in job execution
time. Therefore, it is critical to mitigate the e ect of the timeout occurrences in the Hadoop
mobile clusters where the chance of particular node failures and communication problems is
comparatively high.
53
Chapter 4
Problem statements and research questions
Most of the current distributed systems including Hadoop employ Transmission Control
Protocol (TCP) for reliable communications between cluster nodes. The performance of
mobile distributed processing largely relies on how e ectively each mobile device exploits the
available network resources through TCP connections as outlined earlier. Despite advances
in mobile technologies, mobile devices still face signi cant limitations on transmitting and
receiving reliable TCP data streams required to avoid any interruptions while performing
distributed analytics.
4.1 Limitations on TCP performance over mobile devices
Mobile devices use a wireless channel as a transmission medium. Unlike wired net-
works, the time-varying condition on the wireless channel is the dominant cause of packet
loss. TCP proposals mostly designed for wired networks are unable to react adequately to
the packet loss due to channel noise, fading, or interference since they assume the only source
of packet loss is congestion [20]. The random packet loss in the wireless channel makes it
di cult for mobile nodes using one of those proposals (e.g., TCP CUBIC [25] in Android
OS based on Linux Kernel) to estimate available channel bandwidth and achieve optimal
TCP throughput. In addition, most of the wireless protocols allow wireless devices to share
the same channel through contention-based media access control (MAC) that includes pro-
cesses for initiating a new transmission, determining the channel state (e.g., available or
unavailable), and managing retransmissions in the event of a busy channel or data loss,
which has several limitations. If many nodes attempt to transmit data at the same time, for
example, a substantial number of collisions may occur and result in lowering the available
54
bandwidth. Without pre-coordination, it is hard to prioritize data  ows and prevent unfair
transmissions. Not many studies have been made on TCP performance of mobile distributed
analytic applications under these practical constraints.
The IEEE 802.11 standard for WLANs [58] de nes several Physical-layer (PHY) data
rates (e.g., most of recent mobile devices supporting IEEE 802.11n [53] use eight data rates:
6.5, 13, 19.5, 26, 39, 52, 58.5, and 65 Mbps) to provide more robust communication by
falling back to a lower rate in the presence of a high noise level, where a rate adaptation
algorithm of Media access control (MAC) layer makes a runtime prediction of changes in the
channel condition and a selection of the most appropriate PHY rate. Although the PHY rate
change is critical to the TCP performance, the cross layer interaction between the TCP  ow
control and MAC rate adaptation is yet to be thoroughly investigated [59]. For example, a
problematic issue arises when the rate adaptation algorithm aggressively and rapidly reduces
the PHY rate due to short-term degradation of channel quality (e.g., 65 ! 52 Mbps: 20%
decrease, 52! 26 Mbps: 50%). TCP reacts to the sudden PHY rate reduction but needs a
substantial amount of settling time to converge into a stable rate by updating its congestion
window size corresponding to the PHY rate after detecting packet losses. In the event that
frequent rate changes occur in the PHY layer, it is hard to utilize the available bandwidth to
the fullest extent using TCP. Moreover, the TCP performance may drastically deteriorate if
inappropriate PHY rates are selected unnecessarily.
4.2 Problems of using mobile devices for mobile cloud
Some low-cost smartphones and tablets continue to have resource limitations compared
to traditional PCs and laptops in spite of the advances made in their hardware capabilities.
Their wireless capability, in particular, is limited by several factors including power-saving
operations (which may result in lower communication quality and intermittent connectivity),
form factor constraints (that involve challenges in antenna implementation and placement),
and minimal production costs (that bring about small network bu er/queue sizes due to low
55
memory capacity), all of which subject them to throughput reduction and  uctuating perfor-
mance in wireless communications [60]. Moreover, when an application on the receiver is not
able to process incoming TCP packets as fast as senders transmit due to lack of processing
resources, the receiver sets the TCP  ow limit by reporting the decreased receive window size
(e.g, no bu er space available in the worst case). As a result, the sender?s transmission will
eventually be bounded by the receiver?s processing rate. Thus, the processing capability of a
mobile device potentially becomes a signi cant performance factor (i.e., TCP transmit rate
bound) in TCP data communications when the mobile device experiences resource scarcity
on processing data.
Furthermore, most of the mobile devices are generally optimized to improve receive per-
formance, which can be found when looking into the mobile OS kernel and network drivers.
For example, the mobile devices have an asymmetric resource scheduling (or distribution)
scheme for transmitting and receiving data, where the mobile kernel allocates more resources
to speed up processing of data frames on arrival and only the minimum number of frames
necessary to acknowledge the received frames is scheduled for transmission while receiving
data. In addition, the mobile OS does not alert the user to runtime errors of its wireless
kernel or hardware faults nor display information about the internal problems directly, which
makes it di cult to identify critical performance problems and improve the performance of
mobile applications. Besides, it is not an easy task to customize the OS kernel and wireless
driver of mobile devices for the variable operating environments although the mobile OS is
open-source. Hence, the network (or application) performance observed on mobile devices
may not be optimal and it is hard to  nd out the performance bound.
4.3 Research questions
This work studies the advantages and challenges of utilizing mobile devices for dis-
tributed analytics by showing its feasibility and conducting performance analysis. The em-
pirical study based on experiments and simulations focuses on how to build the mobile ad
56
hoc cloud by clustering with nearby mobile devices to reliably support practical distributed
analytics. The following questions are addressed in this work:
 Is a typical distributed software framework for cloud computing capable of e ectively
supporting mobile distributed analytics? What are the problems of implementing
the distributed analytic framework on mobile devices? How can the performance of
distributed analytics be evaluated on practical mobile clusters?
 What are the limitations in enabling mobile devices to o oad portions of the workload
to remote computing resources and share their resources for distributed processing?
How e ciently can the controller node initiate distributed analytics using dynamic
mobile cloud resources under the time-varying operating environment?
 In what ways, is the mobile cluster able to mitigate the e ect of frequent task fail-
ures while supporting large complex computations and long-running processes for dis-
tributed analytics, which are usually caused by hardware/software faults (or slow-
running tasks) and communication problems?
 How can reliable data communications between mobile devices for analytical data
transfers in the work ow of distributed analytics be guaranteed under the limitations
of TCP performance over wireless links? What is the best way to control TCP  ows
on mobile devices for improving performance of mobile distributed analytics?
To resolve these questions, this work has developed a test bed of mobile distributed
platforms and a simulator for mobile cloud clusters by conducting performance analysis to
identify performance issues as presented in the previous chapter, and will propose adaptive
TCP  ow control algorithms for enhanced analytic performance by validating the proposed
solutions through extensive experiments in the following chapters.
57
Chapter 5
Adaptive TCP  ow control for mobile clusters
Through the performance analysis of Hadoop mobile clusters, this work has focused on
two major tra c patterns: 1) between Map and Reduce tasks and 2) between master and
slave nodes. It identi es two performance issues that cause frequent performance degradation
in mobile distributed analytics with MapReduce. Speci cally, over ows of the MAC-layer
transmit queue on the slave nodes interrupt long-lived analytical data  ows required for
data aggregation and over ows of the TCP receive bu er on the master node prevent timely
updates of task progress and resource status for job monitoring and scheduling. This chapter
introduces adaptive TCP  ow control algorithms for improving the performance issues.
5.1 Queueing level control for transmit queue over ow
Most of the previous TCP proposals for both wired and wireless environments (e.g.,
[24] and [25]) neglect a major in uence of the MAC-layer packet loss due to transmit queue
over ows. Neither do they consider any round-trip time (RTT) variations due to queueing
delays, transmission rate changes, and retransmissions (transmit retries) in the MAC layer
while emphasizing the e ect of packet loss resulting from network congestion or channel
fading. They commonly assume a MAC-layer transmit queue with in nite capacity and zero
delay (i.e., no queue dynamics). In practice, the queue?s behavior on mobile devices can
have a signi cant impact on TCP performance.
5.1.1 Analysis of MAC-layer transmit queue over ow
Once a TCP connection is established, TCP decides its data rate based on the Transport-
layer information only. If the sender?s congestion window is not full and the receiver has
58
Figure 5.1: TCP packet transmitting process of mobile nodes
a su cient receive bu er for the connection, TCP packets are continuously injected into a
transmit queue of the lower network layer (e.g. MAC-layer transmit queue) until the window
size is fully used or there is no more packet in the Transport layer. Meanwhile, the limi-
tations on the TCP performance over mobile devices (e.g., time-varying channel condition,
contention-based channel access, PHY transmit rate change, power-saving operation, etc.)
can prevent mobile nodes from sending out packets at the same rate as the TCP packet
injection (i.e., the in ow of the transmit queue). Thus, the queueing delay for transmission
increases and the queueing packets can exceed the limited capacity of the transmit queue.
With a simple queue management algorithm (e.g., Tail drop), when the queue is  lled to
its maximum capacity, the newly arriving packets are dropped until the queue has enough
space to accept incoming packets. The loss of packets causes the TCP sender to move into
congestion avoidance phases, which reduces throughput in the TCP session.
Figure 5.1 illustrates how mobile nodes process TCP transmitting packets. Consider a
wireless single-hop connection consisting of a set of nodes. When one node transfers large
data to the other node using TCP, the Transport-layer TCP of the sender releases packets
to the MAC-layer transmit queue at the TCP transmit rate Rtcptx (t) which corresponds to
the in ow rate of the queue at time t. If there is an in ux of packets in the queue, the
MAC protocol sends out packets at the Physical-layer transmit rate Rphytx (t) selected by a
rate adaptation algorithm. Table 5.1 presents the input parameters of the TCP transmit
analysis.
59
Table 5.1: Parameters for transmit queue over ow analysis
Notation Parameter
Rtcptx (t) TCP transmit rate at time t
Rphytx (t) Physical-layer transmit rate at time t
R tx(t) Actual throughput of wireless link at time t
Pchtx Probability of channel loss
Pqtx Probability of queueing loss
Qmactx (t) Size of MAC transmit queue at time t
qmax Maximum queue capacity
Let R tx(t) be the actual throughput of the wireless link, which is equal to or lower than
the available network bandwidth because some damaged or lost frames due to the varying
channel condition with a probability Pchtx of packet loss may need to be retransmitted. And
let Qmactx (t) denote the size of the transmit queue (i.e., the number of packets in the queue).
Using the notations above, the following di erential equation that represents the queue?s
behavior can be derived at a given time t:
dQmactx (t)
dt =
8
><
>:
0; if Rtcptx (t) Rphytx (t)
Rtcptx (t) Rphytx (t); otherwise
(5.1)
Thus, R tx(t) is given by
R tx(t) =
8>
<
>:
Rphytx (t) Pchtx  Rphytx (t); if Qmactx (t) > 0
Rtcptx (t) Pchtx  Rtcptx (t); if Qmactx (t) = 0
(5.2)
As Rtcptx (t) is bounded by Rphytx (t) that adapts to the channel condition through rate adapta-
tion, both Rtcptx (t) and Rphytx (t) should converge into R tx(t) in stable state:
Rtcptx (t) Rphytx (t) R tx(t) (5.3)
60
Table 5.2: Variables for transmit queueing level control algorithm
Identi er Variables
tx qlen Size of transmit queue
rx qlen Size of receive queue
fc threshold Threshold for queue in ow control
tx queueing Queueing status (set true when queueing data; otherwise false)
tx bound Max transmit frames in scheduling
max frames Max limit of processing frames in one scheduling
rx pending Receive status (set true when receiving data; otherwise false)
In reality, there is high probability of packet loss caused by poor quality of wireless
channel in the mobile environment. When the rate adaptation algorithm aggressively reduces
Rphytx (t) due to a short-term increase of channel loss, TCP reacts to the sudden PHY rate
reduction but needs a signi cant amount of settling time to  nd out the stable rate R tx(t)
by updating its congestion window corresponding to Rtcptx (t) many times. During this time
lag, Rtcptx (t) exceeds Rphytx (t). Thus, the number of packets  lling the queue increases:
dQmactx (t)
dt = R
tcp
tx (t) R
phy
tx (t) > 0 (5.4)
Moreover, mobile nodes share the same channel based on contention-based access. When
the channel is busy as many nodes attempt to communicate, the window of transmit op-
portunity (i.e., transmit time slot) for each node is narrowing. If such delays in queueing
and transmitting last during an interval (t;t +  ) and the transmit queue has a limit on its
capacity qmax, the queue over ow that may lead to packet loss from TCP retransmission
timeout (RTO) with a probability Pqtx will occur as follows:
Z t+ 
t
dQmactx (t)
dt = qmax (5.5)
61
Algorithm 1: Transmit queueing Level control
1 if tx qlen > 0 then
2 if tx queueing = true then /* TCP flow control */
3 if tx qlen fc threshold then
4 Stop TCP packet injection;
5 end
6 else
7 if tx qlen < fc threshold then
8 Restart TCP packet injection;
9 end
10 end
11 if rx pending = false then /* MAC resource scheduling */
12 tx bound max frames;
13 else
14 if tx qlen = 0 then
15 tx bound max frames= 2;
16 end
17 end
18 end
5.1.2 Transmit queueing level control algorithm
To avoid the transmit queue over ow, TCP needs to regulate the out ow rate (i.e.,
in ow rate of the transmit queue) according to the available queue space (i.e., the queueing
level)  rst. As the current network protocol architecture that follows strict layering principles
do not provide any interface for coordination, interaction, or optimization between network
protocols of di erent layers, a modi cation of the existing layered protocols is necessary to
implement a cross-layer  ow control between the Transport and MAC layer.
In addition, mobile devices are usually optimized to improve receive performance, where
more resources are statically assigned to make the processing of receive frames faster. If there
is no incoming data frame, however, all available resources can be used to transmit packets
for reducing the queueing delay. Even though there are incoming frames, the resources for
62
processing receive and transmit data can be dynamically scheduled according to the data
size to be processed as well.
Algorithm 1 presents the steps of the cross-layer  ow control and dynamic resource
scheduling for stabilizing the queueing level, which are performed on the TCP sender node.
Table 5.2 de nes variables for monitoring the queue usage, thresholds for initiating the  ow
control, and parameters for scheduling network resources. If the transmit queue  lls and the
queue length exceeds the pre-de ned threshold, TCP stops injecting packets into the queue.
When the queueing level stabilizes, TCP restarts queueing. Then, it checks the receiving
data  ows. If there is no overhead in processing incoming frames, it uses more resources for
transmitting packets in the queue.
5.2 TCP push  ow control for receive bu er over ow
The previous research has mostly focused on the partition and aggregation pattern of
distributed systems in data center networks, where the input data is partitioned and sent
to worker nodes, and the output data of each worker node is transmitted to an aggregation
node, but another tra c pattern between the controller and worker nodes also has a sig-
ni cant impact on the performance of the distributed computing system. Although worker
nodes complete assigned all sub-workloads faster, the controller?s  nal response to the user?s
request of the workload can be delayed if there is a bottleneck in reporting the sub-workload
completion and transmitting metadata for the  nal result to the controller. Thus, the re-
sponse time is substantially a ected by the network performance between the controller and
worker nodes as well.
5.2.1 Analysis of TCP receive bu er over ow
Every slave node that composes the Hadoop cluster transmits two types of packets
(heartbeat packets for quick status updates and data packets for operational data transfers)
to the master node, the only controller of Hadoop framework. For the MapReduce operation,
63
Figure
5.2:
Data
 o
ws
bet
ween
master
and
sla
ve
no
des
64
each node runs a simple loop that periodically sends a heartbeat packet to the master. The
heartbeat packets contain timely information about the node state (e.g., statistics of resource
utilization) and progress of running tasks, which enables the master to monitor task status
and restart a failed or slow task rapidly. The master combines these updates to produce
a global view of the status of all the jobs being run and their sub-tasks. There are other
sporadic operational data transfers for task assignments and task completion reports using
data packets in the tra c pattern as well. In addition, each node also communicates with the
master using the same kinds of packet for the distributed  les system (i.e., HDFS) operation
as depicted in Figure 5.2.
The Hadoop software architecture distinctively implements the heartbeat updates using
the TCP push function to force an immediate data transfer (without data bu ering). When
this function is invoked, TCP creates a packet that contains application data, and transmits
it right away with the PSH control  ag bit set to 1. The PSH  ag informs the receiving host
that the data should be pushed up to the receiving application immediately. This function
is originally designed for time-sensitive TCP connections (e.g., Telnet, HTTP GET request,
etc.) and is also useful for TCP-based real-time applications. Figure 5.3 illustrates how the
data  ow of Hadoop with the normal TCP packets di ers from the heartbeat  ow with the
TCP push packets.
Meanwhile, current window-based TCP congestion/ ow control algorithms (e.g., [25],
[24]) do not e ectively deal with the fast data  ow consisting of TCP push packets. Because
the TCP push packets unregulated by the typical window-based (or bu er-based) control
scheme have a high process priority, a data  ow with excessive TCP push packets can
exacerbate any congestion level of the TCP connection or processing overhead of in owing
packets stored in TCP receive bu ers. For example, the TCP push packets account for
about 30 percent of tra c to the master node that coordinates 10 slave nodes in the Hadoop
cluster running TeraSort workload with 1GB input data; the master receives around 30 TCP
65
(a) Data  ow
(b) Heartbeat  ow
Figure 5.3: Comparison of Hadoop data and heartbeat  ow
push packets per second. Thus, the data  ows between the master and slave nodes have a
non-negligible e ect on the performance.
In the performance analysis, the frequent TCP receive bu er over ows on the master
node are observed (i.e., the master node advertises TCP zero receive window) while running
the Hadoop benchmarks. When a receiver (i.e., master node) is unable to process TCP pack-
ets in the receive bu er quickly, the receiver reports decreasing values of the receive window
and the sender (i.e., slave node) transmits a portion of data that does not exceed the receive
window. If TCP Zero Window is reported by the receiver, the sender immediately stops its
transmission of the corresponding TCP session. Thus, every Zero Window advertisement
incurs a substantial throughput penalty and transmission delay on the TCP connection due
66
Figure 5.4: TCP packet receiving process of master node
to the signi cant amount of time taken for the receive window to be re-opened to the normal
window size [60].
Figure 5.4 demonstrates how the master node processes TCP receiving packets. Consider
one of TCP sessions between the master and slave nodes. The master receives packets from
a slave at the rate R rx(t) and the application of the master processes the packets at the
rate Rapprx (t). When the master has a bottleneck in processing the receive packets, the TCP
receive bu er begins to  ll. Let rwndi(t) denote the size of the TCP receive bu er i at
time t. Table 5.3 presents the input parameters of the TCP receive analysis. The following
equation that describes behavior of the receive bu er can be derived at a given time t. If
67
Table 5.3: Parameters for receive bu er over ow analysis
Notation Parameter
R rx(t) Physical-layer receive rate at time t
Rapprx (t) Application?s processing rate at time t
Pbrx Probability of packet loss due to bu er over ow
rwndi(t) Size of TCP Receive bu er at time t
rwndmax Maximum bu er capacity
there are no substantial MAC-layer receive bu ering delays,
drwndi(t)
dt =
8
><
>:
0; if R rx(t) Rapprx (t)
R rx(t) Rapprx (t); otherwise
(5.6)
Since the master node based on a mobile device is subject to resource scarcity on man-
aging its slave nodes by monitoring the job progress and by updating the entire image of the
distributed  lesystem, there is higher possibility of bottlenecks in processing. For example,
the master of Hadoop runs two major controller modules, JobTracker for MapReduce and
NameNode for HDFS, and maintains at least 2 n concurrent receive bu ers for TCP con-
nections between the master node with 2 controllers and the n slave nodes (as well as the
same number of TCP transmit bu ers for bidirectional TCP  ows) as depicted in Figure 5.4.
In addition, each slave node periodically sends heartbeats that need to be processed
as soon as they arrive. TCP pushes them to the application immediately. The TCP push
packets aggravate the process overhead while the master is busy processing many requests.
Consequently, R rx(t) frequently goes beyond Rapprx (t) on the master node. Thus, the receive
bu er size (the number of TCP data packets to be processed in the receive bu er) increases:
drwndi(t)
dt = R
 
rx(t) R
app
rx (t) > 0 (5.7)
68
When such bottleneck in processing receive packets lasts during the interval (t;t +  )
and the TCP receive bu er has a limit on its capacity rwndmac, the receive bu er over ow
that leads to packet loss from TCP retransmission timeout (RTO) with a probability Pbrx
may occur as follows: Z
t+ 
t
drwndi(t)
dt = rwndmax (5.8)
As a result, the receiver?s processing rate Rapprx (t) eventually slows down the sender?s
transmission rate Rtcptx (t) and limits the actual throughput R tx(t) of the TCP connection.
From (3), we have
Rtcptx (t) Rphytx (t) R tx(t) R rx(t) Rapprx (t) (5.9)
5.2.2 TCP push  ow control algorithm
To avoid the receive bu er over ow on the master node which results in the interruption
of receiving the latest status updates and task reports, a modi cation of the TCP congestion
control is needed because the current TCP implementation does not include any performance
consideration and sophisticated exception handling for the TCP push function. This work
has reviewed the Linux Kernel codes for TCP by focusing on TCP push packet handling
and identi ed a problem in round-trip time (RTT) estimation that is essential for TCP
congestion window control and retransmission timeout (RTO) estimation.
The TCP sender measures the RTT of the network path, which includes the propa-
gation delays of the network channels, queueing delays of network devices, and processing
delays at the receiver. The TCP implementation estimates the accurate RTT of the path
using new RTT samples and updates the RTO based on the estimated RTT. If the RTO
is underestimated due to incorrect RTT estimates, the TCP packets can be retransmitted
excessively, which also initiates TCP congestion avoidance procedures unnecessarily. Thus,
the RTT measure is the most important parameter in the TCP performance.
In the Hadoop cluster, the master node responds (acknowledges) to the heartbeat (TCP
push packet) using the same type of packet, i.e., every heartbeat call incurs a bidirectional
69
Table 5.4: Variables for TCP push  ow control algorithm
Identi er Variables
fc mode Flow control status (set true when enabled; otherwise false)
zero window TCP Zero Window reception (set true when advertised by master; other-
wise false)
psh acked TCP push ACK reception (set true when acknowledged; otherwise false)
delayed ack TCP delayed ACK status (set true when enabled; otherwise false)
cwnd TCP congestion window
packets in flight Unacknowledged TCP data in  ight
TCP push stream. As the TCP push packet is passed directly to the application and its
acknowledgment is transmitted without bu ering on the receiver, the TCP push stream has
a shorter processing delay compared to the normal TCP data stream as shown in Figure 5.3.
If the short RTT of the push stream is frequently used for the samples of RTT estimation,
excessive retransmissions of many normal TCP packets will occur after the RTO is underes-
timated according to the incorrect RTT estimates. Also, the fast acknowledgements of the
push packets increase the sender?s congestion window quickly. As a result, the sender (i.e.,
slave node) transmits too many packets that overwhelm the available network bandwidth
and/or the receiver (i.e., master node)?s processing capability.
Algorithm 2 includes the TCP  ow control procedure for improving receive bu er over-
 ows of the master node, which is performed on the slave nodes. Table 5.4 de nes variables
for monitoring the Zero Window events, thresholds for initiating the  ow control, and pa-
rameters for the TCP control. If Zero Window is advertised by the master node, the slave
node initiates the TCP push  ow control mode. In this mode, the TCP control skips the
speci c receive steps for the RTT update and RTO estimation using the RTT sample of TCP
push streams; also bypasses the step for delayed ACK timeout (ATO) estimation if enabled.
Then, it moderates the congestion window by taking the smallest size of the congestion
window and the unacknowledged data (i.e., the actual bandwidth limit of the connection)
in order to prevent an outburst from the fast growth due to the quick acknowledgements of
70
Algorithm 2: TCP push  ow control
1 if zero window = true then /* Initiate flow control */
2 fc mode true;
3 end
4 if fc mode = true then /* TCP push flow control */
5 if psh acked = true then
6 Skip RTT update;
7 Skip RTO estimation;
8 if delayed ack = true then
9 Skip ATO estimation;
10 end
11 cwnd min(cwnd; packets in flight);
12 end
13 end
the push packets. This algorithm is implemented in the TCP protocol of the experimental
platform.
71
Chapter 6
Evaluation of proposed solutions
This chapter presents performance evaluation of two proposed solutions: the transmit
queueing level control algorithm for avoiding MAC-layer transmit queue over ows on the
slave nodes and TCP push  ow control algorithm for mitigating TCP receive bu er over ows
on the master node.
6.1 Transmit queueing level control algorithm
To avoid frequent MAC-layer transmit queue over ows on the mobile cluster nodes,
which interrupt long-lived analytical data  ows required for the partition and aggregation
work ow of distributed processing frameworks, Algorithm 1 performed on the TCP sender
includes the cross-layer  ow control for TCP packet injection and dynamic network resource
scheduling for incoming and outgoing frames to minimize the queueing delay and stabilizing
the queueing level. The algorithm is implemented in the TCP and MAC protocol of Linux-
based Android OS kernel on the experimental mobile platform (Google Nexus 7).
6.1.1 Performance improvement of peer-to-peer data transfer
First, TCP communication performance between two mobile devices was measured in
order to clarify the cause of performance degradation during large data transfers. One
device was con gured as a transmitter and the other as a receiver. A network performance
measurement tool, Iperf, was employed to investigate communication performance, which
generates constant TCP tra c. In addition, several scripts to control continuous Iperf
test runs and collect performance measurements (e.g., TCP throughput) by con guring
the Iperf test parameters according to each experimental setup were developed. The scripts
72
Figure 6.1: Evaluation setup for transmit queueing level control
also collected TCP and link statistics (e.g., TCP congestion window, transmit queue size,
transmission rate, retransmission, and transmission failure) from the Linux kernel and its
wireless driver. As the performance of a wireless link is usually a ected by the surrounding
channel conditions, the experimental results may exhibit variations. To eliminate this e ect,
the communication performance was measured at 10 feet range with an unobstructed line-
of-sight, where all measurements reported negligible di erences. Figure 6.1 presents the
evaluation setup for the transmit queueing level control.
Figure 6.2 indicates how the frequent transmit queue over ows a ect the TCP perfor-
mance between two mobile nodes. Three di erent queue capacities of the transmitter with
the original protocol stack are tested; the smaller is 128 packets, the default 256 packets,
and the larger 512 packets. It is found that the queue over ows are even observed in the
larger queue and they cause frequent TCP throughput collapse with constant adjustments
of TCP congestion window from the transmit queueing losses. Figure 6.3 also demonstrates
how the proposed algorithm improves the TCP performance. The same queue sizes of the
transmitter with the modi ed protocol stack are applied. As the algorithm implements the
TCP  ow control and dynamic resource scheduling for reliable data communications, two
mobile nodes achieve better TCP throughput with the stable congestion window in all the
di erent queue capacities. Note that the transmit queue size should be optimized considering
73
0
10
20
30
40
50
60
05101520 Data rate (Mbps)
Iperf TCP throughput
0
10
20
30
40
50
60
0100200300400500600
Segments
TCP congestion window
 
 
cwnd
ssthresh
0
10
20
30
40
50
60
0128256384512
Run time (s)
Packets
Transmit queue length
 
 
max queue
queue length
(a)
Max
queue:
128
pkts
0
10
20
30
40
50
60
05101520 Data rate (Mbps)
Iperf TCP throughput
0
10
20
30
40
50
60
0100200300400500600
Segments
TCP congestion window
 
 
cwnd
ssthresh
0
10
20
30
40
50
60
0128256384512
Run time (s)
Packets
Transmit queue length
 
 
max queue
queue length
(b)
Max
queue:
256
pkts
0
10
20
30
40
50
60
05101520 Data rate (Mbps)
Iperf TCP throughput
0
10
20
30
40
50
60
0100200300400500600
Segments
TCP congestion window
 
 
cwnd
ssthresh
0
10
20
30
40
50
60
0128256384512
Run time (s)
Packets
Transmit queue length
 
 
max queue
queue length
(c)
Max
queue:
512
pkts
Figure
6.2:
TCP
performance
of
mobile
no
des
with
di eren
tM
AC
queue
sizes
74
0
10
20
30
40
50
60
05101520 Data rate (Mbps)
Iperf TCP throughput
0
10
20
30
40
50
60
0100200300400500600
Segments
TCP congestion window
 
 
cwnd
ssthresh
0
10
20
30
40
50
60
0128256384512
Run time (s)
Packets
Transmit queue length
 
 
max queue
queue length
(a)
Max
queue:
128
pkts
0
10
20
30
40
50
60
05101520 Data rate (Mbps)
Iperf TCP throughput
0
10
20
30
40
50
60
0100200300400500600
Segments
TCP congestion window
 
 
cwnd
ssthresh
0
10
20
30
40
50
60
0128256384512
Run time (s)
Packets
Transmit queue length
 
 
max queue
queue length
(b)
Max
queue:
256
pkts
0
10
20
30
40
50
60
05101520 Data rate (Mbps)
Iperf TCP throughput
0
10
20
30
40
50
60
0100200300400500600
Segments
TCP congestion window
 
 
cwnd
ssthresh
0
10
20
30
40
50
60
0128256384512
Run time (s)
Packets
Transmit queue length
 
 
max queue
queue length
(c)
Max
queue:
512
pkts
Figure
6.3:
TCP
performance
with
mo
di ed
net
work
stac
k
75
0 20 40 60 80 1000
5
10
15
20
Run time (s)
Throughput (Mbps)
 
 
modified stack original stack
Figure 6.4: TCP throughput of original and modi ed network stack
other communication protocol usages (e.g., UDP) even though the small queue size works
well with the adaptive TCP  ow control algorithm.
6.1.2 Performance improvement in data aggregation
Next, three mobile devices were con gured to generate aggregate data tra c that is
similar to the data  ow pattern between the Map and Reduce nodes; see Figure 6.1. Two
devices simultaneously transmit a constant TCP stream to one device. To compare the
TCP performance between the original and modi ed protocol stack, one transmitter was
con gured with the original stack and the other with the modi ed stack that implements
the algorithm. The TCP performance is measured in the same way as the prior experiments
using the Iperf scripts.
Figure 6.4 shows a signi cant improvement in TCP performance. During the 100-
second test with the original stack, the zero throughput due to the transmit queue over ows
76
is observed twice at 13 and 75 second. On the contrary, the transmitter with the modi ed
stack sends more TCP data reliably without throughput collapse as indicated in the  gure
and its average throughput (9.82 Mbps) is signi cantly improved compared to that of the
original one (6.18 Mbps).
Although the proposed algorithm improves the performance problem, the critical values
for the  ow control threshold (or transmit queue size) and resource scheduling parameters
should be carefully determined in consideration of other performance factors (e.g., memory
utilization, network condition, and other communication protocols) and the operating con-
ditions (or environments). Thus, the future work should continue to re ne the algorithms by
validating through extensive experiments using various benchmarks and sample applications.
6.2 TCP push  ow control algorithm
In order to prevent the TCP receive bu er over ows on the master node due to the use of
TCP push packets, which results in interruptions in the reception of the latest status updates
and task reports from slave nodes, Algorithm 2 performed on slave nodes includes the TCP
push  ow control for mitigating the side e ects of TCP push  ows that cause excessive
transmissions from fast congestion window growth and recurrent RTO underestimation. It
moderates the congestion window outburst and bypasses the receive steps for the RTO
estimation using RTT updates from the TCP push streams when the bu er over ow is
detected. The algorithm is implemented in the Linux TCP protocol of the experimental
Android platform.
6.2.1 Packet analysis of Hadoop master running TeraSort
For evaluation of the algorithm, the TeraSort benchmark with 1 GB input data is ex-
ecuted in the same con guration of the previous performance analysis, where the mobile
cluster that runs the Hadoop software consists of a single master node and ten slave nodes
77
0
150
300
450
600
750
900
020406080100
Run time (s)
CPU usage (%)
CPU
0
150
300
450
600
750
900
020406080100
Run time (s)
Memory usage (%)
Memory
0
150
300
450
600
750
900
020406080100
Run time (s)
I/O rate (KB/s)
Storage
 
 
read
write
Figure
6.5:
Resource
utilization
of
master
no
de
running
TeraSo
rt
workloa
d
0
100
200
300
400
500
600
700
800
900
051015202530
Run time (s)
Data rate (KB/s)
Master node
 
 
receive
transmit
0
100
200
300
400
500
600
700
800
900
00.511.522.53
Run time (s)
Data rate (MB/s)
Map node
 
 
receive
transmit
0
100
200
300
400
500
600
700
800
900
00.511.522.53
Run time (s)
Data rate (MB/s)
Reduce node
 
 
receive
transmit
Figure
6.6:
Tra c
patterns
of
cluster
no
des
running
TeraSort
workload
78
0 100 200 300 400 500 600 700 800 9000
20
40
60
80
100
Progress (%)
1 GB TeraSort task progress
 
 
Map
Reduce
0 100 200 300 400 500 600 700 800 9000
20
40
60
80
100
Packets
TCP push flows on Master node
 
 
receive
transmit
0 100 200 300 400 500 600 700 800 9000
2
4
6
8
10
Run time (s)
Count
Zero Window advertisements of Master node
Figure 6.7: TCP packet analysis of master node
con gured with the default values for system parameters of Android OS and Hadoop frame-
work. The usages of computing and networking resources of each node are monitored with
the performance monitoring tool during the experiments.
First, slave nodes with the original TCP protocol are used for the mobile cluster in order
to investigate master node?s behavior under the CPU and I/O bound TeraSort testing. As
the master node based on a mobile device is subject to resource scarcity on managing all
the cluster nodes by monitoring their task progress with resource usages and by updating
the entire image of the distributed  lesystem, the high memory usage and steady storage
79
0 20 40 60 80 1000
2
4
6
8
10
Run time (s)
Zero Window count
 
 
original TCP
modified TCP
Figure 6.8: Zero Window advertisements of original and modi ed TCP
utilization of the master node are commonly observed as shown in Figure 6.5. Figure 6.6 also
displays the master?s incessant sort-lived tra c pattern (i.e., TCP push  ows) compared to
slave nodes running Map and Reduce tasks during the experiment. In addition, Figure 6.7
includes the TCP packet analysis of the master node with the task progress, which indicates
that the master node processes around 30 incoming and outgoing TCP push packets (i.e.,
heartbeats and acknowledgements) every second. As a result, the master node receives too
many heartbeat and data packets that overwhelm its processing capability at some time and
advertises frequent Zero Window to halt the TCP transmission; see Figure 6.7.
6.2.2 Performance improvement of TeraSort workload
Next, the slave nodes where the modi ed TCP protocol is implemented are deployed
on the mobile cluster in order to evaluate the proposed algorithm. Since it prevents the
RTT update and RTO estimation using the RTT sample of TCP push streams when the
80
original TCP modified TCP0
1
2
3 x 10
4
Packets
TCP transmissions
original TCP modified TCP0
1
2
3
4
Retransmission rate (%)
TCP retransmissions
original TCP modified TCP50
75
100
125
150
Protocol stack
Count
Zero Window advertisements
original TCP modified TCP700
750
800
850
900
Protocol stack
Time (s)
Job completion time
Figure 6.9: Performance improvement of experimental mobile cluster
master node advertises Zero Window and moderates the TCP congestion window to avoid
an outburst from the fast growth due to the quick acknowledgements of the push packets,
there are marked decreases in the Zero Window advertisement (i.e., receive bu er over ow
occurrence) of the master node as shown in Figure 6.8. When all the cluster nodes have the
implementation of the TCP push  ow control algorithm along with the transmit queueing
level control algorithm, it is observed that the TCP retransmission rate of the slave nodes (as
well as the total number of transmit packets) can be reduced and the overall job completion
time also expects to be shortened; see Figure 6.9.
Even though the proposed algorithm mitigates the performance problem, there are still
a signi cant number of bu er over ow occurrences on the master node. Therefore, the
future work should continue to improve the proposed algorithm. For example, adaptive
bu er management for many concurrent receive bu ers (e.g., twice the number of slaves
81
for Hadoop master) is required to receive more incoming packets when considering the slow
application processing rate under resource scarcity of mobile devices.
As analyzed in this study, frequent over ows of the MAC-layer transmit queue and
TCP receive bu er on the mobile cluster nodes are the two most signi cant performance
issues, which interrupt reliable analytical data interchanges for mobile distributed analytics.
Since the higher frequency of the over ows leads to a large number of under-utilization
periods and  uctuations in the TCP performance, for example, the Hadoop mobile clusters
may not able to complete a submitted job within an expected time frame by reporting
a signi cant number of response timeout in the task running. Hence, a better TCP  ow
control algorithm for mobile distributed analytic applications should be proposed in order to
improve the performance problems and achieve the desired performance over variable mobile
environments and critical performance limitations.
82
Chapter 7
Conclusion
This chapter concludes this dissertation with a summary of the previous chapters and
a discussion of main contributions and further research suggestions.
7.1 Summary
In this work, the bene ts and challenges of employing practical mobile devices for im-
plementing mobile ad hoc cloud were analyzed. The current trends of mobile technologies
and mobile services were presented with the di erent concepts of mobile cloud computing
in Chapter 1. The overviews of Hadoop distributed analytic framework and TCP  ow con-
trol for performance analysis were introduced in Chapter 2 and the prior e orts to develop
mobile cloud services using mobile devices were discussed with other useful techniques and
approaches therein. In order to understand performance issues of mobile distributed ana-
lytic applications, the experimental performance studies were conducted by developing a test
bed of mobile cloud clusters with Hadoop analytic framework in Chapter 3 and furthermore
the simulation studies were conducted by developing the MapReduce simulator based on a
network simulator to address extensive mobility scenarios and operating setups.
Moreover, this work identi ed performance issues through performance analysis. De-
spite advances in mobile technologies, mobile devices still contains signi cant imperfections
in transmitting and receiving reliable data streams required to avoid any interruptions while
performing distributed processing, which come from limitations on TCP performance over
wireless networks and problems of using mobile devices with resource constraints. The prob-
lem statements and the main goal of this study can be found in Chapter 4 where the research
questions were presented. In order to improve the performance issues, this work proposed
83
two adaptive TCP  ow control algorithms for mitigating the transmit queue over ows and
the receive bu er over ows of mobile devices with the analysis of TCP performance prob-
lems over Hadoop MapReduce framework in Chapter 5. Finally, the presented algorithms
were evaluated in Chapter 6 and it was shown that the proposed solutions improve the TCP
performance and shorten the overall MapReduce processing time.
7.2 Discussion
The main contributions of the present work can be summarized as follows. First, this
work discussed the bene ts and the challenges of exploiting mobile devices for distributed
cloud applications, showing its feasibility with Hadoop analytic framework, and also inves-
tigates critical performance issues for reliable data communications between mobile devices
in the work ow of distributed computing. Second, unlike earlier approaches this work exam-
ined the performance of Hadoop mobile clusters by performing extensive experiments using
typical Hadoop benchmarks representing CPU, memory and/or I/O intensive applications.
The newest release of Hadoop software framework with its enhancements was entirely ported
to the latest Android-based mobile devices through the mobile virtualization. Third, this
work developed the MapReduce simulator based on the ns-2 network simulator in order to
comprehensively evaluate the performance and e ciency of mobile cloud clusters in extensive
operating environments, which allowed it to identify more performance issues under di erent
cluster (or workload) scales, dynamic node mobility, and various wireless channel conditions.
Lastly, this work analyzed the TCP performance problems resulting from distinct tra c
patterns of MapReduce-based Hadoop distributed framework and proposed adaptive TCP
 ow control algorithms for improving the performance of mobile cloud clusters by mitigating
the e ects of the frequent transmit queue and receive bu er over ows of mobile devices.
In order to improve the presented algorithms for the reliable data communications be-
tween mobile cluster nodes, the future work should enable the algorithms to dynamically
adjust the values of the performance parameters for TCP  ow control (e.g., threshold for
84
queue in ow control) and network resource scheduling (e.g., number of transmit frames in
scheduling) according to the operating conditions although the current algorithms with pre-
de ned values mitigated the performance problems in the experimental setups. In addition,
an adaptive bu er management technique for many concurrent TCP receive bu ers of the
controller node is necessary to accommodate more incoming packets considering the slow
application processing rate of mobile devices with small bu er capacity. It is also impor-
tant that the algorithms should be re ned, validating through extensive experiments and
simulations using other distributed analytic frameworks and various real-world applications.
While this work mostly focused on the performance of practical distributed analytic
applications on mobile cloud clusters in terms of job processing time (or completion time)
and response time, some other studies have paid attention to mobile node?s energy e ciency
that is another key performance factor for enabling data analysis and mining over mobile
devices. Reducing energy consumption is one of the most important design aspects for small
form-factor mobile platforms, such as smartphones and tablets. Therefore, an energy-aware
scheduling over the mobile cluster for optimizing energy utilization of cluster nodes should
be taken into account for reliable mobile distributed analytics.
85
Bibliography
[1] \IDC Predictions 2014: Battles for Dominance and Survival on the 3rd Platform,"
Research, IDC, Inc., 2013. [Online]. Available: http://www.idc.com/getdoc.jsp?
containerId=244606
[2] \Forecast: PCs, Ultramobiles and Mobile Phones, Worldwide, 2010-2017, 4Q13
Update," Research, Gartner, Inc., 2014. [Online]. Available: https://www.gartner.
com/doc/2639615
[3] \Bring Your Own Device," Website, The White House, Aug. 2012. [Online]. Available:
http://www.whitehouse.gov/digitalgov/bring-your-own-device
[4] PassMark Software, \CPU Benchmarks," 2014. [Online]. Available: http://www.
cpubenchmark.net/
[5] \Worldwide Mobile Worker Population 2011-2015 Forecast," Research, IDC, Inc.,
Dec. 2011. [Online]. Available: http://www.idc.com/getdoc.jsp?containerId=238366#
.USZZJ6WNEwE
[6] \Top 10 Strategic Technology Trends for 2013," Research, Gartner, Inc., 2012. [Online].
Available: http://www.gartner.com/technology/research/top-10-technology-trends
[7] Dinh, Hoang T and Lee, Chonho and Niyato, Dusit and Wang, Ping, \A
survey of mobile cloud computing: architecture, applications, and approaches,"
Wireless Communications and Mobile Computing, 2011. [Online]. Available:
http://dx.doi.org/10.1002/wcm.1203
[8] Fernando, Niroshinie and Loke, Seng W and Rahayu, Wenny, \Mobile cloud computing:
A survey," Future Generation Computer Systems, vol. 29, no. 1, pp. 84 { 106, 2012.
[9] Satyanarayanan, Mahadev and Bahl, Paramvir and Caceres, Ram on and Davies, Nigel,
\The case for vm-based cloudlets in mobile computing," Pervasive Computing, IEEE,
vol. 8, no. 4, pp. 14{23, 2009.
[10] Zachariadis, Stefanos and Mascolo, Cecilia and Emmerich, Wolfgang, \Satin: A Com-
ponent Model for Mobile Self Organisation," in On the Move to Meaningful Internet
Systems 2004: CoopIS, DOA, and ODBASE. Springer Berlin Heidelberg, 2004, pp.
1303{1321.
[11] Wikipedia, \Cloud computing," 2014. [Online]. Available: http://en.wikipedia.org/
wiki/Cloud computing
86
[12] \Gartner Identi es the Top 10 Strategic Technology Trends for 2014," Press Release,
Gartner, Inc., 2012. [Online]. Available: http://www.gartner.com/newsroom/id/
2603623
[13] \Hadoop project," Website, Apache Software Foundation. [Online]. Available:
http://hadoop.apache.org
[14] Dean, Je rey and Ghemawat, Sanjay, \MapReduce: Simpli ed Data Processing on
Large Clusters," Communications of the ACM, vol. 51, no. 1, pp. 107{113, 2008.
[15] Ghemawat, Sanjay and Gobio , Howard and Leung, Shun-Tak, \The Google  le sys-
tem," in ACM SIGOPS Operating Systems Review, vol. 37, no. 5. ACM, 2003, pp.
29{43.
[16] White, Tom, Hadoop: The de nitive guide. O?Reilly Media, 2012.
[17] Rajaraman, Anand and Ullman, Je rey David, Mining of Massive Datasets. Cambridge
University Press, 2012.
[18] \Big Data in the Enterprise: Network Design Considerations," White paper,
Cisco Systems, Inc., 2011. [Online]. Available: http://www.cisco.com/en/US/prod/
collateral/switches/ps9441/ps9670/white paper c11-690561.html
[19] Yang, Hailong and Luan, Zhongzhi and Li, Wenjun and Qian, Depei, \MapReduce
workload modeling with statistical approach," Journal of Grid Computing, vol. 10,
no. 2, pp. 279{310, 2012.
[20] Afanasyev, Alexander and Tilley, Neil and Reiher, Peter and Kleinrock, Leonard, \Host-
to-Host Congestion Control for TCP," Communications Surveys & Tutorials, IEEE,
vol. 12, no. 3, pp. 304{342, 2010.
[21] Postel, Jon, \Transmission control protocol, RFC 793," 1981.
[22] Jacobson, Van, \Congestion avoidance and control," in ACM SIGCOMM Computer
Communication Review, vol. 18, no. 4. ACM, 1988, pp. 314{329.
[23] Karn, Phil and Partridge, Craig, \Improving round-trip time estimates in reliable trans-
port protocols," ACM SIGCOMM Computer Communication Review, vol. 17, no. 5, pp.
2{7, 1987.
[24] Grieco, Luigi A and Mascolo, Saverio, \Performance Evaluation and Comparison of
Westwood+, New Reno, and Vegas TCP Congestion Control," ACM SIGCOMM Com-
puter Communication Review, vol. 34, no. 2, pp. 25{38, 2004.
[25] Ha, Sangtae and Rhee, Injong and Xu, Lisong, \CUBIC: A New TCP-Friendly High-
Speed TCP Variant," ACM SIGOPS Operating Systems Review, vol. 42, no. 5, pp.
64{74, 2008.
87
[26] Vaquero, Luis M and Rodero-Merino, Luis and Caceres, Juan and Lindner, Maik, \A
Break in the Clouds: Towards a Cloud De nition," ACM SIGCOMM Computer Com-
munication Review, vol. 39, no. 1, pp. 50{55, 2008.
[27] Armbrust, Michael and Fox, Armando and Gri th, Rean and Joseph, Anthony D and
Katz, Randy and Konwinski, Andy and Lee, Gunho and Patterson, David and Rabkin,
Ariel and Stoica, Ion and others, \A View of Cloud Computing," Communications of
the ACM, vol. 53, no. 4, pp. 50{58, 2010.
[28] Buyya, Rajkumar and Yeo, Chee Shin and Venugopal, Srikumar and Broberg, James
and Brandic, Ivona, \Cloud Computing and Emerging IT Platforms: Vision, Hype,
and Reality for Delivering Computing as the 5th Utility," Future Generation computer
systems, vol. 25, no. 6, pp. 599{616, 2009.
[29] Zhang, Qi and Cheng, Lu and Boutaba, Raouf, \Cloud computing: state-of-the-art and
research challenges," Journal of Internet Services and Applications, vol. 1, no. 1, pp.
7{18, 2010.
[30] Mei, Lijun and Chan, Wing Kwong and Tse, T.H., \A Tale of Clouds: Paradigm
Comparisons and Some Thoughts on Research Issues," in The 3rd IEEE Asia-Paci c
Services Computing Conference (APSCC) 2008. IEEE, 2008, pp. 464{469.
[31] Wang, Lizhe and Von Laszewski, Gregor and Younge, Andrew and He, Xi and Kunze,
Marcel and Tao, Jie and Fu, Cheng, \Cloud Computing: a Perspective Study," New
Generation Computing, vol. 28, no. 2, pp. 137{146, 2010.
[32] Marinelli, Eugene E, \Hyrax: Cloud Computing on Mobile Devices using MapReduce,"
Master?s thesis, Carnegie Mellon University, 2009.
[33] Teo, Chye Liang Vincent, \Hyrax: Crowdsourcing Mobile Devices to Develop Proximity-
Based Mobile Clouds," Master?s thesis, Carnegie Mellon University, 2012.
[34] Huerta-Canepa, Gonzalo and Lee, Dongman, \A Virtual Cloud Computing Provider for
Mobile Devices," in Proceedings of the 1st ACM Workshop on Mobile Cloud Computing
& Services: Social Networks and Beyond. ACM, 2010, p. 6.
[35] Shi, Cong and Lakafosis, Vasileios and Ammar, Mostafa H and Zegura, Ellen W,
\Serendipity: Enabling Remote Computing among Intermittently Connected Mobile
Devices," in Proceedings of the thirteenth ACM international symposium on Mobile Ad
Hoc Networking and Computing. ACM, 2012, pp. 145{154.
[36] Shi, Cong and Ammar, Mostafa H and Zegura, Ellen W and Naik, Mayur, \Computing
in Cirrus Clouds: The Challenge of Intermittent Connectivity," in Proceedings of the
 rst edition of the MCC workshop on Mobile cloud computing. ACM, 2012, pp. 23{28.
[37] Kemp, Roelof and Palmer, Nicholas and Kielmann, Thilo and Bal, Henri, \Cuckoo:
A Computation O oading Framework for Smartphones," Mobile Computing, Applica-
tions, and Services, pp. 59{79, 2012.
88
[38] Lee, Seungbae and Grover, Kanika and Lim, Alvin, \Enabling actionable analytics for
mobile devices: performance issues of distributed analytics on Hadoop mobile clusters,"
Journal of Cloud Computing: Advances, Systems and Applications, vol. 2, no. 1, p. 15,
2013.
[39] Chen, Yanpei and Gri t, Rean and Zats, David and Katz, Randy H, \Understanding
TCP Incast and Its Implications for Big Data Workloads," University of California at
Berkeley, Tech. Rep., 2012.
[40] Alizadeh, Mohammad and Greenberg, Albert and Maltz, David A and Padhye, Jitendra
and Patel, Parveen and Prabhakar, Balaji and Sengupta, Sudipta and Sridharan, Mu-
rari, \Data center tcp (dctcp)," ACM SIGCOMM Computer Communication Review,
vol. 40, no. 4, pp. 63{74, 2010.
[41] Chen, Yanpei and Gri th, Rean and Liu, Junda and Katz, Randy H and Joseph,
Anthony D, \Understanding TCP incast throughput collapse in datacenter networks,"
in Proceedings of the 1st ACM workshop on Research on enterprise networking. ACM,
2009, pp. 73{82.
[42] Phanishayee, Amar and Krevat, Elie and Vasudevan, Vijay and Andersen, David G and
Ganger, Gregory R and Gibson, Garth A and Seshan, Srinivasan, \Measurement and
Analysis of TCP Throughput Collapse in Cluster-based Storage Systems," in FAST,
vol. 8, 2008, pp. 1{14.
[43] Vasudevan, Vijay and Phanishayee, Amar and Shah, Hiral and Krevat, Elie and An-
dersen, David G and Ganger, Gregory R and Gibson, Garth A and Mueller, Brian,
\Safe and e ective  ne-grained TCP retransmissions for datacenter communication," in
ACM SIGCOMM Computer Communication Review, vol. 39, no. 4. ACM, 2009, pp.
303{314.
[44] Wu, Haitao and Feng, Zhenqian and Guo, Chuanxiong and Zhang, Yongguang, \ICTCP:
Incast Congestion Control for TCP in data center networks," in Proceedings of the 6th
International COnference. ACM, 2010, p. 13.
[45] Alizadeh, Mohammad and Atikoglu, Berk and Kabbani, Abdul and Lakshmikantha,
Ashvin and Pan, Rong and Prabhakar, Balaji and Seaman, Mick, \Data center trans-
port mechanisms: Congestion control theory and IEEE standardization," in 2008 46th
Annual Allerton Conference on Communication, Control, and Computing. IEEE, 2008,
pp. 1270{1277.
[46] Krevat, Elie and Vasudevan, Vijay and Phanishayee, Amar and Andersen, David G and
Ganger, Gregory R and Gibson, Garth A and Seshan, Srinivasan, \On application-level
approaches to avoiding TCP throughput collapse in cluster-based storage systems,"
in Proceedings of the 2nd international workshop on Petascale data storage: held in
conjunction with Supercomputing?07. ACM, 2007, pp. 1{4.
[47] \Mumak: Map-Reduce Simulator," Website, Apache Software Foundation. [Online].
Available: https://issues.apache.org/jira/browse/MAPREDUCE-728
89
[48] Hammoud, Suhel and Li, Maozhen and Liu, Yang and Alham, Nasullah Khalid and Liu,
Zelong, \MRSim: A discrete event based MapReduce simulator," in Seventh Interna-
tional Conference on Fuzzy Systems and Knowledge Discovery (FSKD), vol. 6. IEEE,
2010, pp. 2993{2997.
[49] Wang, Guanying, \Evaluating MapReduce System Performance: A Simulation Ap-
proach," Ph.D. dissertation, Virginia Polytechnic Institute and State University, 2012.
[50] \The Network Simulator - ns," Website, Nanam. [Online]. Available: http:
//nsnam.isi.edu/nsnam/index.php/Main Page
[51] Huang, Shengsheng and Huang, Jie and Dai, Jinquan and Xie, Tao and Huang, Bo, \The
HiBench benchmark suite: Characterization of the MapReduce-based data analysis," in
IEEE 26th International Conference on Data Engineering Workshops (ICDEW). IEEE,
2010, pp. 41{51.
[52] \NEXUS 7," Website, Google, Inc., 2012. [Online]. Available: http://www.google.
com/nexus/7
[53] \IEEE Standard for Information technology{ Local and metropolitan area networks{
Speci c requirements{ Part 11: Wireless LAN Medium Access Control (MAC)and Phys-
ical Layer (PHY) Speci cations Amendment 5: Enhancements for Higher Throughput,"
IEEE Std 802.11n-2009 (Amendment to IEEE Std 802.11-2007 as amended by IEEE Std
802.11k-2008, IEEE Std 802.11r-2008, IEEE Std 802.11y-2008, and IEEE Std 802.11w-
2009), pp. 1{565, 2009.
[54] Enck, William and Octeau, Damien and McDaniel, Patrick and Chaudhuri, Swarat, \A
Study of Android Application Security," in USENIX security symposium, 2011.
[55] \LinuxonAndroid project," Website. [Online]. Available: http://linuxonandroid.org
[56] Wikipedia, \Mobile virtualization," 2014. [Online]. Available: http://en.wikipedia.org/
wiki/Mobile virtualization
[57] Kebarighotbi, Ali and Cassandras, Christos G, \Timeout Control in Distributed Sys-
tems using Perturbation Analysis," in 2011 50th IEEE Conference on Decision and
Control and European Control Conference (CDC-ECC). IEEE, 2011, pp. 5437{5442.
[58] \IEEE Standard for Information technology{Telecommunications and information ex-
change between systems Local and metropolitan area networks{Speci c requirements
Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY)
Speci cations," IEEE Std 802.11-2012 (Revision of IEEE Std 802.11-2007), pp. 1{2793,
2012.
[59] Khademi, Naeem and Welzl, Michael and Gjessing, Stein, \Experimental evaluation of
TCP performance in multi-rate 802.11 WLANs," in 2012 IEEE International Sympo-
sium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM). IEEE,
2012, pp. 1{9.
90
[60] Sanadhya, Shruti and Sivakumar, Raghupathy, \Adaptive Flow Control for TCP on
Mobile Phones," in 2011 Proceedings IEEE INFOCOM. IEEE, 2011, pp. 2912{2920.
91