ENERGY AWARE TASK SCHEDULING ON HETEROGENEOUS SYSTEMS 
 
 
 
Except where reference is made to the work of others, the work described in this 
dissertation is my own or was done in collaboration with my advisory committee. This 
dissertation does not include proprietary or classified information. 
 
 
 
 
 
 
Rabab Farouk Abdel-Kader 
 
 
 
Certificate of Approval: 
 
 
 
 
 
Cheryl Seals      Sanjeev Baskiyar, Chair 
Assistant Professor     Associate Professor  
Computer Science      Computer Science  
and Software Engineering     and Software Engineering 
 
 
 
 
 
Levent Yilmaz      George T. Flowers  
Assistant Professor     Interim Dean 
Computer Science     Graduate School 
and Software Engineering 
ENERGY AWARE TASK SCHEDULING ON HETEROGENEOUS SYSTEMS 
 
Rabab F. Abdel-Kader 
 
 
A Dissertation 
Submitted to 
the Graduate Faculty of  
Auburn University  
in Partial Fulfillment of the  
Requirements for the  
Degree of 
Doctor of Philosophy 
 
 
 
 
 
Auburn, Alabama 
December 17, 2007 
 
 
 
ENERGY AWARE TASK SCHEDULING ON HETEROGENEOUS SYSTEMS 
 
 
 
Rabab F. Abdel-Kader 
 
 
 
 
Permission is granted to Auburn University to make copies of this dissertation and its 
discretion, upon request of individuals or institutions and at their expense. The author 
reserves all publication rights. 
 
 
 
 
 
 
 
 
 
Signature of Author 
 
 
 
 
Date of Graduation  
 
 
 
iii
 
iv
 
 
 
DISSERTATION ABSTRACT 
ENERGY AWARE TASK SCHEDULING ON HETEROGENEOUS SYSTEMS 
 
Rabab F. Abdel-Kader 
Doctor of Philosophy December 17, 2007 
(M.S. Tuskegee University 2002) 
(B.S. Suez Canal University, 1998) 
 
128 Typed Pages  
Directed by Sanjeev Baskiyar 
 
We consider the problem of scheduling directed a-cyclic task graphs (DAG) on 
heterogeneous distributed processor systems with the twin objectives of minimizing 
finish time and energy consumption. Previous scheduling heuristics have assigned DAGs 
to processors to minimize overall run-time of applications. But due to many new 
applications on embedded systems such as high performance DSP in image processing, 
multimedia, and wireless security, there is a strong need for scheduling algorithms which 
lower energy consumption and yet attain good finish times. 
In this research, we employ dynamic voltage scaling (DVS) within the scheduling 
heuristics to achieve the twin objectives. The processors used can run on different 
discrete operating voltages. Processors can scale down their voltages to slow down in 
 
v
order to reduce energy consumption whenever they idle due to task dependencies.  
Specifically, we combine Decisive Path Scheduling (DPS) and Heterogeneous N-
predecessor Duplication (HNPD) with DVS. Using simulations, we show average energy 
consumption reductions of 40% over DPS and 28% over HNPD.  
The simulations used large number of randomly generated DAGs with various 
characteristics as well as DAGs of real world problems. Energy savings increased with 
increasing number of nodes or increasing Communication to Computation Ratios (CCR) 
whereas it decreased with increasing parallelism (out-degree) or increasing number of 
available processors.  Increasing nodes, increase tasks dependencies and thus idle times.  
When CCR increases, processors are idle longer due to communication between tasks.  
Our algorithms used such idle times to achieve energy savings. An increase in out-degree 
resulted in smaller average energy savings. A larger out-degree allows more processors to 
run in parallel, reducing idle times. 
 
vi?
?
Style manual: IEEE Standard 
Software used: Microsoft Word 2007, Microsoft Excel 2007, Microsoft Visual Studio 
2003,and C++ 
vii 
 
 
 
 
TABLE OF CONTENTS 
LIST OF FIGURES ........................................................................................................... ix 
LIST OF TABLES ........................................................................................................... xiii 
1. INTRODUCTION ...........................................................................................................1 
 1.1 Contributions......................................................................................................3 
2. RELATED WORKS ........................................................................................................5 
 2.1 Power Estimation Techniques ............................................................................6 
 2.2 Power Optimization Techniques ........................................................................8 
 2.3 Software power optimization on system level ...................................................9 
 2.4 Scheduling to lower energy consumption ........................................................10 
3. DYNAMIC VOLTAGE SCALING ..............................................................................15 
 3.1 Power Management Implementation ...............................................................19 
4. DIRECTED A-CYCLIC GRAPH .................................................................................21 
 4.1 Performance .....................................................................................................24 
5. SCHEDULING ALGORITHMS ...................................................................................28 
 5.1 EADAGS algorithm .........................................................................................28 
 5.2 EAGS-D algorithm ..........................................................................................37 
 5.3 Voltage scaling strategies ................................................................................46 
 
viii 
 
6. RESULTS AND DISCUSSION ....................................................................................51 
 6.1 Results for random DAGs ................................................................................51 
             6.1.1 Results for EADAGS ........................................................................52 
                        6.1.2 Results for EAGS-D .........................................................................60 
 6.2 Results for real world problems .......................................................................68 
             6.2.1 Gaussian elimination .........................................................................68 
             6.2.2 Molecular dynamic code ...................................................................79 
                        6.2.3 Fast Fourier transforms .....................................................................88 
             6.2.4 Sieve of Eratosthenes ........................................................................98 
7. CONCLUSIONS..........................................................................................................108 
BIBLIOGRAPHY ............................................................................................................110 
 
ix 
 
 
 
 
LIST OF FIGURES 
 
3.1. Different operating rates for the same task .................................................................17 
3.2. Voltage Scaling graph .................................................................................................19 
4.1. A sample DAG ............................................................................................................21 
5.1. EADAGS algorithm ....................................................................................................36 
5.2. Timeline for machine k ...............................................................................................36 
5.3. EAGS-D algorithm .....................................................................................................45 
5.4. Timeline for machine k (case 1) .................................................................................45 
5.5. Timeline for machine k (case 2) .................................................................................46 
5.6. Timeline for machine k (case 3) .................................................................................46 
5.7. Different finish times with different execution rates ..................................................49 
6.1. Average power savings with respect to number of nodes...........................................53 
6.2. Average power savings with respect of CCR .............................................................55 
6.3. Average power savings with respect to PNR ..............................................................56 
6.4. Average power savings with respect to shape parameter ...........................................57 
6.5. Average power savings with respect to out-degree ....................................................59 
6.6. Average power savings with respect to number of nodes...........................................61 
6.7. Average power savings with respect of CCR .............................................................63 
x 
 
6.8. Average power savings with respect to PNR ..............................................................64 
6.9. Average power savings with respect to shape parameter ...........................................65 
6.10. Average power savings with respect to out-degree ..................................................67 
6.11. Gaussian elimination task graph .........................???????????????71 
6.12. Average power savings for Gaussian elimination algorithm with EADAGS with 
different number of processors ..........................................................................................73 
6.13. Average power savings for Gaussian elimination algorithm with EADAGS with 
different CCR values ..........................................................................................................73 
6.14. Average power savings for Gaussian elimination algorithm with EAGS-D with 
different number of processors ..........................................................................................76 
6.15. Average power savings for Gaussian elimination algorithm with EAGS-D with 
different CCR values ..........................................................................................................77 
6.16. Average power savings for Gaussian elimination algorithm with EADAGS and 
EAGS-D with voltage scaling levels .................................................................................78 
6.17. Directed Acyclic graph (DAG) for a molecular dynamics code .......................................... 80 
6.18. Average power savings for molecular dynamic code with EADAGS with different 
number of processors .........................................................................................................82 
6.19. Average power savings for molecular dynamic code with EADAGS with different 
CCR values ........................................................................................................................83 
6.20. Average power savings for molecular dynamic code with EAGS-D with different 
number of processors .........................................................................................................85 
xi 
 
6.21. Average power savings for molecular dynamic code with EAGS-D with different 
CCR values ........................................................................................................................86 
6.22. Average power savings for molecular dynamic code algorithm with EADAGS and 
EAGS-D with voltage scaling levels .................................................................................88 
6.23. The generated DAG for FFT with four points ..........................................................90 
6.24. Average power savings for FFT with EADAGS with different number of processors92 
6.25. Average power savings for FFT with EADAGS with different CCR values ...........93 
6.26. Average power savings for FFT with EAGS-D with different number of processor ... 
............................................................................................................................................95 
6.27. Average power savings for FFT with EAGS-D with different CCR values .............96 
6.28. Average power savings for FFT with EADAGS and EAGS-D with voltage scaling 
levels ..................................................................................................................................97 
6.29. Sieve of Eratosthenes task graph for N=32 ..............................................................99 
6.30. Average power savings for different number of processors for Sieve of Eratosthenes 
with EADAGS .................................................................................................................101 
6.31. Average power savings for different CCR values for Sieve of Eratosthenes with 
EADAGS .........................................................................................................................102 
6.32. Average power savings for Sieve of Eratosthenes with EAGS-D with different 
number of processors .......................................................................................................104 
6.33. Average power savings for Sieve of Eratosthenes with EAGS-D with different CCR 
values ...............................................................................................................................105 
xii 
 
6.34. Average power savings for Sieve of Eratosthenes with EADAGS and EAGS-D with 
voltage scaling levels .......................................................................................................107 
 
 
xiii 
 
 
 
 
LIST OF TABLES 
 
3.1. The relationship between clock frequencies, supply voltage, and power dissipation of 
Transmeta?s Crusoe TM5400 microprocessor ...................................................................18 
5.1. EADAGS algorithm ....................................................................................................32 
5.2. EAGS-D algorithm .....................................................................................................40 
6.1 Makespan and average power savings with respect to number of nodes for EADAGS.. 
 ?????????????????????????????????..54 
6.2. Makespan and average power savings for different CCR values for EADAGS ........55 
6.3. Makespan and average power savings for different shape parameter for EADAGS .58 
6.4. Makespan and average power savings with respect to out-degree for EADAGS ......59 
6.5. Makespan and average power savings with respect to number of nodes for EAGS-D .. 
???????????????????????????????????61 
6.6. Makespan and average power savings for different CCR values for EAGS-D ..........63 
6.7. Makespan and average power savings for different shape parameters for EAGS-D .66 
6.8. Makespan and average power savings with respect the out-degree for EAGS-D ......67 
6.9. Gaussian elimination algorithm ..................................................................................69 
6.10. Makespan and average power savings for different number of processors for 
Gaussian elimination with EADAGS ................................................................................74 
 
 
xiv 
 
6.11. Makespan and average power savings with respect to CCR for Gaussian elimination 
with EADAGS ...................................................................................................................75 
6.12. Makespan and average power savings for different number of processors for 
Gaussian elimination with EAGS-D ..................................................................................77 
6.13. Makespan and average power savings with respect to CCR for Gaussian elimination 
with EAGS-D .....................................................................................................................78 
6.14. Makespan and average power savings for different number of processors for 
molecular dynamic code with EADAGS ...........................................................................82 
6.15. Makespan and average power savings with respect to CCR for molecular dynamic 
code with EADAGS ...........................................................................................................84 
6.16. Makespan and average power savings for different number of processors for 
molecular dynamic code with EAGS-D ............................................................................85 
6.17. Makespan and average power savings with respect to CCR for molecular dynamic 
code with EAGS-D ............................................................................................................87 
6.18. Makespan and average power savings with respect to number of processors for FFT 
with EADAGS ...................................................................................................................92 
6.19. Makespan and average power savings with respect to CCR for FFT with EADAGS . 
............................................................................................................................................94 
6.20. Makespan and average power savings for different number of processors for FFT 
with EAGS-D .....................................................................................................................96 
6.21. Makespan and average power savings with respect to CCR for FFT with EAGS-D. 
............................................................................................................................................97 
 
 
xv 
 
6.22. Makespan and average power savings for different number of processors for Sieve 
of Eratosthenes with EADAGS .......................................................................................101 
6.23. Makespan and average power savings for different CCR values for Sieve of 
Eratosthenes with EADAGS ............................................................................................103 
6.24. Makespan and average power savings for different number of processors for Sieve 
of Eratosthenes with EAGS-D .........................................................................................104 
6.25. Makespan and average power savings for different CCR values for Sieve of 
Eratosthenes with EAGS-D .............................................................................................106 
 
 
 
1
 
 
 
CHAPTER 1 
INTRODUCTION 
 
We consider scheduling on heterogeneous distributed computing systems 
interconnected by high-speed networks. Such systems are promising for fast processing 
of computationally intensive applications with diverse computation needs.  
One of the challenges in heterogeneous computing is to develop scheduling 
algorithms that assign the tasks of applications to processors [Reut97]. Therefore, 
researchers have proposed many static, dynamic and even hybrid algorithms to minimize 
execution time of applications running on a heterogeneous system [Iver98][Kwok99] 
[Radu00][Seig97][Topc99][Wang97]. Another challenge facing distributed computing is 
energy consumption [Dong01] 
There are many applications which require both low finish time and low energy 
consumption. Energy consumption is a major issue in many real-time distributed 
embedded systems. Furthermore, most applications running on an energy limited system 
inherently constrain the finish time. Low-cost, low-energy sensor networks composed of 
Smart Dust Mote [Smar06] are examples of such systems. New wireless communication 
systems are expected to evolve using this system. These networks are distributed 
networks operating on energy constraints, also called energy-aware distributed systems 
 
2
(PADS).  Hence there is a need for scheduling algorithms which would effectively reduce 
the overall energy consumed and yet attain the best possible finish time.   
We consider the problem of scheduling a directed a-cyclic task graph (DAG) on a 
heterogeneous distributed processor system with the twin objectives of minimizing finish 
time and energy consumption. The DAG structure is important as it occurs in many 
regular and irregular applications in forms of Cholesky factorization, LU decomposition, 
Gaussian elimination, FFT, Laplace transforms, and instruction level parallelism. Such 
low energy schedules can help run such applications in multi-hop sensor radio networks. 
Traditionally, priority has been on performance, and consequently the supply 
voltage has been set at the maximum allowable level based on device breakdown 
potentials to enable fast operation. However, applications may not require the maximum 
achievable speed at all times. The top energy consumers in a computer system [Kump94] 
are display (68%), disk (20%), and CPU (12%).  There seems little which can be done to 
minimize screen energy-consumption, beyond employing a screen-saver and relying on 
hardware improvements. Disk energy consumption is minimized by spinning down the 
disk when it has been inactive. 
However, in the future, we may well see ubiquitous computing devices with 
neither disks nor conventional displays. For such devices, minimizing the energy 
consumed by the CPU will be critical if the replacements of disks and displays consume 
relatively smaller fractions of total energy. 
A study by Argonne National Laboratory has indicated that a 2.5 petaflop 
supercomputer, made of over a hundred thousand CPUs, will be available by 2010. The 
 
3
study predicts that such a system will cost $16 million and would require 8 mega watts of 
energy to operate at a cost of about $8 million per year. Hence, high energy prices and 
rising concerns about the environmental impact of electronics systems highlight the 
importance of incorporating low energy design schemes at all levels of such systems. 
Current microprocessors from AMD, Intel and Transmeta allow the speed of the 
microprocessor to be set dynamically. 
Another study by NASA in 1998 predicted energy need of 25 megawatts for 
Japan?s NEC earth simulator, which is capable of executing 40 Tflops. That amount 
increased to 100 megawatts in a more recent study that much energy is enough to light 
1.6 million 60-watts light bulbs, the lighting requirements of a small city.  
Reducing systems components energy and energy consumption decrease systems 
expenses. Assuming a rate of $100 per megawatt a Pflops machine consuming 100 
megawatts of energy would cost $10,000 per hour approximate $85 million dollar a year.  
These estimates do not include air cooling expenses which are commonly 40% of 
systems operating cost. For such systems even small reduction in overall energy 
consumption would significantly impact Pflops systems? operational costs. 
 
1.1 Contributions 
In this dissertation we presented two scheduling algorithms for scheduling 
Directed Acyclic Graphs on a distributed computing system for low energy.  
The first proposed algorithm combines Decisive Path Scheduling (DPS) with 
dynamic voltage scaling for the twin objectives of low energy consumption and minimum 
 
4
execution time. We call it Energy Aware DAG Scheduling (EADAGS). The second 
algorism proposed combines Heterogeneous N-predecessor Duplication (HNPD) with 
dynamic voltage scaling to minimize both finish time and consumed energy, we identify 
that algorithm as Energy Aware Graph Scheduling with Duplication (EAGS-D).  
 In both cases first the initial algorithm is completed using either DPS or HNPD 
for minimum finish time, then the amount of consumed energy is estimated and the 
voltage scaling algorithm is simulated to minimize the consumed energy without 
affecting the finish time.  
The remainder of this dissertation is organized as follows. In the next Chapter, we 
describe different types of scheduling on both homogenous and heterogeneous systems, 
energy estimation and energy optimization techniques, and prior work on scheduling for 
low energy are discussed. Chapter 3 explains the concept of dynamic voltage scaling and 
how it can be used to reduce the consumed energy in computing systems. Chapter 4 
defines DAG and explains some of the definitions and terminology used by the 
scheduling algorithms. Chapter 5 introduces both scheduling algorithms presented in this 
work EADAGS and EAGS-D. Chapter 6 is for the simulation and analysis of results. 
Finally the conclusion and suggestions for future work are presented in Chapter 7.  
 
 
 
 
 
 
5
 
 
 
CHAPTER 2 
RELATED WORK 
 
Traditional scheduling algorithms did not consider the amount of energy 
consumption. Instead, they focus on performance or fairness. Recently, low energy 
system design has gained significant attention largely due to demands from the portable 
electronics industry. System design for low energy is also very important for other 
industries such as automotive, telecommunications, information technology, etc.  This is 
due to the fact that low energy designs can offer significant reduction in system 
packaging costs and improvement in reliability.  
 Two main design aspects of scheduling are how to build the scheduling queue and 
how to choose the optimal processor. List and cluster scheduling are primary techniques 
to schedule tasks on heterogeneous systems.  
 In list scheduling, tasks are ordered in a scheduling queue based on the priority 
assigned to free tasks. List scheduling algorithms have been shown to have good cost-
performance trade-offs.  
 Cluster scheduling involves merging nodes/paths to form clusters that can be 
scheduled on the same processor so as to get closer to the objectives of schedule length, 
number of processors, etc.  
 
6
 Several algorithms for static scheduling on heterogeneous multiprocessors 
systems are available: Dynamic Level Scheduling (DLS), Generalized Dynamic Level 
Scheduling (GDLS), Best Imaginary Level (BIL), Mapping Heuristics (MH), 
Heterogonous Earliest Finish Time (HEFT), Task Duplication Scheduling (TDS), Static 
Task Duplication Scheduling (STDS), Fast Critical Path (FCP), and Fast Load Balancing 
(FLB). Among the above TDS and STDS employ task duplication to suppress 
communication whereas others do not. A brief description of these algorithms is available 
in [Topc02].  
Heterogeneous N-predecessors Decisive Path (HNPD) is based on DPS but with 
Task duplication. The performance of HNPD was proven to outperform two of the best 
existing heuristics, Heterogeneous Earliest Finish Time (HEFT) and Static Task 
Duplication Scheduling (STDS), in terms of finish time and the number of processors 
employed over a wide range of parameters. 
 Low energy scheduling research can be classified in two major categories  
1. Energy estimation techniques (energy model) 
2. Energy optimization techniques. 
 
2.1 Power Estimation Techniques  
 Existing energy estimation methodologies can be classified based on their level of 
abstraction, namely instruction level, architecture level, and gate level. 
? Instruction level: application for this can be for embedded processing systems 
as presented by Nikoladis in [Niko02]. He uses an assembly or machine level 
 
7
program as input and gives an estimate of the energy consumed for that 
specific program on a specific processing system. This provided an accurate 
estimation of energy consumption even in the presence of instantaneous 
energy supply variation.  
? Architecture level: it provides cycle-by-cycle energy consumption data of the 
architecture on the basis of the instruction/data flow stream. At the 
architecture level a technique presented by Landman uses the black box 
energy model for the architecture level components to estimate the energy 
consumed while preserving the accuracy of the gate or circuit level estimation 
[Land94]. 
? Gate level: Ishehara summarized and compared different techniques for 
energy estimation and proved that gate level estimation of energy 
consumption is the most accurate measurement [Ishe96]. The  techniques for 
energy estimation at the gate level and low levels of abstraction  can be 
classified into  
o Simulation based techniques: the earliest techniques proposed 
suggested monitoring both the supply voltage and current 
waveforms. These techniques were to slow to handle very large 
circuits so other techniques were introduced assuming that the 
supply and ground voltages are constant and estimate only the 
supply current waveform. 
o Probabilistic techniques: most of the probabilistic techniques are 
 
8
applicable to combinational circuits only. In these techniques user 
supplied input signal probabilities are propagated into the circuit. 
To achieve this, special models for the components have to be 
developed and stored in a module library.  
o Statistical techniques: they do not require any specialized model 
for the components. The idea is to simulate the circuit with 
randomly generated input vectors until energy converges to the 
average energy. The convergence is tested by statistical mean 
estimation techniques.  
 
2.2 Power Optimization Techniques 
Competition is driving the requirement for energy optimization. Systems are 
designed with low energy consumption as one of the important criteria. Energy 
optimization can be achieved through both hardware and software. 
? Hardware Optimization: 
o Behavior level: transformations, scheduling, resource allocation, etc. 
o Architecture level: low energy flip-flops, low energy adder, etc. 
o Circuit level: low energy circuits. 
? Software  Optimization 
o Instruction level: low energy compiling, low energy instruction 
scheduling. 
 
 
9
o System level: Dynamic energy management, low energy memory 
management, etc. 
 
2.3 Software energy optimization on system level  
 There are two techniques that can reduce energy consumption on system level 
scheduling: Dynamic Energy Management (DPM) and Dynamic Voltage Scaling (DVS).  
 The DPM technique dynamically reconfigures an electronic system by reducing 
number of active components and/or load on such components while providing services.  
DPM is used in various forms usually in portable devices. However, the complexity of 
interfacing heterogeneous components has limited designers to simple solutions in DPM.  
An example of a simple policy, mostly applied to laptops and PDA, is a timeout policy, 
which turns off a component after a fixed inactivity period, under the assumption that it is 
highly likely that a component remains idle if it has been idle for the timeout period. 
 The fundamental premise for the applicability of DPM is that systems (and their 
components) experience non-uniform workloads during operation time. Such an 
assumption is valid for most systems, both when considered in isolation and when 
systems are connected via internet. A second assumption of DPM is that it is possible to 
predict, with a certain degree of confidence, the fluctuations of workload. The analytical 
process of prediction should not consume significant energy. Typically, a energy 
manager (PM) implements a control procedure based on some observations and/or 
assumptions on the workload. 
 In DVS technique, computation and communication tasks are run at reduced 
voltages and clock frequencies which fill idle periods but reduce energy dissipation 
 
10
while providing required performance. The key idea of DVS is to dynamically scale the 
supply voltage of CPU while meeting total computation time and/or throughput.  It is a 
trade-off between processor speed and energy consumption which is especially useful in 
real-time systems. The energy consumption of a processor running at high speed and high 
voltage is much larger than running at low speed and low voltage. For example, reducing 
the supply voltage from 5 V to 3.3 V in some cases has reduced energy by 56% 
[Pouw01].  
 DVS essentially fills the slack times by elongated computation or communication 
times.  There are two types of slack times: Worst Slack Time (WST) and Workload-
Variation Slack time (WVST). WST results from low processor utilization. WVST 
occurs due to execution time variations caused by data-dependent computation. WST can 
be roughly estimated from the scheduling results before task execution whereas WVST 
can be known only after execution. 
 
2.4 Scheduling to lower energy consumption 
 For uniprocessor real-time systems, many schemes have been proposed to manage 
energy consumption. Mosse et. al. [Moss00] proposed and analyzed several schemes to 
dynamically adjust processor speed for slack reclamation. They used a compiler to assist 
the operating system in changing the CPU operating levels to reduce energy 
consumption.  
 Weiser [Weis94] discussed several methods for varying the clock speed 
dynamically under control of the operating system. He proved that by adjusting the clock 
 
11
speed at a fine grain, substantial CPU energy can be saved with a limited impact on 
performance.  
 Chandrakasan [Chan96] has shown that for periodic tasks, a few 
voltage/frequency levels are sufficient to achieve almost the same energy savings as 
infinite voltage/speed levels.  
 Yang [Yang01] proposed a two-phase scheduling scheme which contains a design 
time scheduler and a runtime scheduler that minimizes energy consumption while 
meeting timing constraints. By choosing different scheduling options at compile time 
they achieved 20-40% average energy savings. 
 Zhang and Chen [Zhan02] proposed a priority based task mapping and scheduling 
for a fixed task graph applying the earliest deadline first scheduling and formulating the 
voltage scaling problem as an integer programming problem. They proved that their 
framework can slow 8% of cycles in very short time. 
 The main concern in DVS is to increase slack time utilization as much as possible 
and to make resultant energy consumption as low as possible. Two types of slack time are 
defined as worst slack time (WST) and workload-variation slack time (VST). WST 
results from processors utilization that is less than 100%. Low processor utilization is 
always the case even if all tasks exhibit their worst-case execution time. VST occurs due 
to execution time variations caused by data-dependent computation. WST can be 
estimated from the scheduling results before task execution whereas VST can be known 
only after actual task execution.  
 Shin et. al. [Shin01] proposed a low-energy, priority-based scheduling which 
 
12
consists of two parts: an off-line component which determines minimum processor speed 
while guaranteeing deadlines of all tasks and an online component which dynamically 
varies processor speed to utilize both WST and WVST. 
Shang et. al. [Shan03] proposed a history-based DVS for interconnected 
networks. Their technique leverages network history to predict future network needs, 
judiciously controlling the frequency (and voltage) of links to track actual network 
utilization. Such mechanisms resulted in 46% average energy savings at the cost of 
15.2% increase in network latency and 2.5% decrease in network throughput.  
 Theoretical investigations of speed scaling algorithms were initiated by Yao, 
Demers, and Shankar [Yao 95]. Yao et al. propose formulating speed scaling problems as 
scheduling problems. They assumed that each task has a release time when it arrives into 
the system, an amount of work that must be performed to complete the task and also a 
deadline that specifies the time by which the task should be completed. A schedule 
specifies which task to run at each time, and at what speed that task should be run. In 
some settings, for example, the playing of a video or other multimedia presentation, there 
may be natural deadlines for the various tasks imposed by the application. In other 
settings, the system may impose deadlines to better manage tasks or ensure a certain 
quality of service to each task. They studied the problem of minimizing the total energy 
used subject to the deadline feasibility constraints 
 Few other literatures have considered energy saving in addition to the 
performance.  Lu, Benini and Micheli [Lu00] presented a greedy on-line scheduling 
algorithm to facilitate energy management for multiple devices.  They ordered the 
 
13
execution of tasks so that devices can have continuous long idle periods during which 
they can be shut down. They achieved an average energy savings of 33%. 
 Mishra et. al. [Mish03] proposed two novel techniques for energy management in 
distributed systems. The first is a static technique which uses a greedy algorithm to 
manage energy in presence of parallelism.  The second technique uses task reallocation 
that enhances the first algorithm by allowing out-of-order execution where preemption is 
allowed. Their technique saved an average of 10-20% more savings than a simple static 
energy management technique. 
 Shiue and Chakrabarti in [Shiu00] presented polynomial time algorithms for (i) 
resource-constrained scheduling and (ii) latency-constrained scheduling for the case 
when the resources operate at multiple voltages. Both scheduling schemes try to reduce 
the overall energy consumption. The resource-constrained scheduling scheme tries to 
balance the conflicting requirements of reducing the latency and maximally utilizing 
resources operating at reduced voltages. The latency-constrained scheduling scheme 
assigns as many nodes as possible to the resources operating at low voltages without 
violating the timing constraint.  
 Kirovski and M. Potkonjak [Kiro97] developed a system-level approach for 
energy minimization of cost-constrained hard real-time designs. The approach 
simultaneously optimizes all three degrees of freedom for energy minimization, namely 
switching activity, effective capacity and supply voltage.  
 Srivastava in [Sriv96] described three broad architectural approaches for energy 
efficient programmable computation: predictive shutdown, concurrency driven supply 
 
14
voltage reduction, and switching activity reduction. A significant reduction in energy 
consumption can be obtained by employing these architectural techniques. They have 
shown that an aggressive shut down strategy based on a predictive technique can reduce 
the energy consumption by a large factor compared to the straightforward conventional 
schemes where the energy decision is based solely on a predetermined idle time 
threshold.  
 An example where the idea embodied in their techniques can be applied is the 
combination of parallelism-driven voltage reduction with switching activity reduction to 
increase the energy efficient of memory operations when the access pattern is sequential 
in nature. Such sequential access patterns occur, for example, when fetching video data 
from a frame buffer memory or when writing a page of virtual memory back to disk. 
Instead of accessing data from memory in a serial fashion, several words can be read 
from memory and the memory can be clocked at a lower rate for the same throughput. If 
the serial implementation runs at a supply of 3V to meet a given throughput, then the 
parallel version can run at a supply voltage of 1.3 V while meeting throughput 
requirements. 
Chaeseok and Ha. [ImHa04] proposed an energy efficient real-time multi-task 
scheduling by the use of buffers with DVS.  They saved an average energy of 44% with 
reasonable machine specifications. The buffers increase CPU utilization by averaging the 
workload. Their technique was designed for multimedia applications where a slight 
buffering delay is tolerable.  
 
 
 
 
CHAPTER 3 
DYNAMIC VOLTAGE SCALING 
 
For most energy-conscious designs, a major source of energy savings is voltage 
scaling, which scales operating voltages of processors and corresponding maximum clock 
speeds. The dominant source of energy consumption in digital CMOS circuits is the 
dynamic energy dissipation P, characterized by 
  
fCVP
2
?
where C is the effective switching capacitance, V is the supply voltage, and f is the clock 
speed [Burd96].  
Since energy varies linearly with the clock speed and the square of the voltage, 
adjusting the voltage can result in significant energy reductions, at least in theory. 
However, reducing the supply voltage requires a corresponding decrease in clock speed 
and increase in task execution latency.  
The settling time for a gate is proportional to the voltage; the lower the voltage 
drop across the gate, the longer the gate takes to stabilize. To lower the voltage and still 
operate correctly, the cycle time must be lowered first. When raising the clock rate, the 
voltage must be increased first. Given that the voltage and the cycle time of a chip could 
be adjusted together, it should be clear now that the lower-voltage, slower-clock chip will 
 
15
 
16
dissipate less energy per cycle. If the voltage level can be reduced linearly as the clock 
rate is reduced, then the energy savings per instruction will be proportional to the square 
of the voltage reduction. Of course, for a real chip it may not be possible to reduce the 
voltage linear with the clock reduction. However, if it is possible to reduce the voltage at 
all by running slower, then there will be a net energy savings per cycle. 
Currently manufacturers do not test and rate their chips across a smooth range of 
voltages. However, some data is available for chips at a set of voltage levels. For 
example, a Motorola CMOS 6805 microcontroller is rated at 6 MHz at 5.0 Volts, 4.5 
MHz at 3.3 Volts, and 3 MHz at 2.2 Volts. This is a close to linear relationship between 
voltage and clock rate. Thus there is seemingly no technical objection to designing a 
variable-voltage system provided that the input reference voltage to the processor?s 
voltage regulator may be a digital word writable by the processor.  
The other important factor is the time it takes to change the voltage. The main 
time-cost would be for the converter or regulator to ramp the supply voltage up or down. 
The ramping time is determined by the time constants of the converter.  The frequency 
for voltage regulators is on the order of 200 KHz [Weis94], so we speculate that it will 
take a few tens of microseconds to boost the voltage on the chip. Moreover the CPU 
should be able to continue working during a voltage ramp and ramping should not have 
any substantial energy cost. 
Finally, why run slower? Suppose a task has a deadline in 100 milliseconds, but it 
will only take 50 milliseconds of CPU time when running at full speed to complete. A 
normal system would run at full speed for 50 milliseconds and then idle for 50 
 
17
milliseconds (assuming there were no other ready tasks) as in Figure 3.1. During the idle 
time the CPU can be stopped altogether by putting it into a mode that wakes up upon an 
interrupt, such as from a periodic clock or from an I/O completion.  
 
 
Figure 3.1. Different operating rates for the same task 
 
Now, compare this to a system that runs the task at half speed so that it completes 
just before its deadline. If it can also reduce the voltage by half, then the task will 
consume 1/4 the energy of the normal system, even taking into account stopping the CPU 
during the idle time. This is because the same number of cycles is executed in both 
systems, but the modified system reduces energy use by reducing the operating voltage. 
Another way to view this is that idle time represents wasted energy, even if the CPU is 
stopped. 
The relation between clock speed, supply voltage, and energy dissipation for 
Transmeta?s Crusoe TM5400 microprocessor as reported in its data sheet [Tiwa96] are 
shown in Table 3.1. For a program running for time duration of T, its total energy 
10050 50 100 
ms ms
Operating voltage Operating voltage 
consumption E is approximately equal to  E = P
avg
 
?
 T , Where P
avg
 is the average energy 
consumed. Researchers have proposed many ways of determining ?appropriate? 
operating clock rate. The basic idea behind these energy saving approaches is to slow 
down the tasks as much as possible without violating the deadline. This ?just-in-time? 
strategy can be illustrated through a voltage scheduling graph as in Figure 3.2 [Much97].  
In a voltage scheduling graph, the X-axis represents time and the Y-axis 
represents processor speed. The total amount of work for a task is defined by the area of 
the task ?box.?  For example, task 1 in Figure 3.2 has a total workload of 8,000 cycles. 
By ?stretching? it out all the way to the deadline without change of the area, we are able 
to decrease the CPU speed from 600MHz down to 400MHz. As a result, 23.4% of total 
(CPU) energy may be saved on a Crusoe TM5400 processor. 
 
 
Table 3.1. The relationship between clock frequencies, supply voltage, and energy 
dissipation of Transmeta?s Crusoe TM5400 microprocessor. 
 
                                   
 
18
 
               (a) Original schedule           (b) Voltage scaled schedule 
Figure 3.2. Voltage scheduling graph  
 
3.1 Power management implementation 
 There are several ways of achieving energy reduction. In this section, we study 
the various approaches. 
 
3.1.1 Simple shutdown 
Energy shutdown to a component is a radical solution that eliminates all sources if 
energy dissipation, including leakage.  
Energy consumption of idle processors can be avoided by energying off the unit. 
This radical solution requires controllable switches. An advantage of this approach is the 
wide applicability to all kind of electronic components. A major disadvantage is the 
wake-up time or the recovery time because the processor operation must be reinitialized.  
 
3.1.2 Multiple and variable energy supplies 
Dynamic energy management is also applicable to processors that are not idle, but 
whose performance requirement varies with time. The implementation technology can 
then be based on the slowdown. The slowdown is achieved by lowering the 
 
19
 
20
voltage supply, such that the machine becomes performance critical. 
Dynamically-varying supply voltages may be quantified [Chan96] and thus be 
restricted to a finite number of values, or may take values in a continuous range. In the 
former case it is possible to identify a finite number of energy states for the system. 
 
3.1.3 OnNow approach 
The OnNow approach uses energy-management hardware to put the PC into a 
low-energy sleep state instead of shutting down completely, so that the system can 
quickly resume working [Micr04]. While in the sleep state, the PC's processor is not 
executing code and thus no work is being accomplished for the user. However, events 
from both hardware devices (such as modem ring or network request) and the real-time 
clock can be enabled to cause the system to wake up. 
Each device in the system has its own energy states, and these are independently 
managed by the device driver (or other policy owner) while the system is in the working 
state. The device's policy integrates any particular application's needs with device 
capabilities and other operating system information to conserve energy without adversely 
affecting the work that the user is doing. 
 
 
 
 
 
 
21
 
 
 
CHAPTER 4 
DIRECTED A-CYCLIC GRAPH DAG 
 
We define a DAG as an a-cyclic graph with nodes representing tasks and edges 
representing execution precedence between tasks. A weight is associated to each node 
and edge. The node weight represents the task execution time and the edge weight 
represents the communication time between connected tasks. This communication time is 
zero if the tasks are executed on the same processor. Each DAG has a root node which is 
a node with no incoming edges and a sink node which is a node with no outgoing edges. 
 
Figure 4.1. A sample DAG 
 
  1   3   4 
  5 
  8 
  7   6 
  1 
  2 
19 
13 
9 
17 
14 
5 
7 
22 
19 
11 
9 
 
22
The DAG structure occurs in many regular and irregular applications such as 
Cholesky factorization, LU decomposition, Gaussian elimination, FFT, Laplace 
transforms, and instruction level parallelism.  
Along the lines of [Bask03] a DAG is represented by the tuple G=(V, E, M, T, C, 
P) where:  
? V is the set of n nodes. 
? E is the set of e edges between the nodes. 
? M is a set of m machines or processors.  
? E(n, c) is an edge between nodes n and c.  
? T is the set of costs T(n,k), represents the computational time of task n on 
machine k.  
? C is the set of costs C(n,c), which represents the communication cost 
associated with the edges E(n,c). Since intra-processor communication is 
insignificant compared to inter-processor communication, C(n,c) is considered 
to be zero if n and c are executed on the same processor.  
? P is the set of costs P(n,k), which presents the consumed power when task n is 
executed on processor k.  
The length of a path is defined as the sum of node and edge weights in that path.  
Node n is a predecessor of node c if there is a directed edge originating from n 
and ending at c. In figure 4.1, node 1 is a predecessor of nodes 2, 3, 4 and 7.  
Likewise, node s is a successor of node n if there is a directed edge originating 
from n and ending at s. From figure 4.1, node 6 is a successor of nodes 1, 2, and 3. We 
can further define pred(n) as the set of all predecessors of n and succ(n) as the set of all 
successors of n as an example pred(6)={1,2,3} and succ(6)={8}. 
 An ancestor of node n is any node c that is contained in pred(n
 
), or any node a, 
that is also an ancestor of any node c contained in pred(n ).  
The earliest execution start time of node n on processor k is represented as 
EST(n,k). Likewise, the earliest execution finish time of node n on processor k is 
represented as EFT(n
,
k). EST(n) and EFT(n) represent the earliest start time upon any 
processor and the earliest finish time upon any processor, respectively. R
nk
 is defined as 
the earliest time that processor k will be ready to begin executing task n. We can 
mathematically define these terms as follows: 
EST(n, k ) = max{ R
nk
 , EFT(c, m)+C(c, n )},Where, c?pred(n) 
EFT(c, m) = EFT(c) and C(c, n ) = 0   when k = m, 
EFT(n, k ) = T (n, k )+EST(n, k ),  
EST(n ) = min (EST(n, k ), k?M), 
EFT(n ) = min (EFT(n, k ), k?M). 
 
The maximum clause finds the latest time that a predecessor?s data will arrive at 
processor k. If the predecessor finishes earlier on a processor other than k, 
communication cost must also be included in this time. In other words, the earliest start 
time of any task n on processor k, EST(n,k) is the maximum of times at which processor k 
becomes available and the time at which the last message arrives from any of the 
predecessors of n. 
 
23
 
24
The main goal for any scheduling technique is to minimize the makespan of the 
DAG. The makespan is defined as the time at which all nodes finish executing. In our 
case, the makespan will be equal to EFT(y), where y is the exit node in the graph. From 
Figure 4.1 the makespan is EFT(8). 
The critical path (CP) is the longest path from an entry node to an exit node. The 
critical path excluding communication cost (CPEC) is the longest path from an entry 
node to an exit node, not including the communication cost of any edge traversed. In our 
work we assume that each task?s mean execution cost across all processors is used to 
calculate CP while each task?s minimum execution cost from any processor is used to 
calculate the CPEC.  
The top distance for a given node is the longest distance from an entry node to the 
node, excluding the computation cost of the node itself. The bottom distance for a given 
node is the longest distance from the node to an exit node. Again we assume that each 
task?s mean execution cost is used to calculate the top distance and bottom distance. The 
bottom distance is also referred to as the upper rank or the blevel.  
The Decisive Path (DP) is defined as the top distance of a given node plus the 
bottom distance of the node. The DP is defined for every node in the DAG. The critical 
path, CP, then becomes the largest DP for an exit node. 
 
4.1 Performance 
Our algorithms are for scheduling directed acyclic weighted task graph running 
on a bounded number of heterogeneous processors with the twin objectives of 
 
25
minimizing the amount of energy consumed and minimizing the finish time.  
In other words, tasks arrive with given execution time and need to meet certain 
execution deadlines as well as minimize the consumed energy. 
Using simulations, we evaluated the performance of EADAGS and EAGS-D. The 
first test suite consists of random directed a-cyclic graphs.  The input parameters used to 
generate the graphs were: 
? Number of nodes (tasks) in the graph, n. 
? Shape parameter of the graph, ?. If ? = 1.0, the graph is balanced. A DAG with 
high parallelism can be generated by selecting ? >> 1. Whereas ? << 1 will 
generate a long DAG with small degree of parallelism. 
? Out-degree of a node, out-degree, represents the average number of outgoing 
edges from each node. Each node?s out-degree is randomly generated from a 
uniform distribution with mean equal to out-degree. 
? Communication to Computation ratio, CCR. CCR is the ratio of the average 
communication to average computation cost. If a DAG?s CCR is less than 1, it is a 
computation-intensive application; if it?s CCR is much greater than 1, it is 
communication-intensive. 
? Computation Range, ?, represents the range of computation costs on processors. 
A high ? causes significant difference of node?s computation costs among 
processors, whereas a low ? means that the expected execution times of a node on 
any processor are almost equal.  
? Processor to node ratio, PNR, represents the availability of processors with 
 
26
respect to number of nodes. A PNR of 100% means the number of processors is 
equal to the number of nodes. 
In generating random DAGs, the parameters were varied as follows: 
       n= {10, 20, 40, 60, 80, 100, 500, 1000} 
CCR = {0.1, 0.5, 1, 5, 10} 
? = {0.5, 1, 2} 
Out-degree = {1, 2, 3, 4, 5, 100} 
? = {0.1, 0.25, 0.5, 0.75, 1.0} 
PNR = {25%, 50%, 100%} 
These values produced 10,800 DAGs, which were repeated for both presented 
algorithms, EADAGS and EAGS-D.  
The second test suite to evaluate the performance of EADAGS and EAGS-D used 
task graphs of real world problems, specifically Gaussian elimination [Wu90], Molecular 
dynamic code [Chun92], Sieve of Eratosthenes, and Fast Fourier Transform (FFT). For 
these problems, the shape of the DAG is fixed and the only parameters we changed were 
the number of available processors and CCR. 
Processors were assumed to have three different operating voltage levels and five 
operating strategies based on the Motorola CMOS 6805 microcontroller, which is rated at 
6 MHz at 5.0 Volts, 4.5 MHz at 3.3 Volts, and 3 MHz at 2.2 Volts. First operating 
voltage was 5V when using this voltage if the processor becomes idle, it shuts down. We 
will refer to this operating voltage as 5V/off.  This level is used for reference only since it 
has physical limitations. The other two operating voltages are 2V and 3.3V, which slow 
 
27
the processor during task execution and during processor?s idle times.  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
 
 
CHAPTER 5 
SCHEDULING ALGORITHIMS 
 
Many scheduling algorithms for scheduling DAGs in a distributed computing 
environment have been proposed as has been mentioned in Chapter 2. However, the 
problem of discovering the schedule that gives the minimum finish time is NP-Complete. 
Among all these scheduling algorithms, some prove to perform better in term of finish 
time or makespan.  
In this work two new scheduling algorithms for scheduling DAGs on distributed 
computing systems were presented: Energy Aware DAG Scheduling (EADAGS) and 
Energy Aware Graph Scheduling with Duplication (EAGS-D). 
 
5.1 EADAGS algorithm 
A new algorithm for scheduling DAGs on distributed computing systems has 
been introduced. EADAGS combines Decisive Path Scheduling (DPS) with DVS to 
minimize both finish time and energy consumption.  DPS [Park97], since it is one of the 
most efficient algorithms, was chosen.  The new algorithm is called Energy Aware DAG 
Scheduling (EADAGS). It consists of two phases. In the first phase, after DPS is run on 
the DAG to provide a low finish time, the energy consumed is estimated for all 
 
29
processors. In the second phase, voltage scaling is applied during slack times to reduce 
energy while maintaining the schedule length.  
EADAGS transforms a DAG to one with a single entry node and a single exit 
node, if not so already. This transformation is accomplished by adding a dummy entry 
node and/or exit node with zero costs. Next, the top and bottom distances from each node 
are calculated. The top and bottom distances are calculated using the mean computation 
value for each node.  After building the DP for each node, EADAGS begins creating the 
scheduling queue, ScheduleQ, in a top-down fashion starting with the DAGs entry node 
and traversing down the CP (which is the DP of the exit node). Nodes are prioritized 
based on the lengths of their DPs. The priorities are decided as follows: EADAGS puts 
the CP nodes into the ScheduleQ in the ascending order of their top-distances. A node is 
added to the queue only if all its predecessors have been added. If not, EADAGS 
attempts to schedule its predecessors first. The first predecessors added to the queue are 
those included in the nodes? DP other are sorted and added to ScheduleQ in increasing 
top-distance.  
Next, EADAGS assign tasks in ScheduleQ to processors. At each step of the 
assignment, the selected processor provides the earliest finish time for the task under 
consideration, taking into account all the communications from the task?s parents. If EFT 
of the exit node is larger than the sum of all the computation costs of the nodes on the 
best processor, EADAGS assigns all nodes to that processor and exits. The time 
complexity of first phase of EADAGS is O(n
2
).  
 
Next, EADAGS computes the consumed energy. The total energy consumed 
when no voltage scaling is used, E
1
 is first calculated by the following equations:  
E
1
 = T  ? 
?
?Mk
1
)(P k  
2
)(
2
1
1
fV
kP =
 
where:   
? P
1
(k) is the amount of power consumed by processor k?M, 
? f  is the operating frequency of machine k, 
? V
1 
is the operating voltage of machine k, 
? T is the makespan 
In the second phase of EADAGS, voltage scaling is applied to all processors 
during their idle times by reducing the execution rate to f
2
 by lowering the voltage to a 
predetermined level V
2
.  Such voltage scaling is applied to a task only if slowing its 
execution would not increase the makespan. We also reduce the voltage level of 
processors during all remaining slack times.  The total energy consumed after applying 
voltage scaling is E
2 
= ? E
2
(k) where:  
 
? 
2
)(
2
222
2
111
2
VfTVfT
kE
+
=  represents the energy consumed by processor k?M 
when voltage scaling is used, 
?  is the total task and communication time when operating 
at V
1
, 
??
+=
iji
jiCiTT ),()(
1
 
30
? T(i) is the computation time of task i on the chosen processor, 
? C(i, j) is the communication cost between tasks i  and j if i and j are not scheduled 
on the same processor,  
? T
2
 is the total time processor k operates at V
2
  (includes idle times during which 
the processor operates at V
2
). 
 
A non blocking send protocol has been assumed in which only the sending 
processor has to process the communication while the receiving processor has a buffer to 
receive all transmitted data without interrupting its job.  The difference between E
1 
and E
2
 
represents the energy that could be saved. The percentage average energy savings = 
100
1
21
?
?
E
EE
. A high level description of EADAGS appears in Table 5.1. 
 
 
 
 
 
 
 
 
 
 
 
 
31
Table 5.1. EADAGS algorithm 
 
Let G represent a DAG 
Let M be the set of m processors in the system 
 
EADAGS 
 Transform G to a DAG with a single entry node and a single exit node 
 Compute DP of each node n?G     
 //DP of the exit node is the critical path, CP 
 Fill ScheduleQ with nodes 
//Starting from the entry node traversing CP in increasing top-distance.   
 while ScheduleQ ? ? do 
  i ? head ( ScheduleQ)      
   Schedule i on processor p?M that provides earliest finish time of i.  
                       Remove i from ScheduleQ 
 end while 
if scheduling all nodes on the fastest processor provides a shorter makespan,     
do so and discard prior schedule 
T? makespan 
            Total energy consumed before voltage scaling 
2
2
1
1
fTV
E =  
Total energy consumed when employing voltage scaling, E
2 
= ScaledEnergy( ) 
end EADAGS 
  
ScaledEnergy( )    
// Returns the total amount of energy consumption on all processors when voltage scaling 
has been applied 
              for each processor p? M do        
       for each node n?G scheduled on p do //traverse first scheduled to last   
                         if (executing n on scaled voltage fits within the next slack) then 
               Scale down the operating voltage during execution of n  
 end if 
             end for  
            Energy consumed by processor p = sum of energy consumed by all nodes 
                        scheduled on p 
            end for 
 E = Sum of energy consumed by all processors 
return E 
end ScaledEnergy 
 
 
 
 
 
32
Figure 5.1 lists the notations used by EADAGS and procedures of EADAGS (EADAGS, 
AddQ, StartTime, and ScaledEnergy). 
 
 
 Let  
   
? G represent a DAG 
? y?G be the exit node of G 
? M  be the set of m processors in the system 
? R
k
 represent the ready time of machine k  
? r
n
 represent the ready time of node n  
? f  be the frequency of operation 
? s
ik 
 represent the start time of node i on machine k  
? Succ(n) represent the list of all successor nodes of node n?G 
? pred(n) is the list of all predecessors of node n?G  
? ScheduleQ queue of tasks in order of execution  
? T(i, k) represent the execution time of node i?G on machine k 
? C(n, c) represent the communication cost from node n to node c 
? EFT(i, k) represent the earliest finish time of node i?G on machine k 
? EFT(i) represent the scheduled finish time of node i?G 
? EFT
2
(i) represent the finish time for node i with the scaled down voltage 
? EST(i) represent the scheduled start time of node i?G 
? T represent the makespan of G 
? V
1
 be the voltage of operation  
? V
2
 be the scaled down voltage of operation  
? T
1
(n) and T
2
(n) represent the execution time for any node n?G before and after 
voltage scaling respectively 
? E(k, n) represent the energy consumed by processor k to execute node n  
? E
1
 represent the total energy consumption before voltage scaling 
? E
2
  is the total energy consumption after scaling down voltage 
? E
1
(k) and E
2
(k) represent energy consumed by processor k?M before and after 
voltage scaling respectively 
 
 
Figure 5.1.a. Notations for EADAGS 
 
 
 
 
 
 
33
 
 
EADAGS 
 Transform G to a DAG with a single entry node and a single exit node 
 Compute DP for each node n?G    // O(n
2
) 
 // DP of the exit node is the critical path, CP 
 // Fill ScheduleQ with nodes in CP in increasing top-distance. 
 ScheduleQ ? ?  
 for each node n?CP  do     // O(n
2
) 
            // Traverse in increasing top-distance.  
 
  ScheduleQ = addQ(n) 
 end for 
 //Schedule nodes in ScheduleQ to processors 
while ScheduleQ ? ? do     // O(n)   
   Pick the head node i in ScheduleQ 
  for each processor  k?M do    // O(m)   
      s
ik
 = StartTime (i, k) 
    EFT(i, k) = s
ik 
 + T(i, k) 
  end for 
  EFT(i) = min
k?M 
 EFT(i, k ) 
                        Schedule i on processor p?M that gave minimum earliest finish time   
                       Remove i from ScheduleQ 
 end while 
 if EFT(y) ?  
?
?
?
Gi
Mk
kiT ),(min
  Schedule all nodes on the processor p?M, which provided the minimum 
 end if 
 T?EFT (y)          // makespan 
            
2
2
1
1
fTV
E =   
 E
2
=ScaledEnergy( ) 
end EADAGS 
 
 
 
Figure 5.1.b. EADAGS procedure  
 
 
 
 
 
 
 
 
 
 
34
 
 
addQ(n)   
// Adds parents of node n?G and n to ScheduleQ 
// Returns ScheduleQ  
 for each parent b
 
of n not visited // in decreasing DP 
                 addQ(b) 
 end for 
  Add to n to ScheduleQ and mark it visited 
 return ScheduleQ 
end addQ 
 
 
Figure 5.1.c. addQ procedure 
 
 
StartTime (node n, machine k)     
// Returns the earliest available start time of node n?G on machine k M ?
 s
nk
 ? R
k
 
 for each parent b
 
of n do   // O(n)   
    s
nk
 = max (s
nk
, EFT(b)+C(b
 
, n))  // if b is scheduled on k, C(b, n)=0 
  end for 
        return s
nk
 
end StartTime 
 
 
Figure 5.1.d. StartTime procedure  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
ScaledEnergy( )     
// Returns the total amount of energy consumption after applying voltage scaling 
              for each processor k? M do     // O(m) 
       for each node n?G scheduled on k  do   //O(n) from first to last scheduled  
                     T
2
(n) = )(
1
2
2
2
1
nT
V
V
?  
                                EFT
2
(n) = s
nk
+ T
2
(n)  
            if (EFT
2
(n)+C(n ,c) < min EST(c) for each c?succ(n) then 
                // if c is scheduled on k, C(n, c) = 0 
  EFT(n) = EFT
2
(n)   //update EFT(n) 
                                               Let i be the node scheduled immediately after n on k 
                                // use scaled voltage, see Figure 5.2 
                              E(k, n)=T
2
(n)?V
2
2
+(EST(i)-EFT(n))?V
2
2 
                             else 
                                    //original voltage for execution and scaled voltage during idle time,  
                                    // see Figure 5.2 
                                     E(k, n)=T
1
(n)?V
1
2
+(EST(i)-EFT(n))?V
2
2 
 end if 
             end for  
              E
2
(k)=+E(k, n
 
)              
             end for 
             E
2
 =  
?
?Mk
kE )(
2
return E
2
 
end ScaledEnergy 
 
Figure 5.1.e. ScaledEnergy procedure 
 
 
36
 
 
Figure 5.2. Timeline for machine k 
 
n i 
V
1
 
T
1
(n)
n i 
V
2
 
T
2
(n)
EFT(n) 
EST(i) 
EFT
2
(n) 
Task execution time
Communication time
EST(i) 
 
37
5.2 EAGS-D Algorithm 
Energy Aware Graph Scheduling with Duplication (EAGS-D) combines HNPD 
and dynamic voltage scaling. The Heterogeneous N-Predecessor Duplication (HNPD) 
algorithm combines the techniques of insertion-based list scheduling with multiple task 
duplication to minimize schedule length. The performance of HNPD was proven to 
outperform two of the best existing heuristics, Heterogeneous Earliest Finish Time 
(HEFT) and Static Task Duplication Scheduling (STDS), in terms of finish time and the 
number of processors employed over a wide range of parameters [Bask03]. EAGS-D 
works as follows: EAGS-D assigns tasks to the best available processor according to the 
order of tasks in a scheduling queue called ScheduleQ. EAGS-D assigns highest priority 
to critical path nodes (CPN) and then to those predecessors of CPNs that include the CPN 
in their DP.  Among these predecessors it gives higher priority to nodes with higher DP 
values. This is because the nodes with the higher DP values are likely to be on longer 
paths. In the order of tasks in ScheduleQ, EAGS-D uses EFT(n) to select the processor 
for each task n. As it is an insertion-based algorithm, it calculates ready time of any 
machine k?M, R
k
 to be the earliest idle time slot large enough to execute T(n, k). In other 
words, it looks for a possible insertion between two already scheduled tasks on the given 
processor without violating precedence relationships. Once tasks have been assigned to 
processors, it attempts to duplicate predecessors of the tasks. Predecessors are selected 
for duplication from most favorite to least and by descending top distance. The goal of 
duplicating predecessors is to decrease the length of time for which the node is awaiting 
data by making use of the processor?s slack time. Predecessors of node n are duplicated 
on processor k, on which node n is scheduled on.  If the duplication result in a lower 
finish time, duplication is retained; otherwise it is discarded. If there is idle time between 
the recently assigned task n and the preceding task on processor k, EAGS-D attempts to 
duplicate each predecessor j. If j is not already scheduled on processor k, it is duplicated 
if EFT(j,k) is less than EFT(j)+C(j, n). The duplication is retained if EFT(n,k) decreases. 
Otherwise, it is discarded. The same duplication procedure is repeated for each 
predecessor in order of most favorite to least. After EAGS-D attempts to duplicate each 
predecessor, it recursively attempts to duplicate the predecessors of any duplicated tasks. 
Duplication recursively continues until no further duplication is possible.  
Then, EAGS-D computes the consumed energy. The total energy consumed when 
no voltage scaling is used, E
1
 is first calculated by the following equations:  
E
1
 = T  ? 
?
?Mk
1
)(P k  
2
)(
2
1
1
fV
kP =
 
where:   
? P
1
(k) is the amount of power consumed by processor k?M, 
? f  is the operating frequency of machine k, 
? V
1 
is the operating voltage of machine k, 
? T is the makespan 
 
In the second phase of EAGS-D, voltage scaling is applied to all processors 
during their idle times by reducing the execution rate to f
2
 by lowering the voltage to a 
 
38
predetermined level V
2
.  Such voltage scaling is applied to a task only if slowing its 
execution would not increase the makespan. We also reduce the voltage level of 
processors during all remaining slack times.  The total energy consumed after applying 
voltage scaling is E
2 
= ? E
2
(k) where:  
? 
2
)(
2
222
2
111
2
VfTVfT
kE
+
=  represents the energy consumed by processor k?M 
when voltage scaling is used, 
?  is the total task and communication time when operating 
at V
1
, 
??
+=
iji
jiCiTT ),()(
1
? T(i) is the computation time of task i on the chosen processor, 
? C(i, j) is the communication cost between tasks i  and j if i and j are not scheduled 
on the same processor, 
? T
2
 is the total time processor k operates at V
2
  (includes idle times during which 
the processor operates at V
2
), 
 
A non blocking send protocol has been assumed in which only the sending 
processor has to process the communication while the receiving processor has a buffer to 
receive all transmitted data without interrupting its job.  The difference between E
1 
and E
2
 
represents the energy that could be saved. The percentage average energy savings = 
100
1
21
?
?
E
EE
. A high level description of EAGS-D appears in Table 5.2. 
 
 
39
Table 5.2. EAGS-D algorithm 
 
Let  G represent a DAG 
Let M be the set of m processors in the system 
 
EAGS-D 
 Transform G to a DAG with a single entry node and a single exit node 
 Compute DP of each node n?G     
 //DP of the exit node is the critical path, CP 
 Fill ScheduleQ with nodes 
//Starting from the entry node traversing CP in increasing top-distance.   
 while ScheduleQ ? ? do 
  i ? head ( ScheduleQ)      
   Schedule i on processor p?M that provides earliest finish time of i.  
                       Remove i from ScheduleQ 
  Duplicate predecessors of i on p if doing so results in a shorter schedule 
  //duplicate executions performed within slacks of the schedule of p 
 end while 
if scheduling all nodes on the fastest processor provides a shorter makespan,     
do so and discard prior schedule 
T? makespan  
            Total energy consumed before voltage scaling 
2
2
1
1
fTV
E =  
Total energy consumed when employing voltage scaling, E
2 
= ScaledEnergy( ) 
end EAGS-D 
  
ScaledEnergy( )    
// Returns the total amount of energy consumption on all processors when voltage scaling 
// has been applied 
              for each processor p? M do        
       for each node n?G scheduled on p do //traverse first scheduled to last   
                         if (executing n on scaled voltage fits within the next slack) then 
               Scale down the operating voltage during execution of n  
 end if 
             end for  
                  Energy consumed by processor p = sum of energy consumed by all nodes 
                 scheduled on p 
            end for 
            E = Sum of energy consumed by all processors 
return E 
end ScaledEnergy 
 
 
 
40
 
 
Figure 5.3.a lists the notations used by EAGS-D while Figures 5.3.b, 5.3.c, 5.3.d, 5.3.e, 
5.3.f and 5.3.g shows the detailed algorithm procedures: EAGS-D, AddQ, StartTime, 
DuplicatePred, Duplicate, and ScaledEnergy respectively. 
 
Let   
? G represent a DAG 
? y?G be the exit node of G 
? M  be the set of m processors in the system 
? R
k
 represent the ready time of machine k  
? r
n
 represent the ready time of node n  
? f  be the frequency of operation 
? s
ik 
 represent the start time of node i on machine k  
? Succ(n) represent the list of all successor nodes of node n?G 
? pred(n) is the list of all predecessors of node n?G  
? ScheduleQ queue of tasks in order of execution  
? T(i, k) represent the execution time of node i?G on machine k 
? C(n, c) represent the communication cost from node n to node c 
? EFT(i, k) represent the earliest finish time of node i?G on machine k 
? EFT(i) represent the scheduled finish time of node i?G 
? EFT
2
(i) represent the finish time for node i with the scaled down voltage 
? EST(i) represent the scheduled start time of node i?G 
? T represent the makespan of G 
? V
1
 be the voltage of operation  
? V
2
 be the scaled down voltage of operation  
? T
1
(n) and T
2
(n) represent the execution time for any node n?G before and after 
voltage scaling respectively 
? E(k, n) represent the energy consumed by processor k to execute node n  
? E
1
 represent the total energy consumption before voltage scaling 
? E
2
  is the total energy consumption after scaling down voltage 
? E
1
(k) and E
2
(k) represent energy consumed by processor k?M before and after 
voltage scaling respectively 
 
 
Figure 5.3.a. EAGS-D notations 
 
 
 
 
 
41
 
 
EAGS-D 
 Transform G to a DAG with a single entry node and a single exit node 
 Compute DP for each node n?G    // O(n
2
) 
 // DP of the exit node is the critical path, CP 
 // Fill ScheduleQ with nodes in CP in increasing top-distance. 
 ScheduleQ ? ?  
 for each node n?CP  do     // O(n
2
) 
            // Traverse in increasing top-distance.  
 
  ScheduleQ = addQ(n) 
 end for 
 //Schedule nodes in ScheduleQ to processors 
 while ScheduleQ ? ? do     // O(n)   
   Pick the head node i in ScheduleQ 
  for each processor  k?M do    // O(m)  
    s
ik
 = StartTime (i, k) 
    EFT(i, k) = s
ik 
 + T(i, k) 
  end for 
   EFT(i) = min
k?M 
 EFT(i, k ) 
                        Schedule i on processor p?M that gave minimum earliest finish time   
                       Remove i from ScheduleQ 
  // Duplicate predecessors of i on p if it results in a shorter schedule 
  DuplicatePred (i, pred(i), p)  
 end while 
 if EFT(y) ?  
?
?
?
Gi
Mk
kiT ),(min
  Schedule all nodes on the processor p?M, which provided the minimum 
 end if 
 T?EFT (y)         // makespan 
            
2
2
1
1
fTV
E =   
 E
2
=ScaledEnergy( ) 
 
end EAGS-D 
 
 
Figure 5.3.b. EAGS-D procedure 
 
 
 
 
 
 
 
 
42
 
 
 
 addQ(n)   
// Adds parents of node n?G and n to ScheduleQ 
// Returns ScheduleQ  
 for each parent b
 
of n not visited // in decreasing DP 
                 addQ(b) 
 end for 
  Add to n to ScheduleQ and mark it visited 
 return ScheduleQ 
end addQ 
 
 
Figure 5.3.c. addQ procedure 
 
 
 
StartTime (node n, machine k)     
// Returns the earliest available start time of node n?G on machine k M ?
 s
nk
 ?  R
k
 
 for each parent b
 
of n do   // O(n)   
    s
nk
 = max(s
nk
, EFT(b) + C(b
 
, n))  // if b is scheduled on k, C(b, n)=0 
  end for 
        return s
nk
 
end StartTime 
 
 
Figure 5.3.d. StartTime procedure 
 
 
 
DuplicatePred (node n,  pred(n), machine k) 
 for each q?pred (n)    // in order from most favorite to least 
     Duplicate (n, q, k) 
    if q was duplicated then 
     DuplicatePred (q, pred(q), k) 
   end if 
  end for 
end DuplicatePred 
 
 
Figure 5.3.e. DuplicatePred procedure 
 
 
 
 
43
 
 
 
Duplicate (node n, node i, machine k) 
// Attempts to duplicate node n, a predecessor of node i, on machine k which executes node i 
  if n has not been scheduled on k then 
   r
nk
 =StartTime (n, k) 
   // Attempt to insert node n in the first available slack in the schedule of  
                        Processor k?M 
   for each node j
  
already scheduled on k?M do  
               //scan from last to first scheduled  
     if j? pred(n) then 
      if j is the first node scheduled on  k then 
      if EST(j, k
 
) >= r
nk
 +T(n, k) then  //see Figure 5.4 
       s
nk 
 = r
nk
      
                                                      end if 
            Let p be the node immediately scheduled prior to j on k  
     else if(EST(j, k)-EFT(p, k)>= T(n,k))and(EFT(p, k)>= r
nk
) then 
       s
nk 
 = EFT(p, k
 
)  // see Figure 5.5 
      end if 
     EFT(n, k) = s
nk
+ T(n, k) 
     end if 
   end for 
   //If EFT(i) did not decrease due to duplication, discard duplication 
   if EFT(n ,k) < min
succ(n)
(EFT(n)+C(n,succ(n)))then 
     Insert and schedule n on k 
     Recalculate EFT(i, k) 
     if EFT(i ,k) improves then 
      Keep n scheduled on k 
     else 
      Discard the duplication of node n on k 
     end if  
   end if 
  end if 
end Duplicate 
 
  
Figure 5.3.f. Duplicate procedure 
 
 
 
 
 
 
 
 
44
 
 
ScaledEnergy( )     
// Returns the total amount of energy consumption after applying voltage scaling 
              for each processor k? M do      // O(m) 
       for each node n?G scheduled on k  do   //O(n) from first to last scheduled  
                     T
2
(n) = )(
1
2
2
2
1
nT
V
V
?  
                                EFT
2
(n) = s
nk
+ T
2
(n)  
            if (EFT
2
(n)+C(n ,c) < min EST(c) for each c?succ(n) then 
                // if c is scheduled on k, C(n, c) = 0 
  EFT(n) = EFT
2
(n)   //update EFT(n) 
                                               Let i be the node scheduled immediately after n on k 
                                // use scaled voltage, see Figure 5.6 
                              E(k, n)=T
2
(n)?V
2
2
+(EST(i)-EFT(n))?V
2
2 
                             else 
                                    //original voltage for execution and scaled voltage during idle time,  
                                    // see Figure 5.6 
                                     E(k, n)=T
1
(n)?V
1
2
+(EST(i)-EFT(n))?V
2
2 
 end if 
             end for  
              E
2
(k)=+E(k, n
 
)              
             end for 
             E
2
 =  
?
?Mk
kE )(
2
return E
2
 
end ScaledEnergy 
 
Figure 5.3.g. ScaledEnergy procedure 
 
 
45
 
 
Task execution time
Communication time 
Figure 5.4. Timeline for machine k (case 1) 
 
n j i 
T(n,k) 
?
r
n
 
EST(j,k) 
Already scheduled 
 
EST(i,k) 
Already scheduled 
 
 
EST(j,k) 
Already scheduled 
EFT(p,k) 
Already scheduled 
 
r
n
 
n
j 
p 
T(n,k) 
Figure 5.5. Timeline for machine k (case 2) 
 
 
46
 
 
Figure 5.6. Timeline for machine k  
 
5.3 Voltage scaling Strategies 
Executing any task with a slower rate will require longer time to finish. And since 
we are dealing with DAGs in which tasks are depending on other tasks. Changing the 
finish time of a task may affect the start time of its predecessors resulting in a change of 
the schedule makespan.   
In our algorithm we want to make use of processor?s slack time to save energy 
without changing the makespan of the original scheduling algorithms. Processors were 
assumed to have three different operating voltage levels based on the Motorola CMOS 
n i 
V
1
 
T
1
(n)
n i 
V
2
 
T
2
(n)
EFT(n) 
EST(i) 
EFT
2
(n) 
Task execution time
Communication time 
EST(i) 
 
47
6805 microcontroller, which is rated at 6 MHz at 5.0 Volts, 4.5 MHz at 3.3 Volts, and 3 
MHz at 2.2 Volts. We tested for five different strategies: 
1. 5V/off: execute tasks with regular voltage level and turn off processors during 
their idle time. This technique is used for comparison reasons only since 
turning processors off each time they incur slack time has physical limitations. 
2. 2V during idle: involves executing tasks with regular execution rate and 
lowering the processor?s voltage level to 2V only during its idle time. 
3. 3.3V during idle: involves executing tasks with regular execution rate and 
lowering the processor?s voltage level to 3.3V during all its idle time. 
4. 2V scale: we check if the slack time is long enough to scale down the 
execution rate of tasks to 2V without altering the finish time. If so, execute the 
task with 2V rate and also lower the voltage during the remaining idle time. If 
not, execute task with regular voltage rate and lower the voltage rate during 
the idle time.  
5. 3.3V scale: we check if the idle time is long enough to lower the execution 
rate to 3.3V of tasks without affecting the makespan. If so, execute the task 
with 3.3V rate and also lower the voltage during the remaining idle time. If 
not, execute task with regular voltage rate and lower the voltage rate during 
the idle time.  
 
 
 
 
48
6.3.1. Voltage scaling algorithm 
The task in Figure 5.7 has finish time of t
1
 when executed with V
1
 and a finish 
time of t
2 
when executed with V
2
.  This task can be executed with the lower voltage level 
only if t
2 
< min (start time of all its children).   
For the task in Figure 5.7(a), the execution rate can be slowed down to V
2
 and 
reduce the amount of energy consumed, while the task in Figure 5.7(b) cannot be scaled 
down to V
2
 since the new finish time is greater than the earliest children?s start time. So 
tasks need to be executed with the original voltage level. The voltage for the processors 
will be scaled down during idle time x only. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
Children?s 
earliest start 
time 
t
1
 original 
finish time 
t
2
 new finish 
time with 
scaled voltage 
      V
1
  
      V
2
  
T 
(a) 
 
Children?s 
earliest start time 
t
1 
original finish 
time 
t
2
 new finish 
time with scaled 
voltage
       V
1
  
       V
2
  
T  
x 
(b) 
 
Figure 5.7. Different finish times with different execution rates 
 
Although we choose to do the energy scaling as a second phase after the 
scheduling of all tasks, it could have been done in parallel with the scheduling phase 
resulting in the same schedule and the same amount of energy savings.  
This could be explained as follows: each task is scheduled on the processor that 
results in the earliest finish time. The finish time of each node on each processor is  
 
50
determined by the first available start time of that node on that processor and the 
execution time of the node on that specific processor, taking into consideration the finish 
time of its predecessors and all communication cost between the task and any other node. 
Then its successors are scheduled based on the same criteria. So immediately after the 
scheduling of all the task children on the appropriate processors, the decision for scaling 
down this task can be made based on the finish time of the task and the start time of its 
successors. A task can be scaled down only if its finish time after executing with a lower 
voltage level is earlier than the start time of all its successors. Executing the task with the 
scaled down voltage level would not affect the scheduling decision for the next nodes 
since it only affect the finish time of the scaled down task.  
The advantage of doing it as a second phase is that we do not have to check if all 
the task?s successors have been scheduled before deciding whether to scale that task 
down or not resulting in a lower complexity algorithm. 
 
 
 
 
 
 
 
 
 
 
51
 
 
 
CHAPTER 6 
RESULTS AND DISCUSSION  
 
 The experimental simulation estimates the energy gains of using voltage scaling 
techniques on scheduling DAG tasks on multiple processors on top of two of the best 
known high performing scheduling algorithms, DPS and HNPD.  
As been stated in Chapter 5, processors were assumed to have three different 
operating voltage levels based on the Motorola CMOS 6805 microcontroller, which is 
rated at 6 MHz at 5.0 Volts, 4.5 MHz at 3.3 Volts, and 3 MHz at 2.2 Volts [Weis94].  
First operating voltage was 5V; when using this voltage, if the processor becomes idle, it 
shuts down. We will refer to this operating voltage as 5V/off.  This level is used for 
reference only since it has physical limitations. The other two operating voltages are 2V 
and 3.3V, which slow the processor during task execution.   
 The algorithms were tested using random DAGs and DAG for real world 
problems; more specifically we tested on DAGs for   Molecular Dynamics code, 
Gaussian elimination, Sieve of Eratosthenes, and Fast Fourier Transform.  
 
6.1 Results for random DAGs 
 A set of random generated DAGs were generated to evaluate the voltage scaling 
 
52
technique. Those DAGs had different input parameters (number of nodes, communication 
to computation ratio, shape parameter, out-degree, and computation range) varied as 
follows: 
n = {10, 20, 40, 60, 80, 100, 500, 1000} 
CCR = {0.1, 0.5, 1, 5, 10} 
? = {0.5, 1, 2} 
out-degree = {1, 2, 3, 4, 5, 100} 
? = {0.1, 0.25, 0.5, 0.75, 1.0} 
PNR = {25%, 50%, 100%} 
The above values produce 10,800 DAGs, which were repeated for both presented 
scheduling algorithms, EADAGS and EAGS-D. 
 
6.1.1 Results for EADAGS 
The amount of energy consumed was measured for DPS and EADAGS for 
different test sets of random DAGs. Then, test sets were created by combining results 
from DAGs with similar properties, such as the number of nodes or the CCR.  
The amount of energy consumed were measured for five different strategies; 
5V/off, 2V during idle, 3.3V during idle, 2V scale and 3.3V scale (explained in Chapter 
5).  
The first test set was achieved by combining DAGs with respect to number of 
nodes. The energy saving was averaged over DAGs with varying CCR, ?, ?, out-degree, 
and PNR.  
Figure 6.1 shows the average energy saved by EADAGS over DPS with respect to 
number of nodes. The percentage of energy savings increased with increasing number of 
nodes. Average energy savings were 30% for a DAG of 10 nodes and the savings 
gradually increases to 46% for 1000 node DAGs. Larger numbers of nodes increase the 
chance of dependency between tasks, causing a slight increase in the processor wait time 
and accordingly increasing the energy savings.  
Table 6.1 lists the makespan and the average energy savings with respect to 
number of nodes. Average energy savings ranges between 30% and 46% for the 5V/off 
technique,  26% and 46% when 2V during idle is used, 24% and 46% when using 3.3V 
during idle, 29% and 46 % when 2V scale is used, and 28% and 46% when 3.3V scale is 
used. 
        
0%
10%
20%
30%
40%
50%
60%
10 20 40 60 80 100 500 1000
Number of Nodes
A
ver
age E
n
er
gy S
avin
g
s
5V/off
2V During idle 
3.3V During idle
2V Scale
3.3V Scale
 
Figure 6.1. Average energy savings with respect to number of nodes 
 
53
 
 
54
 
 
Number 
of nodes 
 Makespan 
Percentage of energy savings 
5V/off  
2V  
during 
idle 
3.3V 
during 
idle 
2V  
Scale 
3.3V  
Scale 
10 174.7 30.28% 26.53% 24.09% 29.98% 28.84%
20 175.27 34.32% 32.6% 31.48% 33.63% 32.45%
40 321.67 40.2% 39.18% 38.51% 40.12% 39.57%
60 404.36 41.2% 40.54% 40.11% 40.93% 40.47%
80 525.5 43.14% 42.56% 42.19% 42.91% 42.51%
100 674.65 43.8% 43.39% 43.12% 43.64% 43.35%
500 835.29 45.94% 45.85% 45.8% 45.82% 45.28%
1000 1127 46.29% 46.25% 46.22% 46.88% 46.03%
 
Table 6.1. Makespan and average energy savings with respect to number of nodes 
 
The second test set combines DAGs with respect to CCR. The average energy 
savings were averaged over different DAGs with varying n, ?, ?, out-degree, and PNR. 
In Figure 6.2 the average energy savings has been plotted with respect to CCR.  
The average energy savings increased with increasing CCR. When CCR increases, 
processors incur longer idle times due to communication between tasks. Our algorithm 
was able to use such idle times to achieve energy savings.  
Table 6.2 shows the makespan and the average energy savings for five different 
CCR values. The average energy savings over DPS ranges from 35% for CCR=0.1 to 
51% when CCR = 10 for 5V/off technique. Savings are smaller for 2V during idle; they 
range between 25% and 38%. Savings are even smaller for the 3.3V during idle time; 
they are 19% for CCR = 0.1 and 30% for CCR = 10.  Average energy savings are 28% for 
CCR = 0.1 and 44.7% when CCR = 10 for 2V scale, while for 3.3V scale energy 
savings is 22.8% for CCR = 0.1 and 40% when CCR = 10.  
   
0%
10%
20%
30%
40%
50%
60%
0.1 0.5 1 5 10
CCR
Avera
g
e En
erg
y
 
S
a
v
i
n
g
s
5V/off
2V During idle
3.3V During idle
2V Scale
3.3V Scale
 
Figure 6.2. Average energy savings with respect to CCR 
 
 
CCR  Makespan 
Percentage of energy savings 
5V/off  
2V  
During idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
0.1 117 35.17% 25.53% 19.27% 28.33% 22.87%
0.5 124 36.93% 26.95% 20.45% 29.42% 24.77%
1 133.89 37.25% 27.2% 20.66% 29.69% 25.06%
5 209.89 45.84% 34.07% 26.42% 41.98% 38.05%
10 280 51.4% 38.52% 30.15% 44.73% 40.11%
 
Table 6.2. Makespan and average energy savings for different CCR values 
 
 
 
 
 
55
 
  The third test set combines DAGs with respect to processor to node ratio, PNR. 
The average energy savings is averaged over randomly generated DAGs with varying n, 
CCR, ?, ?, and out-degree. Figure 6.3 shows the average energy saving with respect to 
PNR.  Figure 6.3 shows decrease in the average energy savings with increasing PNR. 
This is because increasing number of processors allows several parallel task executions, 
thus minimizing the wait times. The average energy savings measured was 47% for PNR 
= 25% in the 5V/off technique, 41% when processors operate at 2V during idle, 38% if 
they operate at 3.3V when idle, 42% when 2V scale is used, and 39% when 3.3V scale is 
used. But for PNR = 100% the average energy savings for all three operating levels are 
almost equal to 33% , which is due to the significant decrease in the processor?s wait 
time. 
. 
0%
10%
20%
30%
40%
50%
60%
0.25 0.5 1
Processor Node Ratio
A
ver
ag
e E
n
er
gy
 S
aving
s
5V/off
2V During Idle
3.3V During idle 
2V Scale
3.3V Scale
 
Figure 6.3. Average energy savings with respect to PNR 
 
56
Another test set combined DAGs with respect to shape parameter ?. The average 
energy savings were computed over 36,000 randomly generated DAGs with varying n, 
PNR, CCR, ?, and out-degree. 
 In Figure 6.4 the average energy savings for the five operating strategies were 
plotted with respect to ?. The results in this figure indicate that the overall energy savings 
marginally increased with increasing ?. Increasing ? increases parallelism in the DAG, 
resulting in more idle time for the processors due to the task dependency. This time can 
be used to reduce the consumed energy.  The average energy savings was measured as 
24% for ? =  0.5, 25% for ? = 1 and 25.5% when ?  =  2.  
0%
10%
20%
30%
40%
50%
60%
0.5 1 2
shape Parameter
Av
er
a
g
e E
n
e
r
g
y
 
S
a
vin
g
s
5V/off
2V During idle
3.3V During idle
2V Scale
3.3V Scale
 
Figure 6.4. Average energy savings with respect to ? 
 
 
 
 
57
 
58
 
Shape 
parameter 
Makespan 
 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
scale 
3.3V Scale 
0.5 110.87 34.19% 24.75% 18.62% 28.47% 22.54%
1 189.87 34.38% 24.9% 18.74% 29.01% 23.71%
2 219.4 35.22% 25.58% 19.31% 29.33% 24.37%
   
Table 6.3. Makespan and average energy savings for different shape parameters 
 
 
The last test set was for out-degree. The average energy savings were averaged 
from 21,600 randomly generated DAGs with varying n, ?, ?, CCR, PNR.  
Figure 6.5 shows the average energy savings with respect to five values for out-
degree. The results in this figure indicate that increase in out-degree results in smaller 
average energy savings. A larger out-degree allows many processors to run in parallel.  
Table 6.4 lists the makespan and average energy savings with respect to out-
degree. The amount of energy that could be saved ranges between 44% and 13% for 
processors using 5V/off, 32% and 8.5% for 2V during idle, 25% and 5% for 3.3V during 
idle, 36% and 10% when 2V scale is used, and 29% and 6% for a 3.3V scale.  
0%
10%
20%
30%
40%
50%
60%
1234510
Out Degree
A
ver
age
 E
n
er
gy savings
5V/off
2V During idle
3.3V During idle
2V Scale
3.3V Scale
 
Figure 6.5. Average energy savings with respect to out-degree 
 
Out-degree Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
scale 
3.3V Scale 
1 233.97 44.24% 32.79% 25.35% 36.29% 29.44%
2 321.67 42.67% 31.54% 24.3% 34.07% 26.36%
3 166.17 32.55% 23.44% 17.52% 26.18% 21.54%
4 194.29 29.53% 21.02% 15.5% 23.37% 19.28%
5 203.6 16.17% 10.33% 6.54% 14.71% 9.39%
100 157.13 13.83% 8.47% 4.98% 10.23% 6.41%
 
Table 6.4. Makespan and average energy savings with respect to out-degree 
 
 
 
59
 
60
6.1.2 Results for EAGS-D 
The amount of energy consumed was measured for HNPD and EAGS-D for 
different test sets of random DAGs. Test sets were created by combining results from 
DAGs with similar properties, such as the number of nodes or CCR.  
The amount of energy consumed was measured for the five different strategies 
explained before, 5V/off, 2V during idle, 3.3V during idle, 2V scale, and 3.3V scale. 
The first test set was achieved by combining DAGs with respect to number of 
nodes. The energy saving was averaged over DAGs with varying CCR, ?, ?, out-degree, 
and PNR. Figure 6.6 shows the average energy saved by EAGS-D over HNPD with 
respect to number of nodes.  
The percentage of energy savings increased with increasing the number of nodes. 
Average energy savings is 15% for a DAG of 10 nodes and the savings gradually 
increases to 30% for 1000 node DAGs. Larger numbers of nodes increase the chance of 
dependency between tasks, causing a slight increase in the processor wait time and 
accordingly increasing energy savings.  
Average energy savings ranges between 15% and 30% for processor using 5V/off 
technique, 13% and 30% for 2V during idle, 12% and 30% when using 3.3V during idle, 
14% and 31% for 2V scale, and 12% and 30% for 3.3V scale. 
 
0%
10%
20%
30%
40%
50%
10 20 40 60 80 100 500 1000
Number of Nodes
Ave
rag
e E
n
erg
y S
avi
n
g
s
5V/off
2V during idle
3.3 V during idle
2V Scale
3.3V Scale
 
Figure 6.6. Average energy savings with respect to number of nodes 
 
Number of 
Nodes 
makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
scale 
3.3V Scale 
10 164.3 15.24% 13.36% 12.13% 14.61% 12.75%
20 169.02 15.65% 14.79% 14.05% 15.31% 14.55%
40 318.07 26.02% 25.36% 24.85% 15.75% 15.57%
60 416.31 28.81% 28.33% 27.99% 28.61% 28.27%
80 537.6 30.43% 30% 29.41% 30.26% 31.07%
100 668.29 29.88% 29.6% 29.8% 29.77% 30.58%
500 799.83 30.67% 30.66% 30.3% 31.89% 30.78%
1000 1345.7 30.88% 30.78% 30.55% 31.09% 30.89%
 
Table 6.5. Makespan and average energy savings with respect to number of nodes 
 
 
 
61
 
62
The second test set combines DAGs with respect to CCR. The average energy 
savings is averaged for DAGs with varying n, ?, ?, out-degree, and PNR values.  
In Figure 6.7 the average energy savings has been plotted with respect to CCR.  
The average energy savings increased with increasing CCR. When CCR increases, 
processors incur longer idle times due to communication between tasks. Our algorithm is 
able to use such idle times to achieve energy savings.  
The average energy savings over HNPD listed in Table 6.6 ranges from 8% for 
CCR = 0.1 to 43% when CCR = 10 for 5V/off technique. Savings are smaller for 2V 
during idle; they range between 4% and 39%. Savings are even smaller for the 3.3V 
during idle; they are 2% for CCR = 0.1 and 32% when CCR = 10. Savings are higher for 
2V scale than 2V during idle; they range from 6% when CCR = 0.1 to 50% for CCR = 10. 
And finally the savings are 4% for CCR = 0.1 and 48% when CCR = 10 for 3.3V scale.  
 
0%
10%
20%
30%
40%
50%
60%
0.1 0.5 1 5 10
CCR
A
ve
r
age E
n
er
gy S
a
vi
ngs
5V/off
2V During idle
3.3V During idle
2V Scale
3.3 Scale
 
Figure 6.7. Average energy savings with respect to CCR 
 
CCR Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V during 
idle 
2V  
Scale 
3.3V Scale 
0.1 108.44 8.1% 4.25% 2.37% 6.92% 4.41%
0.5 119.11 13.11% 8.81% 5.62% 10.33% 9.38%
1 140.56 24.38% 21.03% 18.8% 21.67% 16.01%
5 206.78 40.58% 33.86% 30.74% 48.67% 47.03%
10 261.67 43.08% 38.98% 32.6% 50.71% 48.42%
 
Table 6.6. Makespan and average energy savings for different CCR values 
 
 
 
63
  The third test set combines DAGs with respect to processor to node ratio, PNR. 
The average energy savings were averaged over randomly generated DAGs with varying 
n, CCR, ?, ?, and out-degree. Figure 6.8 shows the average energy saving with respect to 
PNR. 
  Figure 6.8 shows a decrease in the average energy savings with increasing PNR. 
This is because increasing number of processors allows several parallel task executions, 
thus minimizing the wait times. The average energy savings measured were 28% for PNR 
= 25% for the 5V/off technique, 27% if processors operate at 2V during idle, 27% for 
3.3V during idle, and 27% if they operate at either 2V scale or 3.3V scale. But for PNR =  
100% the average energy savings for all five strategies were almost equal to 25%, which 
is due to the significant decrease in the processor?s wait time.  
0%
10%
20%
30%
40%
0.25 0.5 1
Processor Node Ratio
A
ver
age
 E
n
er
gy S
a
v
i
ngs
5V/off
2V During Idle
3.3V During idle 
2V Scale
3.3V Scale
 
Figure 6.8 Average energy savings with respect to PNR 
 
64
 
Another test set combined DAGs with respect to shape parameter ?. The average 
energy savings was computed over 36,000 randomly generated DAGs with varying n, 
PNR, CCR, ?, and out-degree.  
In Figure 6.9 the average energy savings for the five operating strategies were 
plotted with respect to ?. The results in this figure indicate that the overall energy savings 
marginally increased with increasing ? from 0.5 to 1 and significantly decreased when ? 
= 2. Increasing ? increases parallelism in the DAG, resulting in more idle time for the 
processors due to the task dependency, but when ? = 2 all this idle time is being used by 
processors to duplicate tasks and so the average energy savings decreased, which was not 
the case with EADAGS. The average energy savings were measured as 18% when ? = 
0.5, 22% when ? = 1, and 3% when ? = 2 as in Table 6.7.  
0%
10%
20%
30%
40%
0.5 1 2
Shape Parameter
A
v
er
age E
n
er
gy S
a
vi
ngs
5V/off
2V During idle
3.3V During idle
2V Scale
3.3V Scale
 
Figure 6.9. Average energy savings with respect to ? 
 
65
 
66
 
Shape 
Parameter 
Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
0.5 97.73 19.75% 18.44% 17.59% 19.33% 18.01%
1 190.6 23.82% 22.62% 21.84% 22.89% 22.15%
2 218.73 3.38% 3.3% 3.25% 3.41% 3.33%
 
Table 6.7. Makespan and average energy savings for different shape parameters  
 
The last test set combines DAGs with respect to out-degree. The average energy 
savings were averaged over 21,600 randomly generated DAGs with varying n, ?, ?, CCR, 
PNR.  
Figure 6.10 shows the average energy savings with respect to five values for out-
degree. The results in this figure indicate that the increase in out-degree results in smaller 
average energy savings. A larger out-degree allows many processors to run in parallel.  
The amount of energy that could be saved ranged between 25% and 10% for 
processors using 5V/off technique, 23% and 7% for 2V during idle, 20% and 5% for 
3.3V during idle, 24% and 9% for 2V scale, and finally 21% to 6% when 3.3V scale is 
used as shown in Table 6.8. 
 
 
0%
10%
20%
30%
40%
1234510
Out Degree
Av
er
age E
n
er
gy 
S
a
v
i
ngs
5V/off
2V During idle
3.3V During idle
2V Scale
3.3V Scale
 
Figure 6.10. Average energy savings with respect to out-degree 
 
Out-degree Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
1 158.9 25.44% 23.45% 20.74% 24.72% 21.64%
2 218.06 22.68% 21.04% 19.35% 22.19% 20.38%
3 144.93 21.09% 18.44% 16.84% 19.07% 17.33%
4 164.03 19.33% 16.17% 13.97% 17.41% 14.86%
5 138.27 12.16% 10.86% 9.46% 12.03% 10.82%
100 106.71 9.67% 7.36% 5.47% 9.11% 6.71%
 
Table  6.8. Makespan and average energy savings with respect to out-degree 
 
 
67
 
68
Regardless of the changing parameters, the average energy savings for EADAGS 
is higher than the average energy savings measured for EAGS-D. That is due to the task 
duplication in EAGS-D. EAGS-D attempts to duplicate the predecessor of tasks to 
decrease the length of the time for which the node is awaiting data by making use of the 
processor?s idle time. So the same task may be executed on several processors to reduce 
the makespan. And since we use the processor?s idle time to save energy, reduction of 
that time reduces the average energy that could be saved. 
 
6.2 Results for real world problems 
To evaluate the performance of the proposed algorithms, we used task graphs of 
four real world problems: Gaussian elimination [Wu90], Molecular dynamics code 
[Chun92], fast Fourier Transform [Chun92], and Sieve of Eratosthenes [Bask00].  
 
6.2.1 Gaussian elimination 
In mathematics, Gaussian elimination or Gauss-Jordan elimination, named after 
Carl Friedrich Gauss and Wilhelm Jordan, is an algorithm in linear algebra for 
determining the solution of a system of linear equations, for determining the rank of a 
matrix, and for calculating the inverse of an invertible square matrix [Corm90]. Gaussian 
Elimination is a systematic application of elementary row operations to a system of linear 
equations in order to convert the system to upper triangular form. Once the coefficient 
matrix is in upper triangular form, we use back substitution to find a solution. The 
general procedure for Gaussian Elimination can be summarized in the steps in Table 6.9.  
 
Gaussian Elimination Steps 
1. Write the augmented matrix for the system of linear 
equations.  
2. Use elementary row operations on the augmented matrix 
[A|b] to transform A into upper triangular form. If a zero is 
located on the diagonal, switch the rows until a nonzero is 
in that place. If you are unable to do so, stop; the system 
has either infinite or no solutions.  
3. Use back substitution to find the solution of the problem. 
 Table 6.9. Gaussian elimination algorithm steps 
The computational complexity of Gaussian elimination is O(n
3
); that is, the 
number of operations required is (approximately) proportional to n
3
 for a matrix of size n 
x n. The DAG for the Gaussian elimination algorithm for n=3, n=4, and n=5 is shown in 
Figure 6.11 where n is the matrix size. Each T
k
,
k
 represents a pivot column operation and 
T
k
,
j
 is an update operation. The total number of tasks in a graph is 
2
2
2
?+ nn
, where n is 
the size of the matrix. 
In the simulation, a matrix of size 8 x 8 has been used to evaluate EADAGS. 
Since the structure of the graph is fixed only the number of processors and the CCR 
values were varied. For a matrix of size 8 the total number of tasks in the graph is 35 and 
largest number of tasks at a single level is 7 so the number of processors is bounded to 
 
69
7. CCR values were 0.1, 0.5, 1.0, 5.0, and 10. In this experiment since the same operation 
is executed at every processor and the same information is communicated from one 
processor to another, a uniform computation cost for all tasks and equal communication 
cost for all communication links were assumed.  
 
T
1, 1 
T
2, 2 
T
1, 3 
T
1, 2 
T
3, 4 
T
1, 1 
T
2, 2 
T
2, 4 
T
3, 3 
T
2, 3 
T
1, 4 
T
1, 3 
T
1, 2 
T
2, 3 
(a) (b) 
 
 
 
70
T
1
,
5
 
T
2
,
5
 
T
1
,
4
 
T
2
,
4
 T
2
,
3
 
T
2
,
2
 
T
1
,
3
 T
1
,
2
 
T
1
,
1
 
T
3
,
3
 
T
3
,
4
 T
3
,
5
 
T
4
,
4
 
T
4
,
5
 
(C) 
 
 
Figure 6.11. Gaussian elimination task graph (a) matrix of size 3, (b) matrix of size 4,  
(c) matrix of size 5 
 
 
 
71
 
72
6.2.1.1 Results for EADAGS 
  Figure 6.12 and Figure 6.13 show the average energy savings using EADAGS 
over DPS with respect to number of processors and CCR values. Figure 6.12 shows an 
increase in the average energy savings with increasing number of processors. This is 
because at each level only a certain number of tasks can be executed at the same time so 
increasing the available processors number produces more idle time, thus increasing the 
energy savings.  The average energy savings measured were 32% for 2 processors using 
the 5V/off technique, 27% of energy savings if processors operate at 2V during idle, 18% 
energy savings if processors use 3.3V during idle, 29% energy savings for 2V scale, and 
19% for 3.3V scale. When three processors are used, the average energy were 52% for 
the 5V/off technique, 43% and 29% when processors scaled down during their idle time 
to 2V and 3.3V respectively, and 45% when processors use the 2V scale, and 30% for the 
3.3V scale.  For four processors the average energy savings were 63%, 53%, 36%, 54%, 
and 36% for the 5V/off, 2V during idle, 3.3V during idle, 2V scale, and 3.3V scale 
respectively.  The average energy savings were 69%, 57%, 38%, 58%, and 39% when 
processors use the 5V/off technique, 2V during idle, 3.3V during idle, 2V scale, and 3.3V 
scale respectively. For 6 processors the average energy savings were 74%, 62%, 42%, 
63%, and 42% for the 5V/off, 2V during idle, 3.3V during idle, 2V scale, and 3.3V scale 
respectively.  Finally, the average energy savings were 77%, 64%, 43%, 65%, and 44% 
when 7 processors are used for the 5V/off technique, 2V during idle, 3.3V during idle, 
2V scale, and 3.3V scale respectively.  
 
0%
20%
40%
60%
80%
100%
234567
Number of processors
A
v
e
r
ag
e ene
r
g
y
 saving
s
5V/off
2V during idle
3.3V during idle
2V scale
3.3V scale
 
Figure 6.12. Average energy savings for Gaussian elimination algorithm with EADAGS 
with different number of processors 
 
0%
20%
40%
60%
80%
100%
0.1 0.5 1 5 10
CCR
Av
er
age e
n
er
gy s
avings
5V/off
2V during idle
3.3V during idle
2V scale
3.3V scale
 
Figure 6.13. Average energy savings for Gaussian elimination algorithm with EADAGS 
with different CCR values 
 
73
 
 
74
Figure 6.13 plots the average energy savings with respect to different CCR values. 
The average savings increased with increasing CCR. The average energy savings over 
DPS ranges from 52% for CCR = 0.1 to 74% when CCR = 10 for 5V/off technique. 
Savings are smaller for 2V during idle; they range between 44% and 62%. Savings are 
even smaller for the 3.3V during idle; they are 29% for CCR = 0.1 and 41% for CCR = 
10, while for 2V scale and 3.3V scale the average energy savings over DPS ranges from 
45% for CCR = 0.1 to 62% when CCR = 10 and 30% for CCR = 0.1 and 42% for CCR = 
10 respectively.  Table 6.10 and Table 6.11 list the makespan and the average energy 
savings for different number of processors and different CCR values for the EADAGS 
algorithm. 
 
Number of 
processors 
 
Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
2 820 32.19 % 27.04 % 18.17 % 28.97% 19.46%
3 824.4 52.1 % 43.77 % 29.41 % 45.15% 30.34%
4 860 63.91 % 53.69 % 36.07 % 54.75% 36.79%
5 784.8 68.19 % 57.28 % 38.49 % 58.22% 39.12%
6 851.2 74.63 % 62.69 % 42.12 % 63.44% 42.63%
7 831.2 77.3 % 64.93 % 43.63 % 65.61% 44.08%
   
Table 6.10. Makespan and average energy savings for different number of processors for 
Gaussian elimination with EADAGS 
 
 
 
 
 
75
CCR 
 
Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
0.1 381.33 52.69 % 44.26 % 29.74 % 45.94% 30.87%
0.5 421.67 54.19 % 45.52 % 30.58 % 47.06% 31.62%
1 480 56.41 % 47.38 % 31.84 % 48.76% 32.76%
5 1063.33 69.55 % 58.42 % 39.25 % 59.06% 39.68%
10 1796.67 74.11 % 62.25 % 41.83 % 62.63% 42.08%
 
Table 6.11. Makespan and average energy savings with respect to CCR for Gaussian 
elimination with EADAGS 
 
6.2.1.2 Results for EAGS-D 
  The same number of processors and values for CCR were tested in the second part 
of this experiment to evaluate EAGS-D.  Figure 6.14 shows the average energy savings 
for EAGS-D with respect to number of processors.  The average energy savings 
measured were 21% for 2 processors in the 5V/off technique, 17% when processors 
operate at 2V during idle, 11% if they operate at 3.3V during idle, 20% for 2V scale, and 
13% for 3.3V scale. These values increase gradually with increasing the number of 
processors due to increasing idle time. The average energy savings were 69%, 58%, 38%, 
59%, and 39% when 7 processors are used for the 5V/off technique, 2V during idle, 3.3V 
during idle, 2V scale, and 3.3V scale respectively. 
0%
20%
40%
60%
80%
100%
234567
Number of processors
Ave
r
age e
n
er
gy s
avings
5V/off
2V during idle
3.3V during idle
2V scale
3.3V scale
 
Figure 6.14. Average energy savings for Gaussian elimination algorithm with EAGS-D 
with different number of processors 
 
Figure 6.15 plots the average energy savings with respect to different CCR values. 
The average savings increase with increasing CCR. The average energy savings over 
HNPD ranges from 33% for CCR = 0.1 to 69% when CCR = 10 for 5V/off technique. 
Savings are smaller for 2V during idle operating voltage; they range between 27% and 
58%. Savings are even smaller for the 3.3V operating voltage; they are 18% for CCR = 
0.1 and 39% for CCR = 10, while for 2V scale and 3.3V scale the average energy savings 
over HNPD ranges from 30% for CCR = 0.1 to 59% when CCR = 10 and 20% for CCR = 
0.1 and 40% for CCR = 10 respectively.  Table 6.12 and Table 6.13 list the makespan and 
the average energy savings for different number of processors and different CCR values 
for the EAGS-D algorithm. 
 
76
0%
20%
40%
60%
80%
100%
0.1 0.5 1 5 10
CCR
A
ver
ag
e ener
g
y
 savings
5V/off
2V during idle
3.3V during idle
2V scale
3.3V scale
 
Figure 6.15. Average energy savings for Gaussian elimination algorithm with EAGS-D 
algorithm with different CCR values 
 
 
Number of 
processors 
 
Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
2 484 21.15 % 17.76 % 11.94 % 20.48% 13.76%
3 442.4 32.93 % 27.66 % 18.59 % 29.79% 20.02%
4 432 45.68 % 38.37 % 25.78 % 40.04% 26.91%
5 432 56.78 % 47.69 % 32.05 % 49.03% 32.95%
6 432 63.98 % 53.74 % 36.11 % 54.86% 36.86%
7 432 69.13 % 58.07 % 39.02 % 59.02% 39.66%
   
Table 6.12. Makespan and average energy savings for different number of processors for  
Gaussian elimination with EAGS-D 
 
 
 
 
77
 
 
CCR 
 
Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
0.1 303.67 33.21 % 27.9 % 18.74 % 30.03% 20.17%
0.5 305 33.59 % 28.22 % 18.96 % 30.33% 20.38%
1 350 42.13 % 35.39 % 23.78 % 37.27% 25.04%
5 580 62.7 5 52.67 % 35.39 % 53.82% 36.16%
10 673.33 69.73 % 58.58 % 39.36 % 59.58% 40.03%
 
Table 6.13. Makespan and average energy savings with respect to CCR for Gaussian 
elimination with EAGS-D 
 
0%
20%
40%
60%
80%
5V/off 2V during idle 3.3V during
idle
2V scale 3.3V scale
A
ver
age ener
gy savi
n
g
s
LPDPS
LPHNPD
 
Figure 6.16. Average energy savings for Gaussian elimination algorithm with 
EADAGS and EAGS-D with voltage scaling levels 
 
 
 
78
 
79
Figure 6.16 shows the average energy savings for both EADAGS and EAGS-D 
with respect to the five voltage scaling levels tested for the Gaussian elimination 
algorithm for a matrix of size 8. The overall average energy savings for EAGS-D is lower 
than the average energy savings measured for EADAGS. That is due to the nature of the 
EAGS-D algorithm, which uses task duplication to minimize the makespan. That 
duplication uses a big portion of the processor?s idle time and that reduces energy 
savings.  
 
6.2.2 Molecular dynamic code  
Figure 6.17 represents the DAG of a molecular dynamics code as given in 
[Chun92]. We used this graph to evaluate the performance of EADAGS and EAGS-D, 
since the graph has a fixed structure and fixed number of nodes, the only parameters that 
could be varied was the number of processors and CCR values. Since there are at most 
seven tasks at any level in Figure 6.17, the number of processors were bounded to seven. 
The amount of energy consumed is measured for the five different voltage scaling levels 
explained earlier. We assumed that the computation costs of all nodes are not equal and 
the communication costs were also not equal for all links since the task computed at each 
node and the data communicated from one node to another is different. Five values for 
CCR were used in our experiments: 0.25, 0.5, 1, 5, and 10. 
 
 
1 
23 24 25 26 27 28 29 
41 
2 4 
14 
6 
3 
5 7 11 
13
15 
8 
12 
9 10 
21 20 19 18 17 16 22 
30 31 32 34 35 33 
40 39 
38 37 36 
 
 
80
Figure 6.17.   Directed a-cyclic graph (DAG) for a molecular dynamics code 
 
 
81
6.2.2.1 Results for EADAGS 
Figure 6.18 and Figure 6.19 show the average energy savings for EADAGS with 
respect to number of processors and CCR values respectively. Figure 6.18 shows an 
increase in the average energy savings with increasing number of processors. The 
average energy savings measured was 22% for 2 processors in the 5V/off technique, 18% 
when processors operate at 2V during idle, 12% if they operate at 3.3V during idle, 19% 
for the 2V scale, and 13% for the 3.3Vscale. When three processors are used, the average 
energy savings were 50% for the 5V/off technique, 40% when processors scaled down to 
2V during idle, 29% for 3.3V during idle, 44% for 2V scale, and 30% for 3.3V scale.  For 
four processors, the average energy savings were 52%, 43%, 29%, 44%, and 30% for the 
5V/off, 2V during idle, 3.3V during idle, 2V scale, and 3.3V scale respectively.  The 
average energy savings were 59%, 49%, 33%, 50%, and 33% when 5 processors are used 
for the 5V/off technique, 2V during idle, 3.3V during idle, 2V scale, and 3.3V scale 
respectively. For 6 processors the average energy savings were 63%, 52%, 35%, 53%, 
and 36% for the 5V/off, 2V during idle, 3.3V during idle, 2V scale, and 3.3V scale 
respectively.  Finally, the average energy savings were 67%, 57%, 38%, 57%, and 38% 
when 7 processors are used for the 5V/off technique, 2V during idle, 3.3V during idle, 
2V scale, and 3.3V scale respectively.  
  The makespan and the average energy savings for different number of processors 
are listed in Table 6.14. 
0%
20%
40%
60%
80%
100%
234567
Number of processors
A
ver
age
 ener
gy
 savings
5V/off
2V during idle
3.3V during idle
2V scale
3.3V scale
 
Figure 6.18, Average energy savings for molecular dynamics code with EADAGS with 
different number of processors 
 
 
Number of 
processors 
 
Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
2 740.8 22.08 % 18.55 % 12.46 % 19.79% 13.30%
3 859.2 52.42 % 44.04 % 29.59 % 44.82% 30.11%
4 67.2 52.19 % 43.84 % 29.46 % 44.64% 30.00%
5 663.2 59.19 % 49.72 % 33.41 % 50.41% 33.87%
6 634 63.07 % 52.98 % 35.6 % 53.61% 36.02%
7 631.2 67.09 % 57.09 % 38.36 % 57.64% 38.73%
    
Table 6.14. Makespan and average energy savings for different number of processors for 
molecular dynamics code with EADAGS 
 
 
 
82
 
 
Figure 6.19 plots the average energy savings with respect to different CCR values. 
The average savings increased with increasing CCR. The average energy savings over 
DPS ranges from 45% for CCR = 0.1 to 64% when CCR = 10 for the 5V/off technique. 
Savings are smaller for 2V during idle; they range between 38% and 54%. Savings are 
even smaller for the 3.3V during idle; they are 25% for CCR =  0.1 and 36% for CCR = 
10, while for 2V scale and 3.3V scale the average energy savings over DPS ranges from 
39% for CCR = 0.1 to 54% when CCR = 10 and 26% for CCR = 0.1 and 36% for CCR = 
10 respectively. The makespan and the average energy savings for different CCR values 
are listed in Table 6.15. 
0%
20%
40%
60%
80%
100%
0.1 0.5 1 5 10
CCR
Av
er
age e
n
er
gy sa
vings
5V/off
2V during idle
3.3V during idle
2V scale
3.3V scale
 
Figure 6.19. Average energy savings for molecular dynamics code with EADAGS with 
different CCR values 
 
 
 
83
 
 
84
 
CCR 
 
Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
0.1 391.33 45.99 % 38.63 % 25.96 % 39.72% 26.69%
0.5 411.67 46.21 % 38.81 % 26.08 % 39.85% 26.78%
1 453.33 48.29 % 40.56 % 27.25 %  41.50% 27.89%
5 853.33 59.11 % 49.65 % 33.36 % 50.17% 33.71%
10 1396.67 64.5 % 54.18 % 36.41 % 54.51% 36.63%
 
Table 6.15. Makespan and average energy savings with respect to CCR for molecular 
dynamics code with EADAGS 
 
6.2.2.2 Results for EAGS-D 
  Figure 6.20 and Figure 6.21 show the average energy savings for EAGS-D with 
respect to number of processors and CCR values respectively.  
  Figure 6.20 shows an increase in the average energy savings with increasing 
number of processors. The average energy savings measured were 18% for 2 processors 
using the 5V/off technique, 15% when processors operate at 2V during idle, 10% if they 
operate at 3.3V during idle, 17% for 2V scale, and 11% for 3.3V scale. The average 
energy savings increases to 61%, 51%, 34%, 52%, and 35% when 7 processors are used 
for the 5V/off technique, 2V during idle, 3.3V during idle, 2V scale, and 3.3V scale 
respectively. The makespan and the average energy savings for different number of 
processors are listed in Table 6.16. 
 
0%
20%
40%
60%
80%
100%
234567
Number of processors
A
v
e
r
age ene
r
g
y
 savi
ngs
5V/off
2V during idle
3.3V during idle
2V Scale
3.3V Scale
 
Figure 6.20. Average energy savings for molecular dynamics code with EAGS-D with 
different number of processors 
 
 
Number of 
processors 
 
Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
2 566.4 18.87 % 15.85 % 10.65 % 17.44% 11.71%
3 488.8 27.58 % 23.17 % 15.57 % 24.48% 16.45%
4 467.2 41.6 % 34.95 % 23.48 % 36.02% 24.20%
5 459.2 52.96 % 44.49 % 29.89 % 45.38% 30.49%
6 445.2 58.15 % 48.85 % 32.82 % 49.64% 33.35%
7 438.4 61.72 % 51.85 % 34.84 % 52.56% 35.31%
    
Table 6.16. Makespan and average energy savings for different number of processors for 
molecular dynamics code with EAGS-D 
 
85
 
  Figure 6.21 plots the average energy savings with respect to different CCR values. 
The average savings increased with increasing CCR. The average energy savings over 
HNPD ranges from 24% for CCR = 0.1 to 74% when CCR = 10 for the 5V/off technique. 
Savings are smaller for 2V during idle; they range between 20% and 62%. Savings are 
even smaller for the 3.3V during idle; they are 13% for CCR = 0.1 and 41% for CCR = 
10, while for 2V scale and 3.3V scale the average energy savings over HNPD ranges 
from 220% for CCR = 0.1 to 62% when CCR = 10 and 15% for CCR = 0.1 and 42% for 
CCR = 10 respectively. The makespan and the average energy savings for different CCR 
values are listed in Table 6.17. 
0%
20%
40%
60%
80%
100%
0.1 0.5 1 5 10
CCR
Av
er
age e
n
er
gy sa
vings
5V/off
2V during idle
3.3V during idle
2V Scale
3.3V Scale
 
Figure 6.21. Average energy savings for molecular dynamics code with EAGS-D with 
different CCR values 
 
 
 
 
86
 
 
87
 
CCR 
 
Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
0.1 299.33 25.37 % 21.31 % 14.32 % 22.73% 15.27%
0.5 321.67  30.13 % 25.31 % 17.01 % 26.64% 17.90%
1 360 38.28 % 32.16 % 21.61 % 33.35% 22.41%
5 506.67 49.51 % 41.59 % 27.94 % 42.46% 28.53%
10 900 74.12 % 62.26 % 41.83 % 62.75% 42.16%
 
Table 6.17. Makespan and average energy savings with respect to CCR for molecular 
dynamics code with EAGS-D 
 
 
Figure 6.22 shows the average energy savings for both EADAGS and EAGS-D 
with respect to the five voltage scaling strategies tested for the molecular dynamic code 
algorithm. The overall average energy savings for EAGS-D is lower than the average 
energy savings measured for EADAGS. That is due to the nature of the EAGS-D 
algorithm which uses task duplication to minimize the makespan. That duplication uses a 
big portion of the processor?s idle time and that reduces energy savings.  
0%
20%
40%
60%
5V/off 2V during
idle
3.3V during
idle
2V Scale 3.3V Scale
A
ver
age
 ener
gy s
avings
LPDPS
LPHNPD
 
Figure 6.22. Average energy savings for molecular dynamics code algorithm with 
EADAGS and EAGS-D with voltage scaling levels 
 
 
6.2.3 Fast Fourier Transform FFT 
FFT is an efficient algorithm to compute the discrete Fourier transform (DFT) and 
its inverse. FFTs are of great importance to a wide variety of applications, from digital 
signal processing to solving partial differential equations to algorithms for multiplying 
large integers. DFT and FFT are used to generate frequency analysis of a discrete non-
periodic signal. The computation of DFT is complicated; it involves many additions and 
multiplications involving complex numbers. Even a simple eight sample signal would 
require 49 complex multiplications and 56 complex additions to work out the DFT. At 
this level it is still manageable; however a realistic signal could have 1024 samples 
 
88
 
89
which requires over 20,000,000 complex multiplications and additions. FFT is a simpler 
method of laying out the computation and much faster for larger number of samples. 
The idea behind the FFT is the divide and conquer approach, by breaking up the 
original N point sample into two (N/2) sequences. This is because a series of smaller 
problems is easier to solve than one large one. The DFT requires (N-1)
2
 complex 
multiplications and N(N-1) complex additions as opposed to the FFT's approach, which 
only requires 1 multiplication and 2 additions and the recombination of the points which 
is minimal [Corm90].  
The recursive, one-dimensional FFT task graph for 4 data points is shown in 
Figure 6.23 [Chun92]. The FFT algorithm consists of two parts: recursive calls and the 
butterfly operations. The task graph in Figure 6.23 can be divided into two parts; the 
tasks above the dashed line are the recursive call tasks while the tasks below the dashed 
line are the butterfly operation tasks.  
T
1 
T
2 
T
3 
T
4 
T
5 
T
6 
T
7 
T
8 
T
9 
T
10 
T
12 
T
13 
T
14 
T
11 
T
15 
 
Figure 6.23. The generated DAG for FFT with four points 
We used this task graph to evaluate the performance of EADAGS and EAGS-D. 
Since the graph has a fixed structure and fixed number of nodes, the only parameters we 
changed were the number of processors and CCR values. Since there are at most four 
tasks at any level in Figure 6.23, the number of processors were bounded to four 
processors starting with only 2 processors in the system and up to 4 processors 
incrementing by 1. Each path from the entry node to an exit node is a critical path since 
the computation cost of tasks in any level are equal and the communication costs of all 
edges between two consecutive levels are equal. The amount of energy consumed was 
measured for the five different strategies listed before.  
 
90
 
91
6.2.3.1 Results for EADAGS 
  Figure 6.24 shows the average energy savings for EADAGS with respect to 
number of processors. Figure 6.24 shows a decrease in the average energy savings with 
increasing number of processors. 
  This is because increasing number of processors allows several parallel 
task executions, thus minimizing the wait times which were used by our algorithm to 
save energy. The average energy savings measured were 34% for 2 processors using the 
5V/off technique, 29% when processors operate at 2V during idle, 19% if they operate at 
3.3V during idle, 30% for 2V scale, and 20% for 3.3V scale. When three processors are 
used, the average energy were 51% for the 5V/off technique, 42% for 2V during idle, 
27% for 3.3V during idle, 44% for 2V scale, and 29% savings for 3.3V scale.  For four 
processors, the average energy savings are 53%, 45%, 31%, 47%, and 33% for the 
5V/off, 2V during idle, 3.3V during idle, 2V scale, and 3.3V scale respectively.  Table 
6.18 lists the average energy savings and the makespan for EADAGS over DPS for 
different number of processors.  
 
0%
20%
40%
60%
80%
100%
234
Number of processors
A
ver
a
g
e ener
gy 
savings
5V/off
2V during idle
3.3V during idle
2V scale
3.3V scale
 
Figure 6.24. Average energy savings for FFT with EADAGS with different number of 
processors 
 
Number of 
processors 
Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
2 233 34.97% 29.37% 19.74% 30.08% 20.43%
3 296 55.89% 46.95% 31.54% 44.02% 29.11%
4 232.8 51.32% 43.11% 28.96% 47.66% 33.59%
    
 
Table 6.18. Makespan and average energy savings with respect to number of processors 
for FFT with EADAGS 
 
 
 
 
 
92
 
Figure 6.25 plots the average energy savings with respect to different CCR values. 
The average savings increased with increasing CCR. When CCR increases, processors 
incur longer idle times due to communication between tasks. Our algorithm was able to 
use such idle times to achieve energy savings.  
The average energy savings over DPS ranges from 23% for CCR = 0.1 to 77% 
when CCR = 10 for 5V/off technique. Savings are smaller for 2V during idle; they range 
between 19% and 64%. Savings are even smaller for the 3.3V during idle; they are 12% 
for CCR = 0.1 and 43% for CCR = 10, while for 2V scale and 3.3V scale the average 
energy savings over DPS ranges from 23% for CCR = 0.1 to 64% when CCR = 10 and 
16% for CCR = 0.1 and 43% for CCR = 10 respectively. 
0%
20%
40%
60%
80%
100%
0.1 0.5 1 5 10
CCR
A
ver
age ene
r
gy 
savings
5V/off
2V during idle
3.3V during idle
2V scale
3.3V scale
 
Figure 6.25. Average energy savings for FFT with EADAGS with different CCR 
values 
 
 
93
 
94
Table 6.19 lists the average energy savings and the makespan for EADAGS over 
DPS for different CCR values.  
 
 
CCR 
 
Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
0.1 155.33 25.30 % 21.25 % 14.28 % 23.13% 16.23%
0.5 163.33 29.95 % 25.16 % 16.9 % 25.32% 16.67%
1 180 36.5 % 30.67 % 20.61 % 34.67% 23.89%
5 340 66.42 % 55.79 % 37.48 % 55.58% 38.81%
10 540 78.79 % 66.19 % 44.47 % 64.22% 43.32%
    
Table 6.19. Makespan and average energy savings with respect to CCR for FFT with 
EADAGS 
 
6.2.3.2 Results for EAGS-D 
  Figure 6.26 and Figure 6.27 show the average energy savings for EAGS-D with 
respect to number of processors and CCR values respectively.  
  Figure 6.26 shows an increase in the average energy savings with increasing 
number of processors. The average energy savings measured were 18% for 2 processors 
for the 5V/off technique, 14% for 2V during idle, 9% for 3.3V during idle time, 15% for 
2V scale, and 10% for 3.3V scale. When three processors are used, the average energy 
were 39% for the 5V/off technique, 33% for 2V during idle, 21% for 3.3V during idle, 
34% for 2V scale, and 22% for 3.3V scale.  For four processors, the average energy 
savings were 49%, 41%, 27%, 42%, and 29% for the 5V/off, 2V during idle, 3.3V during 
idle, 2V scale, and 3.3V scale respectively.   
0%
20%
40%
60%
80%
100%
234
Number of processors
A
ver
age
 ener
gy
 savings
5V/off
2V during idle
3.3V during idle
2V scale
3.3V scale
 
Figure 6.26. Average energy savings for FFT with EAGS-D with different number of 
processors 
 
Figure 6.27 plots the average energy savings with respect to different CCR values. 
The average savings increased with increasing CCR. When CCR increases, processors 
incur longer idle times due to communication between tasks. Our algorithm was able to 
use such idle times to achieve energy savings. The average energy savings over HNPD 
ranges from 16% for CCR = 0.1 to 61% when CCR = 10 for 5V/off technique. Savings 
are smaller for 2V during idle; they range between 13% and 51%. Savings are even 
smaller for the 3.3V during idle; they are 9% for CCR = 0.1 and 34% for CCR = 10, 
while for 2V scale and 3.3V scale the average energy savings over HNPD ranges from 
15% for CCR = 0.1 to 51% when CCR = 10 and 11% for CCR = 0.1 and 35% for CCR = 
10 respectively. Tables 6.20 and 6.21 lists the average energy savings for EAGS-D over 
 
95
HNPD for different processors number and for different CCR values respectively 
 
0%
20%
40%
60%
80%
100%
0.1 0.5 1 5 10
CCR
Av
er
age e
n
er
gy sa
vings
5V/off
2V durin idle
3.3V during idle
2V scale
3.3V scale
 
Figure 6.27. Average energy savings for FFT with EAGS-D with different CCR values 
 
Number of 
processors 
Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
2 218.4 18.09 % 14.88 % 9.34% 15.82% 10.90%
3 208 39.96 % 33.25 % 21.68 % 34.18% 22.21%
4 199.2 49.93 % 41.62 % 27.31 % 42.15% 29.23%
    
Table 6.20. Makespan and average energy savings for different number of processors for 
FFT with EAGS-D 
 
 
96
 
CCR 
 
Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
0.1 136 16.52 % 13.88 % 9.33 % 15.45% 11.74%
0.5 146.67 22.71 % 19.08 % 12.82 % 20.05% 13.46%
1 160 27.78 % 23.33 % 15.67 % 24.21% 16.56%
5 300 51.48 % 41.64 % 24.7 % 42.53% 26.09%
10 300 61.48 % 51.64 % 34.7 % 51.48% 35.91%
    
Table 6.21. Makespan and average energy savings with respect to CCR for FFT with 
EAGS-D 
 
0%
20%
40%
60%
5V/off 2V during
idle
3.3V during
idle
2V scale 3.3V scale
Av
er
age ener
gy sav
in
gs
LPDPS
LPHNPD
 
Figure 6.28. Average energy savings for FFT with EADAGS and EAGS-D with voltage 
scaling levels 
 
 
97
 
 
98
Figure 6.28 shows the average energy savings for both EADAGS and EAGS-D 
with respect to the five voltage scaling strategies tested for a four point FFT DAG. The 
overall average energy savings for EAGS-D is lower than the average energy savings 
measured for EADAGS. That is due to the nature of the EAGS-D algorithm, which uses 
task duplication to minimize the makespan. That duplication uses a big portion of the 
processor?s idle time and that reduces the amount of energy savings.  
 
6.2.4. Sieve of Eratosthenes  
Sieve Eratosthenes is a method of identifying all prime numbers in a sequence of 
numbers up to a certain N. A prime number is a natural number greater than 1 that can be 
divided only by itself and by 1, while a composite number n is a natural number that can 
be divided by a number less than n and greater that 1. The Sieve of Eratosthenes 
identifies all prime numbers up to a given number N as follows [Corm90]: 
1. Write down all numbers 1, 2, 3,?, N.  We will eliminate composites by marking 
them. Initially all numbers are unmarked. 
2. Mark number 1 as special (it is neither prime nor composite). 
3. Set k = 1. While k is less than the square root of N, do this: 
a. Find the first number in the list greater than k that has not been identified 
as composite (the first number found is 2) and call it m.  Mark the numbers 
2m, 3m, 4m,?? as composites. (Thus in the first run we mark all even 
numbers greater than 2. In the second run we mark all multiples of 3 
greater than 3.) 
b. m is a prime number; put it on your list. 
c. Set k = m and repeat. 
4. Put the remaining unmarked numbers in the list of prime numbers. 
The Sieve of Eratosthenes algorithm can be presented by a task graph DAG. The 
DAG for the Sieve of Eratosthenes for N = 32 is shown in Figure 6.29.  [Bask00] 
 
1
 13 12  11  10    9876542  16  15 3   14  17 
  24  23 22 21  20  19    18  25
 29 28 27  26 
  30 
 32
  31 
Figure 6.29. Sieve of Eratosthenes task graph for N = 32 
 
99
 
100
In the simulation, we used the graph of Sieve of Eratosthenes for a sequence of 32 
numbers (N = 32) to test both EADAGS and EAGS-D. Since the structure of the graph is 
fixed, only the number of processors and the CCR values were changed. For a sequence 
of 32 numbers, the total number of tasks in the graph is 32 nodes and the largest number 
of tasks at a single level is 16 tasks, so the number of processors was bounded to 8 
processors. CCR had five different values: 0.1, 0.5, 1.0, 5.0, and 10. The amount of 
energy consumed was measured for the five different operating strategies explained 
earlier in Chapter 5.  
 
6.2.4.1 Results for EADAGS 
  Figure 6.30 shows a decrease in the average energy savings with increasing 
number of processors for Sieve of Eratosthenes with EADAGS over DPS. This is because 
increasing number of processor allows several parallel task executions, thus minimizing 
the wait times, which are used by our algorithm to save energy.  
  The average energy savings measured were 23% for 2 processors in the 5V/off 
technique, 19% for 2V during idle, 13% for 3.3V during idle, 21% for 2V scale, and 14% 
for 3.3V scale. When three processors are used, the average energy was 41% for the 
5V/off technique, 33% for 2V during idle, 33% for 3.3V during idle, 35% for 2V scale, 
and 25% for 3.3V scale.  For eight processors, the average energy savings are 65%, 54%, 
36%, 56%,a and 7% for the 5V/off, 2V during idle, 3.3V during idle, 2V scale, and 3.3V 
scale respectively. Table 6.22 lists the makespan and the average energy savings for each 
different number of processors tested. 
0%
20%
40%
60%
80%
100%
2345678
Number of processors
A
v
er
ag
e en
er
g
y savin
g
s
5V/off
2V during idle
3.3V during idle
2V scale
3.3V scale
 
Figure 6.30. Average energy savings for different number of processors for Sieve of 
Eratosthenes with EADAGS 
 
   
Number of 
processors 
Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
2 502.4 23.57 % 19.73 % 13.26 % 21.19% 14.89%
3 472 41.35 % 33.14 % 22.37 % 35.58% 25.82%
4 384.8 43.12 % 34.22 % 24.34 % 37.18% 27.62%
5 382.4 53.78 % 45.18 % 30.36 % 47.97% 32.20%
6 382.4 61.49 % 51.65 % 34.7 % 54.98% 35.91%
7 382 62.89 % 54.19 % 36.16 % 56.91% 38.22%
8 391.2 65.26 % 54.82 % 36.83 % 56.33% 37.83%
 
Table 6.22. Makespan and average energy savings for different number of processors for 
Sieve of Eratosthenes with EADAGS 
 
 
101
Figure 6.31 plots the average energy savings with respect to CCR. The average 
energy savings increased with increasing CCR. When CCR increases, processors incur 
longer idle times due to communication between tasks. Our algorithm is able to use such 
idle times to achieve energy savings. The average energy savings over DPS ranges from 
29% for CCR = 0.1 to 77% when CCR = 10 for 5V/off technique. Savings are smaller for 
2V during idle; they range between 24% and 65%. Savings are even smaller for the 3.3V 
during idle; they are 16% for CCR = 0.1 and 43% for CCR = 10, while for 2V scale and 
3.3V scale the average energy savings over DPS ranges from 31% for CCR = 0.1 to 65% 
when CCR = 10 and 20% for CCR = 0.1 and 45% for CCR = 10 respectively. Table 6.23 
lists the average energy savings and the makespan for EADAGS over DPS for different 
CCR values.  
0%
20%
40%
60%
80%
100%
0.1 0.5 1 5 10
CCR
A
ver
ag
e en
er
g
y savi
n
g
s
5V/off
2V during idle
3.3V during idle
2V scale
3.3 V scale
 
Figure 6.31.  Average energy savings for different CCR values for Sieve of Eratosthenes 
with EADAGS 
 
102
 
 
103
 
CCR Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
0.1 233.71 29.69 % 24.53 % 16.05 % 31.01% 20.84%
0.5 244.29 35.23 % 29.5 % 19.75 % 33.69% 21.29%
1 268.57 41.57 % 33.49 % 23.46 % 34.19% 23.97%
5 505.71 66.74 % 56.4 % 38.47 % 57.28% 39.77%
10 817.14 77.81 % 65.3 % 43.7 % 65.34% 45.84%
    
Table 6.23. Makespan and average energy savings for different CCR values for Sieve of 
Eratosthenes with EADAGS 
 
6.2.4.2 Results for EAGS-D 
  Figure 6.32 and Figure 6.33 show the average energy savings for EAGS-D with 
respect to the number of processors and CCR values respectively for the DAG for Sieve 
of Eratosthenes.  
  Figure 6.32 shows an increase in the average energy savings with increasing 
number of processors. The average energy savings measured were 12% for 2 processors 
using the 5V/off technique, 10% for 2V during idle, 6% for 3.3V during idle, 13% for the 
2V scale, and 8% for the 3.3V scale. The average energy savings increases to 63%, 53%, 
35%, 54%, and 37%  when 8 processors are used for the 5V/off technique, 2V during 
idle, 3.3V during idle, 2V scale, and 3.3V scale respectively. The makespan and the 
average energy savings for different numbers of processors are listed in Table 6.24. 
0%
20%
40%
60%
80%
100%
2345678
Number of processors
A
v
er
ag
e
 en
er
g
y s
av
i
n
g
s
5V/off
2V during idle
3.3V during idle
2V scale
3.3V scale
 
Figure 6.32. Average energy savings for Sieve of Eratosthenes with EAGS-D with 
different numbers of processors 
 
Number of 
processors 
Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
2 416 12.27 % 10.75 % 6.71 % 13.16% 8.81%
3 354.4 26.93 % 22.56 % 16.72 % 24.63% 18.49%
4 314.4 38.49 % 31.78 % 20.84 % 33.84% 21.67%
5 304 45.34 % 37.39 % 26.35 % 39.45% 27.76%
6 288.8 51.67 % 43.03 % 28.59 % 44.10% 29.99%
7 284 60.37 5 50.71 % 34.07 % 52.74% 35.60%
8 276.8 63.65 % 53.47 % 35.92 % 54.30% 37.79%
    
Table 6.24. Makespan and average energy savings for different numbers of processors for 
Sieve of Eratosthenes with EAGS-D 
 
 
104
Figure 6.33 plots the average energy savings with respect to different CCR values. 
The average savings increased with increasing CCR. The average energy savings over 
HNPD ranges from 29% for CCR = 0.1 to 62% when CCR = 10 for 5V/off technique. 
Savings are smaller for 2V during idle; they range between 25% and 51%. Savings are 
even smaller for the 3.3V during idle; they are 16% for CCR = 0.1 and 35% for CCR = 
10, while for 2V scale and 3.3V scale the average energy savings over HNPD ranges 
from 30% for CCR = 0.1 to 51% when CCR = 10 and 21% for CCR = 0.1 and 35% for 
CCR = 10 respectively. The makespan and the average energy savings for different CCR 
values are listed in Table 6.25. 
0%
20%
40%
60%
80%
100%
0.1 0.5 1 5 10
CCR
Av
er
ag
e e
n
e
r
g
y s
avi
n
g
s
5V/off
2V during idle
3.3V during idle
2V scale
3.3V scale
 
Figure 6.33. Average energy savings for Sieve of Eratosthenes with EAGS-D with 
different CCR values 
 
 
 
105
 
106
CCR Makespan 
Percentage of energy savings 
5V/off  
2V  
during idle 
3.3V 
during idle 
2V  
Scale 
3.3V Scale 
0.1 236 29.94 5 25.23 % 16.59 % 30.15% 21.21%
0.5 242.86 34.05 % 28.02 % 18.98 % 31.77% 21.03%
1 257.14 38.66 % 32.5 % 21.8 % 32.32% 21.71%
5 342.86 48.32 % 40.72 % 27.77 % 41.28% 29.37%
10 520 62.38 % 51.87 % 35.74 % 51.71% 35.34%
  
Table 6.25. Makespan and average energy savings for different CCR values for Sieve of 
Eratosthenes with EAGS-D 
 
 
Figure 6.34 shows the average energy savings for both EADAGS and EAGS-D 
with respect to the five voltage scaling strategies tested for Sieve of Eratosthenes. The 
overall average energy savings for EAGS-D is lower than the average energy savings 
measured for EADAGS. That is due to the nature of the EAGS-D algorithm, which uses 
task duplication to minimize the makespan. That duplication uses a big portion of the 
processor?s idle time and that reduces the amount of energy savings.  
0%
20%
40%
60%
5V/off 2V during idle 3.3V during
idle
2V scale 3.3V scale
A
v
e
r
age ener
gy
 savi
n
g
s
LPDPS
LPHNPD
 
Figure 6.34. Average energy savings for Sieve of Eratosthenes with EADAGS and 
EAGS-D with voltage scaling levels 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
108 
 
 
 
 
CHAPTER 7 
CONCLUSIONS 
 
We have proposed two new scheduling algorithms, EADAGS and EAGS-D, which try to 
minimize finish time as well as energy consumption by the use of dynamic voltage scaling.  
The results were based on a two part software simulation study. The first part consists of 
a large set of randomly generated DAGs with various characteristics such as number of nodes, 
CCR, shape parameter, processor node ratio, and out degree. For each parameter the results were 
averaged across all other variables. The total number of random DAG generated for evaluating 
each algorithm were 10,800 graphs for each.  
The second part of the simulation contained DAGs for real world problems namely; 
Gaussian elimination, molecular dynamic code, Sieve of Eratosthenes, and Fast Fourier 
transform. These DAGs has a specific structure so numbers of nodes, shape parameter, and out 
degree are fixed. We tested both EADAGS and EAGS-D with different number of available 
processors and CCR. 
The results from the randomly generated DAGs showed that EADAGS algorithm 
resulted in an average energy saving of 40% over simple DPS, while EAGS-D algorithm 
estimated a reduction in energy by 28% which is less than that for EADAGS due to the nature of 
EAGS-D algorithm which involves task duplication.  
109 
 
For the second test set, first for Gaussian elimination an average of 44% of energy 
savings were achieved by EADAGS and an average of 37% by EAGS-D. For the molecular 
dynamic code problem the average energy savings with EADAGS and EAGS-D were 46% and 
38% respectively. For FFT the average energy savings measured were 42% and 33% for 
EADAGS and EAGS-D respectively. While for the Sieve of Eratosthenes average energy 
savings for EADAGS and EAGS-D were 40% and 36% respectively. 
The effect of different DAG characteristic on the amount of energy savings is discussed 
next. For both EADAGS and EAGS-D the amount of energy savings increased by increasing the 
number of nodes due to the increase in idle time due to task dependency. The rate of the increase 
is lower in EAGS-D than EADAGS due to task duplication. The increase of CCR resulted in an 
increase in the average energy savings for both EADAGS and EAGS-D. When CCR increases, 
processors incur longer idle times due to communication between tasks. Both algorithms were 
able to use such idle times to achieve energy savings. The results showed that the overall energy 
savings marginally increased with increasing the shape parameter. Increasing shape parameter 
increases parallelism in the DAG resulting in more idle time for the processors due to the task 
dependency. This time was used by both EADAGS and EAGS-D to reduce the consumed 
energy. The last parameter tested was out degree. An increase in out-degree resulted in smaller 
average energy savings. A larger out-degree allows many processors to run in parallel reducing 
the idle time for all processors and so less energy to save.  
The future work can involve applying the voltage scaling technique to other scheduling 
algorithm.  Aiming to find the optimal solution especially to the real world problem and then 
physically implementing them to save energy. 
 
110
 
 
 
BIBLIOGRAPHY 
 
[Bask00] S. Baskiyar, ?Scheduling DAGs on message passing m-processor systems,? 
IEICE Transactions on Information and Systems, vol. E-38-D, July 2000. 
 
[Bask03] S. Baskiyar, and C. Dickinson, ?Scheduling Directed A-cyclic Graphs on a 
Bounded Set of Heterogeneous Processors Using Task Duplication,? LNCS, vol. 
2913, pp. 259-267, Springer-Verlag, 2003. 
 
[Burd96]  T. Burd and R. Brodersen, ?Processor Design for Portable Systems,? Journal 
of VLSI Signal Processing, 13(2-3), pp.203-222, 1996. 
 
[Chae04] Chaeseok Im and Soonhoi Ha, ?Dynamic Voltage Scaling for Real-time Multi-
task Scheduling using Buffers,? Proc. ACM SIGPLAN/SIGBED Conference on 
Languages, Compilers, and Tools for Embedded Systems, pp.88-94, 2004. 
 
[Chan96] A. Chandrakasan, V. Gutnik and T. Xanthopoulos, ?Data Driven Signal 
Processing: An Approach for Energy Efficient Computing,? International 
Symposium on Low Power Electronics and Design, pp. 347-352, Aug. 1996. 
 
[Choi04] K. Choi, R. Soma, and M. Pedram, ?Dynamic Voltage and Frequency Scaling 
based on Workload Decomposition,? Proceedings of the 2004 International 
Symposium on Low Power Electronics and Design, August 2004. 
 
[Chun92] Y. Chung and S. Ranka, ?Applications and Performance Analysis of a 
Compile-Time Optimaization Approach for List Scheduling Algorithms on 
Distributed Memory Multiprocessors,? Proc. Supercomputing, pp. 512-521, 
Nov 1992. 
 
[Corm01] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, ?Introduction to Algorithms,? 
MIT Press, 2001. 
 
[Dong01] J. J. Dongarra and D. W. Walker, ?The Quest for Petascale Computing,? In 
IEEE Transactions on Computing in Science and Engineering, pp. 32-39, May 
2001. 
 
[Gebo96] C. H. Gebotys and R. J. Gebotys, ?Power-Minimization in Heterogeneous 
Processing,? In Proceedings of 29th Hawaii International Conference on 
System Sciences (HICSS?96), vol. 1, pp. 330-337, January 1996. 
 
111
[ImHa04] C. Im and S. Ha, ?Dynamic Voltage Scaling for Real-time Multi-task 
Scheduling using Buffers,? Proc. ACM SIGPLAN/SIGBED Conference on 
Languages, Compilers, and Tools for Embedded Systems, pp.88-94, 2004. 
 
[Iver98] M. A. Iverson and F. Ozguner, ?Dynamic Competitive Scheduling of Multiple 
DAGs in a Distributed Heterogeneous Environment,? In Proceedings of the 
1998 Workshop on Heterogeneous Processing, pp. 70-78, March 1998. 
 
[Khok93] A. Khokhar, V. K. Prasanna, M. E. Shaaban and C. Wang, ?Heterogeneous 
Computing: Challenges and Opportunities,? IEEE Transactions on Computers, 
vol. 26, pp. 18- 27, June 1993. 
 
[Kiro97] D. Kirovski and M. Potkonjak, ?System-level Synthesis of Low-power Hard 
Real-time Systems,? In Proceedings of the 34th Annual Conference on Design 
Automation, 1997. 
 
[Kump94] K. Li, R. Kumpf, P. Horton, and T. Anderson, ?A Quantitative Analysis of 
Disk Drive Power Management in Portable Computers,? PTUC. Winter 1994 
USENIX Conference, pp. 279-292, Jan. 1994. 
 
[Kwok99] Y. K. Kwok and A. Ishfaq, ?Link Contention-Constrained Scheduling and 
Mapping of Tasks and Messages to a Network of Heterogeneous Processors,? In 
Proceedings of 1999 International Conference on Parallel Processing, pp. 551-
558, September 1999. 
 
[Mart02] S. Martin, K. Flautner, T. Mudge, and D. Blaauw, ?Combined Dynamic 
Voltage Scaling and Adaptive Body Biasing for Lower Power Microprocessors 
under Dynamic Workloads,?  Proceeding of the 2002IEEE/ACM International 
Conference on Computer-aided Design, November 2002. 
 
[Lu00] Y. H. Lu, L. Benini and G. Di Micheli, ?Low-Power Task Scheduling for 
Multiple Devices,? International Workshop on Hardware/Software Codesign, 
pp. 39- 43, May 2000. 
 
[Micr04] ?OnNow Power Management Architecture for Applications? at        
http://www.microsoft.com/hwdev/desinit/onnowapp.HTM 
 
[Mish03] R. Mishra, N. Rastogi, D. Zhu, D. Moss?, and R. Melhem, ?Energy Aware 
Scheduling for Distributed Real-Time Systems,? Proc. Int?l Parallel and 
Distributed Processing Symposium, pp. 9-16, April 2003. 
 
[Much97] S. Muchnick, ?Advanced Compiler Design and Implementation,? Morgan 
Kaufmann Publishers, Inc., 1997. 
 
 
 
112
[Pouw01] J. Pouwelse, K. Langendoen, and H. Sips, ?Dynamic Voltage Scaling on a 
Low-Power Microprocessor,? Proc. of the 7
th
 Annual International Conference 
on Mobile Computing and Networking, pp. 251-259, July 2001. 
 
 [Radu00] A. Radulescu and A. J. C. Van Gemund, ?Fast and Effective Task Scheduling 
in Heterogeneous Systems,? In 9
th
 Heterogeneous Computing Workshop, pp. 
229-239, May 2000. 
 
[Reut97] C. Reuter, M. Schwiegershausen and P. Pirsch, ?Heterogeneous Multiprocessor 
Scheduling and Allocation using Evolutionary Algorithms,?  In Proceedings of 
the IEEE International Conference on Application-Specific Systems, 
Architecture and Processors, pp. 294-303, July 1997. 
 
[Seig97] M. Tin, H. J. Seigel, J. K. Antonio and Y. A. Li, ?Minimizing the Application 
Execution Time Through Scheduling of Subtasks and Communication Traffic in 
a Heterogeneous Computing System,? In IEEE Transactions on Parallel and 
Distributed Systems, vol. 8, no. 8, pp. 857-870, August 1997. 
 
[Shan03] L. Shang, Li-Shiuan Peh, and N.  K. Jha, ?Dynamic Voltage Scaling with Links 
for Power Optimization of Interconnection Networks,? Proc. of the 9th 
International Symposium on High-Performance Computer Architecture, pp. 91-
102, February 2003. 
 
[Shin01] D. Shin, S. Lee, and J. Kim, ?Intra-Task Voltage Scheduling for Low-Energy 
Hard Real-Time Applications,? IEEE Design and Test of Computers, vol. 18, 
pp. 20-30, March 2001.  
 
[Shiu00] W. T. Shiue and C. Chakrabarti. ?Low-Power Scheduling with Resources 
Operating at Multiple Voltages,? In IEEE Transactions on Circuits and 
Systems-2: Analog and Digital Signal Processing, vol. 47, no. 6, pp. 536- 543, 
June 2000. 
 
[Smar06]http://www.bsac.eecs.berkeley.edu/archive/users/warneke-brett/SmartDust/, 
accessed on February 2006. 
 
[Sriv96] M. Srivastava, A. Chandrakasan, and R. Brodersen, ?Predictive System 
Shutdown and Other Architectural Techniques for Energy Efficient 
Programmable Computation,? IEEE Transactions on VLSI Systems, vol. 4, pp. 
42-55, March 1996. 
 
[Tiwa94] V. Tiwari, S. Malik, and A. Wolfe, ?Power Analysis of Embedded Software: A 
First Step toward Software Power Minimization,? Proceedings of the 1994 
IEEE/ACM International Conference on Computer-aided Design, November 
1994. 
 
 
113
 
[Tiwa96]  V. Tiwari, S. Malik, A. Wolfe, and M. Lee, ?Instruction Level Power Analysis 
and Optimization of Software,? Journal of VLSI Signal Processing, 13(2/3), pp. 
1-18, 1996. 
 
[Topc99] H. Topcuoglu, S. Hariri and M. Y. Wu, ?Task Scheduling Algorithms for 
Heterogeneous Processors,? In Proceedings of the 8
th
 Heterogeneous 
Computing Workshop, pp. 3-14, April 1999. 
 
[Topc02] H. Topcuoglu, S. Hariri and M. Y. Wu, ?Performance-effective and Low 
Complexity Task Scheduling for Heterogonous Computing Parallel and 
Distributed Systems,? IEEE Transactions on Parallel and Distributed Systems, 
vol. 13, no. 3, March 2002. 
 
[Wang97] L. Wang, H. J. Siegel, V. P. Rowchowdhury and A. A. Maciejewski, ?Task 
Matching and Scheduling in Heterogeneous Computing Environments Using a 
Genetic- Algorithm-Based Approach,? Journal of Parallel and Distributed 
Computing, vol. 47, pp. 8-22, November 1997. 
 
[Weis94] M. Weiser, B. Demers, and Shenker, ?Scheduling for reduced CPU energy,? 
Proc 1
st
 USENIX Symposium On operating Systems Design and 
Implementation, pp. 13-23, Nov 1994.  
 
[Yang01] P. Yang, C. Wong, P. Marchal, F. Catthoor, D. Desmet, D. Verkest, and R. 
Lauwereins, ?Energy-Aware Runtime Scheduling for Embedded-
Multiprocessor SOCs,? IEEE Design and Test of Computers, vol. 18, no. 5, pp. 
46-58, Sept./Oct. 2001.  
 
[Zhan02] Y. Zhang, D. Chen, ?Task Scheduling and Voltage Selection for Energy 
Minimization,? Design Automation Conference, pp.183-188, June 2002.