Cooling Hadoop: Temperature Aware Schedulers in Data Centers

Kulkarni, Sanjay

Metadata Field	Value	Language
dc.contributor.advisor	Qin, Xiao
dc.contributor.advisor	Lim, Alvin
dc.contributor.advisor	Biaz, Saad
dc.contributor.author	Kulkarni, Sanjay
dc.date.accessioned	2013-01-07T20:47:33Z
dc.date.available	2013-01-07T20:47:33Z
dc.date.issued	2013-01-07
dc.identifier.uri	http://hdl.handle.net/10415/3464
dc.description.abstract	The amount of unstructured data, also known as “Big Data” in Internet is growing every day. Because the Big data is unstructured, a large-scale distributed batch processing infrastructure like Hadoop is used instead of traditional databases. Hadoop is an open source framework, which uses MapReduce programming model to process large data set. Hadoop's true power lies in while working in a cluster of machines in data centers. Hadoop's master-slave architecture enables master node to control the slave nodes to store and process the data. When a client application submits a job to Hadoop, the scheduler in master node schedules tasks on every available slave to process the job in parallel fashion. Many existing Hadoop schedulers do not consider the nature of the job, workload, power and temperature distribution in the data center, which is very critical and important to improve life of devices and cut down on cooling costs, which is about 25% of total investment in data centers. Based on thorough investigations of Hadoop's existing schedulers, we propose a couple of new thermal aware schedulers that schedules tasks to balance the outlet temperature across all nodes and reduce AC costs in data center. First is a dynamic scheduler, which schedules a job based on the CPU and disk's temperature and utilization feedback given by all slave nodes at run-time. Second is a static scheduler, which assigns tasks to slaves based on CPU and disk's temperature and stored job information. Both these schedulers are implemented on top of Hadoop's FIFO scheduler. We test our schedulers and FIFO scheduler by running a set of standard Hadoop benchmark applications like WordCount, DistributedGrep, PI at different temperature, utilization thresholds and cluster sizes. The experimental results show that our schedulers achieve average outlet temperature saving of 2 degree Celsius over the default FIFO scheduler that saves about 15% of cooling cost with little performance overhead.	en_US
dc.rights	EMBARGO_NOT_AUBURN	en_US
dc.subject	Computer Science	en_US
dc.title	Cooling Hadoop: Temperature Aware Schedulers in Data Centers	en_US
dc.type	thesis	en_US
dc.embargo.length	NO_RESTRICTION	en_US
dc.embargo.status	NOT_EMBARGOED	en_US

Files in this item

Name:: Cooling Hadoop-Temperature Aware Schedulers in Data Centers.pdf.txt
Size:: 117.5Kb

Name:: Cooling Hadoop-Temperature Aware Schedulers in Data Centers.pdf
Size:: 1.159Mb

Show simple item record