Show simple item record

dc.contributor.advisorQin, Xiao
dc.contributor.advisorLim, Alvin
dc.contributor.advisorBiaz, Saad
dc.contributor.authorKulkarni, Sanjay
dc.date.accessioned2013-01-07T20:47:33Z
dc.date.available2013-01-07T20:47:33Z
dc.date.issued2013-01-07
dc.identifier.urihttp://hdl.handle.net/10415/3464
dc.description.abstractThe amount of unstructured data, also known as “Big Data” in Internet is growing every day. Because the Big data is unstructured, a large-scale distributed batch processing infrastructure like Hadoop is used instead of traditional databases. Hadoop is an open source framework, which uses MapReduce programming model to process large data set. Hadoop's true power lies in while working in a cluster of machines in data centers. Hadoop's master-slave architecture enables master node to control the slave nodes to store and process the data. When a client application submits a job to Hadoop, the scheduler in master node schedules tasks on every available slave to process the job in parallel fashion. Many existing Hadoop schedulers do not consider the nature of the job, workload, power and temperature distribution in the data center, which is very critical and important to improve life of devices and cut down on cooling costs, which is about 25% of total investment in data centers. Based on thorough investigations of Hadoop's existing schedulers, we propose a couple of new thermal aware schedulers that schedules tasks to balance the outlet temperature across all nodes and reduce AC costs in data center. First is a dynamic scheduler, which schedules a job based on the CPU and disk's temperature and utilization feedback given by all slave nodes at run-time. Second is a static scheduler, which assigns tasks to slaves based on CPU and disk's temperature and stored job information. Both these schedulers are implemented on top of Hadoop's FIFO scheduler. We test our schedulers and FIFO scheduler by running a set of standard Hadoop benchmark applications like WordCount, DistributedGrep, PI at different temperature, utilization thresholds and cluster sizes. The experimental results show that our schedulers achieve average outlet temperature saving of 2 degree Celsius over the default FIFO scheduler that saves about 15% of cooling cost with little performance overhead.en_US
dc.rightsEMBARGO_NOT_AUBURNen_US
dc.subjectComputer Scienceen_US
dc.titleCooling Hadoop: Temperature Aware Schedulers in Data Centersen_US
dc.typethesisen_US
dc.embargo.lengthNO_RESTRICTIONen_US
dc.embargo.statusNOT_EMBARGOEDen_US


Files in this item

Show simple item record