Cooling Hadoop: Temperature Aware Schedulers in Data Centers
Type of Degreethesis
MetadataShow full item record
The amount of unstructured data, also known as “Big Data” in Internet is growing every day. Because the Big data is unstructured, a large-scale distributed batch processing infrastructure like Hadoop is used instead of traditional databases. Hadoop is an open source framework, which uses MapReduce programming model to process large data set. Hadoop's true power lies in while working in a cluster of machines in data centers. Hadoop's master-slave architecture enables master node to control the slave nodes to store and process the data. When a client application submits a job to Hadoop, the scheduler in master node schedules tasks on every available slave to process the job in parallel fashion. Many existing Hadoop schedulers do not consider the nature of the job, workload, power and temperature distribution in the data center, which is very critical and important to improve life of devices and cut down on cooling costs, which is about 25% of total investment in data centers. Based on thorough investigations of Hadoop's existing schedulers, we propose a couple of new thermal aware schedulers that schedules tasks to balance the outlet temperature across all nodes and reduce AC costs in data center. First is a dynamic scheduler, which schedules a job based on the CPU and disk's temperature and utilization feedback given by all slave nodes at run-time. Second is a static scheduler, which assigns tasks to slaves based on CPU and disk's temperature and stored job information. Both these schedulers are implemented on top of Hadoop's FIFO scheduler. We test our schedulers and FIFO scheduler by running a set of standard Hadoop benchmark applications like WordCount, DistributedGrep, PI at different temperature, utilization thresholds and cluster sizes. The experimental results show that our schedulers achieve average outlet temperature saving of 2 degree Celsius over the default FIFO scheduler that saves about 15% of cooling cost with little performance overhead.