This Is AuburnElectronic Theses and Dissertations

Show simple item record

RTAH: Resource and Thermal Aware Hadoop


Metadata FieldValueLanguage
dc.contributor.advisorXiao, Qin
dc.contributor.authorGautam, Dudeja
dc.date.accessioned2014-07-02T19:34:52Z
dc.date.available2014-07-02T19:34:52Z
dc.date.issued2014-07-02
dc.identifier.urihttp://hdl.handle.net/10415/4225
dc.description.abstractThe amount of unstructured data, also known as Big Data in Internet is growing every day. Because the Big data is unstructured, a large-scale distributed batch processing infrastructure like Hadoop is used instead of traditional databases. Hadoop is an open source framework, which uses MapReduce programming model to process large data set. Hadoop's true power lies in while working in a cluster of machines in data centers. Hadoop's masterslave architecture enables master node to control the slave nodes to store and process the data. When a client application submits a job to Hadoop, the scheduler in master node schedules tasks on every available slave to process the job in parallel fashion. Many existing Hadoop schedulers do not consider the workload distribution, its thermal impact and overall heat distribution in the data center which leads to unstructured increase in temperature and then massive power expenditure on cooling the data center which now stands about 25% of total investment in data centers.With the exponential increase in cooling costs of large-scale data centers, thermal management must be adequately addressed. Recent trends have discovered one of the critical reason behind the temperature rise turns out to be heat re-circulation within data center; where for a server i not only server i's workload but also its neighbor server's contribute in its temperature rise. Based on thorough investigations of Hadoop's available schedulers, we proposed a new resource and thermal aware scheduler that schedules tasks to minimize peak inlet temperature across all nodes and reduce power consumption by Air conditioning units and eventually cooling costs in data center. The proposed dynamic scheduler, schedules a job based on the current CPU, disk's utilization and number of tasks running and the feedback given by all slave nodes at run-time.en_US
dc.rightsEMBARGO_NOT_AUBURNen_US
dc.subjectComputer Scienceen_US
dc.titleRTAH: Resource and Thermal Aware Hadoopen_US
dc.typethesisen_US
dc.embargo.lengthNO_RESTRICTIONen_US
dc.embargo.statusNOT_EMBARGOEDen_US

Files in this item

Show simple item record