Taming the Scientific Big Data with Flexible Organizations for Exascale Computing

Tian, Yuan

Metadata Field	Value	Language
dc.contributor.advisor	Yu, Weikuan
dc.contributor.advisor	Klasky, Scott
dc.contributor.advisor	Qin, Xiao
dc.contributor.advisor	Umphress, David
dc.contributor.advisor	Shiwen, Mao
dc.contributor.author	Tian, Yuan
dc.date.accessioned	2012-07-31T21:04:03Z
dc.date.available	2012-07-31T21:04:03Z
dc.date.issued	2012-07-31
dc.identifier.uri	http://hdl.handle.net/10415/3295
dc.description.abstract	The last five years of supercomputers has evolved at an unprecedented rate as High Performance Computing (HPC) continue to progress towards exascale computing in 2018. These systems enable scientists to simulate scientific processes with high fidelity at large scale and consequently, often produce complex data that are also exponentially increasing in size. However, the growth within the computing infrastructure is significantly imbalanced. The dramatically increasing computing power is accompanied with the slowly improving storage system. Such discordant progress among computing power, storage and data has led to a severe I/O bottleneck for the advancing of scientific computing. While intensive research for the next generation storage is undergoing, a revolutionary upgrade to current back-end storage systems is not foreseeable in the near future. As a result, applications become more reliant on I/O software in hoping to alleviate the performance bottleneck through driving the storage system at its full speed. Efficient I/O for scientific big data is crucial for a successful transition into exascale for HPC. However, providing a high performance I/O at software layer is nontrivial. The large volume, high complexity and mismatch between the organization of scientific data and underlying storage system pose grand challenges for I/O software design. This dissertation investigates the characteristics of scientific data and storage system as a whole, and explores the opportunities to drive the I/O performance for petascale computing and prepare it for the exascale. To this end, a set of flexible data organization and management technique are introduced to address the I/O challenges from ve directions, namely system-wide data concurrency, in-node data organization, complex I/O patterns, time dimension analytics and asynchronous compression. For these purposes, four key techniques are designed to exploit the capability of the back-end storage system for processing and storing scientific big data with a fast and scalable I/O performance. It has been shown that these techniques can contribute to the real world scientific applications with enhanced I/O performance and scalability for end-to-end data flow. It also contributes as part of the solution towards scalable data management techniques while high performance computing is progressing into exascale.	en_US
dc.rights	EMBARGO_NOT_AUBURN	en_US
dc.subject	Computer Science	en_US
dc.title	Taming the Scientific Big Data with Flexible Organizations for Exascale Computing	en_US
dc.type	dissertation	en_US
dc.embargo.length	MONTHS_WITHHELD:6	en_US
dc.embargo.status	EMBARGOED	en_US
dc.embargo.enddate	2013-01-31	en_US

Files in this item

Name:: thesis-final.pdf
Size:: 2.926Mb

Show simple item record