This Is AuburnElectronic Theses and Dissertations

Developing a Data Provenance System with Version Control Using Blockchain

Date

2021-08-04

Author

Mukhopadhyay, Ujan

Type of Degree

PhD Dissertation

Department

Computer Science and Software Engineering

Abstract

The origin and integrity of data is a concept that is of great importance. If the history of the data is not preserved, there is no way to ascertain data integrity and whether it was modified; if yes, the identity of the modifier and the frequency and nature of the modification; or if copies of the data have been made. Even in a closed system, the integrity of the data might be compromised, leaving room for the modifier to repudiate. There are not many systems that provide Data Provenance as a service. Of the few systems that do, none are large scale, robust or commercially available, and are limited in their performance as well. Git provides version control but is not equipped to provide data provenance, and even then, the use cases are limited to codes and documents. In this thesis, we propose a distributed system that combines Git and Blockchain to provide secure data provenance with version control. In the said system, multiple users create, access, and modify files that reside in a separate database. Every change is documented in the Blockchain, and every version of a file is stacked in the database.