Heuristics in Distributing Data and Parity with Distributed Hash Tables
Type of DegreeMaster's Thesis
Computer Science and Software Engineering
MetadataShow full item record
We compare multiple methods of distributing data and error correcting code across distributed hash tables. We focus on the scaling of distributed hash tables and at which methods moved the least amount of data while maintaining an even distribution. A common technique is to use erasure coding and storing pieces of files on separate hardware. This approach makes placement of pieces dependent on earlier placements. We identify several rules that when applied to standard methods reduces the amount of data moved while scaling dramatically. Even though CRUSH  includes these heuristics we found that tweaking its approach allowed it to migrate less data when changing the cluster layout.