Secure File Assignment in Heterogeneous Distributed Systems by Yun Tian A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Auburn, Alabama May 4, 2013 Keywords: Security, Performance, Heterogenous, Distributed Systems Copyright 2013 by Yun Tian Approved by Xiao Qin, Associate Professor of Computer Science and Software Engineering Drew Hamilton, Professor of Computer Science and Software Engineering Cheryl Seals, Associate Professor of Computer Science and Software Engineering Abstract There is a growing demand for large-scale distributed storage systems to support re- sourcesharing andfault tolerance. Althoughheterogeneity issues ofdistributed systems have been widely investigated, little attention has yet been paid to security solutions designed for distributed systems with heterogeneous vulnerabilities. This fact motivates us to investigate the topic of secure file assignment in heterogeneous distributed systems. Firstly we propose a secure fragment allocationscheme called S-FAS to improve security ofadistributedsystemwherestoragesiteshaveawidevarietyofvulnerabilities. IntheS-FAS approach, we integrate file fragmentation with the secret sharing technique in a distributed storage system with heterogeneous properties in vulnerability. Storage sites in distributed systems arecategorized intoavarietyofdi?erent types ofstoragenodebasedonvulnerability characteristics. Given a file and a distributed system, S-FAS allocates fragments of the file to as many di?erent types of nodes as possible in the system. Data confidentiality is preserved because fragments of a file are allocated to multiple storage nodes. We develop storage assurance and dynamic assurance models to evaluate quality of security o?ered by S-FAS. Analysis results show that fragment allocations made by S-FAS lead to enhanced security because of the consideration of heterogeneous vulnerabilities in distributed storage systems. In order to consider performance while providing higher quality of security for large scale distributed systems with heterogeneous features, we develop a Secure Allocation Pro- cessing (SAP) algorithm for the S-FAS scheme to improve the security level and consider its performance using the heterogeneous features of a large distributed system. To improve the security, the design of SAP is guided by the experimental results from S-FAS; to improve performance, we not only consider the heterogeneity of the storage nodes and the whole system, but also the heterogeneous features of the requests. The SAP allocation algorithm ii considers load balancing, delayed e?ects caused by the workload variance of many consecu- tive requests, and the heterogeneous features (such as CPU speed and network bandwidth) of the storage nodes in the system. In order to use practical implementations to demonstrate the ideas on actual systems with real-world applications, we developed a prototype using the multi-threading technique and C language for the S-FAS scheme with the SAP algorithm to guide the file allocation. The prototype is built in the distributed cluster environment with heterogeneous storage nodes, in which the Network File System (NFS) and Linux are installed. We did some experiments on system throughput and testing against real world traces. The evaluation results show that the proposed solution can not only improve the security level, but also im- prove thethroughputandperformanceofthedistributed storagesystems with heterogeneous vulnerabilities by using the multi-thread technique. To further explore thesecurity solution while considering system availability, we propose a solution called Reef by integrating fragment replication into the proposed S-FAS and SAP solution for distributed systems with heterogeneous features. The Reef scheme is extended based on the S-FAS scheme. In the proposed Reef scheme we consider the system failure modecaused byhardwarediversity when categorizing thestoragenodesintodi?erent groups. The storage assurance model for Reef is developed to evaluate the security quality o?ered by Reef when all fragments have the same replication degrees. Then we developed a secure fragment replication allocation process algorithm called R-SAP illustrating how to use the proposed Reef scheme. The evaluation results show that the proposed Reef scheme and R-SAP algorithm can improve both availability and security for distributed storage systems with heterogeneous vulnerabilities. iii Acknowledgments I would like to acknowledge and thank the many people whom, without their support, this work would not have been possible. First of all, I would like to express my deep and sincere gratitude to my advisor, Dr. XiaoQin. Withouthiswideknowledge, detailedandconstructiveguidance, generoussupport and warm encouragement, I would not have completed my PhD study and this dissertation research would have never been possible. His passionate attitude towards research and wonderful personality will have a remarkable influence on my entire career. I warmly thank Dr. Drew Hamilton and Dr. Cheryl Seals for their valuable advice on my dissertation. The extensive discussions and their insightful comments have significantly helped in improving the quality of this dissertation. I also wish to express my warm and sincere thanks to Dr. Shiwen Mao for serving as the outside reader and proofreading my dissertation. I have been working in a great research group. I would like to thank the group members: Xiaojun Ruan, Jiong Xie, Zhiyang Ding, Shu Yin, Yixian Yang, Jianguo lu, James Majors, Ji Zhang, Xunfei Jiang, Sanjay Kulkarni, Ajit Chavan, and Tausif Muza?ar who have helped me a lot in my research and study. Working with them is beneficial and pleasant. I also owe much gratitude to Dr. Daniela Marghitu, Dr. Kai Chang, and Dr. David Umphress for shaping my graduate student career for the better. I would also like to ac- knowledge the e?erts of Ms. Michelle Wheeles, Ms. Jo Ann Lauraitis, and Ms. Carol Lovvorn in helping me keep my school and immigration paper work in order. Furthermore, I have received warm friendship, generous help and patient mentoring from some other students also from the Department of Computer Science and Software iv Engineering. I would like to specially thank Qing Yang, Chengjun Wang, Shaoen Wu and Haiquan Chen for their help during my study life at Auburn. I owe my loving thanks to my husband Yuyun Zhan for his sincere understanding, considerate caring, always support and unconditional love. It would have been impossible for me to complete this dissertation without his encouragement and support. Last but not least, I am infinitely grateful to my mother, Ailian Wei and my father, Shouzhi Tian. They provided me with unconditional love, advice, and support. My par- ents have presented me with admirable role models of diligence, wisdom and perseverance. Despite our long distance relationship, my sisters, brothers and all the relatives in our big family overwhelmed me with more love and support throughout my years in college and graduate school than what I could have ever deserved. Especially thanks go to my sister for her unwavering support, mentoring and encouragement. She always has more reasons to appreciate life than to complain no matter whatever happens. v Table of Contents Abstract........................................... ii Acknowledgments...................................... iv ListofFigures ....................................... x ListofTables........................................ xv 1 Introduction...................................... 1 1.1 ProblemStatement................................ 1 1.1.1 SecurityofDistributedSystems..................... 2 1.1.2 HeterogeneityTrends........................... 4 1.1.3 Heterogeneous Vulnerabilities . ..................... 5 1.1.4 Another Two Design Goals: High Performance and Availability . . . 6 1.2 ResearchScope.................................. 7 1.3 Objectives..................................... 10 1.4 DissertationOrganization ............................ 11 2 LiteratureReview................................... 15 2.1 SecurityTechniquesforDistributedSystems.................. 15 2.2 FragmentationTechniques............................ 16 2.3 SecretSharing................................... 16 2.4 LoadBalancingtoImproveSystemPerformance................ 17 2.5 Replication Scheme for High Performance and Availability .......... 18 2.6 Observations.................................... 19 2.7 ComparisonofOurWorkwithExistingSolutions ............... 19 3 SecureFragmentAllocationSchemeS-FAS..................... 20 3.1 SystemandThreatModel............................ 22 vi 3.1.1 SystemModel............................... 22 3.1.2 ThreatModel............................... 25 3.2 S-FAS:ASecureFragmentAllocationScheme................. 26 3.2.1 Heterogeneity in the Vulnerability of Data Storage . .......... 26 3.2.2 AMotivationalExample......................... 26 3.2.3 DesignoftheS-FASScheme....................... 28 3.3 StaticandDynamicAssuranceModels ..................... 31 3.3.1 StaticStorageAssuranceModel..................... 31 3.3.2 DynamicAssuranceModel. ....................... 33 3.4 EvaluationofSystemAssurance......................... 34 3.4.1 ImpactofHeterogeneityonStorageAssurance............. 35 3.4.2 ImpactofSystemSizeonStorageAssurance.............. 36 3.4.3 ImpactofSizeofServerGroupsonStorageAssurance......... 37 3.4.4 Impact of Number n ofFileFragmentsonStorageAssurance..... 38 3.4.5 Impact of Threshold m onStorageAssurance ............. 39 3.4.6 Impact of P L onDynamicAssurance .................. 39 3.4.7 Impact of q onDynamicAssurance................... 42 3.5 ChapterSummary ................................ 42 4 SecureAllocationProcessing(SAP)AlgorithmforS-FAS............. 44 4.1 MotivationfortheSecureAllocationProcessing(SAP)Algorithm...... 44 4.2 FactorsA?ectingPerformanceandSecurity .................. 46 4.3 SAPAllocationAlgorithm............................ 49 4.4 AnAllocationExampleforSAP......................... 53 4.5 ChapterSummary ................................ 54 5 S-FASPrototype................................... 57 5.1 PrototypeDesign................................. 57 5.1.1 ArchitectureandModules ........................ 60 vii 5.1.2 DataFlow................................. 65 5.2 Experimental Evaluation on System Throughput ................ 66 5.2.1 EvaluationMethodology......................... 66 5.2.2 ExperimentResults............................ 67 5.3 ReadandWritePerformance........................... 71 5.4 ChapterSummary ................................ 76 6 Reef-A Replication Solution to Improve Availability ................ 77 6.1 Improve Availability in S-FAS Solution ..................... 78 6.1.1 Availability in Distributed System .................... 78 6.1.2 Make Heterogeneity valuable to Availability .............. 79 6.2 SystemModelandArchitecture......................... 80 6.2.1 ReplicationSchemeforHeterogeneousDistributedSystems...... 81 6.2.2 AMotivationExample.......................... 83 6.3 Reef:AFragmentReplicationSchemeforS-FAS................ 87 6.3.1 DesignGoalofReef............................ 87 6.3.2 DesignoftheReefScheme........................ 89 6.4 StaticAssuranceModelforReef......................... 92 6.5 EvaluationofSystemAssurance......................... 95 6.5.1 ImpactofReplicationDegreeonStorageAssurance.......... 95 6.5.2 ImpactofSystemSizeonStorageAssurance.............. 96 6.5.3 Impact of Number n ofFileFragmentsonStorageAssurance..... 97 6.6 R-SAP:TheAllocationAlgorithmforReef................... 99 6.6.1 Factors A?ecting Security, Availability and Performance ....... 100 6.6.2 TheDesignofR-SAP........................... 100 6.7 ChapterSummary ................................ 104 7 FutureResearchPlan................................. 105 viii 7.1 Task1: Dynamic Replica Reallocation in Heterogeneous Distributed Systems BasedonReefScheme .............................. 106 7.2 Task2: Considering Energy as a System Resource to Design Optimal Dynamic ReplicaReallocationAdaptiveSchemes..................... 106 7.3 Task3: Resource Scheduling and Management Considering Heterogeneity Na- tureinCloudComputing............................. 107 8 Conclusion....................................... 108 8.1 DissertationSummary .............................. 108 8.1.1 S-FAS:ASecureFragmentAllocationScheme............. 108 8.1.2 SAP:AnSecureFragmentAllocationProcessModule......... 109 8.1.3 A Prototype: The Implementation of S-FAS scheme and SAP Algorithm109 8.1.4 Reef: An Replication Scheme to Improve Availability ......... 109 8.2 Contributions................................... 110 Bibliography ........................................ 112 ix List of Figures 1.1 The Architecture of The Core Work ........................ 12 3.1 A distributed storage system is comprised of a set of cluster storage subsystems. Multiple fragments of a file can be stored either in storage nodes within a single cluster storage subsystem or in nodes across multiple cluster storage subsystems. SeeFig.3.2fordetailsonaclusterstoragesubsystem. .............. 22 3.2 A cluster storage subsystem consists of a number of storage nodes and a gate- way. Storage nodes are divided into di?erent server-type groups, each of which represents a level of security vulnerability. ..................... 23 3.3 A distributed storage system contains 16 storage nodes, which are divided into 4 server-type groups (or server groups for short), i.e., T 1 , T 2 , T 3 ,andT 4 .Servers in each group have the same level of security vulnerability. . . .......... 27 3.4 Possible insecure file fragment allocation decision made using a hashing function (see Eq. 11 in [82]): Server set 1 handles fragment f a , server set 2 handles fragment f b , and server set 3 handles fragment f c . Server set 1 contains storage nodes r 1 , r 4 , r 7 , r 1 0, r 1 3, and r 1 6; server set 2 contains storage nodes r 2 , r 5 , r 8 , r 11 ,andr 14 ; and server set 3 contains storage nodes r 3 , r 6 , r 9 , r 12 ,andr 15 . It is possible that fragments f a , f b ,andf c may be allocated to storage nodes that belong to the same server-type group. For example, the three fragments are respectively stored on nodes r 4 , r 8 ,andr 12 , which share the same vulnerability in server group T 4 . Rather than three attacks, one successful attack against server group T 4 allows unauthorized users to access the three fragments of file F.... 29 x 3.5 Heterogeneous system and homogeneous system using secret sharing scheme. In all the four test cases, N is set to 60. K issetto1,4,5,and6,respectively. When K is1,thereisonlyoneservergroupinthesystem............. 35 3.6 The impact of the system size N on storage assurance. .............. 36 3.7 The impact of server-group size on data storage assurance. The server-group size means the number of storage nodes in a server-type group. Note that the storage nodes within a server group share the same level of vulnerability. The server-groupsizevariesfrom12to18withanincrementof1. .......... 37 3.8 The impact of the number n of fragments of a file on storage assurance. The number n of fragments increases from 11 to 20. The parameters k and N are set to3and75,respectively................................ 38 3.9 ImpactofP L -theprobabilitythatafragmentmightbeintercepted byanattacker during the fragment?s transmission through an insecure link. P L is varied from 0 to 8?10 ?3 by increments of 1?10 ?3 . Threshold m isvariedfrom7to10.... 40 3.10 Impactofq -thenumberq offragmentstransmittedtoandfromastoragecluster. q is chosen from 0 to 6 with an increment of 1. Threshold m is set from 7 to 10) 41 4.1 The SAP fragment Partition and Sorting Step. The chosen secret sharing scheme (m,n)isemployed................................... 50 4.2 The SAP Data Flow. ................................. 51 4.3 AllocationLoadSample............................... 53 4.4 AlocationResult ................................... 55 xi 5.1 System architecture.The storage server manages metadata. The storage nodes are logically grouped into di?erent server types. Client nodes can directly access storageserversthroughthenetworkconnection................... 58 5.2 The Prototype design.The upper part of this figure represents the disks of a storage server where the storage node configuration file, file configuration file, and fragment configuration file are stored. The middle part is the memory part of the storage server node, where the key allocating module parts such as the SAP algorithm, multi-thread writing, and the three active link lists are stored. The three active lists records the most updated information for storage nodes, stored files and fragments in the system. The lower layer in the figure represent theclientnodesinthesystem. ........................... 59 5.3 The architecture of the system.The storage server manages metadata (e.g., con- figuration files for server nodes, stored files and fragments. The Storage node also runs the NFS).The storage nodes are logically grouped into di?erent server types (server nodes of the same server type share the similar vulnerabilities.) Client nodes can directly access storage servers through the network interconnect. . . . 61 5.4 The allocating module. The allocating module runs on the storage server node and directly communicate with the client nodes. The upper part of this figure represents the disks of the storage node where the server configuration file, file configuration file and the fragment configuration file are stored. The middle part is the memory part of the storage server node, where the key allocating module parts such as the SAP algorithm, multi-thread writing and the three active link lists are stored. The three active lists records the most updated information for server, stored file and fragments in the system. The lower layer in the figure representtheclientnodesinthesystem....................... 62 xii 5.5 The reads and writes/tracing module.The reads and writes/tracing module runs at the storage server node. The SAP reading and/or writing part deal with the requestsfromclientsortracerecords. ....................... 64 5.6 Impact of file size on system throughput. ...................... 68 5.7 Impact of fragment number on system throughput. ................. 68 5.8 File Size Impact on Throughput of Systems Including Di?erent Number of Stor- ageNodes....................................... 69 5.9 File Size and Number of Storage Nodes Impact on System Throughput(1). . . . 70 5.10 File Size and Number of Storage Nodes Impact on System Throughput(2) . . . 71 5.11 File Size Impact on Read-Only Trace Processing Time. .............. 72 5.12 File Size Impact on Write-Only Trace Processing Time. .............. 73 5.13 File Size Impact on read and Write Trace Processing Time. ........... 74 5.14 Fragment Number Impact on Read-Only Trace Processing Time. ......... 74 5.15 Fragment Number Impact on Write-Only Trace Processing Time. ........ 75 5.16 Fragment Number Impact on Read and Write Trace Processing Time. ...... 75 6.1 A distributed storage system 2 contains 25 storage nodes, which are categorized into 5 server-type groups (or server groups for short), i.e., T 1 ,T 2 ,T 3 ,T 4 ,andT 5 . Servers in each group have the same level of security vulnerability. ........ 85 6.2 Possible insecure file fragment allocation ...................... 86 6.3 Replication Degree Impact on Assurance ...................... 96 xiii 6.4 System Size Impact on Assurance ......................... 98 6.5 Number of Fragments Impact on Assurance .................... 99 6.6 Creating a Three-dimension Decreasing Ordered Replica List ........... 102 6.7 Data Flow of the R-SAP Algorithm ......................... 103 xiv List of Tables 3.1 Notationusedinthesystemandmodels....................... 24 4.1 NotationusedintheSAPAlgorithm......................... 47 4.2 FeaturesofExampleFiles.............................. 54 4.3 FeaturesofaHeterogeneousStorageSystemExample............... 54 4.4 AllocationResultbytheSAPAlgorithm...................... 56 6.1 Notationusedinthesystemandmodels....................... 81 6.2 NotationusedintheSAPAlgorithm......................... 101 xv Chapter 1 Introduction With the wide use and development of distributed computing, parallel computing and cloud computing, the security issues for such applications have emerged as primary and even bottlenecks for systems dealing with large amount of sensitive data. With the increasing of the storage node number in distributed storage systems, the heterogeneous feature is becoming more common. There are a lot of solutions exist to solve the security problems based on di?erent aspects of the systems. Although heterogeneity issues of distributed systems have been widely investigated, little attention has yet been paid to security solutions designed for distributed storage systems with heterogeneous vulnerabilities. Based on the heterogeneous trend of distributed systems, we believe that there are methods to make use of heterogeneity of such systems to improve system security while considering system performance and availability. The objective of this dissertation work is to investigate how to use heterogeneity among distributed storage nodes to improve system security and at the same time to improve system performance and availability. This chapter first presents the problem statement in Section 1.1. In Section 1.2, we describe the scope of this research. Section 1.3 highlights the main objectives of this dissertation, and Section 1.4 outlines the dissertation organization. 1.1 Problem Statement In this section, we start with an overview of the security problem and techniques of distributed systems. In Section 1.1.2, we describe the heterogeneity trend of larger and scalable distributed storage systems. Section 1.1.3 specifically discusses the heterogeneous 1 vulnerabilities of distributed systems. Then, we present our another two design goals(except security) of performance and availability in distributed storage systems in Section 1.1.4. 1.1.1 Security of Distributed Systems Distributed systems which provide information sharing and large scale data storage, have enormous impact on our society. The security demand for such distributed systems is becoming increasingly important. Theconsequences ofthefailuresofthedistributed systems can have high cost, from financial resources loss to even human lives loss. Modern large scale distributedsystemsservices mustprovideguaranteesforprotectingservices againstmalicious threats [99]. However, distributed systems are more vulnerable to threats than centralized systems, since it is di?cult to control processing activities of the distributed systems and information can be accessed over networks [142]. A variety of factors have impact on distributed system security including network topol- ogy, storage nodes, data placement, distributed system physical security environment, dis- tributed system management structure, and interactions between di?erent security mecha- nisms [88]. Except the popular and traditional security techniques like authentication [135] and access control [107] [86], many security approaches have been developed to improve se- curity level of distributed systems corresponding to the above mentioned influential factors on security. Some of the security techniques are based on the system architecture level. For example, Benson, G. and Appelbe, W. et al. proposed a hierarchical model for dis- tributed system security [18]; Naqvi, S. and Riguidel, M. developed a security architecture for heterogeneous distributed computing systems [87]; Soshi, M. and Maekawa, M. proposed a security architecture for open distributed systems [120]. Some security techniques are de- veloped by building secure system models. For example, Noeparast, E.B. and Banirostam, T. proposed a cognitive model of immune system for increasing security in distributed sys- tems [90]; Ching Lin and Varadharajan, V. developed a hybrid trust model for enhancing 2 security in distributed systems [73]; Varadharajan, V. and Black, S. proposed amultilevel security model for a distributed object-oriented system [130]. Intrusion detection isa security technique toimprove system security by identification of unauthorizeduse,orintrusionattemptincomputernetworkenvironment[113][113][146][3][103]. Intrusion detection is one of the most common security measures that enterprises use to protect themselves from malware, worms and all other types of cyber attacks. Voko- rokos, L. and Chovanec, M. et al. proposed a technique of security distributed intrusion detection system based on multisensor fusion [132]. Their approach is based on behav- ior prediction of attacker and legitimate user of computer network and they suggests use of customized DIDS (distributed intrusion detection systems), which combines distributed monitoring using external sensors and centralized data analysis for more accurately iden- tification of security events. Many intrusion detection methods are based on di?erent agents[28][57][26][152][136][111][75][58]. Avarietyofotherintrusiondetectiontechniques have been proposed [83] [83] [49] [110] [104] [123]. Intrusion detection is not enough in many cases. It must have some idea of the attack signature before it can defend against it in order for an intrusion detection system (IDS) to be e?ective. Networks remain especially vulnerable to new forms of attacks, since it is impossible to have a signature for an attack that hasn?t been seen yet. In addition, there are 120K malware incidents identified per day by these tools, with 5 - 20 new malware strains missed every day. With all these new threats, not to mention the highly targeted Advanced Persistent Threats, IDS is simply incapable of protecting the distributed system enough. Fault tolerance is another important and common technique to improve distributed system security and availability. It is the property that enables a distributed system to continue operating properly in the event of the failure of (or one or more faults within) some of its components [143] [51] [126] [105]. A variety of techniques providing fault tolerance for distributed systems have been proposed [150] [147] [40] [77] [14] [55] [65]. Replication is one of the methods to provide fault tolerance [31] [114] [119] [50]. 3 Although these techniques are able to provide a certain level of security for distributed systems, the conventional security techniques lack the ability to express heterogeneity in security services [142]. With the increasing of size and scalability modern large distributed systems, heterogeneity is a more and more obvious trend. 1.1.2 Heterogeneity Trends A distributed computing system is a collection of independent computers linked by a computer network that appear to the users of the system as a single coherent system [2]. Cluster computing is a type of distributed computing. A computer cluster consists of a set of loosely connected or tightly connected computers that work together so that in many respects they can be viewed as a single system [1]. Currently distributed cluster computing is becoming more popular in a lot of application fields both in academic and industry such as bioinformatics, weather forecasting, social networks and other web-based applications. One of the advantages for distributed computing systems is that they can easily grow in- crementally by adding more machines to the systems as requirements on processing power grow. The large data and high demand drive the sizes of distributed systems to increase very quickly and to become more scalable and heterogeneous. When the sizes of distributed cluster computing systems grow, the heterogeneous fea- tures [4] also grow such as available bandwidth, processor speed, disk capacity, security, failure rate, and pattern of failures among the storage nodes and network condition. On the other hand heterogeneous features of the di?erent applications that run on such systems are increasing at the same time. The data for di?erent applications may have di?erent size, access rate, and di?erent quality of security and performance requirements. 4 1.1.3 Heterogeneous Vulnerabilities Heterogeneous distributed systems have been applied to security sensitive applications, such as banking systems and digital government, which require new approaches to secu- rity[142]. Heterogeneityissuesofdistributedsystemshavebeenwidelyinvestigated[19][84][15] [85] [41]. There are a lot of factors that a?ect distributed system security both in hardware and software [89]. The traditional security techniques for distributed systems include ac- cess control, security threat detection, authentication, authorization, and fault tolerance et al..There is an increasing demand to develop large-scale distributed storage systems support- ing data-intensive services that provide resource sharing and fault tolerance. The confiden- tiality of security-sensitive files must be preserved in modern distributed storage systems, because distributed systems are exposed to an increasing number of attacks from malicious users [101]. Although there exist many security techniques and mechanisms (for example, [81] and [148]), itisquite challenging to secure datastored in distributed systems. In general, security mechanisms need to be built for each component in a distributed system, then a secure way of integrating all the components in the system must be implemented. It is critical and important to maintain the confidentiality of files stored in a distributed storage system when malicious programs and users compromise some storage nodes in the system. The file fragmentation technique is often used in many distributed and parallel systems to improve availability and performance. Several file fragmentation schemes have been pro- posed to achieve high assurance and availability in a large distributed system [82][137]. In real-worlddistributedsystems, thefragmentationtechnique isusuallycombined with replica- tion to achieve better performance at the cost of increased security risk to data stored in the systems. A practical distributed system normally contains multiple heterogeneous servers providing services with various vulnerabilities. Unfortunately, the existing fragmentation algorithms do not take the heterogeneity issues into account. 5 In addition to cryptographic systems, secret sharing is an approach to providing data confidentiality by distributing a file among a group of n storage nodes, to each of which a fragment of the file is allocated. The file can be reconstructed only when a su?cient number (e.g., more than k) of the fragments are available to legitimate users. Attackers are unable to reconstruct a file using the compromised fragments, if a group of servers are compromised and fewer than k fragments are disclosed. In a large-scale distributed system, di?erent storage sites have a variety of ways to protect data. The same security policy may be implemented in various mechanisms. Data encryption schemes may vary; even with the same encryption scheme, key lengths may vary across the distributed system. The above mentioned factors can contribute to di?erent vulnerabilities among storage sites. Although security mechanisms deployed in multiple storage sites can be implemented in a homogeneous way, di?erent vulnerabilities may exist due to heterogeneities in computational units. There are a bunch of work and research have been done on distributed system secu- rity [64] [52] [60] [67] [16] [102]. However little attention has been paid to security solutions designed for distributed storage systems by making use of the heterogeneous vulnerabilities. This problem motivates us to focus on heterogeneity issues concerning security mechanisms of distributed storage systems. 1.1.4 Another Two Design Goals: High Performance and Availability The design goals of distributed systems include high security, performance, availability, and the like. Many solutions have been proposed to achieve these goals in distributed sys- tems. In the previous section, we summarized the current security techniques for distributed systems. In this part, we introduce the past and current research on high performance and availability in distributed systems. 6 A handful studies has been done to improve performance for various types of distributed systems to support di?erent applications [125] [43] [23] [133] [76] [23] [140]. The heteroge- neous features of distributed systems have been well investigated to improve performance. For example, Jiong et al. proposed a solution to improve MapReduce performance through data placement in heterogeneous Hadoop clusters. [140] Reliability and availability are another two very important requirements for high quality service of distributed systems. There are a wide range of approaches to improving system reliability [33] [21] and availabil- ity [92] [13] [7] [53]. Data replication techniques are very popular solutions in distributed systems to improve system availability. Many studies have been carried out to investigate datareplication techniques andtheir applications [29][78][153][100][106][20]. Indata repli- cation scheme there is a very important parameter called replication degree which indicates the number of replicated copies for each piece of data. The appropriate value of replica- tion degree is very important from the availability?s perspective. If too many replica copies are created in distributed system, the security risk is increased due to increased chances of being be attacked by hackers. On the other hand, increasing replica numbers significantly consume distributed system resources including storage space, power and maintenance cost. A few studies have focused on determining the best replication degree for distributed sys- tems [154] [94] [96]. A lot of work has been done to improve security, performance, or availability of high- quality online services. Unfortunately, there isvery littlework that hasbeen done toimprove security by making use of heterogeneity of distributed systems through file assignment. 1.2 Research Scope To address the above mentioned limitations, we propose a file fragmentation and al- location approach to improving assurance, scalability and performance of a heterogeneous distributed system. 7 Westarttoaddresssecurityheterogeneityissuesbydividingstorageserversintodi?erent server-type groups. Each server type represents a level of security vulnerability. In a server- type group, storage servers with the same vulnerability share the same weakness that allows attackerstoreducetheservers? informationassurance. Althoughitmaybedi?culttoclassify all servers in a system into a large number of groups, a practical way of identifying server typesistoorganizethesewithsimilarvulnerabilitiesintoonegroup. Ifoneormorefragments ofa filehave been compromised, it isstill very hard fora malicious user to reconstruct thefile from the compromised fragments. Our solution is di?erent from those previously explored, becauseourapproach utilizes heterogeneousfeaturesregardingvulnerabilities amongservers. To evaluate our method for fragment allocations, we develop static and dynamic assur- ancemodelstoquantifytheassuranceofaheterogeneousdistributedstoragesystemhandling data fragments. Experimental results show that increasing heterogeneity levels can improve file assurance in a distributed storage system. We investigate the possible parameters that influence both the system performance and the proposed S-FAS scheme. There are three aspects in a distributed storage system that influence its security and performance, workload, storage nodes and network interconnects. There are di?erent elements of each aspect of the system. We extracted the possible key elements of each aspect by analyzing of the proposed S-FAS scheme. Based on the both performance and security analysis, we developed a secure allocating processing (SAP) algorithm for the S-FAS scheme to both improve the security level and consider system performance by using theheterogeneous featureof largedistributed systems. The SAP allocation algorithm considers load balancing, delays caused by the workload vari- ance of many consecutive requests, and the heterogeneous nature of storage nodes in a system. In order to use practical implementations to demonstrate the ideas on actual systems with real-world applications, we developed a prototype using the multi-threading technique and C language for the S-FAS scheme with the SAP algorithm to guide the file allocation. 8 The prototype is built in the distributed cluster environment with heterogeneous storage nodes, in which the Network File System (NFS) and Linux are installed. We did some experiments on system throughput and testing against real world traces. The evaluation results show that the proposed solution can not only improve the security level, but also im- prove thethroughputandperformanceofthedistributed storagesystems with heterogeneous vulnerabilities by using the multi-thread technique. To further explore the security solution while considering system availability, we pro- pose Reef, which extends S-FAS by incorporating replicas of file fragments. Reef is designed to improve the distributed storage system security and availability by integrating the frag- mentation technique, secret sharing, and fragment replication. Reef considers heterogeneity features of distributed storage systems during the replica placement phase. The Reef scheme is an extension of the S-FAS scheme. The system model for Reef is similar to that of S-FAS except that Reef address the system failures mode and aims to improve system reliability in addition to security. We build a static assurance model to quantitatively evaluate the system assurance for the Reef scheme. We also developed a replica allocation process algo- rithm called R-SAP to demonstrate how does the proposed Reef scheme work. In the Reef design, we addressed the distributed system security, performance, as well as availability. To evaluate the assurance provided by Reef, we studied the impacts caused by replication degree, system size, and the number of fragments. In this dissertation study, we only focus on the static replica placement solution. Dy- namic replica reallocation schemes are essential to achieve high performance and availability of distributed systems, especially for internet based applications and services. In a dynamic wide-area environment, client access patterns, network conditions, andservice characteristics are constantly changing. We plan to study and propose a dynamic replica reallocation ap- proach for heterogeneous distributed system by extending Reef to address the heterogeneous vulnerabilities in the large scale distributed systems. 9 1.3 Objectives The following are five main objectives that we plan to realize with this study: 1. We aim to address the heterogeneous vulnerability issue by categorizing storage nodes of a distributed system into di?erent server-type groups based on their vulnerabilities. Eachserver-typegroup-representingalevelofvulnerability-willcontainstoragenodes with the same security vulnerability. We will propose a secure fragmentationallocation scheme called S-FAS to improve security of a distributed system where storage nodes have a wide variety of vulnerabilities. 2. We plan to develop the storage assurance and dynamic assurance models to quan- tify information assurance and to evaluate the proposed S-FAS scheme. We will find principles to improve assurance levels of heterogeneous distributed storage systems. The principles will be general guidelines to help designers achieve a secure fragment allocation solution for distributed systems. 3. We plan to develop a secure allocating processing (SAP) algorithm to improve security andsystem performance byconsidering theheterogeneous featureof a largedistributed system. 4. In order to conduct the performance analysis for the S-FAS scheme and SAP allocat- ing algorithm, we will develop a prototype for the S-FAS scheme and SAP algorithm. We will also implement the prototype and conduct some experiments on the through- put of the proposed scheme and algorithm. We will do some experiments on system throughput and testing against real world traces. 5. Last, to further explore the security solution while considering system availability, we will propose Reef, which extends S-FAS by incorporating replicas of file fragments. Reef is aimed to improve the distributed storage system security and availability by integrating the fragment replication technique, secret sharing, fragment replication. 10 Reef will consider heterogeneity features of distributed storage systems during the replica placement phase. The Reef scheme will be an extension of the S-FAS scheme. The system model for Reef should be similar to that of S-FAS except that Reef address the system failures mode and is aimed to improve system reliability in addition to security. We plan to build a static assurance model to quantitatively evaluate the system assurance for the Reef scheme. We will also develop a replica allocation process algorithm called R-SAP to demonstrate how does the proposed Reef scheme work. In the Reef design, we plan to address the distributed system security, performance, as well as availability. To evaluate the assurance provided by Reef, we will study the impacts caused by replication degree, system size, and the number of fragments. The architecture of the core work of this dissertation is outlined in Fig. 1.1. Our original motivation and objective for this research are security techniques for distributed heteroge- neous systems. Performance and availability are another two very important desired pros- perities in distributed systems. Security, performance and availability prosperities usually conflict with each others. We discover that by making use of the heterogeneous features in distributed systems we can improve not only security, but also performance and availability, so we further address the performance and availability issues based on our proposed security solution. 1.4 Dissertation Organization This dissertation is organized in the following manner: Chapter 2 reviews related work and presents the comparison of our work with existing solutions. Chapter 3 describes S-FAS - a secure fragmentation allocation scheme by make use of the heterogeneous feature among storage nodes in distributed systems. In the S-FAS approach, we integrate file fragmentation with the secret sharing technique in a distributed storage system with heterogeneous vulnerabilities. Storage sites in a distributed systems are 11 Figure 1.1: The Architecture of The Core Work 12 classified into a variety of di?erent server types based on vulnerability characteristics. Given a file and a distributed system, S-FAS allocates fragments of the file to as many di?erent types of nodes as possible in the system. Data confidentiality is preserved because fragments of a file are allocated to multiple storage nodes. Storage and dynamic assurance models are developed to evaluate the quality of security o?ered by S-FAS. Analysis results show that fragment allocations made by S-FAS lead to enhanced security because of the consideration of heterogeneous vulnerabilities in distributed storage systems. Chapter 4 presents a Secure Allocation Processing (SAP) algorithm for the S-FAS scheme to improve the security level and consider its performance using the heterogeneous featuresofalargedistributedsystem. Toimprovethesecurity, thedesignofSAPisguidedby the experimental results from S-FAS; to improve performance, we not only consider the het- erogeneity of the storage nodes and the whole system, but also the heterogeneous features of the requests. The SAP allocation algorithm considers load balancing, delayed e?ects caused by the workload variance of many consecutive requests, and the heterogeneous features (such as CPU speed and network bandwidth) of the storage nodes in the system. Chapter 5 presents a prototype using the multi-threading technique and C language for the S-FAS scheme with the SAP algorithm to guide the file allocation. The prototype is built in the distributed cluster environment with heterogeneous storage nodes, in which the Network File System (NFS) and Linux are installed. Some experiments on system through- put and testing against real world traces are presented. The evaluation results show that the proposed solution can not only improve the security level, but also improve the throughput and performance of the distributed storage systems with heterogeneous vulnerabilities by using the multi-thread technique. Chapter 6 proposed a solution called Reef to improve the distributed storage system security and availability by integrating the fragmentation technique, secret sharing, and fragment replication. Reef considers heterogeneity features of distributed storage systems during the replica placement phase. The Reef scheme is an extension of the S-FAS scheme. 13 The system model for Reef is similar to that of S-FAS except that Reef address the system failuresmodeandaimsto improve system reliability inadditiontosecurity. Webuildastatic assurance model to quantitatively evaluate the system assurance for the Reef scheme. We also developed a replica allocation process algorithm called R-SAP to demonstrate how does the proposed Reef scheme work. In the Reef design, we addressed the distributed system security, performance, as well as availability. To evaluate the assurance provided by Reef, we studied the impacts caused by replication degree, system size, and the number of fragments. Chapter 7 presents the future work directions based on the ideas contained in the dis- sertation. Chapter 8 summarizes the main contributions of this dissertation. 14 Chapter 2 Literature Review In this chapter, we summarize the previous literatures that are most relevant to the research in terms of security, performance, and availability in distributed cluster computing systems. Section 2.1 introduces related work on security solutions in distributed systems; Section 2.2 presents the related work on fragmentation techniques in distributed storage systems; Section 2.3 introduces the secret sharing security solution; Section 2.4 presents the related work on load balancing for high performance distributed cluster storage systems; Section 2.5 presents the related work on replication scheme to improve performance and availability for distributed systems. Section 2.6 presents our observations from the existing solutions. Section 2.7 presents the comparison of our work with the existing solutions. 2.1 Security Techniques for Distributed Systems Muchresearchhasbeenperformedtoimprovesecurityofdistributedandhigh-performance computing systems such as Grids. For example, Pourzandi et al. proposed a structured se- curity approach that incorporates both distributed authentication and distributed access control mechanisms [101]. Intrusion detection techniques have been widely used to provide basic assurance of security in distributed systems. However, most intrusion detection techniques areinadequate to protect data stored in distributed systems [63]. One of the most e?ective approaches to improving information assurance in distributed systems is intrusion tolerance [35] [122][137]. To enhance security assurance, researchers have developed a range of intrusion-tolerant tools and mechanisms. The fragmentation technique summarized below is one of the intrusion tolerance methods that can be used in combination with intrusion detection techniques. 15 2.2 Fragmentation Techniques Afragmentationtechniquepartitionsasecuritysensitivefileintomultiplefragmentsthat aredistributedacrossdi?erent storageservers inadistributedsystem. Alotoffragmentation schemes have been proven to be valuable tools for improving the security of data stored in distributed systems (see [66]). Many fragmentation approaches aim to improve availability and performance of distributed systems by applying data replication methods. For example, Dabek et al. developed a wide-area cooperative storage system in which a fragmentation scheme was implemented to improve availability and to facilitate load balancing [32]. Although combining a fragmentation and replication scheme can enhance performance and availability, data replications may increase security risks due to an increasing number of file fragments handled by distributed storage servers. A file is more likely to be compromised when more replications of the file are stored in distributed storage servers. Existing file fragmentation technologies are inadequate in addressing the issue of het- erogeneous vulnerabilities in large-scale distributed systems. Our preliminary results show that security can be improved in a distributed storage system when a fragmentation scheme incorporates the heterogeneous-vulnerability feature with our S-FAS scheme. 2.3 Secret Sharing Secret sharing?independently invented by Shamir and Blakley?is a method of distribut- ing a secret among a group of participants, each of which is allocated a share of the secret. The secret can be successfully reconstructed only when a su?cient number of shares are collected and combined [98][112]. Shamir proposed the (k,n) secret sharing scheme that divides data D into n pieces in such a way that D can be easily reconstructed from any k pieces. If fewer than k pieces are disclosed, no one can reconstruct D from the revealed pieces. 16 The secret sharing scheme has been extended and employed in di?erent application do- mains [116]. For example, Bigrigg et al. proposed an architecture called PASIS for secure storage systems. The PASIS architecture integrates the secret sharing scheme with informa- tion dispersal to improve security, integrity and availability [138][149]. In a storage system with PASIS, even if an attacker compromises a limited (i.e., fewer than the threshold) sub- sets of storage nodes, the confidentiality of data stored in the system is still preserved. The aforementioned secret-sharing solutions designed for distributed storage systems ignore the issue of heterogeneous vulnerabilities. This fact motivates us to extend the secret sharing scheme by considering heterogeneity of vulnerabilities in the context of distributed storage systems. 2.4 Load Balancing to Improve System Performance We can classify existing load balancing approaches into di?erent categories such as static and dynamic or homogenous and heterogeneous. The focus of homogeneous load balancing schemes is to improve the performance of homogeneous parallel and distributed systems. On the other hand, heterogeneous load balancing approaches attempt to boost the performance of heterogeneous clusters, which comprise a variety of nodes with di?erent performance characteristics in computing power, memory capacity, and disk speed. A static load balancing scheme for Computational Fluid Dynamics simulations on a network of het- erogeneous workstations has been studied by Chronopoulos et al [27]. Their load-balancing algorithm takes both the CPU speed and memory capacity of workstations into account. To dynamically balance computational loads in a heterogeneous cluster environment, Cap and Strumpen explored heterogeneous task partitioning and load balancing. Xiao et al [139]. have investigated an approach that considers both system heterogeneity and e?ective usage of memory resources, thereby minimizing both CPU idle time and the number of page faults in heterogeneous systems. 17 2.5 Replication Scheme for High Performance and Availability Data replication is a fundamental technique to increase data availability, reliability and robustness in distributed systems. A number of factors including replication degree, replica placement and maintenance strategies have impact on the e?ect of data replication. There is a lot of work have been done studying the impact from replication degree [72], replica place- ment [45] [9] [115] [131], and replication maintenance [155]strategies and overhead [145] [46]. However, data replication brings the issue of data consistency and many strategies have been proposed to solve the data consistency issue caused by data replication [6] [8] [5] [144] [12]. High performance is always highly demand in distributed systems. Many factors influ- ence performance of a distributed system including quality of hardware, software, network connectivity, data management and scheduling strategies, and etc.. High performance has been investigated from various aspects [25] [24] [109] [118]. Data replication is one of the approaches that not only improve availability, but also improve performance of a distributed system. Many techniques have been developed to boost the performance of distributed systems where data replication is applied [91] [34] [38] [151]. Availability is usually measured as a factor of the reliability of a distributed systems. Availability is one of the desired properties [62] and di?erent approaches have been pro- posed to improve availability in distributed systems [54] [36] [42] [10] [128] [95]. Except data replication technique, sophisticated management, load balancing and recovery techniques are needed to achieve high availability amidst an abundance of failure sources that include software, hardware, network connectivity, and power issues [44]. Data placement and repli- cation strategies are among the top list of multiple design choices while data replication is chosen to improve availability of a distributed system [44]. A variety of research has been done to study e?cient data placement [97] and replication strategies [127] [61]. Data replication can improve availability, but it brings the issue of data consistency. A lot of work have been done to study the tradeo? between availability and data consistency. However, there is very little work has been done to study the tradeo? between availability 18 and security while replication scheme is used. The more replicas of a file or fragment are stored, the bigger the risk is that the file or fragment is compromised by hackers. In this dissertation, we will address the tradeo? issue between availability and security due to data replication in heterogeneous distributed systems. 2.6 Observations We observe that vulnerabilities of storage nodes in a distributed system are more het- erogeneous due to the following four main reasons. First, storage nodes have di?erent ways to protect data. Second, a security policy can be implemented in a variety of mechanisms. Third, the key length of an encryption scheme may vary across multiple storage nodes. Fourth, heterogeneities exist in computational units of storage sites. There is very little work has been done to boost security by making use of heterogeneous features in distributed systems. We believe that future security mechanisms for distributed systems must be aware of vulnerability heterogeneities. Withthepreviouslymentionedlimitationsofexistingtechniquesdesignedfordistributed systems, we propose our investigation results and solution for distributed systems with het- erogeneous systems in this dissertation. 2.7 Comparison of Our Work with Existing Solutions Our proposed security schemes, assurance evaluation models, secure file allocation algo- rithms and prototype in this dissertation are di?erent from existing solutions, because our approach aim to incorporate the heterogeneous vulnerabilities of distributed systems into file fragment allocation and secret sharing. Our solution captures heterogeneous features of the nodes regarding vulnerabilities to improve security. At the same time, our techniques also consider to boost performance and availability of distributed systems. 19 Chapter 3 Secure Fragment Allocation Scheme S-FAS In previous two chapters we present the introduction of this dissertation and the related work. In this Chapter we introduce our first work of a fragment allocation scheme called S-FAS in to improve security of a distributed system where storage sites have a wide variety of vulnerabilities. Distributed storage systems are becoming ubiquitous because of the large amount of data required for search engines, multimedia websites, and data-intensive high-performance computing [39] [48]. These distributed storage systems typically are at high risk and inef- ficient concerning the high confidential and performance requirements of the storage data. Security isoneofthekey qualities thatmost customers careabout. Withoutacertainlevel of security a storage system is useless for a lot of applications of high confidential requirements. With more and more personal, commercial, governmental and scientific data needing to be stored in distributed systems, it is important to enhance the security level of distributed storage systems. A handful of traditional or novel techniques developed to improve security in storage systems include authentication, authorization, fragmentation techniques, secret sharing, era- sure coding. These security techniques can significantly enhance the assurance level of the distributed system. With the increasing of the storage node number in distributed storage systems, the heterogeneity feature is becoming more common.Vulnerabilities of storage nodes in a dis- tributed system are heterogeneous in nature due to the following four main reasons. First, storage nodes have di?erent ways to protect data. Second, a security policy can be imple- mented in a variety of mechanisms. Third, the key length of an encryption scheme may 20 vary across multiple storage nodes. Fourth, heterogeneities exist in computational units of storage sites. We believe that future security mechanisms for distributed systems must be aware of vulnerability heterogeneities. Although heterogeneity issues of distributed systems have been widely investigated, little attention has yet been paid to security solutions de- signed for distributed storage systems with heterogeneous vulnerabilities. Since the existing security techniques developed for distributed systems are inadequate for distributed systems with heterogeneity in vulnerabilities, the focus of this study is heterogeneous vulnerabilities in largescale distributed storage systems. The goal of our research is to develop a Secure Fragment Allocation Scheme (S-FAS) to fully make use of the feature of heterogeneous vulnerabilities among large scale distributed storage systems where storage sites have a widevariety of vulnerabilities. In the S-FAS approach, we integrate file fragmentation with the secret sharing technique in a distributed storage system with heterogeneous vulnerabilities. Storage sites in a distributed systems are classified into a variety of di?erent server types based on vulnerability characteristics. Given a file and a distributed system, S-FAS allocates fragments of the file to as many di?erent types of nodes as possible in the system. Data confidentiality is preserved because fragments of a file are allocated to multiple storage nodes. We develop storage assurance and dynamic assurance models to evaluate the quality of security o?ered by S-FAS. Analysis results show that fragment allocations made by S-FAS lead to enhanced security because of the consideration of heterogeneous vulnerabilities in distributed storage systems. The rest of the chapter is organized as follows: We discuss in Section 3.1 the system and threat model. Section 3.2 describes the design of the S-FAS Scheme. In Section 3.3 we presents the static and dynamic assurance models for S-FAS.Then, Section 3.4 shows assurance evaluationresults ofS-FAS.Finally, Section 6.7concludesthechapter and presents our future research directions. 21 Cluster Storage 1 Cluster Storage 3 Cluster Storage 2 Cluster Storage 4 Cluster Storage 5 Figure 3.1: A distributed storage system is comprised of a set of cluster storage subsystems. Multiple fragments of a file can be stored either in storage nodes within a single cluster storage subsystem or in nodes across multiple cluster storage subsystems. See Fig. 3.2 for details on a cluster storage subsystem. 3.1 System and Threat Model We firstly outline the system and threat models that capture main characteristics of distributed storage systems. The system model is used as a basis to design the S-FAS fragmentation allocation scheme, whereas the threat model helps us identify vulnerabilities and certain potential attacks in distributed storage systems. 3.1.1 System Model The S-FAS fragmentation allocation scheme was designed for a distributed storage sys- tem (see Fig. 3.1) where each storage site is a cluster storage subsystem. Di?erent cluster storage subsystems may be connected within some subnetworks to form a larger scale dis- tributed storage sysytem. 22 Master node or Gateway of the cluster storage Figure 3.2: A cluster storage subsystem consists of a number of storage nodes and a gateway. Storage nodes are divided into di?erent server-type groups, each of which represents a level of security vulnerability. Fig. 3.2 depicts a cluster storage subsystem, which consists of a number of storage nodes and a gateway. Considering heterogeneous vulnerability in large-scale storage systems, we divide storage nodes into di?erent server-type groups, each of which represents a level of security vulnerability . Before presenting details on the system model, let us summarize all notations used throughout this chapter in Table 3.1. In this study, we consider a distributed storage system containing L cluster storage subsystems, i.e., R 1 ,R 2 ,...,R L . Cluster storage subsystems R i consists of H i storage nodes, i.e., R i = {r i1 ,r i2 ,...,r iH i }. All the storage nodes connected in cluster R i have heterogeneous vulnerabilities. Since all the nodes, including a master node, are fully connected in a cluster storage subsystem, we model the topology of a cluster storage system as a general graph. Cluster 23 Table 3.1: Notation used in the system and models. Notations Meaning N Number of server nodes in the system U The whole system considered L Number of subsystems in the whole system H i Number of server nodes in the subsystem i F A file stored in the system F i Fragment i of file F T j Server type j in the system K The total number of server types S j The size of a certain server type in a cluster m Threshold for the secret sharing scheme n The total number of fragments for each file in the secret sharing scheme R i Cluster storage subsystem r ij Node j in subsystem i X The event that a set of storage nodes is chosen to be attacked Y The event that if X occurs, at least m fragments can be compromised using the same attack method. Z The event of a successful attack to a certain fragment of a file V The event file F is compromised under one attack method P N The successful probability of an attack on a node P f The successful probability to compromise a fragment in a compromised node P(X) The probability of event X occurring P(Y) The probability of event Y occurring P(Z) The probability of event Z occurring P(V) The probability of event V occurring ? An allocation mapping of file F SA(?) The storage assurance of an allocation mapping ? of file F DA(?) The dynamic assurance of an allocation mapping ? of file F q Number of fragments needed to reconstruct a file transmitted from outside of the subsystem g Number of fragments compromised out of the q fragments transmitted from outside of the subsystem P L The probability that a fragment is intercepted during its transmission P D The probability that a file F is intercepted because of the compromised transmitted fragments 24 storage subsystem R i has a gateway, which hides the cluster?s internal architecture from users by forwarding file requests to storage nodes. Data in cluster storage subsystem R i can be accessed through its master node. When a read request is submitted to cluster R i , the master node is responsible for reconstructing file fragments and returning the file to users. When a write request of a file is issued, the master node updates all the fragments of the file. Legitimate users access cluster storage subsystems through master nodes; malicious users may bypass the master nodes to access storage nodes without being authorized. See Section 3.1.2 below for details on the threat model. 3.1.2 Threat Model It is not reasonable to assume that if a malicious user breaks into a storage node, fragments of a file stored on the node are thereby compromised. Normally, a malicious user needs two steps to compromise fragments of a file stored on a server. First, the malicious user must successfully attack the server. Second, fragments are retrieved by the malicious user. Let P N be the probability that a storage server is successfully attacked; let P f be the probability that authorized users retrieve fragments stored on the server, provided that the server has been compromised. We define event Z as a successful attack on a fragment (i.e., unauthorized disclosure of the fragment). Since the above two consecutive attack steps are independently, the probability that event Z occurs is a product of probability P N and probabilityP f . Thus, theprobabilitythatafragmentisdisclosed toanunauthorizedattacker can be expressed as: P(Z)=P N ?P f . (3.1) In a dynamic allocation environment, a malicious user can use a compromised node to collect other needed fragments of the file when the fragments are passing through the compromised node. 25 If encryption keys are disclosed to attackers, unauthorized interceptions of encrypted files stored on the attacked node may occur. Given two storage nodes with di?erent vulner- abilities, successful attacks of the nodes are not correlated. This statement is true for many potential threats, because compromising one storage node does not necessarily lead to the successful attack of the second one. 3.2 S-FAS: A Secure Fragment Allocation Scheme In this section, we first outline the motivation for addressing the heterogeneity issues in the vulnerability of distributed storage systems. Next, we describe a security problem addressed in this study. Last, we present a secure fragment allocation scheme called S-FAS for distributed storage systems. 3.2.1 Heterogeneity in the Vulnerability of Data Storage Since the existing security techniques developed for distributed systems are inadequate for distributed systems with heterogeneity in vulnerabilities, the focus of this study is hetero- geneous vulnerabilities in large-scale distributed storage systems. Vulnerabilities of storage nodes in a distributed system are heterogeneous in nature due to the following four main rea- sons. First, storage nodes have di?erent ways to protect data. Second, a security policy can be implemented in a variety of mechanisms. Third, the key length of an encryption scheme may vary across multiple storage nodes. Fourth, heterogeneities exist in computational units of storage sites. We believe that future security mechanisms for distributed systems must be aware of vulnerability heterogeneities. 3.2.2 A Motivational Example If the above heterogeneous vulnerability features are not incorporated into fragment allocation schemes for distributed storage systems, a seemingly secure fragment allocation 26 :Server Group 1 1 T 1 95 13 :Server Group 3 3 T :Server Group 4 4 T :Server Group 2 2 T 2 14106 3 11 157 4 16128 Figure 3.3: A distributed storage system contains 16 storage nodes, which are divided into 4 server-type groups (or server groups for short), i.e., T 1 , T 2 , T 3 ,andT 4 . Servers in each group have the same level of security vulnerability. decision can lead to a breach of data confidentiality. The following motivational example illustrates a security problem caused by ignoring vulnerability heterogeneities. LetusconsiderafileF withthreepartitionedfragments: f a , f b ,andf c ,andadistributed storage system (see Fig. 3.3) that contains 16 storage nodes divided into 4 server-type groups (or server groups for short), i.e., T 1 , T 2 , T 3 ,andT 4 . Storage nodes in each server group o?er similar services with the same level of vulnerability. In this example, server group T 1 consists of nodes r 1 ,r 5 ,r 9 ,r 13 , i.e., T 1 = {r 1 ,r 5 ,r 9 ,r 13 }. Similarly, we define the other three server groups as: T 2 = {r 2 ,r 6 ,r 10 ,r 14 }, T 3 = {r 3 ,r 7 ,r 11 ,r 15 },andT 4 = {r 4 ,r 8 ,r 12 ,r 16 }. Fig. 3.4 shows that it is possible to make insecure fragment allocation decisions that do not take vulnerability heterogeneity into account. The decision made using a hashing function (see Eq. 11 in [82]) randomly allocates the three fragments of file F to three di?erent nodes, each of which belongs to one of the three server sets illustrated in Fig. 3.4. Forexample, thethreefragmentsf a , f b ,andf c arestoredonnodesr 1 , r 6 ,andr 8 , respectively. 27 This fragment allocation happens to be a good solution, because r 1 , r 6 ,andr 8 have di?erent vulnerabilities as the three nodes belong to di?erent server groups (i.e., T 1 , T 2 ,andT 4 ). A malicious user must launch three successful attacks (one for each server group) in order to compromise all three fragments. The above fragment allocation scheme fails to address the threat described in previous described Threat Model. This is because an attacker can first retrieve one fragment of F by compromising a single node, then the attacker simply waits for the other two fragments to be passed through the compromised node. To solve this security problem, Zanin et al. developed a static algorithm to decide whether a particular storage node is authorized to handle a file fragment of F [149]. Zanin?s algorithm can generate an insecure fragment allocation because heterogeneous vulnerabilities are not considered. For example, the three fragments are respectively stored on nodes r 4 , r 8 ,andr 12 , which share the same vulnerability in server group T 4 (see Fig. 3.4). Rather than three attacks, one successful attack against server group T 4 allows unauthorized users to access the three fragments of file F.Two other insecure fragment allocations are: (1) allocating f a , f b , f c to nodes r 1 , r 5 ,andr 9 , respectively; and (2) allocating f a , f b , f c to nodes r 7 , r 11 and r 15 , respectively. These three fragmentallocationdecisionsareunacceptable, becausethefragmentsareassignedtoagroup of storage nodes with the same vulnerability, meaning that an attacker who comprised one node within a group can easily compromise the other nodes in the group. The attacker can reconstruct F from f a , f b ,andf c stored on the comprised server group. 3.2.3 Design of the S-FAS Scheme Tosolvetheabovesecurityproblem, wehavetoincorporatevulnerability heterogeneities into fragment allocation schemes. Specifically, we design a simple yet e?cient approach to allocating fragments of a file to storage nodes with various vulnerabilities. Since allocating fragments of a file into di?erent storage clusters can degrade performance, our S-FAS scheme attempts to allocate fragments to storage nodes within a cluster. If the number of nodes 28 Server Set1 That Handles Fragment Server Set2 That Handles Fragment Server Set3 That Handles Fragment 1 1374 1610 a f b f 5 118 142 c f 93 15126 Figure 3.4: Possible insecure file fragment allocation decision made using a hashing function (see Eq. 11 in [82]): Server set 1 handles fragment f a , server set 2 handles fragment f b , and server set 3 handles fragment f c . Server set 1 contains storage nodes r 1 , r 4 , r 7 , r 1 0, r 1 3, and r 1 6; server set 2 contains storage nodes r 2 , r 5 , r 8 , r 11 ,andr 14 ; and server set 3 contains storage nodes r 3 , r 6 , r 9 , r 12 ,andr 15 . It is possible that fragments f a , f b ,andf c may be allocated to storage nodes that belong to the same server-type group. For example, the three fragments are respectively stored on nodes r 4 , r 8 ,andr 12 , which share the same vulnerability in server group T 4 . Rather than three attacks, one successful attack against server group T 4 allows unauthorized users to access the three fragments of file F. 29 with di?erent vulnerabilities cannot meet the aforementioned criterion, file fragments must be allocated across multiple clusters. To improve the assurance of a distributed storage system while maintaining high I/O performance, each cluster storage subsystem has to be built with high vulnerability heterogeneity. This causes the fragments of a file to be less likely distributed across multiple storage clusters. Because of the following two reasons, the S-FAS scheme can significantly improve data security when fragments are stored in a large-scale distributed storage system. First, S- FAS integrates the fragmentation technique with secret sharing. Second, S-FAS addresses the issue of heterogeneous vulnerabilities when file fragments are allocated to a distributed storage system. The S-FAS scheme makes fragment allocation decisions by following the four policies below: ? Policy 1: Allthestoragenodesinadistributedstoragesystemareclassified intomulti- ple server-type groups (server group for short) based upon their various vulnerabilities. Each server group consists of storage nodes with the same vulnerability level. ? Policy 2: To improve security of a distributed storage system, S-FAS allocates frag- ments ofa filetostoragenodesbelonging to asmanydi?erent server groupsaspossible. In doing so, it is impossible to compromise the file?s fragments using a single successful attack method. ? Policy 3: The fragments of a file are trying to be allocated to nodes with a wide range of vulnerability levels all within a single cluster storage subsystem. The goal of this policy is to improve performance of the storage system by making the fragments less likely to be distributed across multiple clusters. ? Policy 4: The (m,n) secret sharing scheme is integrated with the S-FAS allocation mechanism. 30 If a file?s fragment-allocation decisions are guided by the above four policies, successful attacks against less than m server groups have little chance to gain unauthorized accesses of files stored in a distributed system. In other words, if the number of compromised fragments of a file is less than m, attackers are unable to reconstruct the file from the fragments that are accessed by the unauthorized attackers. The S-FAS scheme can improve information assurance of files stored in a distributed storage system without enhancing confidentiality services deployed in cluster storage subsystems of the distributed system, because S-FAS is orthogonal to security mechanisms that provide confidentiality for each server group in a dis- tributed storage system. Thus, S-FAS can be seamlessly integrated with any confidentiality service employed in distributed storage systems in order to o?er enhanced security services. 3.3 Static and Dynamic Assurance Models We developed assurance models to quantitatively evaluate the security of a heteroge- neous distributed storage system in which S-FAS handles fragment allocations. 3.3.1 Static Storage Assurance Model For encrypted files, their encryption keys are partitioned and allocated using the same strategy that handle file fragments. Once a storage node in set U is compromised, file fragments andencryption key fragmentsstored onthe nodearebothbreached. If a malicious userwantstocrackafile,atleastm nodes within U must be successfully attacked. We first investigate the probability that a file is compromised using one attack method. Let X be the event that a set of storage nodes is chosen to be attacked. Let Y be the event that if X occurs, at least m fragments can be compromised using the same attack method. As we already defined, event Z represents a successful attack to a certain fragment of a file. Applying the multiplication principle, we calculate the probability that V -aneventthat 31 file F is compromised under one attack - occurs as: P(V) = k summationdisplay j=1 P(X)P(Y)P(Z) (3.2) where P(X), P(Y)andP(Z) are probabilities that events X, Y and Z occur when the total number of di?erent server-type groups (server group for short) is K. The probability P(V) is proportional to probability P(Z), which largely depends on the quality of security mechanisms deployed in the storage system, as well as the attacking skills of hackers. Note that when k equals 1, there is no vulnerability di?erence among storage nodes. Supposing that all the fragments of a file can be compromised using one successful attack method, the probability that Y occurs becomes 1. Then, we can express P(V)as: P(V) = k summationdisplay j=1 P(X)P(Z) (3.3) Let S j be the number of storage nodes in server type T j set and N be the total number of nodes in a distributed system. The probability that nodes in set T j are randomly attacked can be derived as P(X) = S j N . Probability P(Y) in Eq. 3.2 can be calculated as follows: P(Y)= n summationdisplay i=m C i S j C n?i N?S j C n N ,(j =2,...K) (3.4) where C n N isthe totalnumber of possibilities ofallocating fragments of a file, and the product of C i S j and C n?i N?S j is the number of possibilities that a file is compromised using a successful attack method which means at least m (It may be m+1,m+2,..., n) fragments of the file are compromised. To simplify the model, one may assume that security mechanisms and attacking skills have nosignificant impactsoninformation assuranceoftheentire distributed storagesystem. This assumption is reasonable because of two factors. First, S-FASis independent of security 32 mechanisms that provide confidentiality for server groups in a distributed storage system. Second, if empirical studies can provide values for probability P(Z), the probability P(V) can be derived from P(Z) and the model (see Eq. 3.4) that calculates P(Y). Since the study of the distribution of P(Z) is not within the range of this work, in this paper the impact of probability P(Z)onP(V) is ignored by setting the value of P(Z)to1. Now we can derive Eq. 3.2 from Eq. 3.4 as below: P(V) = K summationdisplay j=1 parenleftBigg S j N P(Z) n summationdisplay i=m C i S j C n?i N?S j C n N parenrightBigg (3.5) The confidentiality of file F is assured if F is not compromised. Thus, we can derive the assurance SA(?) of the storage system from Eq. 3.5 as: SA(?)=1?P(V) =1? K summationtext j=1 parenleftbigg S j N P(Z) n summationtext i=m C i S j C n?i N?S j C n N parenrightbigg (3.6) 3.3.2 Dynamic Assurance Model. During read and write operations, some fragments of a file may be transmitted among di?erent storageclusters orsubnetworks. Weassume thatdatatransmissions within acluster are secure, while connections among clusters and subnetworks may be insecure. Let P L be the probability that a fragment isintercepted during its transmission on an insecure link. We consider acommoncaseinwhichsomefragmentsoffileF areallocatedoutsideacluster. The probability P D that a fragment of F is intercepted during its transmission can be expressed as: P D = ? 1 ? 2 P L +? 3 [1?P L ]P L (3.7) where ? 1 = 1 indicates that connections among storage clusters are insecure and ? 1 =0 means the connections are secure. ? 2 = 1 indicates that fragments are transferred among 33 di?erent clusters, otherwise ? 2 = 0. Similarly, ? 3 = 1 means that fragments are transmit- ted across di?erent subnetworks. When ? 1 , ? 2 ,and? 3 equal to 0, there is no fragment transmission risk. If q fragments need to be collected outside a cluster processing read/write operations, then probability P q (g) that g out of q fragments are intercepted can be expressed as: P q (g) = C g q P D g (1?P D ) q?g (3.8) Now we model the dynamic assurance of an allocation mapping ? of file F. For simplic- ity, let us focus on a time period during which there is only one attempt to attack storage nodes where F is stored. During this time period, we assume that only one read or write operation is issued to access F. There are two cases where file F can be compromised. First, a malicious user can reconstruct F from m compromised fragments using the same attack method. Second, although less than m fragments are compromised, other g fragments are intercepted during their transmissions. Hence, we can derive the dynamic assurance DA(?) from the storage risk (see Eq. 3.5) and the transmission risk (see Eq. 3.8), as shown here: DA(?)=1? parenleftBigg P (V)+ parenleftBigg q summationtext g=(m?i) P q (g) parenrightBigg K summationtext j=1 parenleftbigg S j N ? m?1 summationtext i=0 C i S j C n?i N?S j C n N parenrightbigg parenrightBigg (3.9) 3.4 Evaluation of System Assurance The assurance models described in Section 3.3 indicate that system assurance is a?ected by the number K of storage types, the number N of storage nodes in the system, and the number S j of nodes in the jth storage type. In addition, threshold m and the number n of fragments in a file also have an impact on system assurance. Now, we quantitatively evaluate the impacts of these factors on the information assurance of distributed storage systems. We first obtain a comprehensive evaluation of S-FAS in terms of data storage assurance and 34 1 2 3 4 5 6 7 8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 threshold m storage assurance The impact of the number of server types K on security assurance N=60, K=1 N=60, K=4 N=60, K=5 N=60, K=6 Figure 3.5: Heterogeneous system and homogeneous system using secret sharing scheme. In all the four test cases, N is set to 60. K is set to 1, 4, 5, and 6, respectively. When K is 1, there is only one server group in the system. dynamic assurance of S-FAS. We compare our approach with a traditional fragment alloca- tion scheme that does not consider vulnerability heterogeneities. We evaluated a distributed storage system with the threshold value m. The default number n of fragments of a file is set to 12 and S j = N K for all j from 1 to K. 3.4.1 Impact of Heterogeneity on Storage Assurance If all storage nodes in the evaluated distributed system are identical in terms of vul- nerability, the probability that fragments of a file can be compromised using one successful attack method is 1. Fig. 3.5 shows the impact of the number K of storage types on system assurance. Results plotted in Fig. 3.5 suggest that for a distributed system with homoge- neous vulnerability, threshold m has no impact on system assurance. When it comes to a 35 45 50 55 60 65 70 0.4 0.5 0.6 0.7 0.8 0.9 1 the size of the system N storage assurance The impact of the size of the system N on storage assurance k=3, n=12, m=4 k=3, n=12, m=5 k=3, n=12, m=6 k=3, n=12, m=7 k=3, n=12, m=8 Figure 3.6: The impact of the system size N on storage assurance. distributed system with heterogeneous vulnerabilities, the system assurance increases signif- icantly with the increasing values of K and threshold m (see Fig. 3.5). Such a trend implies that a high heterogeneity level of vulnerability gives rise to high confidentiality assurance. 3.4.2 Impact of System Size on Storage Assurance To quantify the impact of system size N on data assurance of a file stored in the system, we gradually increase system size from 45 to 70 by increments of 5. We keep k at 3andalsovarym from 4 to 8. Fig. 3.6 reveals that the storage assurance of the system is not very sensitive to the system size, indicating that storage assurance largely depends on the vulnerability heterogeneity level rather than system size. Thus, large-scale distributed storagesystemswithlowlevelsofvulnerabilityheterogeneitiesmaynothavehigherassurance than small-scale distributed systems. These results suggest that one way to improve system assurance is to increase vulnerability heterogeneity while increasing the scale of a distributed 36 12 13 14 15 16 17 18 0.4 0.5 0.6 0.7 0.8 0.9 1 number of each server type?assume each type has the same number of servers s t orage assurance The impact of the size of each type Sj on assurance. k=3, n=12, m=4 k=3, n=12, m=5 k=3, n=12, m=6 k=3, n=12, m=7 Figure 3.7: The impact of server-group size on data storage assurance. The server-group size means the number of storage nodes in a server-type group. Note that the storage nodes within a server group share the same level of vulnerability. The server-group size varies from 12 to 18 with an increment of 1. storage system. A high heterogeneity level in vulnerability helps in increasing threshold m, making it harder for attackers to compromise multiple server groups and reconstruct files. 3.4.3 Impact of Size of Server Groups on Storage Assurance Fig. 3.7 illustrates the impact of server-group size on data storage assurance. Note that the server-group size is the number of storage nodes in a server-type group, in which all the storage nodes share the same level of vulnerability. We vary the server-group size from 12 to 18 with an increment of 1. We observe from Fig. 3.7 that when threshold m is small (e.g, m = 4), the assurance of systems with large server-group sizes is slightly higher than that of systems with small server-group sizes. Interestingly, the opposite is true when the threshold m is large (e.g, m>4). Given a fixed number of storage nodes in a distributed storage 37 11 12 13 14 15 16 17 18 19 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 the number of fragments n of a file in the used scheme storage assurance Storage Assurance of System with Different Scheme number n k=3, N=75, m=4 k=3, N=75, m=5 k=3, N=75, m=6 k=3, N=75, m=7 Figure 3.8: The impact of the number n of fragments of a file on storage assurance. The number n of fragments increases from 11 to 20. The parameters k and N are set to 3 and 75, respectively. system, increasing the server-group size can decrease the number of server groups, which in turn tends to reduce vulnerability heterogeneity. The results shown in Fig. 3.7 match the results in the previous experiments in which a low level of vulnerability heterogeneity (or larger server-group sizes ) results in degraded storage assurance. 3.4.4 Impact of Number n of File Fragments on Storage Assurance Fig.3.8illustratestheimpactofthenumber noffragmentsofafileonstorageassurance. In this experiment, we increase the number n of fragments from 11 to 20 and measured data storageassurance using our model. The parameters k andN areset to3 and 75, respectively. We also vary threshold m from 4 to 7. Results depicted in Fig. 3.8 confirm that the system assurance is reduced with the increasing value of fragment number n. The results indicate that a large number of file fragments leads to low data storage assurance of the file. This 38 assurance trend is reasonable because more fragments are likely to be allocated to storage nodes with the same vulnerability. If one storage node is compromised by an attacker, fragments stored on nodes with the same vulnerability can also be collected by the attacker, who is more likely to be able to reconstruct the file from the disclosed fragments. In addition, Fig. 3.8 shows that increasing the value of threshold m can improve storage assurance. This pattern is consistent with the results obtained in the previous experiments. 3.4.5 Impact of Threshold m on Storage Assurance Figs.3.5-3.8clearlyshowtheimpactofthresholdmonstorageassuranceofadistributed system. More specifically, regardless of other system parameters, the storage assurance always goes up with the increasing threshold value m. The results indicate that the more fragments an attacker needs in order to reconstruct a file, the higher data storage assurance can be preserved for the file in distributed storage systems. These results suggest that to improve data storageassurance of a file, oneneeds to partitionthefile andallocatefragments insuchaway thatanattacker must compromise moreserver groups(thebest case ismserver groups) in order to reconstruct the file. 3.4.6 Impact of P L on Dynamic Assurance Now we are in a position to evaluate dynamic assurance of distributed storage systems. The three parameters ? 1 , ? 2 ,and? 3 in Eq. 3.7 have an important impact on dynamic assur- ance because these parameters indicate whether there is risk during fragment transmissions. Please refer to Sections 6.5.1 to 3.4.5 for details on the impacts of a set of parameters on data storage assurance. P L - the probability that a fragment might be intercepted by an attacker during the fragment?s transmission through an insecure link - has a noticeable impact on dynamic assurance of a distributed storage system provided that threshold m is small (e.g., smaller than 9). Fig. 3.9 shows the dynamic assurance of a distributed system when P L is varied 39 0 1 2 3 4 5 6 7 8 x 10 ?3 0.99 0.992 0.994 0.996 0.998 1 1.002 PL??probability of a fragment is intercepted in network dynamic assurance Dynamic Assurance with Different Network Security Level N=72, k=4, Sj=18, n=12, q=4, m=7 N=72, k=4, Sj=18, n=12, q=4, m=8 N=72, k=4, Sj=18, n=12, q=4, m=9 N=72, k=4, Sj=18, n=12, q=4, m=10 Figure 3.9: Impact of P L - the probability that a fragment might be intercepted by an attacker during the fragment?s transmission through an insecure link. P L is varied from 0 to 8?10 ?3 by increments of 1?10 ?3 . Threshold m is varied from 7 to 10 40 0 1 2 3 4 5 6 0.99 0.992 0.994 0.996 0.998 1 1.002 q??the number of fragments transmitted across different storage clusters to serve a request dynamic assurance The Impact of q on Dynamic Assurance N=72, k=4, Sj=18, n=12, Pl=0.004, m=7 N=72, k=4, Sj=18, n=12, Pl=0.004, m=8 N=72, k=4, Sj=18, n=12, Pl=0.004, m=9 N=72, k=4, Sj=18, n=12, Pl=0.004, m=10 Figure 3.10: Impact of q -thenumberq of fragments transmitted to and from a storage cluster. q is chosen from 0 to 6 with an increment of 1. Threshold m is set from 7 to 10) from 0 to 8?10 ?3 by increments of 1?10 ?3 . We also vary threshold m (i.e., m is varied from 7 to 10) to evaluate the sensitivity of dynamic assurance on parameter P L under di?erent threshold m. Fig. 3.9 demonstratively confirms that when threshold m is equal to or smaller than 8, a large value of P L results in low dynamic assurance of the system. The results are expected since a high value of P L means that the transmitted fragments are likely to be intercepted by an attacker. Once the attacker has collected enough fragments of a security-sensitive file, the file could be reconstructed. When threshold m is larger than 8, the dynamic assurance is not noticeably sensitive to the probability P L that a fragment is compromised during its network transfer. 41 3.4.7 Impact of q on Dynamic Assurance Like parameter P L ,thenumberq of fragments transmitted to and from a storage cluster also has an impact on the dynamic assurance of a distributed storage system. Intuitively, Fig. 3.10 shows that when the number of fragments of a file that must be transmitted through insecure links is increasing, the dynamic assurance of the file drops. Interestingly, when threshold m is larger than 8, the dynamic assurance becomes very insensitive to the number q of fragments. This observation suggests that when the threshold is small, the S-FAS fragment allocation scheme must pay particular attention to lower the value of q in order to maintain a high dynamic assurance level. In addition, we observe from Fig. 3.10 that dynamic assurance is always lower than the corresponding storage assurance (where q=0 in Fig. 3.10). This trend is always true because in a dynamic environment, file fragments have to be transmitted through insecure network links where malicious users may intercept the fragments in order to reconstruct files. 3.5 Chapter Summary It is critical to maintain theconfidentiality of files stored in adistributed storagesystem, even when some storage nodes in the system are compromised by attackers. In recognizing thatstoragenodesinadistributed systemhaveheterogeneousvulnerabilities, weinvestigated a secure fragment allocation scheme by incorporating secret sharing and heterogeneous vul- nerability to improve security of distributed storage systems. We addressed the security heterogeneity issue by categorizing storage servers into di?er- ent server-type groups (or server group for short), each of which represents a level of security vulnerability. With heterogeneous vulnerabilities in place, we developed a fragment alloca- tion scheme called S-FAS to improve security of a heterogeneous distributed system. S-FAS allocates fragments of a file in such a way that even if attackers compromised a number of server groups but fewer than k fragments are disclosed, the file cannot be reconstructed from the compromised fragments. 42 To evaluate the S-FAS scheme, we built the static and dynamic assurance models in order to quantify the assurance of a heterogeneous distributed storage system processing file fragments. We developed a SAP file allocation algorithm based on the analysis of the assurance model as well as the proposed S-FASscheme. In order to measure the performance of the S-FAS scheme and the algorithm, we built a prototype in a real-world distributed storage system. We demonstrated how S-FAS incorporates the vulnerability heterogeneity feature into file fragment allocation for distributed storage systems. Experimental results show that increasing heterogeneity levels can improve file assurance in a distributed storage system. The experimental results of our prototype implementation o?er us inspiration on how to use S-FAS to e?ciently improve security and performance in distributed storage systems with heterogeneous vulnerabilities. There are three future research directions of this study. First, we will make an e?ort to improve the performance of the SFAS fragment allocation scheme in a heterogeneous distributed system. Second, we will integrate the data replication technique with S-FAS to enhance reliability and performance of the fragment allocation scheme for distributed systems. Third, we will implement a distributed storage system prototype where S-FAS is deployed. In this prototype, we will evaluate performance of S-FAS in a real-world system. 43 Chapter 4 Secure Allocation Processing (SAP) Algorithm for S-FAS In previous chapter we present a fragment allocation scheme called S-FAS to improve security of a distributed system where storage sites have a wide variety of vulnerabilities. However the S-FAS scheme mainly focus on security considerations, the heterogeneous fea- tures can also be leveraged to improve performance. In thisChapter wedevelop a secure allocatingprocessing (SAP) algorithmfortheS-FAS scheme to improve the security level and consider its performance using the heterogeneous feature of a large distributed system. The SAP allocation algorithm considers load balanc- ing, delayed e?ects caused by the workload variance of many consecutive requests, and the heterogeneous feature of the storage nodes in the system. The rest of the chapter is organized as follows: We discuss in Section 4.1 the motivation to develop the SAP algorithm. Section 4.2 describes the factors that a?ect performance and security in distributed systems. Section 4.3describes astaticallocationalgorithmintegrated with the S-FAS scheme. In Section 4.4, a sample allocating process is illustrated. Section 5.4 summarizes this chapter and outlines some future work directions. 4.1 Motivation for the Secure Allocation Processing (SAP) Algorithm Therearetradeo?samongdesired features(e.g., security andperformance)ofdistributed storage systems. Most existing solutions improve the security of the systems at the cost of system performance. However, both high security and performance are among the top client desired features in many widely deployed data centers operated by Google, Amazon and Yahoo, etc.. The exploration and development of solutions that can improve not only system security, but also system performance is in high demand. 44 In our previous research [124] we proposed a scheme, S-FAS, to address security het- erogeneity issues by dividing storage servers into di?erent server groups. We focus on the development of a file fragmentation and allocation approach to improving the assurance and scalability of a heterogeneous distributed system. If one or morefragments of a file have been compromised, it is still very hard for a malicious user to reconstruct the file fromthe compro- mised fragments. Our solution is di?erent from previous approaches, because ours captures heterogeneous features regarding to vulnerabilities among servers. In a server group, storage servers with the same vulnerability share the same weakness allowing attackers to reduce the servers? information assurance. Although it may be impractical to classify all servers in a system into a large number of groups, a reasonable way of identifying server types is to organize servers with similar vulnerabilities into one group. We built static and dynamic assurance models to evaluate the security level of our S-FAS scheme. Analysis results show that the system assurance level can be improved by the S-FAS scheme compared with the traditional methods without the consideration of heterogeneous features. Thetimecosttoreconstruct afilefromitsfragmentsisobviouslygreaterthanthatofthe non-fragmentation storage method; and performance degradation becomes inevitable [129]. Inthisresearchweinvestigateimpactofsecurefragmentationschemeonsystemperformance. Similarresearch workonfileassignment canbefoundintheliterature[71][30]. Thesestudies showed that not only the queuing cost, and file heat of the file (represents the product of file access rate and the access service time) [30] [108], but also the variance of service time can influence theperformanceofhomogeneous systems where each site hasthesameperformance characteristics. Inourinvestigated heterogeneousdistributed systems, thevarianceinservice time caused by heterogeneous servers with di?erent performance characteristics has a large influence on the overall system performance. We investigate the possible parameters that influence both the system performance and the proposed S-FAS scheme. There are three aspects to a distributed storage system that influence its security and performance, workload, storage nodes and network interconnects. 45 There are di?erent elements of each aspect of the system. We extracted the possible key elements of each aspect by analyzing of the proposed S-FAS scheme. Based on the both performance and security analysis, we developed a secure allocat- ing processing (SAP) algorithm for the S-FAS scheme to both improve the security level and consider system performance by using the heterogeneous feature of large distributed systems. The SAP allocation algorithm considers load balancing, delays caused by the workload variance of many consecutive requests, and the heterogeneous nature of storage nodes in a system. We developed a prototype of S-FAS using the multi-threading technique to implement the SAP algorithm to guide file allocations. The experiment results show that the proposed security solution can not only improve security, but also improve the throughput of a distributed storage system with heterogeneous vulnerabilities by virtue of the multi-threading. In this study, we made the following contributions: ? First, we develop a secure allocating processing (SAP) algorithm to improve security andsystem performance byconsidering theheterogeneous featureof a largedistributed system. ? Second, in order to conduct the performance analysis for the S-FAS scheme and SAP allocating algorithm, we developed a prototype for our S-FAS scheme. ? Third, weimplemented theprototypeandconducted someexperiments onthethrough- put of the proposed scheme and algorithm. ? Fourth, we evaluated our solution by implementing some real world traces. 4.2 Factors A?ecting Performance and Security There are three main sources a?ecting the processing delay of a request from a client: workload, storage nodes, and network interconnects. 46 Before presenting details on the SAP algorithm, let us summarize all notations used throughout this section in Table 4.1. Table 4.1: Notation used in the SAP Algorithm. Notations Meaning S the size of file F u server type in the system S i the size of server type i in a cluster |F| the total number of files |F i | the total number of fragments in file |F i | Fij the jth fragment of the ith file Burden ij the work load that the jth fragment of the ith file brings to the node where it is stored DataSize ij the jth fragment size of the ith file b N Bandwidth of the connected network of the Nth node ? ij the access rate of the jth fragment of ith file N uv the vth node of server type u u,v= 0,1... I ij Decreasing sorted list of fragments to be allocated (Fragments of the same file are consecutive and belong to the same row) TLoad(N uv ) total available load of node N uv CLoad(N uv ) current load of node N uv RLoad(N uv ) N number of element vector recording the total available storage capacity for all nodes CurrentN[u] thecurrentavailablenodeinthe uth type of servers LoadBN uv the most load that shoud be assigned to node N uv Concerning the workload, the service time, S i ,foreachfileF i , and the access rate of this file, A i , are the two main characteristics that a?ect the performance on the file. These two characteristics are usually combined in a joint metric called ?heat? to evaluate the active rate of a file [68]. The service time is directly influenced by file size and two parameters-m and n-in the threshold (m, n). Since we consider the performance of heterogeneous systems, 47 service time S i for file F i is not fixed, because di?erent allocation nodes are chosen to store the fragments of the file. In order to fully benefit from the performance capabilities of a homogeneous distributed or parallel system, one has to insure that the load must be uniformly distributed among all storage nodes. Otherwise, some disks may become performance bottleneck, severely increasing the response time of requests, and reducing the overall system throughput [68]. As such reducing the time cost on network and slave nodes may be achieved by minimizing the utilization of each node and by minimizing the variance of service times among all the nodes. Most of the current published work on this topic concentrated on minimizing the nodes utilization by balancing load across all disks while ignoring the minimization of the varianceofservicetimeinthecontext ofheterogeneousdistributed systems. Whenfragments of a wide variety of sizes are intermixed on each node, small fragment requests are likely to wait for large fragment requests accesses that were queued ahead of the smaller ones. This is ine?cient, especially when the load is heavy and the queuing delays dominate the response time. Thus, in addition to load balancing, the performance of a distributed system can be improved by reducing the variance of service times among fragment storage nodes. System network interconnects also a?ect performance of heterogeneous distributed systems. Thus, our file fragment allocation algorithm has to take the network delays into account. Experimental results described in section show that diversity of a system with heteroge- neous features can possibly increase the system security. To make use of the heterogeneous nature of vulnerabilities in storage nodes, the SAP algorithm should allocate the fragments of a file to as many types of storage nodes as possible to improve the system security. While considering the impact of di?erent bandwidths of networks, we define the coe?- cient w N (See weight in Eq. 4.1). Equation 4.1 is used to determine load to be distributed to the Nth nodeconcerning thenetwork speed variance of di?erent storagenode. This equation helps to guarantee that storage nodes connected by fast networks handle more I/O loads. 48 w N = b N summationtext N i=1 b N (4.1) Based on workload analysis, we use burden(see Eq. 6.7 below) to evaluate the workload that a fragment brings to the storage node that stores the fragment. Workload ij = ? ij ?DataSize ij (4.2) In order to balance the loads imposed on di?erent nodes in the system, we distribute the workload expressed by Eq. 6.8 where a number of fragments of multiple files are stored in node N. WorkloadN = w N ? i=d summationdisplay i=1 j=|F i | summationdisplay j=1 ? ij ?DataSize ij (4.3) 4.3 SAP Allocation Algorithm We developed a static allocation algorithm integrated with the S-FAS scheme. we focus on the static algorithm rather than its dynamic counterpart, because we plan to consider the movement of data in our future work. Initially, all nodes within the same server type are listed in a decreasing order of pro- cessing speed under a certain workload. All files are sorted in list I in a descending order of fragment sizes. Then, the nodes are allocated to the file fragments in order to optimize both performance and security. Fig. 4.2) shows the data flow of the SAP algorithm. The algorithm includes the partitioning and sorting part(see Fig. 4.1), and fragment allocation steps(see Algorithm 1). 49 Inputs Output Divide each file into n fragments for all the |F| files Compute the burden for all fragments Sort fragments in decreasing order based on their burdens (Fragments of the same file are consecutive and belong to the same row) Decreasing sorted two-dimension list Iij of fragments based on their burdens of the input files 1. |F| number of files 2. Secret sharing scheme (m, n) Figure4.1: The SAP fragment Partition and Sorting Step. The chosen secret sharing scheme (m, n) is employed. 50 Start Decreasing Sorted Two- Dimension Fragment List Server Configuration File SAP- Check for the Available Server for Fragment Distribute to the Chosen Server Yes No Break No Is This the Last Fragment in List ? Yes End ij I ij I ij I ij I Figure 4.2: The SAP Data Flow. 51 Algorithm 1 SAP Allocating Process Step: Input: I ij CLoad (N uv ) TLoad(N uv ) Step1. Sort all nodes in a two-dimension list in decreasing order based on their vulnera- bility type and processing speed. Step2. Allocate each fragment in list I to servers i=1 for u =0andu= y, we have the following further reasoning and expression of T(x,y) (we do not give the general formula when t