A Space-Time Separated and Jointly Evolving Relationship-Based Network Access and Data Protection System with NP- complete Defenses by David Charles Last A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Auburn, Alabama May 5, 2013 Keywords: computer, network, security Copyright 2013 by David Charles Last Approved by Chwan-Hwa "John" Wu, Chair, Professor of Electrical Engineering John A. Hamilton, Jr., Professor of Computer Science and Software Engineering Shiwen Mao, Professor of Electrical Engineering Darrel Hankerson, Professor of Mathematics Xiao Qin, Professor of Computer Science and Software Engineering ii Abstract Attacks on information networks have been increasing in frequency and success in recent years. Attack methods are becoming increasingly sophisticated, and network defense systems have not kept pace. IDS and IPS systems utilizing signature- and statistics-based methods are not agile enough for today's environment. This paper presents an alternative solution; the Intrusion-resilient, Denial-of-Service resistant, Agent-assisted Cybersecurity system (IDACS). IDACS utilizes the concept of a space-time separated and jointly-evolving relationship to provide network defenses that can defend against zero- day and metamorphic attacks. IDACS provides network security in three key areas: attack detection and prevention, digital forensics to identify the origin of the attack, and deep protection of at-rest encrypted data in case of a successful network breach. IDACS combines these three aspects into a complex space-time relationship that provides mutual reinforcement between these aspects. A mathemtical analysis of IDACS reveals that several facets of its network defense are NP-complete, presenting a potential attacker with an incredibly complex problem to solve. Multiple simulations of a fielded IDACS system demonstrate the high attack detection rate, network traitor identification rate, and data protection capabilities provided by this system. iii Acknowledgements I would like to thank my professor, Dr. Chwan-Hwa "John" Wu, for his inspiration, dedication, and encouragement over the course of my PhD studies. Without his help and guidance, I would not be where I am today. I would also like to thank my committee members for their efforts and their advice concerning my research. I would like to thank my wife for her love, support, and patience through this entire process; she has made great sacrifices to support me as I pursue my dreams. I would also lik to thank the Department of Defense SMART scholarship program, which has funded me in this research over the course of my PhD studies. Finally, I would like to thank my Creator God for His blessings of life, health, intelligence, and opportunity that have brought me to this point. iv Table of Contents Abstract .............................................................................................................................................................................................. ii Acknowledgements ........................................................................................................................................................................... iii List Of Tables .................................................................................................................................................................................... vi List Of Figures .................................................................................................................................................................................. vii Table 1. Summary of Notation ......................................................................................................................................................... ix 1 Introduction.............................................................................................................................................................................. 1 2 Related Work........................................................................................................................................................................... 4 3 IDACS Network Description and Security ............................................................................................................................... 5 3.1 Introduction .................................................................................................................................................................... 5 3.2 IDACS Description ......................................................................................................................................................... 5 3.2.1 Definitions for IDACS Elements and Operations ................................................................................................. 5 3.2.2 IDACS System Components ................................................................................................................................ 6 3.2.3 IDACS Client-side Authentication and Access Control ........................................................................................ 8 3.2.4 IDACS Network-side Authentication and Access Control .................................................................................. 14 3.3 Proofs for Theorem 1 and Theorem 2 ......................................................................................................................... 19 3.3.1 Proofs ................................................................................................................................................................. 19 3.3.2 Constraints on NP-completeness ...................................................................................................................... 21 3.3.3 Implications for IDACS ....................................................................................................................................... 23 3.4 Simulations .................................................................................................................................................................. 24 3.4.1 Simulation Setup ................................................................................................................................................ 24 3.4.2 Simulation Parameters ....................................................................................................................................... 26 3.4.3 Attack Detect Rate ............................................................................................................................................. 26 4 IDACS Network Attack Detection, Prevention, and Traceback ............................................................................................. 29 4.1 Introduction .................................................................................................................................................................. 29 4.2 Attack Vectors ............................................................................................................................................................. 29 4.3 Attack Detection and Prevention ................................................................................................................................. 30 4.4 Attack Traceback ......................................................................................................................................................... 33 4.4.1 Log Correlation to Identify Attacking Clients ...................................................................................................... 35 4.4.2 Log Correlation to Identify Traitor Client Botnet ................................................................................................. 37 4.4.3 PID/OTP Correlation to Identify Client-Side Security Items ............................................................................... 39 v 4.4.4 TK Correlation to ID Traitor SA/SSAs ................................................................................................................ 41 4.5 Simulations .................................................................................................................................................................. 42 4.5.1 Attack Traceback Time Simulations ................................................................................................................... 42 4.5.2 Attack Detection, Traceback, and Remediation Simulations ............................................................................. 44 5 IDACS Protection of Encrypted Data .................................................................................................................................... 51 5.1 Introduction .................................................................................................................................................................. 51 5.2 Space-Time Protection of Encrypted Data .................................................................................................................. 51 5.2.1 Separation of Encryption Keys ........................................................................................................................... 51 5.2.2 Separation of Encrypted Data ............................................................................................................................ 52 5.2.3 Data Segmentation ............................................................................................................................................ 56 5.2.4 Mathematical Proofs and Analysis ..................................................................................................................... 61 5.3 Simulations .................................................................................................................................................................. 66 5.3.1 Simulation Parameters ....................................................................................................................................... 66 5.3.2 Controlled vs. Runaway simulations .................................................................................................................. 68 5.3.3 Data Protection Results ..................................................................................................................................... 69 5.3.4 Leakage Detection Results ................................................................................................................................ 70 6 Implementation ...................................................................................................................................................................... 72 7 Conclusion............................................................................................................................................................................. 76 8 References ............................................................................................................................................................................ 77 9 Appendix A ............................................................................................................................................................................ 79 vi List Of Tables Table 1. Summary of Notation ......................................................................................................................................................... ix Table 2. IDACS System Elements ................................................................................................................................................... 7 Table 3. P-valueT for each test ....................................................................................................................................................... 23 Table 4. Average time to for permutation orderings of Seed? to generate OTPs and PIDs at 106 permutations per second ........ 24 Table 5. Traceback functions and what they detect....................................................................................................................... 34 Table 6. Comparison of segmented vs. non-segmented data encryption for file of length x. ........................................................ 58 Table 7. Comparison of performance of segmented vs. non-segmented File Directory Trees containing x data files .................. 60 Table 8. Comparison of security of segmented vs. non-segmented File Directory Trees containing x data files .......................... 60 Table 9. P-valueT for NIST tests for ?matched? ciphertext fragments and ?mismatched? ciphertext fragments ............................. 66 Table 10. Types of Network Attacks Defeated by IDACS .............................................................................................................. 76 vii List Of Figures Fig. 1. CB and door states are used to calculate the matching between challenges and responses .............................................. 2 Fig. 2. Relationships between built-in modules of IDACS ................................................................................................................ 3 Fig. 3. IDACS elements ................................................................................................................................................................... 7 Fig. 4. IDACS Network Access Control top-level view ................................................................................................................... 14 Fig. 5. Procedure to calculate a single OTP? at Cust? ................................................................................................................... 14 Fig. 6. Mutual authentication in authentication chain ..................................................................................................................... 18 Fig. 7. Cascading the run_auth_chain() algorithm ......................................................................................................................... 18 Fig. 8. (a) Directed graph representing seed reassembly problem, and (b) solution to the MWDPSL problem, N=4.................... 21 Fig. 9. (a)Graph representing memory reassembly problem, b = 4 (b)solution to the MWPSL problem, N = 4 (W(e) not shown) . 21 Fig. 10. Graph problem for uniform probability relationship (e.g. true Random Number Generator) .............................................. 22 Fig. 11. Graph problem for highly correlated relationship (e.g. poorly designed hash function) ..................................................... 22 Fig. 12. Proportion of Passing Data Samples for Each Test ........................................................................................................... 23 Fig. 13. Simulation Network ........................................................................................................................................................... 25 Fig. 14. Attack Detect Ratio vs. Network Size and Number of SAs ?fully compromised? ............................................................... 27 Fig. 15. Attack Detect Ratio vs. Network Size and Number of SA/SSAs ?fully compromised? ...................................................... 27 Fig. 16. Direct Attack ...................................................................................................................................................................... 30 Fig. 17. Chained Botnet Attack ...................................................................................................................................................... 30 Fig. 18. Illustration of Claim 1 ........................................................................................................................................................ 31 Fig. 19. Illustration of Claim 2 ........................................................................................................................................................ 31 Fig. 20. Illustration of Claim 3 ........................................................................................................................................................ 32 Fig. 21. Block diagram of "report_and_trace_attack()" .................................................................................................................. 35 Fig. 22. Attack packet traceback to identify root attacker .............................................................................................................. 36 Fig. 23. Traitor Client botnet detection using Security Log correlation ........................................................................................... 38 Fig. 24. Space-separated combinations of seeds used to calculate PID? ...................................................................................... 40 Fig. 25. Mutual Authentication in authentication chain .................................................................................................................... 42 Fig. 26. Cascading the run_auth_chain() algorithm ....................................................................................................................... 42 Fig. 27. Attack Traceback Time ..................................................................................................................................................... 43 Fig. 28. Attack Traceback Time, 1 SSA vs. 2 SSAs ...................................................................................................................... 44 Fig. 29. Botnet Detection Time, 1 SSA vs. 2 SSAs ........................................................................................................................ 44 Fig. 30. Percentage of IDACS Network Machines controlled by Attacker over the Scenario 1 simulation .................................... 48 Fig. 31. Percentage of IDACS Network Machines controlled by Attacker over the Scenario 2 simulation .................................... 48 Fig. 32. Percentage of IDACS Network Machines controlled by Attacker over the Scenario 3 simulation .................................... 49 Fig. 33. Percentage of IDACS Customers controlled by Attacker over the simulation .................................................................. 50 Fig. 34. Average Number of Successful Illegal Datacenter Access Attempts over the Simulation Time Period ........................... 50 Fig. 35. Xbits removed from the cryptographic keys ...................................................................................................................... 52 Fig. 36. Xslices removed from Client-side ciphertext and stored in IDACS datacenter ................................................................. 53 Fig. 37. Multiple layers of encryption ............................................................................................................................................. 53 Fig. 38. Using F-box(Offset) and F-box(XLth) transforms to remove Xslices ................................................................................ 54 Fig. 39. Data segmentation for a single file .................................................................................................................................... 57 Fig. 40. Encrypted File Directory Tree segmented into zones ....................................................................................................... 59 Fig. 41. Retrieving a single data file from the File Directory Tree .................................................................................................. 59 Fig. 42. Xslice/Data Block splitting problem ................................................................................................................................... 62 Fig. 43. Xbits splitting problem ....................................................................................................................................................... 62 Fig. 44. Splitting problem represented in terms of graph theory. ................................................................................................... 62 Fig. 45. Maximum Weight Path. ..................................................................................................................................................... 62 Fig. 46. Graph with uniform edge weight distribution ..................................................................................................................... 63 Fig. 47. Graph with non-uniform edge weight distribution ............................................................................................................... 63 Fig. 48. Proportion of passing NIST tests ....................................................................................................................................... 65 Fig. 49. Simulation File Directory Tree setup ................................................................................................................................. 67 viii Fig. 50. Percentage of Active SAs/SSAs that are Traitors in IDACS, Contained vs. Runaway Botnet .......................................... 68 Fig. 51. Percentage of File Directory Tree stolen over simulation period ...................................................................................... 69 Fig. 52. Average percentage of Data File Segments stolen for a Contained Botnet ..................................................................... 70 Fig. 53. Percentage of Data File Segments stolen for a Single Runaway Botnet .......................................................................... 70 Fig. 54. Histogram for when Data File Segments were stolen vs. when they were detected Stolen; Contained Botnet ............... 71 Fig. 55. Histogram for when Data File Segments were stolen vs. when they were detected stolen; Runaway Botnet ................. 71 Fig. 56. Percentage of all Clients ever turned Traitor that are detected, Contained vs. Runaway Botnet ...................................... 71 Fig. 57. IDACS Implementation Tested Setup ............................................................................................................................... 72 Fig. 58. Java CLI implementations. ................................................................................................................................................ 73 Fig. 59. A demonstration of the implementation. ........................................................................................................................... 74 Fig. 60. BlackBerry implementation of IDACS encryption and distributed storage. ....................................................................... 75 Fig. 61. NP-complete reduction path ............................................................................................................................................. 79 ix Table 1. Summary of Notation Symbol Name Type Description SA? Security Agent Location Network-side authentication machine SSA? Super Security Agent Location Network-side authorization machine DB? Database Location Network-side data storage machine User? User Human Human User of the IDACS network Client? Client computer Location Client-side computer (laptop, smartphone, etc.) Badge? User Badge Location Client-side smartcard security badge Pwd? User Password Location Client-side password PIN? Badge PIN State Client-side PIN entered into User Badge UA? User Agent virtual location Small software application downloaded from IDACS Network to Client computer to perform security operations Cust? Customer State Combination of User?, Client?, Badge?, Pwd?, PIN?, and UA?, authorized to access the ID CS Network Seed?, ??????? , ??????? , ??????? ?PIN?, ??????? ?Badge? Seed State Cryptographic seed stored on Client?, Badge?, SA?, or SSA? or derived from Pwd? or PIN? S, SClient?, S ????????? Super-state State Represents a combination of states Ticket? Client Security Ticket State Data structure used to send, authenticate, and authorize a data or service request from Cust? to IDACS Network Req? Merchandise Request State Data request that specifies target data and desired operation OTP?, ???????? One-Time Password (OTP) State Used to authenticate Cust? with all SA? in ???? PID?, ???????? Pseudo-ID (PID) State Used to authorize Cust? and Req? with ???? and ?????? N Authentication Chain Length Lenght of Authentication Chain, i.e. how many SAs and SSAs are in the approach and return authentication chains Key(A, B) Shared cryptographic key State Cryptographic key shared between locations A and B XV1 Xchain value State Cryptographic hash value calculated for authentication between machines in ???? and ?????? TK2 Network Security Ticket State IDACS network message containing Ticket? and XV values \TK2\ Packet record State Log record of the critical attributes of an IDACS network message F-box(Lookup) Lookup transform Transform Based on certain inputs (super-states), returns a particular ordered set of seeds from a location or state F-box(Concat) Concatenate transform Transform Concatenates a set of objects F-box(Hash) Hash transform Transform Performs a cryptographic hash on the inputs F-box(Next) Next-SA-SSA transform Transform Calculates the next SA or SSA in the authentication chain F-box(Insert) Insert Log Record transform Transform Inserts a packet record \TK\ for received network message TK into a location's security logs F-box(Rtrv) Retrieve Log Record transform Transform Retrieves a packet record from a location's security logs based on specified search criteria F-box(Mrand) Random transform Transform Returns a random byte string F-box(Offset) Data Block Offset transform Transform Returns the length of the next Data Block F-box(XLth) Xslice Length transform Transform Returns the length of the next Xslice F-box(SString) Substring transform Transform Returns a substring of the input string F-box(Encrypt) Encrypt transform Transform Encrypts the input data with the input key A?B Notation Data block A is stored at location/state B A ? B: C Notation Location A sends message C to location B ?={E1, E2, ?} Notation Notation indicating a set of objects 1 1 Introduction As the world proceeds farther into the Information Age, the need to protect sensitive information is increasing quickly. Many organizations are storing their information in centralized secure datacenters [1]. However, news stories over the past few years from Google, RSA, PlayStation, and others continue to remind us that information/cyber security technology has not kept pace with the capabilities of attackers. Much of the research to date in improving network defenses has focused on detecting and preventing ?incorrect? network and database access [2] [3] [4] [5] [6], and allowing all other access to proceed through rigorous access control [7] [8] [9] [10] [11]. However, this security technology is unlikely to be able to defend against zero-day attacks and mutating malware [12], since these attacks present new footprints that have not yet been classified as malicious. In order to deal with these emerging threats, the research presented in this paper approaches the problem from a different angle: mathematically define ?correct? network access behavior for protected information and services, and block all other behavior. The mathematically-governed access behaviors provide sufficient complexity to be unpredictable to attackers, but are easily verified by the security system. This design should provide three mathematically-related capabilities; rigorous but fast network access control, efficient real-time forensics capabilities, and further protection of at-rest data in case of a network breach. The mathematical design that provides this level of protection is based on the theory of the Space-Time Separated and Jointly Evolving relationship. This theory calls for space-time evolving relationships between authentication credentials, file/database systems, and protected data in the realms of space and time in order to render the breaking of the access control system mathematically infeasible. Furthermore, this space-time separated and evolving relationship is encoded into network application layer packets, and becomes a means for rapidly tracing attacks back to the source attacker, thus providing real-time forensics capability. The relationship also determines the storage locations of protected data (e.g. in a cloud) and authentication credentials (e.g. on security tokens) in a time-evolving manner so that it becomes infeasible for attackers to decode the dynamic relationships. Hence, three distinct capabilities (or modules) of a security system are described by a single principle of the space- time separated and evolving relationship. A simple example helps explain the space-time separated and jointly evolving relationship concept. Consider a military-style restricted area that uses a challenge-response system to unlock the doors to restricted areas. Any user U must carry an electronic codebook CB which contains a list of challenges and responses. This list is generated from a few cryptographic seeds unique to U as well as the U?s PIN, the state of CB, and the state of the current door (Fig. 1). Each door 2 between restricted areas presents U with a challenge code; U must use CB to locate the corresponding response code to open the door. As soon as U opens the door, the state of the door and the state of CB are pseudo-randomly changed with forward secrecy, resulting in a new challenge-response list if U attempts to re-open this door. Additionally, the cryptographic seeds associated with U residing on CB are changed at regular time intervals (e.g. ?t = 1 minute) with forward secrecy; these changes are propagated to all doors in the system. Each door presents U with multiple challenges, and multiple doors must be opened to access different restricted areas. Additionally, the system maintains logs of the histories of the CB state, the door states, and the challenge-response pairs; if an attacker A attempts to open a door using an older, stolen challenge-response pair, the door system can compare this pair to all previous challenge-response pairs to trace back where and when A stole this pair, thus identifying security breaches in the system. Fig. 1. CB and door states are used to calculate the matching between challenges and responses This paper presents the Intrusion-resilient, Denial-of-Service resistant, Agent-assisted Cybersecurity system (IDACS). IDACS relies on the concept of the space-time separated and jointly evolving relationship to achieve a high level of security in computer and information networks. This is accomplished in three ways. First, the space-time separated and evolving relationship is used as the basis for the IDACS Network Access Control protocol. By using multiple (space-separated) time- evolving items for identifying an information or service access, e.g. file name and user ID, IDACS can efficiently allow legal access and block illegal access to the IDACS network. Second, the mathematical properties of the space-time separated and evolving relationship of the IDACS Network Access Control protocol provide a number of built-in forensics capabilities. Attacks by unauthorized users can be detected, blocked, traced back to the origin of the attack, and analyzed to determine what authentication items have been compromised, all in a very quick and efficient manner using the properties of this relationship. Third, IDACS uses the space-time separated and evolving relationship to protect at-rest encrypted data stored on network- connected devices (in the cloud or on PCs or mobile devices such as tablets or smartphones). IDACS uses jointly space- 3 separated and time-evolving storage to store critical pieces of at-rest ciphertext in the IDACS network so that reassembling and decrypting the ciphertext without access to the distributed pieces spread in the cloud is mathematically infeasible. The space- time separated and evolving relationship aspect of authentication seeds is transparent to legitimate users, but it presents an insurmountable barrier to attackers due to the NP-completeness of generating authentication credentials as well as the encoded file/database systems using space/time-varying IDs, locations, and protections. Additionally, this relationship aspect of authentication seeds and states contributes to the speed of the IDACS forensics capabilities. These three built-in modules of IDACS are all interconnected using the space/time relationship and all provide mutual support for each other, as indicated in Fig. 2. The ?Access Control? module will be discussed in ?3 IDACS Network Description and Security?, the ?Forensics? module will be discussed in ?4 IDACS Network Attack Detection, Prevention, and Traceback?, and the ?Distribution? module will be discussed in ?5 IDACS Protection of Encrypted Data?. Additionally, the current implementation of IDACS will be discussed in ?6 Implementation?. Fig. 2. Relationships between built-in modules of IDACS 4 2 Related Work The research presented in the paper focuses on the detection and prevention of unauthorized network and data access. Due to the current security environment, this area has been the focus of much research. Recent research has especially focused on Denial of Service (DoS) attack detection [13] [14] [15] [16] [17] and botnet attack detection [18] [19] [20]. Until now, Intrusion Detection and Prevention systems have been using two major methods for attack detection: signatures and statistics, which both focus on the characteristics of illegal activity. In today?s threat environment, however, new threats are cropping up at a prodigious rate, and many attacks go undetected by signature-based methods [21]. Researchers have begun talking of a need for a paradigm shift in the field of intrusion detection. The research presented in this paper presents such a paradigm shift; it applies the idea of space-time separated and jointly evolving relationships to mathematically define and permit only "correct" network access. Network authentication and authorization are spread across space (multiple network authentication points) and time (time-evolving authentication credentials) with joint evolution between the space and time parameters in order to make ?correct? network behavior and its history mathematically infeasible for an attacker to reconstruct. While some have previously addressed the idea of space/time relationships [22] [23], this research will use the realms of space and time jointly in previously unconsidered ways. This paper also demonstrates the real-time forensics for attack traceback capabilities and the attack report correlation and aggregation capabilities of the proposed system. While digital forensics [24] [25] [26] [14] and attack report correlation [27] [28] have been discussed in other research, space/time relationships have not been previously leveraged to provide speed and accuracy and avoid ambiguity in the manner presented in this paper. Additionally, the concept of distributed data storage has been addressed at length in prior research. However, it is typically focused on scalability [29] and redundancy for integrity and availability [30] [31]. Little work has been done in the area of distributed storage for security purposes. 5 3 IDACS Network Description and Security 3.1 Introduction With the rise of the Internet and computer networks, network security has become increasingly important. Recent stories in the news have reminded us how vulnerable network-connected computer systems are, and the frightening regularity with which they are breached. Many of these breaches are the result of the exploitation of zero-day and metamorphic attacks, using previously unseen attack vectors, or metamorphic variants of known attacks, to strike at the vulnerable underbellies of networks. There is a rising need for a type of network defense system that can detect and block new and emerging threats. The proposed solution here is to use a space-time separated and jointly evolving relationship to define ?correct? network behavior in a way that is easily verified by the network but mathematically infeasible for an attacker to replicate. This section introduces the Intrusion-resilient, Denial of Service resistant, Agent assisted Cybersecurity System (IDACS), which utilizes this space-time relationship to achieve a high level of network security. The elements and operations of IDACS will be introduced, and the mathematical properties of its defenses will be analyzed and proven. Additionally, simulations will be provide a demonstration of the real-world strength of IDACS against external attacks. 3.2 IDACS Description 3.2.1 Definitions for IDACS Elements and Operations The following definitions and notation are used as the basis for the description of the IDACS network: Definition 1 : A location is a physical device with an associated physical location. The physical device includes memory storage and data processing capabilities. A virtual location is a software object with memory storage and data processing capabilities. A virtual location is capable of residing in different physical locations. Definition 2 : A state represents the PID (Definition 3) and memory contents associated with a piece of data that can change over time. It can also represent the memory contents of a physical location. The relationship between states and locations is detailed in Definition 17. Definition 3 : A location or state may be represented by a permanent, well-protected ID, or by a time-changing Pseudo-ID (PID) which is derived by PID(A) = hash( ID(A), crypto seeds, time-changing sequence number ) PID(A) may also be represented implicitly as A. Specific applications of PIDs are discussed in Definition 21. Definition 4 : A transform is a mathematical operation which accepts a set of states and/or locations as inputs and produces a set of states and/or locations as outputs. In this paper, all transforms are represented by the notation F-box(). In this 6 notation, the parentheses contain a number of parameters which are inputs to the transform. The first parameter defines the actual internal operation of the transform. For example, a transform that computes a cryptographic hash of the inputs would be called F-box(hash), with ?hash? being represented as Hash; the remaining parameters would detail the inputs to the hash function. output = F-box(Hash, input data) Transforms may be combined in a particular order in order to form new transforms. For example, a given transform may involve a lookup (Lookup) followed by a concatenate (Concat) of the outputs of the lookup. Transforms may be combined according to the following notation: output = F-box(Lookup?Concat, input_1, input_2, input_3) Many transforms make changes to their input superstates (e.g. SCust? as discussed in Definition 17), although these changes are abstracted in this notation. Definition 5 : Some variables are a function of other variables; that is, if the value of variable A is a function of the values of variable B and time t, then the value of A depends on the value of B at time t. This relationship is represented by the notation A:f(B, t). This relationship implies that B is the input to an F-box() that is used to calculate the value of output A. Definition 6 : A set of elements ? = {E1, E2, ? Ex} is a collection of elements. An ordered set shall be defined as a set where the ordinality (order) of the elements in the set is one of the attributes of the set. Changing the ordinality of the members of ? creates a different set ? . Therefore, if ? = {E1, E2, E3} and ? = {E3, E1, E2}, then ? ? ? . Unless specified, all sets are unordered. 3.2.2 IDACS System Components The IDACS Network previously discussed is composed of a number of network entities. These items are defined in Definition 7 to Definition 16, and they are shown graphically in Fig. 3. Given the IDACS Network, it contains the elements shown in Table 2. 7 Table 2. IDACS System Elements ..a set of ... termed ?which are There are The set of ...is represented as Definition 7 servers Security Agents (SAs) locations. q SAs SA?, ??[1, q]. SA? ???? = {SA1, SA2,?, SAq} Definition 8 servers Super Security Agents (SSAs) locations. n SSAs SSA?, ??[1, n]. SSA? ?????? = {SSA1, SSA2, ?, SSAn} Definition 9 servers Databases locations. h Databases DB?, ??[1, h] DB? ????? = {DB1, DB2, ?, DBh} Definition 10 humans Users humans. u Users, User?, ??[1, u]. User? ??????? = {User1, ?, Useru} Definition 11 Computers/Devices Clients locations. z Clients, Client?, ??[1, z]. Client? ????????? = {Client1, ?, Clientz} Definition 12 smartcards Badges locations. y Badges Badge?, ??[1, y]. Badge? ?????????? = {Badge1, ?, Badgey} Definition 13 user passwords states. ? User Passwords Pwd?, ??[1, w]. Pwd? ??????? = {Pwd1, Pwd2, ..., Pwdw} Definition 14 Badge PINs states. x Badge PINs PIN?, ??[1, x]. PIN? ?????? = {PIN1, PIN2, ..., PINx} Fig. 3. IDACS elements Property 1 : All Pwd?? ??????? and all PIN?? ?????? are stored in the brains of User?? ???????, and are not stored at any other location. However, cryptographic hashes of all Pwd?? ??????? and all PIN?? ?????? are stored at locations (SAs and SSAs) that are required to verify these Pwd? and PIN?. This space-separated relationship allows Pwd? and PIN? to be verified when they are provided by User?, while providing no useful information (due to the one-way property of the cryptographic hash) to an attacker who gains access to a location?s memory contents. Definition 15 : Given the IDACS Network, when User? seeks to use Client? to communicate with the IDACS servers at time t, Client? downloads a unique User Agent software program UA? from the network. This UA? handles all communications between Client? and the IDACS servers. UA? is considered a virtual location. UA? is a function of User?, Client?, and time, 8 thus UA?: f(User?, Client?, t). UA? is the entity that will be performing most of the operations on the client side in the IDACS Network, so the following definitions and procedures will reference a single UA?. Definition 16 : Given the IDACS Network, at time t there are c sets of User?, Client?, Badge?, Pwd?, PIN?, and UA? (denoted as {User?, Client?, Badge?, Pwd?, PIN?, UA?}) that are authorized to access the network. These combinations are termed Customers Cust?, ??[1, c]. Cust? is considered a state. Since Cust? represents a combination of the other parameters, Cust?: f(User?, Client?, Badge?, Pwd?, PIN?, UA?, t). Definition 17 : Given the locations defined in the IDACS network, some of the following definitions will depend on the state that describes the configuration and memory contents of a combination of certain locations. These states represent a combination of other states as defined in Definition 2, so they are termed super-states. The symbol S represents the super-state covering the entire IDACS system, with other symbols representing more narrowly-defined super-states that are subsets of S, e.g. SClient? represents the state of Client? in combination with UA?. SClient?: f(Client?, UA?, t) The definition of S depends mainly on the memory contents of different locations and the results of the lookup transform as defined in Definition 24. Similar notation is used for Badge?, ??????????, PIN?, ??????, Pwd?, ???????, SA?, ????, SSA?, and ??????. These super-states represent the basis of the space-time separated and jointly evolving relationship in IDACS. The locations, states, transforms, and notation that are defined here and in the following sections are summarized in Table 1 for easy reference. 3.2.3 IDACS Client-side Authentication and Access Control 3.2.3.1 IDACS Client-side Elements, Operations, and Definitions The definitions given in the previous section form the basis for the description of the IDACS system. This section will build on those definitions and expand on the Client-side operations of IDACS. It details how the IDACS Network Access Control protocol is handled for Customer authentication and authorization to allow customers to access data or services residing on a DB. Definition 18 : Given the set ???????, in order for Cust? to initiate communications with the IDACS servers (SAs and SSAs) and perform network actions (Read/Write/Execute a piece of data on DB?), Cust? must present a Client Security Ticket Ticket? to the IDACS network. Ticket? is considered a state. Ticket? is a function of both Cust? and time t; thus, Ticket?:f(Cust?, t). Ticket? is the set containing the sets ???????? and ????????, and the state Req? (all defined in the following definitions); i.e. Ticket? = { ????????, ????????, Req?}. 9 Definition 19 : Given Ticket?, it requires a Merchandise Request Req? to communicate the specifics of the desired network action. Req? is considered a state. Req? specifies the request type (Read/Write/Execute a piece of data on DB?), the unique PID for Cust?, the Content(PID?) tied to the specified data (as defined in Definition 22), and the data itself. The mechanics of the formation of Req? also depend on SCust?; Req?: f(SCust?, ????????). Definition 20 : Given Ticket?, it requires a set ???????? of q One-Time Passwords (OTP) OTP?, ??[1, q]. Since all OTP? are data structures, they are considered states. These OTP? are used for pairwise authentication between Cust? and each SA?. Each calculated OTP? is a function of the Cust? calculating it, the SA? which will be verifying it, and time t.; thus, OTP?:f(Cust?, SA?, ?, t). The set ???????? of OTP? for all SA?, which is calculated by the UA? associated with Cust? is represented as ???????? = {OTP1, OTP2, ?, OTPq}. ????????: f(Cust?, S ????, t). Algorithm 2 illustrates the procedure by which OTP? is calculated. Definition 21 : Given Ticket?, it also requires a set ???????? of r Psuedo IDs (PID) PID?, ??[1, r]. Since all PID? are data structures, they are considered states. These PID? are used for access control; they verify the identity of Cust? as well as the permissions of Cust? to perform the requested network action in Req?, and they identify the information associated with Req? residing on DB?. Each calculated PID? is a function of the associated Cust? and Req?, the index ?, and time t. Thus, PID?: f(Cust?, Req?, ?, t). The set ???????? of PID? calculated by the UA? associated with Cust? to authorize with the network is represented as ???????? = {PID1, PID2, ?, PIDr}. ????????: f(Cust?, Req?, t). Algorithm 3 illustrates the procedure by which PID? is calculated. Definition 22 : Given ????????, one of the PID? in ???????? is tied to the specific piece of data (merchandise) specified in Req?. This particular PID? is called the Content PID; it is represented by Content(PID?). The Content PID indicates the data being accessed in a Read or Execute operation, or establishes a data PID for future reference in a Write operation. Permission is granted to different Cust? to access different pieces of data residing on DB?; checking the permissions of Cust? to access a requested piece of data is part of the IDACS Network Access Control mechanism. To protect Content(PID?) for data residing in ????? from attacks against relatively less-protected SAs, the information needed to calculate Content(PID?) resides only on the relatively better-protected SSAs; only SSAs are capable of verifying Content(PID?). This is reflected in the simulations in ?3.4 Simulations?. Property 2 : Although ???????? and ???????? are calculated similarly, they serve separate functions. The elements of ???????? are used for authentication to verify the identity of Cust?, while the elements of ???????? are used for access control to verify that 10 Cust? is allowed to perform the action specified in Req? on the data specified by Content(PID?). ???????? provides space- time separated and evolving authentication (Property 3), while ???????? provides per-customer and per-data access control which enforces broader group-based access policies. Notation: When a piece of data A is stored at a location or state B at time t, this is indicated by the notation A?(B, t). However, the time parameter is often abstracted, so the notation is simplified to A?B. Definition 23 : Given ?????????, every Client? can store up to ? cryptographic seeds Seed?, ??[1, ?]. Seed? is considered to be a state. The set of all Seed??Client? is represented as ????????Client? = {Seed1?Client?, Seed2?Client?, ?, Seed??Client?}. These relationships are represented by Seed??Client?: f(Client?, t) and ????????Client?: f(Client?, t). All Badge? can also store a set ????????Badge? of Seed??Badge?, so Seed??Badge?: f(Badge?, t) and ????????Badge?: f(Badge?, t). Additionally, Seed? can be derived from Pwd? and PIN? by applying the cryptographic hash function to a concatenation of Pwd? or PIN? with pseudo-random nonces and time-evolving sequence numbers. Thus, Seed??Pwd?: f(Pwd?, t), ????????Pwd?: f(Pwd?, t), Seed??PIN?: f(PIN?, t), and ????????PIN?: f(PIN?, t). Definition 24 : Given the sets ???????, ??????, ??????????, or ?????????, each Pwd?, PIN?, Badge?, or Client? stores (q + r) ordered sets ??????? or ??????? each consisting of j seeds Seed??Pwd?, Seed??PIN?, Seed??Badge?, or Seed??Client?. Each set is needed to calculate one OTP? or one PID?, respectively. The F-box(lookup) transform takes the item type (OTP or PID), the index (? or ?), the super-state SClient?, SBadge?, SPIN?, or SPwd? (which provides the seeds and states), and SCust? (which determines the order of the seeds in different ??????? and ??????? ) as inputs; and outputs the ordered set of Seed? which corresponds to the item type and index. This transform is represented by ??????? ?Client? = F-box(Lookup, SCust?, SClient?, OTP, ?) where SClient? can be replaced by SBadge?, SPIN?, or SPwd?, and (OTP, ? ) may be replaced by (PID, ?). For some combinations of inputs, the output set may be an empty set, i.e. j = 0 and ??????? ?Client? = ?. Property 3 : The members of ??????? ?Client?, ??????? ?Badge?, etc. are not stored consecutively in their respective locations; they are stored randomly in that location?s memory. Additionally, based on the IDACS state history and nonces, their positions in memory are changing in time with forward secrecy. This provides the space-time separation and the space- time joint evolution of IDACS. Because of this, the F-box(Lookup) transform is non-trivial for an attacker to break and gives the strength to Theorem 1. 11 Definition 25 : Given a group of n generic objects O1, O2, ?, On, the F-box(concatenate) transform accepts this group of objects as input and outputs the objects concatenated into an ordered set. The generic objects may be individual objects, or they may be sets of objects. In equation notation, the ?concatenate? is represented by Concat. For example, ??????? = F-box(Concat, ??????? ?Client?, ??????? ?Badge? , ??????? ?PIN? , ??????? ?Pwd? ) ?????? = F-box(Concat, OTP1, OTP2, ?, OTPn) Definition 26 : The F-box(random) transform generates a random byte array. For this research, the array is typically 256 bits long (corresponding to the SHA-256 hash algorithm). XV1 = F-box(Mrand) Definition 27 : Given a generic set of inputs, the F-box(hash) transform applies a cryptographic hash function (e.g. SHA-256) to a byte array representation of the inputs and outputs the resulting byte array. output = F-box(Hash, inputs) A specific instance of this transform operates as follows. Given an item type OTP or PID of index ? or ?, the transform accepts the item type(OTP or PID), the index (? or ?), and the associated set of seeds i.e ??????? or ??????? as inputs. The output OTP? or PID? is calculated by applying the cryptographic hash to ??????? or ??????? combined with well-known (system-wide for IDACS and publically known) values and a time-evolving sequence number. Different but well-known values and order of the seeds are used for each OTP? or PID?; thus, each OTP? or PID? is calculated differently, but the calculation method is well-known. The time-evolving sequence number is used to accomplish anti-replay functionality of the output. OTP? = F-box(Hash, SCust?, ??????? , OTP, ?) Property 4 : Due to Property 3, the outputs of the F-box(Lookup) transform and the composition of ??????? and ??????? are drawn from space-separated elements that are time-evolving with forward secrecy. Additionally, when a OTP? or PID? is being calculated using ??????? or ??????? as inputs to F-box(Hash), a time-evolving sequence number is used as another of the inputs. As a result, the ???????? and ???????? that depend on these values are also space-time separated and jointly evolving. An attacker who intercepts ????????, ????????, or any ??????? or ??????? will be unable to use them after the sequence number or any of the Seed? have changed, as they will be invalid. Additionally, an attacker cannot use 12 an intercepted OTP? or PID? to obtain any information regarding the Seed? used to calculate them (due to the one-way property of the cryptographic hash function). Given Definition 17, Definition 24, and the related Property 3, the following Theorems may be formed. Both of these Theorems are proved in "3.3 Proofs for Theorem 1 and Theorem 2": Theorem 1 : Given the F-box(Lookup) transform, which takes as inputs (a) a super-state SClient?, SBadge?, SPIN?, or SPwd? (which contain cryptographic seeds), (b) the super-state SCust?, and (c) an OTP or PID index ((b) and (c) are used together to determine the identity and order of the seeds returned by the F-box(Lookup) transform). This transform returns an ordered subset of the seeds derived from (a). An attacker who wishes to recreate the F- box(Lookup) transform and has access to (c) and all or part of (a) but not (b) faces an NP-complete problem due to the order of the output seeds. Theorem 2: Given the IDACS system and an attacker who is trying to calculate ???????? and ???????? without access to the super-states SClient?, SBadge?, SPIN?, SPwd?, or SCust?. Such an attacker must reassemble SCust? (which contains SClient?, SBadge?, SPIN?, and SPwd?) in order to successfully calculate ???????? and ????????. It is an NP- complete problem for the attacker to reassemble SCust? or any other super-state. 3.2.3.2 IDACS Client-side Algorithms The Client-side operations for IDACS network authentication and authorization may now be detailed in terms of these definitions. Algorithm 1 outlines the entire IDACS Network Access Control procedure, and the Client-side operations are detailed in Algorithm 2 and Algorithm 3. The network-side operations will be detailed in the next section. Notation: The notation A ? B: C indicates that message C is being sent from party A to party B. In Algorithm 1, UA? first calculates the ???????? (1) and ???????? (2) needed to authenticate the data request Req? (3) with the IDACS network and packages them together into Ticket? (4). Cust? then sends Ticket? to a pre-determined SA1 (5) to begin the network access control process. The network access control module defined in Algorithm 4 uses SAs and SSAs to authenticate Ticket? as many times as necessary (as defined by the authentication chain length, N) according to the specific IDACS implementation (7 and 8) using randomly generated XV1 and XV4 values (6) for the first iteration of the module. If the network access control module fails at any time, the data request is dropped (10). After the network access control module has been run several times, the final SSA to handle Ticket? sends Req? to DB? for processing (11). Algorithm 1 (and all contained sub-algorithms) is outlined in Fig. 4. The ???????? and ???????? used in Algorithm 1 are calculated according to Algorithm 2 and Algorithm 3. 13 In Algorithm 2, if ???????? is being calculated by Cust? (1), each individual OTP? (2) is calculated by gathering the relevant seeds from the relevant Client?, Badge?, PIN?, and Pwd? (3), and hashing them together (4). The individual OTP? are concatenated together to generate the set ???????? (5). If the calculation is being performed by SA? (7), then only OTP? is needed to be authenticated rather than the entire set ????????. Thus, all of the relevant seeds are gathered and hashed together to generate OTP? (8). On the other hand, the entire set ???????? is generated by Cust? and the entire set is authenticated by each SA? or SSA? (excluding Content(PID?), which is only checked by SSA?). In Algorithm 3, Cust? gathers all of the relevant seeds from Client?, Badge?, PIN?, and Pwd? (3) and hashes them together (4) for each individual PID? (1). Any SA? or SSA? gathers the seeds and hashes them together (6) to generate each individual PID? (1). Finally, all PID? are concatenated to form ???????? (7). The procedure for calculating a single OTP? at Cust? in Algorithm 2 is outlined in Fig. 5; the procedure for calculating a single PID? at Cust? is similar. Algorithm 2. calculate_OTPs() input: location, Cust? output: ???????? or OTP? 1 if (location == SCust?) 2 for ?=1 to q 3 ???????OTP, ? = F-box(Lookup?C, SCust?, OTP, ?) 4 OTP? = F-box(Hash, SCust?, ???????OTP, ?, OTP, ?) end 5 ???????? = F-box(Concat, OTP1, OTP2, ?, OTPq) 6 return ???????? 7 else if (location == SSA?) 8 OTP? = F-box(Lookup?Hash, PID(Cust?), SSA?, OTP, ?) 9 return OTP? end Algorithm 3. calculate_PIDs() input: location, Cust? output: ???????? 1 for ?=1 to r 2 if (location == SCust?) 3 ???????PID,? = F-box(Lookup?C, SCust?, PID, ?) 4 PID? = F-box(Hash, SCust?, ???????PID,?, PID, ?) 5 else if (location == SSA? or SSSA?) 6 PID? = F-box(Lookup?Hash, PID(Cust?), Slocation, PID, ?) end end 7 ???????? = F-box(Concat, PID1, PID2, ?, PIDr) 8 return PID? Algorithm 1. IDACS Network Access Control procedure input: Cust?, S ????, S ?????, S ????, data operation (Read/Write/Execute), data, SA1, N output: Cust?, S ????, S ?????, S ???? at UA? 1 ???????? = calculate_OTPs(SCust?, Cust?) 2 ???????? = calculate_PIDs(SCust?, Cust?) 3 Req? = F-box(Concat, (Read/Write/Execute), PID(Cust?), Content(PID?), data) 4 Ticket? = F-box(Concat, ????????, ????????, Req?) 5 Cust? ? SA1 : Ticket? 6 XV1 = F-box(Mrand) XV4 = F-box(Mrand) 7 for n = 1 to N 8 { XV1, XV4, SA1, SSA2, passed } = run_auth_chain(SA1, XV1, XV4, Ticket?) 9 if (passed = false) 10 exit al orithm end end 11 SSA2 ? DB? : Req? 14 Fig. 4. IDACS Network Access Control top-level view Fig. 5. Procedure to calculate a single OTP? at Cust? 3.2.4 IDACS Network-side Authentication and Access Control 3.2.4.1 IDACS Network-side Elements, Operations, and Definitions The previous section outlined the IDACS Client-side authentication and authorization procedures for gaining access to data stored on a DB. This section will discuss the corresponding Network-side procedures that verify ???????? and ???????? in order to authenticate Cust? and grant DB access. Definition 28 : Given ???? and ?????, each pair of two machines from these sets share a cryptographic key. This key is used for encryption and cryptographic hash functions. A cryptographic key shared between machines A and B is represented as Key(A, B). These shared keys are referenced in Algorithm 4. 15 Definition 29 : Given Ticket?, ????, and ?????, Ticket? must be authenticated ( ????????) and authorized ( ????????) by multiple SA? and SSA? (space separation and redundancy) both before it reaches ???? and before it returns to Cust?. As Ticket? is passed through this authentication chain, there is also an authentication method for messages passed between the SA? and SSA? to verify the identity of the sending SA? or SSA?. Xchain Values are used in these messages for inter-machine authentication. Details on how these values are calculated are included in Algorithm 4. The notation for these values is XVA, A ? [1, 6] as described in Algorithm 4. Definition 30 : Given Ticket? which is sent by Cust? to the IDACS network, the ?????? and ?????? contained in Ticket? must be verified by an authentication chain of multiple SA? and SSA? before it is sent to DB? (as detailed in Algorithm 1 and Definition 29). The order in which SA? and SSA? verify Ticket? is pseudo-random, but calculated by the F-box(next-SA-SSA) transform. This transform accepts Ticket?, the current location SA? or SSA?, and SSA or SSSA as inputs and outputs index ? or ? for the next-hop SA or SSA (next-hop SA? ? ????, next-hop SSA? ? ?????). The transform applies F-box(Hash) to Ticket? and then calculates the Hamming distance between F-box(Hash, Ticket?)and PID(SA?) or PID(SSA?) for ??[1, q] and ??[1, n]. The index ? or ? where PID(SA?) or PID(SSA?) has the lowest Hamming Distance (excluding all SA? or SSA? already in the authentication chain) is the index of the next-hop SA or SSA. The F-box(next-SA-SSA) is represented in equation notation by ? = F-box(Next, Ticket?, SA?, S ????) PID(SA?) or PID(SSA?) are shared among all SA? or SSA?, but they are not shared with Cust?. The F-box(next-SA-SSA) transform may only occur at SA or SSA locations. Definition 31 : Given Ticket? and XV values, the complete messages passed between multiple SA? and SSA? are termed Network Security Tickets, denoted TKA, A ? [1, 5]. The details are shown in Algorithm 4. TKA is a concatenation of the relevant Ticket? and Xchain values. Definition 32 : Given network message B, any SA? or SSA? processing B will record a Security Ticket Log Record \B\ detailing the vital information regarding B (e.g. the time B was processed, the IP address of the Cust? that sent B, etc.) B may be any Ticket? or TK network messages. The logs residing on SA? or SSA? are part of SSA? or SSSA?. Definition 33 : Given Definition 32, any SA or SSA that processes a network message (e.g. TK) records a log record \TK\ using the F-box(insert-log-record) transform. This transform accepts SSA? or SSSA? and TK as inputs and outputs an updated 16 version of SSA? or SSSA? which contains \TK\. The F-box(insert-log-record) transform is represented in equation notation by SSA? = F-box(Insert, SSA?, TK) Definition 34 : Given Definition 33, any SA? or SSA? may search its own log entries for a given \TK\ that matches certain input parameters such as time, IP address of sending Cust?, etc. These input parameters are not rigidly defined, and may exist in many combinations. The F-box(retrieve-log-record) transform accepts SSA? or SSSA? and a list of conditions as inputs and outputs one or more matching log records \TK\, or ?null? if no matching records are found. This transform is represented in equation notation by \TK\ = F-box(Rtrv, SSA?, {conditions}) 3.2.4.2 IDACS Network-side Algorithms The network-side authentication and authorization process described in Algorithm 1 is carried out in the ?run_auth_chain()? function described in Algorithm 4. The initial inputs to Algorithm 4 are handled separately depending on if this is the first call of the function (Algorithm 1 (8) with n=1) or a subsequent call. For the first call of the function, SA1 has been randomly selected by Cust?, Ticket? has been sent from Cust? to SA1 (Algorithm 1 (5) connected to Algorithm 4 (2)), and XV1 and XV4 have been randomly generated by SA1 and SSA1, respectively (Algorithm 1 (6) connected to Algorithm 4 (1) and (7)). For subsequent function calls, SA1 and SSA1 in the current Algorithm 4 function call are SA2 and SSA2 from the previous Algorithm 4 function call, and Ticket? resides at SA1 as a consequence; XV1 and XV4 in the current Algorithm 4 function call are XV2 and XV5 from the previous Algorithm 4 function call, respectively (Algorithm 1 (8) connected to Algorithm 4 (1) and (3)) (see Fig. 6 and Fig. 7 for details on how consecutive calls to Algorithm 4 are linked). 17 The main body of Algorithm 4 is represented graphically in Fig. 6. First, SA1 records a security log record of Ticket? (4). Next, SA1 verifies the associated OTP? and also ???????? extracted from Ticket?; if the verification fails, then the function returns ?false? (5). The next-hop SA2 and SSA1 are determined (6), and XV2 is calculated (7). TK1 is formed and sent to SA2 (8), and TK2 is formed and sent to SSA1 (9). SSA1 mirrors this process; SSA1 records a security log record of TK2 (10) and verifies Algorithm 4. run_auth_chain() input: SA1, XV1, XV4, Ticket? output: XV2, XV5, SA2, SSA2, passed At beginning of algorithm, the following values reside at the indicated locations after the last iteration of this algorithm, or are sent to (Ticket?) or generated at (XV1 and XV4) the indicated locations during the first iteration of the algorithm 1 XV1 ? SA1 2 Ticket? ? SA1 3 XV4 ? SSA1 at SA1 4 SSA1 = F-box(Insert, SSA1, Ticket?) 5 if (check_OTP_PID(SA1,Ticket?, Cust?) == false) return (passed = false) 6 SA2 = F-box(Next, Ticket?, SA1, S ????) SSA1 = F-box(Next, Ticket?, SA1, S ?????) 7 XV2 = F-box(Hash, XV1, Key(SA1, SA2)) 8 SA1 ? SA2: TK1 = { Ticket?, XV1, XV2 } 9 SA1 ? SSA1: TK2 = { Ticket?, XV2 } at SSA1 10 SSSA1 = F-box(Insert, SSSA1, TK2) 11 if (check_OTP_PID(SSA1, Ticket? ? TK2, Cust?) == false) return (passed = false) 12 SA2 = F-box(Next, Ticket? ? TK2, SSA1, S ????) SSA2 = F-box(Next, Ticket? ? TK2, SSA1, S ?????) 13 XV3 = F-box(Hash, XV2, Key(SSA1, SA2)) XV5 = F-box(Hash, XV4, Key(SSA1, SSA2)) 14 SSA1 ? SA2: TK3 = { Ticket?, XV3, XV5 } 15 SSA1 ? SSA2: TK4 = { Ticket?, XV4, XV5 } at SA2 16 SSA2 = F-box(Insert, SSA2, TK1, TK3) 17 if (XV2 ? TK1 != F-box(Hash, XV1 ? TK1, Key(SA1, SA2)) or (XV3 ? TK3 != F-box(Hash, XV2 ? TK1, Key(SSA1, SA2)) 18 report_and_trace_attack() return (passed = false) end 19 SSA2 = F-box(Next, Ticket?, SA2, S ?????) 20 XV6 = F-box(Hash, XV5, Key(SA2, SSA2)) 21 SA2 ? SSA2: TK5 = { Ticket?, XV6 } at SSA2 22 SSSA2 = F-box(Insert, SSA2, TK4, TK5) 23 if (XV5 ? TK4 != F-box(Hash, XV4 ? TK4, Key(SSA1, SSA2)) or (XV6 ? TK5 != F-box(Hash, XV5 ? TK4, Key(SA2, SSA2) 24 report_and_trace_attack() return (passed = false) end 25 return (passed = true) 18 ???????? extracted from Ticket? which is extracted from TK2 (11). SSA1 then calculates the next-hop SSA2 and SA2 (12) and also calculates XV3 and XV5 (13). SSA1 then forms TK3 and sends it to SA2 (14) and forms TK4 and sends it to SSA2 (15). While SA1 and SSA1 verify OTP? and ????????, SA2 and SSA2 verify the Xchain values (however, SA2 and SSA2 will also be verifying OTP? and ???????? as SA1 and SSA1 in the next function call of Algorithm 4). SA2 first records security log records for TK1 and TK3 (16). Next, the relationship between XV1 and XV2 and also the relationship between XV2 and XV3 are verified (17). If the XV relationships fail verification, the authentication process is stopped (18). The next-hop SSA2 (19) and XV6 (20) are both calculated. Finally, TK5 is formed and sent to SSA2 (21). SSA2 verifies its received Xchain values in a similar fashion (22?24). If all of the authentication checks and Xchain verifications pass, the function returns successfully (25). Fig. 6. Mutual authentication in authentication chain Fig. 7. Cascading the run_auth_chain() algorithm Algorithm 5. check_OTP_PID() input: location, Ticket?, Cust? output: passed 1 if (location ????) and (OTPlocation?Ticket? != calculate_OTPs(Slocation, Cust?)) 2 report_and_trace_attack() return (passed = false) end 3 if( ???????? ? Ticket? != calculate_PIDs(Slocation, Cust?)) 4 report_and_trace_attack() return (passed = false) end 5 return (passed = true) Algorithm 5 outlines the procedure used by SAs and SSAs to verify the ???????? and ???????? contained in Ticket?. The OTP? associated with SA? is verified by each SA? (1), and the entire ???????? is verified by each SA? and SSA? (3) (excluding Content(PID?), which is only checked by SSA?). If either of these checks fail, the ?report_and_trace_attack()? function is called to 19 identify the source of error (which is assumed to be an attacker with incomplete authentication credentials). The details of this algorithm are discussed in "4.4 Attack Traceback". The following properties help to explain the operation and purpose of the Xchain values. Property 5 : The procedure outlined in Algorithm 4 provides mutually-supported authentication between the SAs and SSAs authenticating Ticket?. Fig. 6 graphically illustrates the XV portion of Algorithm 4 run_auth_chain(). During the first iteration of run_auth_chain(), XV1 and XV4 are randomly generated; in subsequent iterations, they are given the values of XV2 and XV5 from the previous iteration. SA1 calculates XV2 by hashing XV1 with Key(SA1, SA2), and sends both values to SA2. SA2 is able to verify XV2 using its own copy of Key(SA1, SA2), which verifies the identity of SA1. SA1 also sends XV2 to SSA1, which calculates XV3 by hashing XV2 with Key(SSA1, SA2) and sends it to SA2. SA2 is able to verify the SA1-SSA1 connection as well as the identity of SSA1 by verifying XV3. Similarly, SSA1 calculates XV5 by hashing XV4 with Key(SSA1, SSA2) and sends both XV4 and XV5 to SSA2 but only XV5 to SA2. SA2 calculates XV6 by hashing XV5 with Key(SA2, SSA2) and sends it to SSA2. SSA2 is then able to verify the identity of SSA1 by verifying XV5 and is able to verify the SSA1-SA2 link as well as the identity of SA2 by verifying XV6. The run_auth_chain() algorithm may be cascaded across N SAs and SSAs to accomplish the desired number of authentications (Fig. 7). In this way, the authentication chain provides mutually- supported space-separated authentication that is time-evolving between the SAs and SSAs. Property 6: When the run_auth_chain() algorithm is cascaded, XV2 and XV5 of one iteration are, in fact, the XV1 and XV4, respectively, for the next iteration; cascaded iterations of the run_auth_chain() algorithm are seamlessly integrated (Fig. 7). This is demonstrated at (8) in Algorithm 1. In this way, consecutive iterations provide mutually-connected authentication for each other. 3.3 Proofs for Theorem 1 and Theorem 2 3.3.1 Proofs Consider the following scenario: an attacker wishes to replicate the IDACS Network Access Control procedure for a legitimate Cust? in order to impersonate Cust? and gain access to Cust?'s data residing on DB?. In order to impersonate Cust?, the attacker requires correctly generated ?????? and ?????? for Cust?. To accomplish this, the attacker requires two things: a) the cryptographic seeds residing on/derived from Client?, Badge?, PIN?, and Pwd? (i.e. ??????? ?Client?, ??????? ?Badge?, ??????? ?Pwd?, and ??????? ?PIN?) and b) the order in which these seeds must be hashed to generate all OTP? and PID? (i.e., the output of the F-box(Lookup) transform in Algorithm 2 (3) and Algorithm 3 (3)). An attacker who is able to steal or clone a Client?, 20 Badge?, Pwd?, or PIN? can gain access to (a) through memory scraping or other memory access strategies. However, in order to obtain (b), the attacker needs access to SCust?, which is the critical input to the F-box(Lookup) transform. Now, SCust? is composed of SClient?, SBadge?, SPwd?, and SPIN?; the entire SCust? does not reside with any one of these locations or states. Therefore, an attacker who does not possess all four of these items cannot recreate SCust? apart from a brute-force attack; such an attacker must resort to other means to recreate (b) (i.e. the output of the F-box(Lookup) transform). The attacker faces an order-reassembly problem; this problem can be represented using graph theory. The group of seeds ??????? is represented as a set of vertices ? and a set of directed edges E, where each v ? is connected to every other v ? by a pair of directed edges e E with opposite directions. Each e E is given an associated weight W(e), 0?W(e)?1 (Fig. 8 (a)). Each e E with a tail connecting to v1 and a head connecting to v2 (v1,v2 ?), represents the possibility that v2 follows v1 in the output of the F-box(Lookup) transform, while W(e) represents the associated probability (the determination of W(e) is discussed in the following section). Presumably, the path connecting N vertices (where N is known to be the number of seeds output by the F-box(Lookup) transform) that has the highest sum W(e) of any path with N vertices will be the correct solution to the F-box(Lookup) transform (Fig. 8 (b)). The problem of finding the highest sum W(e) path is defined as the Maximum Weight Directed Path of Specified Length (MWDPSL) problem; this problem is proven NP-complete in "Appendix A". The inspiration for this proof is drawn from [29]. The NP-complete proof of the (MWDPSL) problem in turn proves Theorem 1. Consider a second scenario. The same attacker does not have access to Client?, Badge?, Pwd?, or PIN? (and therefore not SCust? either). This attacker must recreate SClient?, SBadge?, SPwd?, and SPIN? in order to gain access to both (a) and (b); therefore, the attacker must correctly reassemble the memory contents of Client? and Badge? and an analagous representation of Pwd?, or PIN? (these memory contents are the definition of SClient?, SBadge?, SPwd?, and SPIN?). Each of these items is represented by b memory locations, each of which is ? bits long; therefore, each memory location contains one of 2? possible values. This situation can be represented using an undirected ?colored? graph [30]. The possible values for a given memory location can be represented by a group of 2? vertices v of the same ?color?, v ?, and each memory location can be represented by a different ?color? group (represented as different shapes in Fig. 9). Each v ? is connected to every other v ? (except for v of the same ?color?) by an undirected edge e E, and each e E has an associated weight W(e), 0?W(e)?1. Each edge represents the possibility that the two connected v are both present in the correct reconstruction of the memory contents, and the associated W(e) represents the probability (again, the manner in which W(e) is assigned is discussed in the next section). A path connecting one v of each ?color? that has the highest sum W(e) will represent the correct reconstruction of the memory 21 contents (Fig. 9). This problem is defined as the Maximum Weight Path of Specified Length (MWPSL) problem; this problem is also proved NP-complete in ?Appendix A?. In turn, the proof of the MWPSL problem proves Theorem 2. a) b) Fig. 8. (a) Directed graph representing seed reassembly problem, and (b) solution to the MWDPSL problem, N=4 a) b) Fig. 9. (a) Graph representing memory reassembly problem, b = 4 (b) solution to the MWPSL problem, N = 4 (W(e) not shown) 3.3.2 Constraints on NP-completeness The NP-completeness put forth in Theorem 1 and Theorem 2 points to a high level of security for IDACS, since NP- completeness is associated with an exponential increase in the problem solution complexity. However, NP-completeness speaks only to the worst-case (for the attacker) situation. It may be that the problem soultion can be found with significantly less than exponential complexity. Consider Fig. 8; the difficulty in finding the MWDPSL lies in the fact that the maximum weight path is not immediately apparent. If the graph in Fig. 8 contained a few very high-weight edges and the remaining edges had lower weights, this would provide a significant portion of the solution and greatly reduce the problem complexity. In the paper which originally presented the graph method for matching fragemnts of data [32], the authors discuss how this method can be used to reassemble scattered data file fragments. They note in their work that fragments from highly-patterned data files generate graphs with a few higher weight edges, resulting in significantly reduced probelm complexity, while data with more pseudo- random qualities generated graphs with more uniformly weighted graphs, resulting in higher problem complexity. This leads to the important question, what would a graph generated by a realistic IDACS situation look like? 22 According to the research in [32], data fragments (whether file fragments in that paper or ??????? in this paper) that are more "random" will have a low correlation with each other; when analyzed for likelihood of matching, they will result in a uniform distirbution of edge weights (Fig. 10). A strong Random Number Generator would be expected to generate this type of result. However, if the data fragments are highly patterned (e.g. the outputs of a poorly-designed hash function), the analysis will result in a graph with a few high-weight edges (Fig. 11). So which of these two situations most closely matches the analysis of the IDACS ???????? Fig. 10. Graph problem for uniform probability relationship (e.g. true Random Number Generator) Fig. 11. Graph problem for highly correlated relationship (e.g. poorly designed hash function) The National Institute of Standards and Technology (NIST) has provided a battery of tests [34] that analyze the outputs of Random Number Generators (RNGs) to measure their ?randomness? by looking for patterns [35]. This battery of tests has also been used on ciphertext from various encryption algorithms to measure how closely it matches truly random data. This battery contains 15 individual tests, each of which measures different aspects of ?randomness? in a set of data. Each test, when analyzing a data sample, asks this question: ?If the algorithm that generated this data sample was truly random, what is the probability that this specific data sample could have been generated?? The test responds with a p-score for the analyzed data sample; this p-score is a probability in the range [0, 1]. NIST recommends interpreting these p-scores using a ?significance level? of 0.01; if a data sample?s p-score is above 0.01, then the data sample has passed the randomness test. Now, some data samples that are truly random will generate a failing p-score, which would be a ?false negative? for randomness; this is due to inherent weaknesses in the tests [35].There are two ways to interpret the results of these tests. The first way is to look at the proportion, or percentage, of data samples with passing p-scores. According to the parameters in [35], for a set of tests run with 1000 data samples, a truly random RNG will have a minimal proportion of 0.9805068, i.e. a minimum 98.05% pass rate. The second way is to look at the distribution of the test p-scores. For a set of truly random data samples being subjected to a test, it is expected that the p-scores of the data samples should be evenly distributed. Evenness of distribution can be measured by 23 calculating P-valueT based on the chi-square statistic for each test as discussed in [35]; if each test has P-valueT ? 0.0001, then the p-scores are considered to be evenly distributed. In order to determine the ?randomness? of ??????? that will be used in IDACS, the NIST battery of tests was applied to a number of SHA-256 cryptographic hash outputs designed to simulate the ??????? that will be used by IDACS. The battery of tests was applied to 1000 data samples of sizes dictated by the NIST battery. Of the 15 tests in the battery, two of the tests are run twice during the course of the battery. Results for both tests are reported here. Three of the tests are run a number of times; results for two randomly chosen instances of those tests are reported here. All other tests were run once, and the results are reported here. There are a total of 20 separate test results. For the first analysis, the proportions of data samples that pass each test are presented in graph form in Fig. 12. It can be seen that all tests exceed the minimum pass proportion of 0.9805068. Fig. 12. Proportion of Passing Data Samples for Each Test For the second analysis, the P-valueT for each of the tests is presented in Table 3. It can be seen that all tests exceed the minimum pass value of 0.0001. Table 3. P-valueT for each test Test # P-valueT Test # P-valueT Test # P-valueT 1 0.461612 8 0.378705 15 0.907498 2 0.328297 9 0.572847 16 0.345744 3 0.134944 10 0.873987 17 0.915241 4 0.788728 11 0.581082 18 0.078567 5 0.918317 12 0.444691 19 0.278461 6 0.605916 13 0.455937 20 0.614226 7 0.018668 14 0.052531 3.3.3 Implications for IDACS What are the implications of the previous sections for IDACS? Consider the attacker in the scenarios; an attacker who has access to cryptographic seeds but not SCust? (Theorem 1) OR no access to any memory locations at all (Theorem 2) faces the NP-complete reassembly problem. There is no known solution to these problems with a complexity polynomial to the 24 problem size (number of seeds or memory locations in the graph). A polynomial-time solution could exist for certain situations meeting special constraints; however, due to the demonstrated randomness of the ??????? used in IDACS, it is expected that the best algorithm will be of exponential complexity to the problem size. Having no special algorithms to aid him, and the attacker will be reduced to brute-force attacks. For a Theorem 1 situation, he must try every possible seed combination solution to F- box(Lookup) to guess the solution as shown in Fig. 8 (b). For a Theorem 2 situation, he must try every possible memory value to guess the solution as shown in Fig. 9 (b). Such attacks will be detected quickly, and the security log/forensics capabilities of IDACS will allow the system to identify which seeds, locations, and states have been compromised by the attacker. The attacker will be foiled even if the some (but not all) of {Client?, Badge?, PIN?, Pwd?} are stolen. Furthermore, even if the attacker is able to guess the solution, because of Property 3, the identity, memory locations, and order of the cryptographic seeds evolve in time, presenting the attacker with totally new problems as shown in Fig. 8 (a) and Fig. 9 (a). To get an idea of the real-world implications of this, consider a Theorem 1 situation where the attacker has access to all of the Seed? needed to calculate ???????? and ???????? (but without access to the F-box(Lookup) transform). For an IDACS system with a given q (number of SA?, with the same number of OTP? to be calculated) and a given r (number of PID? to be calculated) and a given number of Seed? used to calculate each OTP? and PID?, Table 4 shows how long it would take the attacker (on average), trying all possible permutations of Seed? at 106 permutations per second, to find the ordering to correctly calculate ???????? and ????????. Since Theorem 1 deals with the ordering of known Seed? and Theorem 2 deals with generating these Seed? (before ordering can even occur), it is accepted that the computation times for a Theorem 2 situation would be significantly greater than those shown in Table 4. Table 4. Average time to for permutation orderings of Seed? to generate correct OTPs and PIDs at 106 permutations per second # Seed? per OTP/PID 8 12 16 # OTPs (SAs) + # of PIDs 6 + 8 3.13 * 10174 years 4.01 * 10294 years 8.87 * 10422 years 8 + 8 6.11 * 10207 years 5.63 * 10348 years 1.36 * 10499 years 10 + 8 8.80 * 10241 years 1.59 * 10404 years 1.14 * 10577 years 3.4 Simulations 3.4.1 Simulation Setup Simulations for this research were carried out using a model of an IDACS network built using MATLAB. The simulation network was built as demonstrated in Fig. 13. The network contains a variable number of Clients up to a maximum of 102,400. The SA barrier consisted of four SAs (with the possibility for future expansion). The network contained one or two SSAs and one 25 Database (both with the possibility for future expansion). Network links were built according to the bandwidths indicated in Fig. 13, and were full-duplex. During all simulations, background traffic was introduced into the network in order to simulate normal operating conditions. It was determined that introducing network traffic on the slower network connections did not affect the simulation results (but made the simulation running time prohibitively long). Therefore, all background traffic was introduced between the SAs and the SSAs. Uniformly distributed background traffic equal to 80 Kbps/Client was divided equally between the SAs and sent from each SA to each SSA. An equal amount of traffic was also sent from each SSA to each SA. This rate of background traffic ranged from a one-way 80% load on a 10 Gbps link for a full-sized network (102,400 Clients) to a much smaller load for smaller networks (0.8% load for 1000 Clients). This rate of background traffic affected both SA/SSA security log size (which has a substantial effect on real-time forensics, which is discussed in "4.5.1 Attack Traceback Time Simulations") and packet transit time in the datacenter (due to network congestion). Additionally, realistic packet delay times for routers were obtained from router manufacturer documentation and incorporated into the simulation. Packet processing delays for Clients, SAs, SSAs, and Databases were estimated; when the IDACS prototype implementation is completed by the researchers, more precise packet processing times will be measured and incorporated into the simulation. Each simulation consisted of two phases. In the first phase, each Attacker would build a set of compromised Slaves (a botnet) gathered from a pool of vulnerable Clients. The attacker would compromise the Clients (turning them into bots) by sending a Compromise packet to each Slave candidate. During the second phase, each Attacker would send out a specified Fig. 13. Simulation Network 26 number of Read and Write attacks using a random-length Attack Chain of chained Slaves (the details of the attack scenarios used in this simulation are discussed in "4.2 Attack Vectors"). The start times for these attacks were uniformly distributed over a 20 millisecond period. These attack packets would be checked for ???????? and ???????? by the SAs and the SSAs according to normal IDACS operations. If the packet failed any of these checks, the packet would be dropped as an attack. If an attack packet was able to bypass all of the security checks and successfully carry out a Database Read or Write operation, the attack was considered successful. Background traffic was present in the network during both of these phases. 3.4.2 Simulation Parameters In the simulation, if the attack packet did not possess the proper ???????? and ????????, any SA or SSA would detect the attack 100% of the time. In reality, however, some attacks would go undetected due to various attack methods (zero-day attacks, SQL injection, buffer overflow, etc.) Therefore, SAs and SSAs in the simulation were classified as ?fully compromised? and ?partially compromised?. A ?fully compromised? SA or SSA would pass a failed ???????? or ???????? check as successful 100% of the time; this situation represents an SA or SSA that is fully controlled by an attacker. A ?partially compromised? SA or SSA would pass a failed ???????? or ???????? check 50% of the time; this represents a ?normal?, or uncontrolled by an attacker, SA or SSA. The rationale behind setting a ?partially compromised? SA or SSA to a 50% fail rate is twofold. First it simulates zero-day attacks, etc.; second, it demonstrates the strength of the IDACS system, even under ?poor? conditions and makes more visible the effect of other variables on network performance. In a realistic situation, it is expected that the failure rate of ?normal? machines will be much less than 50%. Additional probability variables were also used to govern other factors in the simulation. During chained attacks (in which an Attacker uses a chain of Slaves (bots) to launch an attack; this is discussed in "4.2 Attack Vectors"), the attacker was given an 80% probability of stealing the cryptographic seeds needed to calculate ???????? and ????????and an 80% chance of an attack chain packet (prior to the final leg of the chained attack) passing a failed permissions check based on Content(PID?). 3.4.3 Attack Detect Rate All tests were based on 1000 attacks for a given test case; 500 Read attacks and 500 Write attacks. Some tests used 1 SSA, and some used 2 SSAs. The first set of tests (Fig. 14) demonstrates the performance of the IDACS system under casualties (?fully compromised? SAs). It shows that even with multiple SAs compromised, the attack detect ratio is still very high. 1 SSA was used in these tests. 27 As expected, Fig. 14 shows that the Attack Detection Ratio is fairly constant across network sizes. However, the Attack Detection Ratio is affected by the number of ?fully compromised? SAs. When no SAs are ?fully compromised?, the system performs very well, with an average detection ratio above 99.5%. With one or two SAs ?fully compromised?, the detection ratio is still fairly high. Thus, it can be seen that the system provides an excellent defense against attacks, even under heavy casualties. The second set of tests (Fig. 15) was performed to test the system under SSA ?full compromise?. The simulated network in these tests contained two SSAs; one was ?fully compromised? and the other was ?partially compromised?. By comparing Fig. 14 and Fig. 15, it can be easily seen that the compromise of an SSA has greater effect on the Attack Detection Ratio than the compromise of an SA. This is because the simulation specified the chances of an attacker obtaining ???????? and ???????? for a Slave were fairly high (80% chance), thus making permissions checks based on the Content(PID?) (which were only performed at SSAs) the primary mode of detecting attacks. Thus, the loss of an SSA has a greater effect on system security. However, even with an SSA and up to 2 SAs ?fully compromised?, the Attack Detect Ratio was still above 94%. This test shows that protecting the SSAs should be a top priority in any implementation of this system. Fig. 15. Attack Detect Ratio vs. Network Size and Number of SA/SSAs ?fully compromised? One of the main features of IDACS is the real-time forensics capability. Through log examination and correlation, IDACS is able to trace back and correctly identify the origin of an attack, whether the attack is launched directly by the attacker or Fig. 14. Attack Detect Ratio vs. Network Size and Number of SAs ?fully compromised? 28 indirectly using a botnet of legitimate IDACS users. This simulation also addressed these capabilities; the results will be presented and discussed in ?4.5.1 Attack Traceback Time Simulations?. 29 4 IDACS Network Attack Detection, Prevention, and Traceback 4.1 Introduction In today?s network security environment, it is critically important to detect and prevent network intrusions. However, it is almost equally important to trace network attacks to their origins and identify the culprits and their methods. This allows the guilty parties to be held liable for their actions; it also allows network administrators to focus their resources once they know the weak spots in their defenses. IDACS provides real-time digital forensics capabilities that can identify network attackers as well as their collaborators, and even traitors within IDACS itself. This section will detail those capabilities possessed by IDACS and discuss how they can be used to detect, block, and trace attacks to their origins. Additionally, simulations will demonstrate the ability of IDACS to detect attacks and self-heal even when the network contains a high percentage of insider traitors. 4.2 Attack Vectors When an attacker wishes to defeat the IDACS Network Access Control Protocol in order to gain access to protected data or services residing within the IDACS datacenter, there are several general attack vectors available for this task. 1) Forge legitimate Cust? credentials (Client?, Badge?, Pwd?, and PIN?) in order to impersonate a legitimate Cust? 2) Steal/hack credentials for a legitimate Cust? 3) Hack and gain control over one or more SAs and/or SSAs in order to manipulate the authentication process Attack vector 1) requires brute-force guessing of SClient?, SBadge?, SPwd?, and SPIN?; this is infeasible according to Theorem 2. Attack vector 2) may be more effective, although the space-separation of Client?, Badge?, Pwd?, and PIN? makes it more difficult for an attacker to collect them all and acquire a botnet of complete Cust?. By using attack vector 3), an attacker can use a botnet of SAs and SSAs to bypass ???????? and ???????? checks and even manipulate the correct authentication chain path. Attack vector 3) can be accomplished by hacking loyal SAs and SSAs, turning them into traitor machines; these traitor machines possess all of the Seed? used by that particular IDACS machine. Additionally, it may be possible to use a hostile machine to impersonate, or spoof, legitimate SAs and SSAs, although a spoofed machine would not possess the Seed? associated with the legitimate machine. The most effective attack scenario combines vectors 2) and 3) in an attempt to access the IDACS datacenter. If an attacker is able to use attack vector 1) possibly combined with 3) to control a single Cust?, he will most likely attempt to access the IDACS datacenter directly (Fig. 16). However, if the attacker uses 2) to build a botnet of traitor Cust?, he may choose to send Ticket? through multiple traitor Cust? (Fig. 17). In this way, the attacker is able to accomplish several 30 objectives. First, the attacker takes advantage of the credentials owned by the traitor Cust? in order to send a legitimate Ticket?. Second, in the case that the attack is detected, he masks his identity from the IDACS forensics suite. However, even in this situation, once an attack is detected, the IDACS real-time forensics will be able to identify the attacker through the methods described in "4.4 Attack Traceback". Fig. 16. Direct Attack Fig. 17. Chained Botnet Attack By means of attack vectors 2) and 3), any Cust?, SA?, or SSA? can be turned into a traitor machine. When this happens, the machine becomes a Byzantine actor (i.e. a malicious system actor that actively works to defeat the correct operation of the system). A great deal of research has been done on the subject of Byzantine actors [34] [35] [36], and it is of great interest to be able to prove that a given system is Byzantine-resistant, able to operate correctly in the presence of a given number of Byzantine actors. 4.3 Attack Detection and Prevention The incorporation of the space-separated time-evolving relationship into the IDACS Network is based on a simple principle, which affects its real-time forensics capabilities: Principle 1: Any Cust?, SA?, or SSA? in IDACS can be hacked and turned into a traitor/Byzantine actor. Any Customer (Cust?), authenticating machine (SA? or SSA?), or real-time forensics machine (SSA?) can be turned into a traitor/Byzantine actor. This principle is the reason for the decentralized approach of separating authentication capabilities in space and time. With a design that keeps this principle in mind, IDACS is able to detect and prevent almost all illegal Ticket? that are passed to it. In fact, IDACS is demonstrably secure against any illegal Ticket? under certain conditions. The following Claims are based on a certain set of assumptions outlined below, which enforce the parameters of network communication. Assumption 1: Any Cust? can only communicate with ????. 31 Assumption 2: Any member of ???? can communicate with any Cust?, any member of ??????, and any other member of ????. Assumption 3: Any member of ?????? can communicate with any member of ???? or ?????, and any other member of ??????. Assumption 4: Any member of ????? can communicate with any member of ??????. Assumption 5: An attacker who is forming Ticket? has access to all Seed? stored on traitor SA? or SSA?. Assumption 6: A spoofed SA? or SSA? does not have access to the Seed? stored on the machine it is spoofing. Assumption 7: Any DB? that receives a Ticket? can verify whether or not the SSA? that sent it was the correct SSA? at the end of the calculated authentication chain. Assumption 8: When processing Ticket?, IDACS performs ???????? or ???????? checks on both the approach from Cust? to ?????, and on the return from ????? to Cust?. However, the authentication chain path for the return is based on the reply message Ticket'?, which is different from Ticket?. Assumption 9: Any attack Ticket? falls into one of two categories: a) contains incorrect ???????? or ????????, or b) contains correct ???????? and ????????, but is attempting to access data or service that the originating Cust? does not have permissions to access. Based on these assumptions, certain Claims about the attack detection and prevention capability of IDACS can be made. Recall that N is the Authentication Chain Length; there are N SAs and N SSAs in the approach authentication chain and N SAs and N SSAs in the return authentication path. Claim 1: A Ticket? with incorrect ???????? will be detected with up to 2N traitor SSAs and (2N - 1) traitor SAs in the approach and return authentication chain paths if the authentication chain path is not manipulated. Justification for Claim 1: According to Assumption 5, if any SA? or SSA? is a traitor, then the attacker will have access to the Seed? necessary to calculate ???????? correctly. Thus, ???????? checks will pass at each SA? or SSA? even if they are not traitors. However, if there is even one loyal SA? in the authentication chain, the attacker does not have access to the Seed? needed to calculate OTP?. This incorrect OTP? will be detected by the loyal SA?, and the attack will be detected and prevented (Fig. 18). Fig. 18. Illustration of Claim 1 Fig. 19. Illustration of Claim 2 32 However, the strength of IDACS illustrated by Claim 1 is qualified by Claim 2. Claim 2: A Ticket? with incorrect ???????? or ???????? is not guaranteed to be detected with one traitor SA and two traitor SSAs in IDACS if the authentication chain path is manipulated. Justification for Claim 2: Under certain circumstances, IDACS cannot guarantee detection of an attack Ticket? with one traitor SA and two traitor SSAs in IDACS if authentication chain path manipulation is allowed. The attacker is allowed to choose the first SA in the authentication chain, so he chooses a traitor SA. Since this SA is a Byzantine actor, it calculates the authentication chain path based on Ticket? and checks to see whether the last SSA in the authentication chain is also a traitor. If it is, then the SA passes Ticket? to this SSA, which then passes Ticket? to a DB? (Fig. 19). This action bypasses the ???????? and ???????? checks that would be performed by (potentially) loyal SAs and SSAs in the authentication chain. It is necessary for the last SSA in the authentication chain to be a traitor, because according to Assumption 7, DB? will also validate the authentication chain for the sending SSA. When DB? forms the return ticket Ticket'?, it will calculate a return authentication chain where the first SSA in the path cannot (by the rules) be the same as the last SSA in the approach path. If this SSA happens to be a traitor also, then it sends Ticket'? directly to the first traitor SA, which sends it to the attacker's Cust?. Claim 1 and Claim 2 address a) in Assumption 9; similar claims can be made to address b) in Assumption 9. Claim 3: A Ticket? with correct ???????? and ???????? but seeking to access data/services for which Cust? is not granted permissions will be detected with up to (4N - 1) traitor SAs and SSAs in the approach and return authentication chains if the authentication chain is not manipulated. Justification for Claim 3: Since Ticket? contains correct ???????? and ????????, the attack will not be detected on those grounds. However, each SA and SSA also checks the data/service targeted by Ticket? to see if Cust? has permissions on it. If only one SA or SSA in the approach or return authentication chain is loyal, then a non-permitted Ticket? will be detected, and the attack will be prevented. Fig. 20. Illustration of Claim 3 33 Claim 4: A Ticket? with correct ???????? and ???????? but seeking to access data/services for which Cust? is not granted permissions is not guaranteed to be detected with 1 traitor SA and 2 traitor SSAs in IDACS if the authentication chain is manipulated. Justification for Claim 4: The justification for Claim 4 is identical to the justification for Claim 2. There is also one final Claim that can be made regarding spoofed IDACS machines. Claim 5: A spoofed SA? or SSA? will be detected as soon as it communicates with a loyal SA? or SSA?. Justification for Claim 5: According to Assumption 6, a spoofed SA? or SSA? does not have access to the Seed? of the machine it is spoofing, including the Seed? needed to calculate XV when communicating with other SA? or SSA?. Therefore, it will be unable to correctly calculate the requisite XV; this situation will be detected immediately by a loyal SA? or SSA?. 4.4 Attack Traceback When an attack is detected by IDACS, it falls into one of several categories, with each category having corresponding root causes. If an attack is detected based on a OTP? or ???????? failure, this is because the attacker possesses an incomplete subset of the set { Client?, Badge?, Pwd?, PIN? }; additionally, if prior SAs or SSAs correctly authenticated the attack packet based on OTP? and ????????, they may be controlled or spoofed by the attacker. If the attack is detected base on an XV failure, this is because the attacker is spoofing one or more of the SA/SSAs. Each of these situations is handled differently by IDACS. When an attack is detected in Algorithm 4, the function ?report_and_trace_attack()? (Algorithm 6) is called to invoke the IDACS real-time digital forensics suite. The inputs to Algorithm 6 are ???????????, ????????????????????????, TK_A, TK_B, and current_location. ??????????? indicates the reason the attack was detected; it may contain one or more of the following values: OTP_fail, PID_fail, and XV_fail. ???????????????????????? contains a list of which (if any) of the PID? failed. TK_A and TK_B are the two TKs received by the detecting SA or SSA (if applicable). ?current_location? indicates the identity of the SA or SSA detecting the attack. When ?report_and_trace_attack()? is called, these inputs are packaged into an attack report (3). If the attack was detected by an SA (1), the report is sent to an SSA for processing (2 and 3). If the attack is detected at an SSA (4), then the attack is processed by that SSA (5). To process the attack, the SSA calls different forensics subroutines based on the reasons for the attack detection. If the attack failed due to OTP? or ???????? (6), then ?trace_attack()? is called to identify the root attacker, 34 the bot chain used in the attack, and any suspicious packet types that may have been used by the attacker to compromise other bots (7); ?identify_bots()? is called to identify possibilities for traitor Client? controlled by the attacker (8); and ?identify_compromised_items()? is called to determine which members of { Client?, Badge?, Pwd?, PIN? } have been stolen by the attacker based on correlation with ???????????????????????? (9). If an attack was detected by a failed OTP?, ????????, or XV (10), then ?identify_bad_SA_SSA()? is called to determine which SAs or SSAs (if any) are traitor or spoofed. All of these subroutines are discussed in the following sections. Algorithm 6. report_and_trace_attack() inputs : ???????????, ????????????????????????, TK_A, TK_B, current_location outputs : none 1 if ( current_location ? ???? ) 2 dest_SSA = random SSA 3 current_location ? dest_SSA: { ???????????, ????????????????????????, TK_A, TK_B, PID(current_location) } 4 else if ( current_location ? ?????? ) 5 dest_SSA = current_location end at dest_SSA: 6 if (OTP_fail ? ???????????) or (PID_fail ? ???????????) 7 {sourceTraitorCust, ??????????????, ??????????????????????????????} = trace_attack(TK_A) 8 traitorCustBotnet = identify_bots(sourceTraitorCust, ??????????????????????????????, ????, ??????) 9 traitorCustItems = identify_compromised_items( ????????????????????????, TK_A) end 10 if (XV_fail ? ???????????) or (OTP_fail ? ???????????) or (PID_fail ? ???????????) 11 traitorSAsSSAs = identify_bad_SA_SSA( ???????????, TK_A, TK_B) end Fig. 21 presents a block diagram representation of Algorithm 6, showing the relationship between int inputs, outputs and different functions that are called within the algorithm. The following table provides an overview of the traceback algorithms provided by the IDACS real-time digital forensics suite and what types of attacks they are able to detect: Table 5. Traceback functions and what they detect Traceback function What it detects trace_attack() Root traitor Cust? and other bot Cust? used in attack chain identify_bots() Attacker?s controlled botnet of traitor Cust? identify_compromised_items() Which cryptographic seeds have been leaked, and by whom { Client?, Badge?, Pwd?, PIN? } identify_bad_SA_SSA() Which SAs and SSAs are spoofed bad XV), or traitors (clearing packets with incorrect ???????? or ????????) 35 Fig. 21. Block diagram of "report_and_trace_attack()" When reading the following sections about the different digital forensics functions used by IDACS, it is important to keep the following Property in mind: Property 7 : The different design elements of IDACS (the distribution of PID seeds, the design of Xchain values, the design of security log records, etc.) are carefully crafted to facilitate the real-time digital forensics capabilities of IDACS. Therefore, IDACS is able to provide high-speed forensic services in real-time with minimal overhead. 4.4.1 Log Correlation to Identify Attacking Clients When an attack is detected by IDACS, the real-time digital forensics suite is able to trace the attack to the root attacker by correlating the security log records on IDACS machines. In a fully-realized IDACS system, all data packets (including Client- Client packets such as are used in attack chains) are required to pass through the SA barrier. Even in a less-complete IDACS system, the Client? still maintain security logs for all of the data packets they send and receive. These security logs are the key component to the attacker traceback capability. As an example, consider the detected attack packet log record on the right in Fig. 22. This record was generated for a data packet that was detected to be an attack due to OTP? or PID? failure. IDACS begins from the assumption that this packet was part of an attack chain (Fig. 17), and begins to trace the attack chain back to the root attacker. The trace is based on the log record items TIME, SOURCE/DESTINATION_IP_ADDRESS, PARENT/CURRENT_UA_PID, and CONTENT_PID. The SSA running the trace searches its own logs as well as the logs of other SSAs, SAs, and Clients (if necessary) for the ?parent? packet that directly precedes the detected packet in the attack chain. The trace searches for a packet that was logged before the attack was logged (TIME < 14830.528934) where the IP Address of the machine that sent the attack packet is the same as the destination IP address of the "parent" packet (SOURCE_IP_ADDRESS for the attack packet = DESTINATION_IP_ADDRESS for 36 the "parent" packet); the DB-side data targeted by the Content(PID?) in the ?parent? packet is the same as in the attack packet (CONTENT_PID = 34876105); and the parent UA for the detected attack packet is the same as the current UA for the ?parent? packet (PARENT_UA_PID for the attack packet = CURRENT_UA_PID for the ?parent? packet). Such a "parent" attack packet is detected (left side of Fig. 22). The investigating SSA then compares the CURRENT_UA_PID and PARENT_UA_PID in the "parent" packet record. If they are the same, then the machine at SOURCE_IP_ADDRESS is flagged as the root attacker; if they are different, the machine at SOURCE_IP_ADDRESS is flagged as a traitor Client?, and the traceback continues. Fig. 22. Attack packet traceback to identify root attacker The details of this traceback are show in the "trace_attack()" function defined in Algorithm 7. The SSA executing the traceback receives the log record \TK\ of the detected attack packet as input. The critical trace parameters are extracted from this record: the Cust? who sent TK is marked as the candidate for the attacker (1), and the time (2), Parent UA(PID?) (3), and Content(PID?) (4) are isolated. Additionally, the ?????????????? and ??????????????????????????????? outputs are initialized (5 and 6). Until the source attacker has been identified (7 and 8), the SSA initiates a search of all SSA, SA, and Client logs (10) searching for a "parent" packet logged before the current attack packet with parameters on the Destination IP address, Current UA(PID?), and Content(PID?) (9). If the F-box(Rtrv) transform (10) returns no hits on the "parent" packet (11), the traceback has failed to detect the root attacker; however, a partial list of bots used in the attack chain can be returned (12). If a "parent" packet with matching Current UA(PID?) and Parent UA(PID?) is discovered (13), then the Cust? that sent this packet is identified as the root attacker (15). Additionally, since this packet was used in an attack chain, this packet?s type is flagged as suspicious (16) and is used in Algorithm 8 to identify candidate members of the attacker?s controlled botnet. Otherwise, the Client that sent the "parent" packet is marked as one of the bots controlled by an attacker (17), the search parameters are reset based on the "parent" packet (18 to 21), this packet?s type is flagged as suspicious (16), and the search continues (8). Once the root attacker has been identified, 37 the function returns the identity of the root attacker, the bots that were identified in the attack chain, and the suspicious packet types (23). Algorithm 7. trace_attack() inputs: \TK\ outputs: sourceAttacker, ??????????????, ?????????????????????????????? 1 sourceAttacker = Cust? ? \TK\ 2 sourceT me = current time 3 sourceParentUAPID = Parent_UA(PID?) ? \TK\ 4 sourceContentPID = Content(PID?) ? \TK\ 5 ?????????????? = null 6 ??????????????????????????????? = null 7 sourceFound = false 8 while (sourceFound == false) 9 ???????????????? = {time ? \Ticket?\ < sourceTime, destination_IP ? \Ticket?\ = = sourceAttacker, Current_UA(PID?) ? \Ticket?\ == sourceParentUAPID, Content(PID?) ? \Ticket?\ == sourceContentID} 10 source\Ticket?\ = F-box(Rtrv, S ????, S ??????, ????????????????) 11 if(source\Ticket?\ == null) 12 fail; return {null, ??????????????} 13 else if((Parent_UA(PID?) ? source\Ticket?\) == (Current_UA(PID?) ? source\Ticket?\)) 14 sourceFound = true 15 sourceAttacker = Cust? ? source\Ticket?\ 16 ??????????????????????????????? = F-box(Concat, ???????????????????????????????, packet_type(source\Ticket?\)) else 17 ?????????????? = F-box(Concat, ??????????????, Cust? ? source\Ticket?\) 18 sourceAttacker = Cust? ? source\Ticket?\ 19 sourceTime = time ? source\Ticket?\ 20 sourceParentUAPID = Parent_UA(PID?) ? source\Ticket?\ 21 sourceContentID = Content(PID?) ? source\Ticket?\ 22 ??????????????????????????????? = F-box(Concat, ???????????????????????????????, packet_type(source\Ticket?\)) end end 23 return { sourceAttacker, ??????????????, ??????????????????????????????? } 4.4.2 Log Correlation to Identify Traitor Client Botnet Immediately following the call of "trace_attack()" to identify the root traitor Client and the traitor Client bots in the attack chain, the SSA running the real-time digital forensics suite will run the "identify_bots()" function to identify all traitor Client bots controlled by the attacker, even those in a dormant state. In the example shown in Fig. 22, it was seen that the root traitor Client and other traitor Clients in the attack chain used "Remote Terminal" packets to carry out attack activities. Therefore, the digital forensics suite designates "Remote Terminal" packets, especially those sent by the root Traitor Client, to be suspicious. The SSA running the real-time digital forensics suite searches through the security log records of all SAs, SSAs, and Clients to identify suspicious network traffic. In the example shown in Fig. 23, the digital forensics suite searches for "Remote Terminal" 38 packets in the security logs that were sent by the root traitor Client (SOURCE_IP_ADDRESS = 75.128.32.146). This search yields a number of Clients that are strong candidates for being Traitors. This list of potential Traitor Clients is added to those identified in the attack chain in "trace_attack()"; these Clients may be quarantined and examined in-depth until they can be healed and returned to service. It should be noted that this algorithm is capable of detecting dormant traitor Clients even beyond those that were used in the attack. Fig. 23. Traitor Client botnet detection using Security Log correlation Algorithm 8 details the "identify_bots()" algorithm. The function receives the identity of the root Traitor Client, any suspicious packet types as determined by the "trace_attack()" algorithm, and ???? and ?????? as inputs. It creates a set of parameters (1) to be the input to the F-box(Rtrv) transform; the packets sought by the F-box(Rtrv) transform must be among the suspicious packet types and must have originated from the attacker detected by "trace_attack()". The function then loops through every SA (2) and every SSA (5), searching for log records that meet the search parameters (3 and 6). For any security log records that match the conditions, the Cust? that were targeted by those packets are added to the list of possible traitor Clients controlled by the attacker (4 and 7). 39 Algorithm 8. identify_bots() inputs: attacker-Cust?, ??????????????????????????????, ???? , ?????? outputs: ??????????????????????? 1 ???????????????? = { packet type(\Ticket?\) ? ??????????????????????????????, source IP(\Ticket?\) == source IP(attacker-Cust?)} 2 for index=1 to ? 3 temp = F-box(Rtrv, SSA?, ????????????????) 4 ??????????????????????? = F-box(Concat, ???????????????????????, destination_IP ? temp) end 5 for index=1 to ? 6 temp = F-box(Rtrv, SSSA?, ????????????????) 7 ??????????????????????? = F-box(Concat, ???????????????????????, destination_IP ? temp) end 8 return ??????????????????????? 4.4.3 PID/OTP Correlation to Identify Client-Side Security Items The space-time separated and jointly evolving relationship built into IDACS can be used to assist the real-time digital forensics capabilities. The elements of ???????? are calculated based on seeds drawn from Client?, Badge?, Pwd?, and PIN?. However, not every PID? needs to draw seeds from each of these sources; the locations of seeds used to calculate individual PID? can be tailored to meet the security requirements of the system. In addition to the distributed storage of the seeds, certain bits are removed from the seeds themselves and stored in a physically different location. These bits are called Xbits; they will be discussed in more detail ?5.2.1 Separation of Encryption Keys?. The Xbits are stored on the SAs in the IDACS Network, and they must be recombined with the cryptographic seeds to correctly calculate the elements of ????????. Fig. 24 demonstrates how different PID? can be calculated using different combinations of cryptographic seeds and xbits; Type A PID? are calculated using only seeds from Client?, Type B PID? are calculated using only seeds from Client? and the associated Xbits, Type F PID? are calculated using seeds and associated Xbits from both Client? and PIN?, etc. This division accomplishes two purposes. First, separating the seeds across different locations increases the complexity for an attacker to correctly construct ????????, as discussed previously. Second, a seed separation and combination such as indicated in Fig. 24 can be used as a forensics tool. If an attack is detected by an SA or SSA due to ???????? failure, an analysis of which PID? failed and which PID? were formed correctly can indicate which Client-side items and Network-side SAs or SSAs are Traitors or have had their memory 40 compromised. For example, if a Type D PID? was formed correctly, it may be assumed that the attacker owns the seeds derived from PIN? as well as the associated Xbits; if a Type F PID? was formed correctly, it may be assumed that the seeds stored on Client? and derived from PIN? as well as the associated Xbits are owned by the attacker. Fig. 24. Space-separated combinations of seeds used to calculate PID? An example of the ?identify_compromised_items()? algorithm corresponding to Fig. 24 is shown in Algorithm 9. The Cust? that sent the detected attack packet TK1 is marked as a Traitor (1), with the Client?, Badge?, Pwd?, and PIN? contained in Cust? and the SAs storing their corresponding Xbits all possibly controlled/cloned by an attacker. Based on what type of PID? were formed correctly in the attack packet TK1, certain elements of Cust? and ???? are marked as Traitor (4 ? 11). The following properties hold true for this forensics capability: Property 8 : Due to the seed distribution and PID? formation (as discussed generically in Property 7, the checks performed in "identify_compromised_items()" are performed very quickly with very little overhead. Property 9 : The seed distribution shown in Fig. 24 is very flexible, and can be adjusted on a per-Client basis to meet the security needs of the particular Client or IDACS implementation. Algorithm 9. identify_compromised_items() inputs: ?????????????????????????, TK_A outputs: traitor_items 1 traitor_C = Cust? ? TK_A 2 traitor_items = null 3 switch (PID? ? ?????????????????????????) 4 case Type A, B, E, F, or N: traitor_items = F-box(Concat, traitor_items, Z ? compromised_C) 5 case Type C, D, E, F, or G: traitor_items = F-box(Concat, traitor_items, X ? compromised_C) 6 case Type G, H, I, J, or K: traitor_items = F-box(Concat, traitor_items, W ? compromised_C) 7 case Type J, K, L, M, or N: traitor_items = F-box(Concat, traitor_items, Y ? compromised_C) 8 case B, E, or F: traitor_items = F-box(Concat, traitor_items, SA storing xbits for Z ? traitor_C) 41 9 case D, F, or G: traitor_items = F-box(Concat, traitor_items, SA storing xbits for X ? traitor_C) 10 case I, J, or K: traitor_items = F-box(Concat, traitor_items, SA storing xbits for W ? traitor_C) 11 case K, M, or N: traitor_items = F-box(Concat, traitor_items, SA storing xbits for Y ? traitor_C) end 12 return traitor_items 4.4.4 TK Correlation to ID Traitor SA/SSAs Fig. 25 graphically illustrates the handling of the XV used in the ?run_auth_chain()? algorithm outlined in Algorithm 4. Multiple iterations of Algorithm 4 are called by Algorithm 1 (7 and 8) in order to form the complete authentication chain of SAs and SSAs, as shown in Fig. 26. The iterations are linked, with the SA2 and SSA2 of one iteration becoming the SA1 and SSA1 of the next iteration (Fig. 25 and Fig. 26). In a given iteration, SA1 and SSA1 perform OTP? and ???????? checks and generate XV, while SA2 and SSA2 verify the XV to verify the identity of SA1 and SSA1. At SA2, if the XV2 verification fails, it indicates that SA1 is being by an attacker; if the XV3 verification fails, this indicates that SSA1 is being spoofed by an attacker. In the same way, at SSA2 if the XV5 verification fails, it indicates that SSA1 is being spoofed; if XV6 fails, it indicates that SA2 is being spoofed. These relationships are discussed more extensively in ?3.2.4.2 IDACS Network-side Algorithms". Additionally, each SA2 and SSA2 in a given iteration of Algorithm 4 are the SA1 and SSA1 for the next iteration, and will also be performing OTP? and ???????? checks in the next iteration. If an SA or SSA finds that the ???????? fails the check, but a previous SA or SSA indicated that the ???????? had passed the check (by passing Ticket? along the authentication chain), then this is highly indicative that the previous SA or SSA is a traitor (having passed verifiably bad ????????). In the same way, if an SA? finds that OTP? fails the check, but a previous SA passed Ticket? along the authentication chain, this may be indicative that the previous SA is a traitor. This cannot be directly verified, since only the previous SA possesses the seeds to verify his OTP; however, the forensics engine can be configured on the assumption that it is statistically unlikely that the seeds to correctly calculate a given OTP could be obtained without obtaining the seeds for all OTPs. 42 Fig. 25. Mutual Authentication in authentication chain Fig. 26. Cascading the run_auth_chain() algorithm Due to the design of ????????, ????????, and the XV relationships, traitor or spoofed SAs or SSAs can be detected and isolated very quickly. Algorithm 10 illustrates how the digital forensics suite performs this detection. Property 10 : The relationships between the XV values (as discussed generically in Property 7) and also ???????? and ???????? are carefully designed to allow traitor or cloned SAs/SSAs to be detected quickly in real time with very little overhead. Algorithm 10. identify_bad_SA_SSA() inputs: ???????????, TK_A, TK_B o tputs: compromised_machines 1 traitor_spoofed_machines = null 2 sourceA = origin of TK_A 3 sourceB = origin of TK_B 4 if (TK_A_XV_fail ? ???????????) OR (TK_A_PID_fail ? ???????????) OR (TK_A_OTP_fail ? ???????????) 5 traitor_spoofed_machines = F-box(Concat, traitor_spoofed_machines, sourceA) end 6 if (TK_B_XV_fail ? ???????????) OR (TK_B_PID_fail ? ???????????) OR (TK_B_OTP_fail ? ???????????) 7 traitor_spoofed_machines = F-box(Concat, traitor_spoofed_machines, sou rceB) end 8 return traitor_spoofed_machines 4.5 Simulations 4.5.1 Attack Traceback Time Simulations The simulations discussed in "3.4 Simulations" were also used to simulate the attack traceback time. When an attack was detected (i.e. (4), (11), (17), or (23) in Algorithm 4), this point in time was recorded as T1. After the attack had been reported to an SSA and the traceback to identify the root attacker was completed (i.e the "trace_attack()" algorithm called at (7) 43 of Algorithm 6 completes), this point in time was recorded as T2. After the attacker's botnet was identified (i.e. the "identify_bots()" function called at (8) of Algorithm 6 completes), this point in time was recorded as T3. The statistics of interest in this situation were the (T2 - T1) time and the (T3 - T1) time. The (T2 - T1) time is termed the "Attack Traceback Time", since it represents the time it takes for the root attacker to be identified after the attack is detected. The (T3 - T1) time is termed the "Botnet Detection Time", since it represents the time it takes for the root attacker's botnet to be detected after the attack is detected. It should be noted here that the "Botnet Detection Time" will always be greater than the "Attack Traceback Time", since (T3 - T1) = (T2 - T1) + (T3 - T2). The IDACS Network measured in Fig. 27 through Fig. 29 consisted of 4 "partially compromised" SAs and either 1 or 2 "partially compromised" SSAs. Fig. 27 shows the average attack traceback time for an IDACS network with 1 SSA. Fig. 27 shows that the attack traceback for this simulated IDACS network is extremely fast, with both the root attacker and its botnet identified in less than 3.5 milliseconds even for a network of 100,000 Client Devices. Additionally, the attack traceback time grows logarithmically with the network size; this is because the simulation uses log2(x) to calculate security log search times, since there currently exist search algorithms that are better than log2(x). Because the attack traceback time is so short, an IDACS Netowork can alert a system administrator and begin network healing procedures before the attacker even realizes that the attack has been detected. Fig. 27. Attack Traceback Time An additional benefit of the IDACS system is that attack traceback time can be improved by scaling the network. Fig. 28 and Fig. 29 show the Attack Traceback Time and Botnet Detection Time for an IDACS network, one with 1 SSA, and one with 2 SSAs. These figures show that the traceback times can be dramatically improved by expanding the network side of the IDACS system; this is because the traceback duties are spread across multiple SSAs, resulting in a lighter workload for each machine. These findings provide significant incentive to expand fielded IDACS systems. 44 Fig. 28. Attack Traceback Time, 1 SSA vs. 2 SSAs Fig. 29. Botnet Detection Time, 1 SSA vs. 2 SSAs 4.5.2 Attack Detection, Traceback, and Remediation Simulations In addition to the simulations described in "3.4 Simulations", a second simulation suite performed for this research addresses the effects of the attack traceback combined with quarantine and healing for Byzantine traitor agents. Given an IDACS network under attack by parties that are able to steal Client authentication items and turn SAs and SSAs into traitors, how well will the attack traceback protect the IDACS datacenter from illegal (no permission) access? The simulation results presented here attempt to answer that question. 4.5.2.1 Assumptions for Simulation In order to fully test the capabilities of the IDACS network against real-world threats and attacks, it was necessary to construct attack scenarios based on the latest and most lethal real-world attack vectors. Therefore, these simulations were carried out under the assumption that all attempts to hack SAs and SSAs and turn them into traitors would be accomplished using zero-day attacks. Since zero-day attacks have not been previously observed, it is virtually impossible to defend against them, and they will almost always be successful. Additionally, through the use of metamorphic evolution techniques, it is possible to generate endless variants of these zero-day turn-traitor-attacks, each of which has a unique signature. This method can be used to defeat security systems that use signature-based scans to detect known turn-traitor-attacks. Since both of these attack methods are widely in use today, they will both be considered in this simulation. This simulation is based on a number of assumptions, each of which is justifiable according to real-world conditions. The first assumptions on which this simulation is based are as follows: Assumption 10: Previously unobserved zero-day turn-traitor-attacks used to gain control over network machines will require a relatively long time (weeks) for a patch that successfully secures that attack?s entry point to be issued. Assumption 11: A zero-day attack used to gain control over network machines, once detected and analyzed, can be blacklisted with a signature-scanning security system very quickly. A zero-day turn-traitor-attack or a metamorphic variation thereof, once detected and blacklisted, will be detected and blocked 100% of the time thereafter. 45 Assumption 10 is reasonable; it is seen to be true across almost all computer security vulnerabilities that are being discovered today. Assumption 11 reflects the strengths of signature-based scanners, although their strength may be slightly overstated in order to simplify this simulation. Based on Assumption 11, it follows that attacker behavior will reflect this reality. Assumption 12: A zero-day turn-traitor-attack or a metamorphic variation thereof, once detected and blocked, will not be reused by the attacker. Based on the relative importance of Cust?, ????, and ?????? in IDACS, they are accorded different levels of protection against theft (Cust?) or outside zero-day turn-traitor-attacks ( ???? and ??????). Cust? are used by human users in the field, so the elements of Cust? (Client?, Badge?, Pwd?, and PIN?) are (relatively) easy to steal, although it may be difficult to steal a complete set. On the other hand, ???? and ?????? reside inside protected network datacenters, so they are relatively more difficult to gain control over. Thus, the assumptions: Assumption 13: Completely turning a Client into a traitor (with access to Client?, Badge?, Pwd?, and PIN?) through theft or coercion is difficult. An exception would be in an active battlefield scenario, where a number of human users (soldiers) could be captured and coerced into turning over all of the elements of Cust?. Assumption 14: Cust? are easier to turn into traitor bots than SAs, and SAs are easier to turn into traitor bots than SSAs. Finally, any attacker, being intelligent and wishing to maximize his chances of success, will not launch attempts to access the IDACS datacenter until he has a certain chance of success. Thus, Assumption 15: An attacker will not launch access-DB-attacks against the IDACS datacenter until he controls a certain number of traitor Cust?, SA? and SSA?. 4.5.2.2 Simulation Parameters This simulation was implemented in MATLAB, and examined an IDACS network consisting of 500 Cust?, 40 SAs, 20 SSAs, and 10 DB (reference Fig. 3). Unlike the first simulation, this simulation did not consider the details of network transmission speeds or packet processing times. All packets are considered to be transmitted from one machine to another in one clock cycle, and all packets are processed in one clock cycle, with one clock cycle per queued packet. The simulation consists of two phases. In Phase 1, the attacker uses turn-traitor-attacks to build a botnet of traitor Cust?, SAs, and SSAs for use in IDACS access-DB-attacks. According to Assumption 15, the attacker builds a botnet consisting of traitor SAs equaling 60% of all SAs in IDACS, and traitor SSAs equaling 60% of all SSAs in IDACS (15% of the traitor SAs and SSAs were spoofed machines). Additionally, the attacker builds a botnet consisting of four traitor Cust? for each traitor SA and SSA (this number was experimentally determined to provide a sufficient number of traitor Cust? to launch a sufficient number of access-DB-attacks for the duration of the simulation). In accordance with Assumption 13, this simulation is assumed to 46 represent an active battlefield situation, so 15% of the traitor Cust? have full access to their authentication credentials (Client?, Badge?, Pwd?, and PIN?). Once a sufficient number of bots have been obtained, the attacker launches Phase 2. In Phase 2, the attacker sends a burst of a high number of access-DB-attacks. The logic behind the burst is that an attacker maximizes his chance of successful datacenter accesses if he sends them quickly; detected and prevented access-DB- attacks will result in the detection and quarantine of traitor Cust?, SAs, and SSAs. By sending a burst of access-DB-attacks, the attacker makes full use of these bots before they are detected and quarantined, and the advantage gained from Assumption 15 begins to slip away. During Phase 2, access-DB-attacks are launched at an average rate of one attempt per 15 clock cycles (the actual start times of the access-DB-attacks are randomized using a normal distribution over the complete period of Phase 2). If an illegal data center access is detected and traced to one or more traitor machines based on the methods discussed in ?4.4 Attack Traceback?, that machine is quarantined (removed from the IDACS network) and healed over a period of 100 clock cycles, and then returned to IDACS as a loyal Cust?, SA? or SSA?. During Phase 2, the attacker continues to attack Cust?, SA? and SSA? and turn them into traitors, thus replenishing the botnet even as bots are detected and quarantined. In accordance with Assumption 14, one new Cust? is turned traitor every 150 clock cycles, one new SA? is turned traitor every 300 clock cycles, and one new SSA is turned traitor every 600 clock cycles (this is on average; the actual times the machines turn traitor are randomized over the period of Phase 2 using a normal distribution). In accordance with Assumption 15, if the percentage of traitor SAs or SSAs in IDACS fell below 10%, the attacker stopped launching access-DB-attacks until both of those numbers rose above 10%. As long as there was any traitor Cust? available, access-DB-attacks would continue. All traitor Cust? without access to complete authentication credentials (Client?, Badge?, Pwd?, and PIN?) launched access-DB-attacks against data/services that particular Cust? had permissions to access (the access-DB-attack was illegal due to incorrect ??????? or ???????), but all traitor Cust? with access to complete authentication credentials launched access-DB-attacks against data/services that particular Cust? did not have permissions to access (since a traitor Cust? with access to complete authentication credentials is indistinguishable from a loyal Cust?, access of data/services for which that Cust? has correct permissions cannot be detected; therefore, this situation was not addressed in this simulation). In this simulation, the botnet-building activity of Phase 1 was compressed into a period of 100 clock cycles (in reality, this botnet building could occur in a ?low-and-slow? turn-traitor-attack strategy over the course of weeks or months). Phase 2 activity was simulated over a period of 4000 clock cycles. The length of both the approach and return authentication chains was 4 (N=4), as would be expected in a fielded IDACS implementation. 47 In order to address the question of zero-day turn-traitor-attacks with metamorphic variants, the simulation was divided into three scenarios. In Scenario 1, whenever an access-DB-attack is detected and prevented, one or more traitor Cust?, SA? or SSA? is identified. This traitor is quarantined and healed, but no attempt is made to analyze the zero-day attack used to turn that machine into a traitor. Therefore, the same zero-day turn-traitor-attack can be used again to turn other machines into traitors during Phase 2. In Scenario 2, there are 20 different zero-day turn-traitor-attacks used to turn machines into traitors. When a traitor machine is identified, the IDACS forensics suite analyzes the zero-day turn-traitor-attack that was used to turn this machine into a traitor. A signature for the zero-day turn-traitor-attack is identified and added to each Cust?, SA? and SSA??s blacklist in accordance with Assumption 11, and will not be successful in turning any more machines into traitors during Phase 2. Therefore, the attacker will stop using that zero-day turn-traitor-attack according to Assumption 12. In Scenario 3, the attacker begins with 20 different zero-day turn-traitor-attacks and 20 metamorphic variants of each zero-day attack. Each analyzed and blacklisted metamorphic turn-traitor-attack variant will no longer be used, but many other metamorphic variants will be available. In short, Scenario 1 represents the "simplified case" situation, Scenario 2 represents the "best case" situation, and Scenario 3 represents the "realistic case" situation. Results from these three Scenarios are presented in the following section. 4.5.2.3 Simulation Results Each of these three scenarios was simulated 10 times, and the results averaged together. This was done to gain a better view of broad trends and mask random variations in different simulation runs. The purpose of this simulation was to demonstrate how well IDACS could protect the datacenter from illegal access. This can be measured in two ways. First, the number of successful illegal accesses over the period of the simulation indicates the success of the IDACS defense. Second, the number of undetected traitor Cust?, SAs, and SSAs remaining in IDACS over the course of the simulation demonstrate how effectively IDACS is detecting, quarantining, and healing traitor machines. Fig. 30 shows the percentage of the active SAs and SSAs in IDACS that are traitors for Scenario 1, or the "simplified case" situation (because there is no attack analysis or blocking in this scenario). Note that this graph counts only SAs and SSAs that are "active", i.e. not under quarantine; this measurement was chosen to reflect the network situation faced by an attacker trying to pass an illegal datacenter access through IDACS. After the botnet-building period represented by the first 100 clock cycles of the simulation, access-DB-attacks commence, leading to the discovery and quarantine of traitor SAs and SSAs. Initially both traitor SAs and SSAs are detected at a high rate, leading to the increased "sanitation" of the IDACS Network. When the percentage of traitor SAs in the system drops below the 10% mark, access-DB-attacks are no longer launched until additional SAs are turned into traitors; since fewer access-DB-attacks are being launched, it leads to traitor SSAs being identified 48 at a lower rate (since the "first line" of SAs will trigger illegal access detection, leading to fewer SSAs having the opportunity to be detected), and with the slow addition of new traitor SSAs, the percentage of traitor SSAs rises to a slightly higher level. The percentage of both traitor SAs and SSAs stabilizes to a relatively low equilibrium point. Fig. 30. Percentage of IDACS Network Machines controlled by Attacker over the Scenario 1 simulation Fig. 31 shows the percentage of active SAs and SSAs that are traitors for Scenario 2, or the "best case" situation (because there is attack analysis and blocking with a limited number of unique zero-day turn-traitor-attacks, so eventually no more new SAs or SSAs can be turned into traitors). As in Scenario 1, the percentage of traitor SAs and SSAs both drop as traitors are identified and quarantined. Once all 20 unique zero-day turn-traitor-attacks have been analyzed and blocked, no new SAs or SSAs can be turned into traitors; therefore the percentage of traitor SAs and SSAs both drop to levels below the 10% cutoff, and remain level as no new traitors are added and no more traitors are identified (since access-DB-attacks are not launched below the 10% cutoff). Scenario 2 demonstrates better performance than Scenario 1; however, the scenario most reflective of a real-world situation is Scenario 3. Fig. 31. Percentage of IDACS Network Machines controlled by Attacker over the Scenario 2 simulation 49 Fig. 32 shows the percentage of active SAs and SSAs that are traitors for Scenario 3, or the "realistic case" situation (because there is attack analysis and blocking, but with metamorphic turn-traitor-attack variants, there is a large supply of new, unidentified turn-traitor-attack variants). Since this scenario gives the attacker a large number of attack variants, the performance of Scenario 3 in Fig. 32 is similar to the performance of Scenario 1 in Fig. 30. However, some detected zero-day turn-traitor-attack variants are re-used before the attacker realizes they have been detected and blocked, and are subsequently blocked by IDACS; therefore, the performance in Scenario 3 is somewhat better than Scenario 1. This can be observed in the lower equilibrium point of percentage of traitor SSAs towards the end of the simulation period. Since Scenario 3 most closely reflects the "real-world" situation faced by a fielded IDACS system, Fig. 32 demonstrates that IDACS will provide an exceptional defense against an attacker with a botnet of Byzantine SAs and SSAs at his disposal. Fig. 32. Percentage of IDACS Network Machines controlled by Attacker over the Scenario 3 simulation Fig. 33 shows the percentage of active Cust? that are controlled by the attacker across the simulation for the different Scenarios. This graph shows that Scenario 3 performs better than Scenario 1 for "sanitizing" the set of Cust?; this is because as zero-day turn-traitor-attack variants are detected and blocked, but still re-used before the attacker is aware they have been blocked, fewer new Cust? are being tuned into traitors. Additionally, since more traitor SAs and SSAs are added in Scenario 3 than in Scenario 2, more access-DB-attacks are launched, allowing more traitor Cust? to be detected in the end. This graph also demonstrates that a real-world IDACS system will have excellent performance in identifying and sanitizing traitor Cust?. 50 Fig. 33. Percentage of IDACS Customers controlled by Attacker over the simulation Fig. 34 shows the average number of access-DB-attacks that were successfully passed through IDACS with the help of traitor SAs and SSAs. This chart shows that the most successful time period for the attacker is when he has the maximum number of traitor SAs and SSAs at his disposal (as expected), but that the success rate drops to almost zero as more traitor SAs and SSAs are identified and quarantined. Since Scenario 3 is most representative of a real-world fielded IDACS system, the performance results shown in this graph are encouraging. "5.3 Simulations" will present additional simulations demonstrating how confidential data stored in IDACS can be protected even in the presence of a low number of successful illegal datacenter accesses. Fig. 34. Average Number of Successful Illegal Datacenter Access Attempts over the Simulation Time Period 51 5 IDACS Protection of Encrypted Data 5.1 Introduction In the age of WikiLeaks and the rise of insiders leaking confidential information, it is critical to provide data networks with defenses against this sort of malicious insider behavior. Additionally, with employees losing laptops and mobile devices containing proprietary information with alarming frequency, it is important to minimize the effects of memory-scraping and unauthorized information access. IDACS leverages the space-time separated and jointly-evolving relationship to defend against these types of leaks of at-rest data. It also provides detection and accountability for the sources of data leaks. By separating encrypted data into pieces that are useless by themselves and storing them in separate and time-changing locations, IDACS can greatly increase the security of stored data. This section introduces the principles and methods by which IDACS provides this data security, and it will provide proofs for the mathematical strength of these methods. Additionally, simulations will demonstrate the real-world effectiveness of such a system, even in the presence of a high number of insider traitors. 5.2 Space-Time Protection of Encrypted Data 5.2.1 Separation of Encryption Keys Definition 23 discusses the concept of cryptographic seeds Seed? and explains that Seed? are spread across different locations (i.e. Seed??Client?, ???????Client?, etc.). Algorithm 2 and Algorithm 3 demonstrate how these Seed? are collected from across these locations and combined in order to calculate ???????? and ????????. The cryptographic keys used to encrypt data residing on Client?, similarly to these Seed?, are split into pieces and spread across different locations. However, these cryptographic keys have an additional space-separation protection; certain bits are removed from these cryptographic keys and stored on the SAs in the IDACS Network. In order to decrypt data residing on Client?, these bits must be retrieved and reassembled with the cryptographic keys. These bits may be formally defined: Definition 35: The cryptographic keys used to encrypt data residing on Client? have a certain number of bits removed and stored in a different location. These bits are termed Xbits, corresponding to the relevant Cust?. When the Xbits are removed from the cryptographic keys, one bit is removed from each byte of the cryptographic key (Fig. 35). These bits are removed from pseudo-random locations in each byte; the locations from which bits are removed (and conversely, the locations where the Xbits should be reinserted when the cryptographic key is being reassembled) are calculated based on the value of SCust?; the locations of the removed Xbits will be different for each cryptographic key. As a consequence of this 52 arrangement, an attacker who manages to steal the contents of Cust? will still be unable to decrypt data residing on Client? without retrieving the corresponding Xbits from the IDACS Network side. Fig. 35. Xbits removed from the cryptographic keys Each time the cryptographic key is used for encryption or decryption, Client? reforms the cryptographic keys and calculates new Xbit locations base on the updated version of SCust?. As such, an attacker who manages to derive the Xbit insertion locations for a given cryptographic key at time t will not possess the correct Xbit insertion points at a later time t? after SCust? has been adjusted. Thus, the space-separated time-evolving relationship is used to protect the integrity of cryptographic keys. This difficulty faced by the attacker is summarized in the following Theorem, which is proved in ?5.2.4 Mathematical Proofs?. Theorem 3 : An attacker who possesses a cryptographic key and the corresponding Xbits, but does not possess the SCust? necessary to determine the Xbit insertion points, faces an NP-complete problem to determine the Xbit insertion points. 5.2.2 Separation of Encrypted Data 5.2.2.1 Concept The IDACS system is designed to store and protect data or services in the IDACS datacenter. However, this design can also be used to help protect data stored on Client devices. Data stored on Client? is encrypted using encryption keys stored across multiple locations (Client?, Badge?, Pwd?, and PIN?); this guarantees that an attacker must have access to all of these items in order to decrypt the data. Additionally, pieces of the ciphertext are removed and stored in the IDACS datacenter (Fig. 36). These pieces must be retrieved from the datacenter in order to correctly decrypt the data stored on Client?. Definition 36: When encrypted data is stored on Client?, segments of data are removed from the ciphertext and stored in a physically different location. These removed segments are called Xslices. All of the Xslices that are removed from a Client-side ciphertext are stored in the IDACS datacenter (Fig. 36). The Xslices may be stored in a single contiguous block (Storage Method 1), or they may be split and stored across multiple locations (Storage Method 2). 53 Fig. 36. Xslices removed from Client-side ciphertext and stored in IDACS datacenter Fig. 37. Multiple layers of encryption In order to decrypt data stored on Client?, one must have access to the ciphertext stored on Client?; all of the locations storing pieces of the encryption keys (Client?, Badge?, Pwd?, and PIN?); the Xbits for the encryption keys, which are stored across multiple SAs in IDACS; and the Xslices that are stored across multiple DB in the IDACS datacenter. Additionally, each time the data stored on Client? is decrypted to be viewed, the value of SCust? is updated, and the data is re-encrypted with new cryptographic keys that have new Xbits, and new Xslices are removed from the ciphertext and stored at new locations in the datacenter. By combining space-separation and time-evolving charactersitics, this IDACS encryption scheme can achieve a much higher level of security than simple encryption. As a final security measure, the encryption/Xbits/Xslices can be applied in multiple layers to protect high-sensitivity data (Fig. 37). In addition to providing more complex security, this provides additional options for space-time evolving protections. At certain time intervals, the top level of encryption can be re-processed with new encryption keys/Xbits/Xslices, while the lower levels are left untouched; in this way, the security is time-evolving with a minimum of effort. 5.2.2.2 Implementation The location and length of the Xslices in the ciphertext are pseudorandom; they are calculated based on SCust?, according to Definition 37 and Definition 38. This pseudorandomness contributes to the strength of the IDACS encryption, as addressed in Theorem 4 through Theorem 6. Definition 37 : Given SCust?, a block of ciphertext to have Xslices removed, and the PID of that data block, the F-box(data- block-offset) transform returns the length between the beginning of the block of data or the end of the previous xslice, and the beginning of the next xslice. This transform also updates SCust? so that the next call to the transform will return the length of the next sub-block. This transform must produce the same sequence of lengths for consecutive 54 transform calls for a given data block PID after SCust? has been reinitialized, so that data blocks may be disassembled and reassembled. The sub-block lengths are determined based on the cryptographic hash of secret seeds stored in SCust?. This transform is represented by local_data_block_length = F-box(Offset, SCust?) Definition 38 : Given SCust?, a block of data to have xslices removed, and the PID of that data block, the F-box(xslice-length) transform returns the length of the next xslice to be removed from the data block. This transform also updates SCust? so that the next call to the transform will return the length of the next xslice. This transform must produce the same sequence of lengths for consecutive transform calls for a given data block PID after SCust? has been reinitialized, so that data blocks may be disassembled and reassembled. The xslice lengths are determined based on the cryptographic hash of secret seeds stored in SCust?. This transform is represented by local_xslice_length = F-box(XLth, SCust?) Fig. 38. Using F-box(Offset) and F-box(XLth) transforms to remove Xslices Fig. 38 demonstrates how these transforms are used to divide the ciphertext into a block of Xslices and a block of ciphertext. This process is formalized in Algorithm 11, with the addition of another F-box transform defined in Definition 39. Finally, the entire multiple-layer encryption process illustrated in Fig. 37 is formalized in Algorithm 12, which also references Definition 40. Definition 39 : Given an input string or byte array, the F-box(substring) transform returns a substring or sub-array based on specified indices. The transform is represented by local_xslice = F-box(SString, data_block, offset, length) 55 The "data_block" parameter is the input string or byte array, the "offset" parameter is the index indicating where in "data_block" the desired substring begins, and the "length" parameter indicates the length of the desired substring. Definition 40 : Given a block of data and SCust?, the F-box(encrypt) transform encrypts the block of data using encryption keys provided by SCust? and returns the ciphertext along with the updated version of SCust?. This transform is represented by { SCust?, ciphertext } = F-box(Encrypt, SCust?, data_block) Algorithm 11. remove_xslices() inputs : SCust?, data_block outputs : SCust?, data_block?, xslices 1 pointer = index of 1st byte of data_block 2 xslices = empty string 3 data_block? = empty string 4 while (pointer < data_block.length) do 5 local_data_block_length = F-box(Offset, SCust?) 6 local_xslice_length = F-box(XLth, SCust?) 7 local_data_block = F-box(SString, data_block, pointer, local_data_block_length) 8 local_xslice = F-box(SString, data_block, [pointer + local_data_block_length], local_xslice_length) 9 xslices = F-box(Concat, xslices, local_xslice) 10 data_block? = F-box(Concat, data_block?, local_data_block) 11 pointer = pointer + local_data_block_length + local_xslice_length end Algorithm 12. encrypt_data() inputs : SCust?, data_block, num_layers outputs : SCust?, ciphertext, xslices 1 ciphertext = data_block 2 xslices = empty string 3 for index = 1 to num_layers 4 { SCust?, ciphertext } = F-box(Encrypt, SCust?, ciphertext) 5 { SCust?, ciphertext, temp_xslices } = remove_xslices(SCust?, ciphertext) 6 xslices = F-box(Concat, xslices, temp_xslices) end 56 5.2.2.3 Theoretical Implications The use of Xslices in the IDACS Client-side data encryption scheme leads to several theoretical implications which demonstrate the security of this encryption scheme. First, consider the situation where the Xslices extracted from a given piece of data?s ciphertext are stored in a contiguous block in a single location (Storage Method 1 in Fig. 36). Alternatively, when Cust? requests these Xslices from the IDACS datacenter, they are returned to Cust? in a single block of data. An attacker who manages to retrieve the ciphertext and the block of corresponding Xslices, but does not possess SCust? and thus cannot perform the F-box(Offset) or F-box(XLth) transforms in Algorithm 11, is faced with the problem of determining 1) where the ciphertext splits for the insertion of Xslices, and 2) where the contiguous block of Xslices splits into individual Xslices. The difficulty of 1) is addressed in Theorem 4, and the difficulty of 2) is addressed in Theorem 5. Theorem 4 : An attacker who possesses a ciphertext block requiring Xslice insertion, but does not possess the SCust? necessary to determine the Xslice insertion points, faces an NP-complete problem to determine the Xslice insertion points. Theorem 5 : An attacker who possesses a block of concatenated Xslices extracted from a ciphertext, but does not possess the SCust? necessary to determine the lengths of and separated individual Xslices, faces an NP-complete problem to separate the individual Xslices. A second situation, where individual Xslices are stored across multiple DB in the IDACS datacenter (Storage Method 2 in Fig. 36), presents a similar problem. Consider an attacker who is able to retrieve all the individual Xslices associated with a certain piece of Client-side data?s ciphertext. However, without access to SCust?, the attacker is unable to determine the correct order in which these Xslices should be arranged for reinsertion into the ciphertext. This problem is addressed in the following Theorem. Theorem 6 : An attacker who possesses all Xslices extracted from a ciphertext, but does not possess the SCust? necessary to determine the order in which these Xslices should be re-inserted into the ciphertext, faces an NP-complete problem to correctly order the individual Xslices. The preceding Theorems are proved in ?5.2.4 Mathematical Proofs?. 5.2.3 Data Segmentation The previous section described how Xslices are used to protect the confidentiality of encrypted Client-side data. This section will examine how the used of distributed Xslices can be combined with data segmentation gain additional security by 57 minimizing the level of decrypted data exposure and minimize the damage caused by an attacker who is able to successfully pass several illegal IDACS datacenter access requests. 5.2.3.1 Single File The standard approach to file encryption and decryption is to decrypt an entire protected file at the time of access. Unfortunately, this exposes the entire contents of the protected file to an attacker who can steal a Client? on which a currently decrypted file is being viewed. The concept of the space-time evolving relationship can be used to minimize this risk. Rather than encrypting a file in a single ?block?, the file can be divided into multiple ?segments? (e.g. one page of the file equates to one segment). A ?navigation? file associated with the encrypted data file is formed; this navigation file contains metadata regarding each segment in the data file, such as where each segment begins and ends in the ciphertext, which Xslices are used in that segment, and where those Xslices are inserted into the ciphertext (Fig. 39). When a user wants to view part of the encrypted file, the contents of the navigation file are presented as a ?Table of Contents?. The user selects the segment he wishes to view, and Cust? requests the Xslices for the specified segment, inserts them into the ciphertext, and decrypts that particular segment. The remaining segments in the file are not decrypted unless they are specifically accessed later. Fig. 39. Data segmentation for a single file Segmenting an encrypted data file in this manner enhances data security in several ways. First, in the event that a Client? being used to decrypt and view data is stolen, the amount of decrypted data residing on Client? is limited. Additionally, if an attacker is able to force a few illegal IDACS datacenter access requests through IDACS, the encrypted data that attacker can recover is limited to a few file segments rather than the same number of files. 58 Segmenting data on the single file-level provides benefits in terms of both security and performance, which are summarized in Table 6. If a single segment of a file is decrypted, the number of Xslices retrieved from the datacenter as well as the amount of decrypted plaintext exposed is less than if the encrypted file were non-segmented. Additionally, the time required to complete this operation is constant (O(1)) rather than linear (O(x)). If all segments of the file are decrypted, then there is no relative advantage over a non-segmented approach. Table 6. Comparison of segmented vs. non-segmented data encryption for file of length x. Non-segmented file Segmented file (decrypting one segment) Segmented file (decrypting entire file) Performance Time to retrieve Xslices Request Xslices: O(1) Send XSlices: O(x) Insert Xslices: O(x) Total: O(x) Request Xslices: O(1) Send XSlices: O(1) Insert Xslices: O(1) Total: O(1) Request Xslices: O(x) Send XSlices: O(x) Insert Xslices: O(x) Total: O(x) Time to decrypt O(x) O(1) O(x) Total time O(x) O(1) O(x) Security Xslices exposed O(x) O(1) O(x) Decrypted plaintext exposed O(x) O(1) O(x) The results shown in Table 6 may be used as justification for the following: Claim 6: Segmenting encrypted files and decrypting and issuing Xslices one segment at a time increases security and performance if one or a few pages are decrypted, but has no effect on security or performance if an entire file is decrypted. 5.2.3.2 File Directory Tree In the previous section, it was shown how data segmentation could be used to protect a single encrypted data file; the same concept can also be used to protect and encrypt a file directory tree. Consider the file directory tree shown in Fig. 40. The different levels of folders in Fig. 40 correspond to the ?navigation? file in Fig. 39; they are files that do not contain actual data, but only pointers to the actual data files. The actual data files that are the leaf nodes of this tree correspond to the ciphertext segments in Fig. 39. In Fig. 40, each Zone is a separate file with its own Xbits and Xslices residing in the IDACS datacenter, and each leaf node data file also contains its own Xbits and Xslices. Using this tiered encrypted File Directory Tree is ideal for a situation where encrypted data is maintained on the Client-side with Xbits and Xslices stored on IDACS Databases. Decrypting the File Directory Tree requires that Xbits and Xslices be retrieved to decrypt each successive Zone. If an attacker is able to pass a few access-DB-attacks, he may be able to gather some information on the structure of the File Directory Tree, but it may not be enough to recover any of the actual data files. Additionally, this structure obfuscates information regarding the size, 59 quantity, and organization of the data files in the File Directory Tree by minimizing the amount of file pointers and file data that are exposed during a single file access, as will be seen in the following example. Obfuscating this information also minimizes the number of targets an attacker can address. Fig. 40. Encrypted File Directory Tree segmented into zones Navigating through the encrypted File Directory Tree is similar to navigating through any file explorer program on a PC. Consider Fig. 41; a user initially possesses encrypted pointer information regarding the ?root? of the File Directory Tree. The user requests the Xbits and Xslices to decrypt the ?root? file residing on Cust?, revealing the contents of Zone 1 (Folder 1). The user then requests the Xbits and Xslices to decrypt the Folder 1 pointer file (Zone 2), revealing the children folders of Folder 1 (Folders 2, 3, and 4). The user proceeds through Zone 5 and Zone 14 to reach the target File 25. Through this process, only information regarding folders and their children along the direct path to the target (File 25) are revealed; information regarding unexplored folders is not revealed to the user. Fig. 41. Retrieving a single data file from the File Directory Tree 60 Table 7 illustrates the performance of a non-segmented File Directory Tree compared to the segmented version. The performance of the segmented version exceeds that of the non-segmented version if a single file is retrieved; however, the performance of the segmented version drops if all of the files in the Directory Tree are retrieved. For its application in IDACS, this tradeoff in performance is considered acceptable in return for the corresponding increase in security, which is demonstrated in Table 8. Table 7. Comparison of performance of segmented vs. non-segmented File Directory Trees containing x data files Non-segmented File Directory Tree Segmented File Directory Tree Time to locate a single file Request File Directory Tree: O(1) Send File Directory Tree: O(x) Decrypt File Directory Tree: O(x) Total: O(x) Request Zone: O(1) Send Zone: O(1) Decrypt Zone: O(1) Repeat for the depth of the File Directory Tree: O(log x) Total: O(log x) Time to decrypt a single file O(1) O(1) Total for a single file O(x) O(log x) Total for all files in File Directory Tree O(x) (because File Directory Tree only needs to be retrieved once) O(x log x) (because potentially entire depth of File Directory Tree must be retrieved for each file) Table 8 compares the security provided (in terms of how much file data and pointers are exposed) for non-segmented and segmented File Directory Trees. If a single file is accessed, the segmented version provides higher security by not exposing the file data and pointers for non-accessed files; of course, this advantage is lost if all of the files in the directory tree are accessed. In either case, the segmented version provides a higher level of security by forcing more authentication and permissions checks by a factor of log x. Since the user must potentially navigate the depth of the File Directory Tree for each file accessed, retrieving Xbits and Xslices from the IDACS datacenter for each zone accessed, the segmented version forces more ???????? and ???????? checks, making a traitor Cust? controlled by an attacker more likely to be detected. Table 8. Comparison of security of segmented vs. non-segmented File Directory Trees containing x data files Non-segmented File Directory Tree Segmented File Directory Tree For a single file access How many files exposed O(1) O(1) How many file pointers (leaf nodes) exposed O(x) (all file pointers exposed) O(1) (only files in folder containing target file) How many folder pointers exposed (by decrypting zones) O(x) (all folder pointers exposed) O(log x) How many authentication/ permissions checks O(1) O(log x) For all files in Directory Tree accessed How many files exposed O(x) O(x) How many file pointers (leaf nodes) exposed O(x) (all file pointers exposed) O(x) (all file pointers exposed) How many folder pointers exposed (by decrypting zones) O(x) (all folder pointers exposed) O(x) (all folder pointers exposed) How many authentication/permissions checks O(x) O(x log x) 61 The results displayed in Table 7 and Table 8 may be taken as justification for the following claims. Claim 7: Segmenting the File Directory Tree and allowing a user to decrypt one zone at a time, as compared to a File Directory Tree system that provides the entire directory tree at once, for a single file access in a tree containing x files, a) Improves the efficiency of retriving a single file from O(x) to O(log x) b) Improves security by reducing the number of exposed file pointers from O(x) to O(1) c) Improves security by increasing the number of required authentication/permissions checks from O(1) to O(log x) Claim 8: Segmenting the File Directory Tree and allowing a user to decrypt one zone at a time, as compared to a File Directory Tree system that provides the entire directory tree at once, for accessing every file in a tree containing x files, a) Reduces the efficiency of retrieving all files from O(x) to O(x log x) b) Improves security by increasing the number of required authentication checks from O(x) to O( x log x) 5.2.4 Mathematical Proofs and Analysis This section provides the mathematical proofs for Theorem 3 through Theorem 6. 5.2.4.1 Theorem 3, Theorem 4, and Theorem 5 First, a short review of the Theorems to be proved: Theorem 3 : An attacker who possesses a cryptographic key and the corresponding Xbits, but does not possess the SCust? necessary to determine the Xbit insertion points, faces an NP-complete problem to determine the Xbit insertion points. Theorem 4 : An attacker who possesses a ciphertext block requiring Xslice insertion, but does not possess the SCust? necessary to determine the Xslice insertion points, faces an NP-complete problem to determine the Xslice insertion points. Theorem 5 : An attacker who possesses a block of concatenated Xslices extracted from a ciphertext, but does not possess the SCust? necessary to determine the lengths of and separated individual Xslices, faces an NP-complete problem to separate the individual Xslices. All three cases represent a ?splitting? problem, where a block of data must be split at certain points in order to re-insert extracted information (Theorem 3 and Theorem 4) or to separate the extracted information into pieces for re-insertion (Theorem 5). In essence, the problem requires the attacker to recreate the sequence of outputs from repeated calls to the F-box(XLth) or F- box(Offset) transforms, as demonstrated in Fig. 38. Although the outputs of these transforms are calculated based on the value of SCust?, in a fielded IDACS system, the lengths of individual Xslices or the data blocks in the ciphertext between Xslices are 62 chosen from a finitely long list of known possible lengths. In the case of Xbit insertion, there are a finite number of possible insertion points in each byte for the corresponding Xbit (8 possible positions). Therefore, solving these problems requires recreating a sequence of numbers (lengths) drawn from a known, finite list. Fig. 42. Xslice/Data Block splitting problem Fig. 43. Xbits splitting problem Fig. 42 and Fig. 43 demonstrate these problems graphically. To solve these problems, one member from each column must be selected, with the final sequence of selections representing the actual division of Xslices/Data Blocks or insertion points for Xbits. These problems can be represented in terms of graph theory. Let the options in each column be represented by a set of vertices ?. Consistent with the concept of "graph coloring", each vertex in the same column is assigned the same color, with different colors assigned to each column (represented by shapes in Fig. 44). Each v ? is connected by an directed edge e, e E, to every other v in an adjacent column (in Fig. 44, only a few e are shown for the sake of simplicity). Each e E has an associated edge weight W(e), 0?W(e)?1, where W(e) represents the probability that the {v1, v2} connected by e, with their respective values and colors, are both present in a path which contains one v of each color that represents the correct sequence Xslice lengths etc. (the method for determining these weights will be discussed in ?5.2.4.3 Randomness in Xslices?) Theoretically, the path containing one v of each color that has the highest sum W(e) of all such paths (i.e the Maximum Weight Path) should represent the correct sequence of Xslice lengths etc. (Fig. 45) Fig. 44. Splitting problem represented in terms of graph theory. Fig. 45. Maximum Weight Path. 63 Now, solving the problem posed in Theorem 3, Theorem 4, or Theorem 5 is equivalent to solving this Maximum Weight Path problem. This problem may be formalized by specifying that a path of length Z (where Z is the number of columns) must be found. This is now the Maximum Weight Directed Path of Specified Length (MWDPSL) problem, which is proved NP-Complete in "Appendix A". Thus, the NP- Completeness of Theorem 3, Theorem 4, and Theorem 5 is proved. 5.2.4.2 Theorem 6 Theorem 6 : An attacker who possesses all Xslices extracted from a ciphertext, but does not possess the SCust? necessary to determine the order in which these Xslices should be re-inserted into the ciphertext, faces an NP-complete problem to correctly order the individual Xslices. The proof for Theorem 6 is identical to the proof for Theorem 1, with the Seed? in Theorem 1 replaced by the Xslices in Theorem 6. The Xslice ordering problem in Theorem 6 may also be represented by the Maximum Weight Path of Specified Length (MWPSL) problem, which is proved NP-Complete in "Appendix A". Thus, the NP-Completeness of Theorem 6 is proved. 5.2.4.3 Randomness in Xslices While Theorem 3, Theorem 4, Theorem 5, and Theorem 6 have been proved NP-complete, the value of this proof must be qualified. NP-completeness speaks only to the worst-case complexity of a given decision problem (which is that the complexity grows exponentially with the problem size); there may be other factors that can significantly reduce the complexity of a problem. If the edge weights in the graph are relatively uniform (Fig. 46), then the complexity of finding the Maximum Weight Path is close to the worst-case scenario; however, if the edge weights are not relatively uniform (Fig. 47), then a simple algorithm (or a human analyst) can significantly reduce the complexity of finding the Maximum Weight Path by picking out the high-weight edges that are more likely to be part of the solution. Fig. 46. Graph with uniform edge weight distribution Fig. 47. Graph with non-uniform edge weight distribution Consider the example of reassembling fragmented data as discussed in [31]. This paper discusses a method of reassembling fragmented data by using the Maximum Weight Path problem with the weights of edges between data fragment nodes based on recognized patterns in the data. Therefore, highly-patterned data will result in stronger pattern recognition, which will result in a graph with a few high-weight edges. Therefore, highly-patterned data will result in a Maximum Weight Path 64 reassembly problem that has a complexity significantly less than the worst-case exponential. In [31], the authors discuss this phenomenon, and show through their experiments that highly-patterned data does indeed lead to faster and more accurate file reassembly. So the question is, does the fragmented, distributed ciphertext (Xslices and their associated ciphertext) that is present in IDACS produce a uniform or non-uniform edge weight distribution in the graph? In order to answer this question, it is necessary to analyze the ?randomness? of the type of ciphertext fragments that will be present in IDACS in order to judge what type of edge weight distribution a Maximum Weight Path model applied to these fragments would generate. This analysis was performed using a software package created by the National Institute of Standards and Technology (NIST) [32], which provides a battery of tests that analyzes the outputs of Random Number Generators (RNGs) to measure their ?randomness? by looking for patterns in the outputs [33]. The battery consists of 15 individual tests, each of which measures different aspects of ?randomness? in the data. Each of these tests ask the question: ?If the algorithm that generated this data sample was truly random, what is the probability that this specific data sample could have been generated?? The individual tests respond with a p-score in the probability range [0, 1]. The NIST standard [33] recommends using a passing score, or ?significance level?, of 0.01. Some truly random data samples will fail the tests and generate a ?false positive? for randomness due to weaknesses in the test; therefore, two types of statistics are recommended for analyzing the test p-scores. The first statistic looks at the proportion, or percentage, of tests with passing p-scores According to the parameters in [33], for a set of tests run with 1000 data samples, a truly random RNG will have a minimal proportion of 0.9805068, i.e. a minimum pass rate of 98.05%. The second statistic looks at the distribution of the p-scores. For a set of truly random data samples, it is expected that the p-scores should be evenly distributed. The evenness of the distribution can be measured by calculating the P-valueT for each test based on the chi-square statistic discussed in [33]; if each test has a P-valueT ? 0.0001, then the p-scores are considered to be evenly distributed. To measure the randomness of ciphertext blocks, the NIST battery was applied to two sets of data. The first set of data consisted of samples of normal AES ciphertext blocks (representing a segment of AES ciphertext with the correct Xslice re- inserted), and the second set consisted of samples of two normal AES ciphertext blocks encrypted with two different AES keys back-to-back with each other (representing a segment of AES ciphertext with an incorrect Xslice re-inserted). This test was designed to determine whether there was any discernible difference between the ?pattern? (or randomness) of a correctly- and incorrectly- re-inserted Xslice. Both data sets consisted of 1000 samples, each of which was 106 bits (1.25 * 105 bytes) long. The samples in the first data set consisted of plaintext encrypted using AES in CBC mode, using a unique key for each sample. 65 The samples in the second data set consisted of the same plaintext encrypted using AES in CBC mode, but each sample was split into two halves, each of which was encrypted using a unique key). The NIST battery of tests consists of 15 individual tests. Two of these tests are run twice during the course of the battery; results for both of the tests are reported here. Three of the tests are run a number of times; results for two randomly selected instances of those tests are reported here. All other tests were run once; the results are reported here. In total, 20 separate test results are reported. Fig. 48 shows the proportion of passing NIST tests for the first data set (Fig. 48 (a)), which represents a ?matched? Xslice and ciphertext, and the second data set (Fig. 48 (b)), which represents a ?mismatched? Xslice and ciphertext (or two concatentated Xslices or ciphertext segments that were not adjacent in the original ciphertext). It can be seen that all but one test result pass the minimum proportion requirements; more importantly, both data sets are demonstrated to be ?random?, and there is no distinguishable difference between the proportions for the two data sets. a) b) Fig. 48. Proportion of passing NIST tests for (a) regular ciphertext (representing ?matched? ciphertext fragments) and (b) dual-AES key ciphertext (representing ?mismatched? ciphertext fragments) Table 9 lists the P-valueT for all of the NIST tests for both the ?matched? and ?mismatched" ciphertext fragments. It can be seen that all tests for both data sets pass the minimum score of 0.0001. Thus, the two data sets may be considered equally ?random?. Therefore, it can be concluded that both matching and mismatching ciphertext fragments would generate uniform edge weights in a weighted graph. This indicates that there is no discernible difference (pattern-wise) between adjacent ciphertext blocks (ciphertext joined with associated Xslices) and non-adjacent blocks (concatentated Xslices or ciphertext with Xslices removed), and that the graphs generated to solve Theorem 3 through Theorem 6 would have uniform edge weights, maximizing the effect of the NP-complete property. 66 Table 9. P-valueT for NIST tests for ?matched? ciphertext fragments and ?mismatched? ciphertext fragments Test # Matched Ciphertext Mismatched Ciphertext Test # Matched Ciphertext Mismatched Ciphertext Test # Matched Ciphertext Mismatched Ciphertext 1 0.147815 0.514124 8 0.162606 0.570792 15 0.243466 0.291249 2 0.173770 0.325206 9 0.174728 0.647530 16 0.888892 0.356948 3 0.528111 0.605916 10 0.323668 0.153763 17 0.105377 0.148760 4 0.340858 0.689019 11 0.867692 0.574903 18 0.504219 0.948298 5 0.196920 0.626709 12 0.649612 0.794391 19 0.156373 0.081013 6 0.875539 0.500279 13 0.522100 0.344048 20 0.257004 0.705466 7 0.727851 0.039073 14 0.437274 0.974555 5.3 Simulations 5.3.1 Simulation Parameters The simulations used to examine the effect of Xbits, Xslices, Single File Data Segmentation, and File Directory Tree Segmentation on increasing IDACS security used the same simulation suite discussed in ?4.5.2 Attack Detection, Traceback, and Remediation Simulations?. In order to simulate the use of Xbits, Xslices, and Data Segmentation, the simulation was expanded to make the access-DB-attacks directed towards accessing the contents of a Segmented File Directory Tree and the data files it stores. The Segmented File Directory Tree used in these simulations is illustrated in Fig. 49. Each Traitor Cust? in the simulation had legal access (permissions) to access 3 random data files in the File Directory Tree; however, no Traitor Cust? had access to the full set of authentication tokens (Client?, Badge?, Pwd?, and PIN?). Because of this, all access-DB-attacks were illegal. Each Traitor Cust? began with the contents of the File Directory Tree (minus Xslices) encrypted on his Client?, with the Xbits and Xslices necessary for decryption residing in the IDACS datacenter. Each Traitor Cust? began with a pointer to the ?root? of the File Directory Tree; he would need to retrieve the Xbits and Xslices necessary to decrypt the ?root? from the IDACS datacenter in order to access the ?root?s? contents (the pointers for the Level 1 Folders). He would then need to retrieve the Xbits and Xslices to decrypt a given Level 1 Folder from the IDACS datacenter in order to access that Level 1 Folder?s contents (the pointers to the children Level 2 Folders). A Traitor Cust? would need to repeat this process until he was able to retrieve all four Segments of a target Data File. 67 Fig. 49. Simulation File Directory Tree setup Each Traitor Cust? in IDACS possessed the same encrypted File Directory Tree; however, each Traitor Cust??s File Directory Tree was encrypted using different encryption keys, different Xslices, and different Content( ) associated with the store Xbits and Xslices. Therefore, Traitor Cust? were not able to collaborate with each other by sharing Content ( ) and thus ?skipping over? the retrieval of a given folder; all Traitor Cust? were forced to completely decrypt their own File Directory Trees. Additionally, each Traitor Cust? contained only the portions of the File Directory Tree that were ?ancestors? of the Data Files that particular Cust? had permissions to access; in this simulation, Traitor Cust? were unable to access files for which they did not have permissions. However, the group of Traitor Cust? was able to collaborate with each other in a limited way; if one Traitor Cust? had retrieved a particular Data File Segment, all other Traitor Cust? would seek to retrieve other Data File Segments, rather than pursue data that had already been successfully recovered. Additionally, the group of Traitor Cust? would give priority for retrieval attempts to the Traitor Cust? with the deepest exploration into the File Directory Tree, thus focusing the resources of Traitor SAs and SSAs towards the Traitor Cust? with the highest likelihood of successfully accessing Data File Segments. During this simulation, it was necessary to complete two separate IDACS datacenter accesses in order to decrypt a single folder/data file pointer/data file segment; one access to retrieve the Xbits, and one access to retrieve the Xslices. The length of the approach and return authentication chains was 2 (N = 2); this parameter was shortened from the previous value of 4 in order to allow more access-DB-attacks to succeed during this simulation. This simulation used the turn-traitor-attack vectors and metamorphic variations defined for Scenario 3 in ?4.5.2.2 Simulation Parameters?, with 40 attack vectors and 100 variations per attack vector (these parameters were increased in order to mask the limiting effects of these parameters from the results of this simulation). Phase 2 of the simulation began after a threshold of 90% of the SAs and SSAs had been turned into traitors, with a single Traitor Cust? for each Traitor SA or SSA. This 90% threshold is much higher than what we would expect to see in a real-world situation; however, it was set at that level for two purposes. First, the 90% threshold represents a catastrophic 68 scenario; if IDACS is capable of defending against this type of scenario, then the real-world performance is expected to be much higher. Second, it was necessary to raise the threshold to 90% in order for an appreciable number of access-DB-attacks to succeed so that the effect on the Segmented File Directory Tree could be observed. Access-DB-attacks would cease if the percentage of active SAs or SSAs that were traitors fell below 15%. Additionally, no SAs or SSAs in this simulation were spoofed; all traitor SAs or SSAs were completely functional traitors. During Phase 2 of this simulation, if a Traitor Cust? was identified, it would be quarantined and the entire File Directory Tree residing on that Cust? would be ?re-encrypted?. Therefore, if that Cust? was later turned into a Traitor, he would have none of the File Directory Tree decrypted, and would have to start again from the ?root?. However, once a Data File segment was retrieved, it was considered to be owned by the attacker regardless of whether that Cust? was detected in the future or not, and that data was added to the pool of data that had been successfully retrieved by the collaborating Traitor Cust?. 5.3.2 Controlled vs. Runaway simulations Because the threshold of 90% of SAs and SSAs turned traitor before Phase 2 of the simulation began, a unique situation presented itself. In most cases, the percentage of Traitor SAs and SSAs in IDACS would drop quickly after the start of Phase 2 of the simulation (Fig. 50), similar to the results in ?4.5.2.3 Simulation Results? with a 60% threshold. However, with a 90% threshold, in about 1 in 10 simulation runs, new Traitor SAs and SSAs would be turned more quickly than they were detected and quarantined at the beginning of Phase 2. If 100% of the SAs and SSAs in IDACS were turned Traitor, there were no loyal SAs or SSAs remaining to detect, identify, and quarantine the Traitors. In that case, IDACS was completely controlled by the attacker, and the IDACS defenses were completely nullified. In this discussion, a simulation that results in 100% Traitor SAs and SSAs will be termed a ?Runaway Botnet?, while a simulation that reduces the percentage of Traitor SAs and SSAs over time will be referred to as a ?Contained Botnet?. Fig. 50 compares the results of IDACS Network loyalty for a given Contained Network simulation run and a given Runaway Network simulation run. Fig. 50. Percentage of Active SAs/SSAs that are Traitors in IDACS, Contained vs. Runaway Botnet 69 It should be noted that Runaway Botnet simulations occurred in only 1 out of 10 simulation runs with an SA/SSA Traitor threshold of 90%; Runaway Botnets were not observed in simulations with a threshold of 80% or less. Therefore, a Runaway Botnet represents a highly unlikely, but very catastrophic situation. For the sake of completeness, the results for Runaway Botnet simulations will be included in the following discussions. 5.3.3 Data Protection Results The results of this simulation were analyzed to discover the effectiveness of the protection provided by the Segmented File Tree Directory against illegal access of the Client-side encrypted data. Fig. 51 shows the percentage of the File Directory Tree stolen by the attacker averaged across 9 Contained Botnet simulation runs, compared to the same data for a single Runaway Botnet simulation. For the Contained Botnet, a small percentage of the File Directory Tree is initially retrieved, riding on the initial high percentage of Traitor SAs and SSAs (Fig. 50). However, as Traitor SAs and SSAs are identified and quarantined, more Traitor Cust? are also identified and quarantined; as their individually encrypted File Directory Trees are re- encrypted, those previously retrieved Folders are lost, and the percentage drops. Eventually, when fewer access-DB-attacks are being made and even fewer are successful, due to a low percentage of Traitor SAs and SSAs in IDACS (compare to Fig. 32 and Fig. 34), the percentage of the File Directory Tree that is retrieved will drop to zero. This simulation demonstrates that even under extremely adverse situations, IDACS will contain and eliminate the results of an initial burst of successful illegal IDACS database accesses. However, Fig. 51 also demonstrates that in the rare, catastrophic situation of a Runaway Botnet, the Traitor Cust? become functionally equal to Loyal Cust?, and the entire File Directory Tree can be retrieved with sufficient time. Fig. 51. Percentage of File Directory Tree stolen over simulation period 70 Fig. 52 and Fig. 53 show the percentage of the total Data File Segments in IDACS that were successfully stolen by Contained Botnets and Runaway Botnets. For a Contained Botnet (which is the more realistic situation), the Traitor Cust? meet with some initial success in retrieving Data File Segments (compare to the percentage of the retrieved Data File Tree in Fig. 51). However, after IDACS becomes more successful at blocking access-DB-attacks, no more Data File Fragments are retrieved. Because a Data File Fragment, once stolen, is permanently stolen (re-encryption of a Data File Tree does not nullify the theft of a given Data File Fragment), quarantine of Traitor Cust? does not reduce the percentage of Data File Segments stolen; however, IDACS is able to limit the damage to an average of about 0.22%, or slightly less than one complete Data File Segment. Once again, however, if an attacker manages to accomplish a Runaway Botnet (the rare, less realistic situation), there is a virtual open door for the Traitor Cust? to retrieve all of the Data File Segments, limited only by time. Fig. 52. Average percentage of Data File Segments stolen for a Contained Botnet Fig. 53. Percentage of Data File Segments stolen for a Single Runaway Botnet 5.3.4 Leakage Detection Results One of the advantages of the IDACS Segmented File Directory Tree approach is that it allows previous illegal IDACS datacenter accesses to be detected. When a Traitor Cust? is detected, the IDACS forensics engine reports to the System Administrator that all Data File Segments previously retrieved by that Cust? have been stolen. It is very useful, in the aftermath of a network breach, to know where data leakages occurred, and what data was leaked. Fig. 54 and Fig. 55 demonstrate when File Data Segments were retrieved during the simulation, and when they were detected as having been stolen. Fig. 54 shows that File Data Segments stolen during the Contained Botnet simulation were detected as stolen soon after the theft. This demonstrates the strength of data leakage capabilities of IDACS. However, when the Runaway Botnet gains control of all of the SAs and SSAs in IDACS (Fig. 55), none of the stolen data is detected as such. 71 Fig. 54. Histogram for when Data File Segments were stolen vs. when they were detected Stolen; Contained Botnet Fig. 55. Histogram for when Data File Segments were stolen vs. when they were detected stolen; Runaway Botnet When a Traitor Cust? is detected to be a Traitor, this allows IDACS to hold that Cust? accountable for stealing files. Fig. 56 demonstrates the success rate for identifying every Cust? that was ever turned Traitor for the Contained and Runaway Botnet simulations. In the common real-world case (Contained Botnet), IDACS is able to identify and quarantine a large percentage of the Traitor Cust?, only failing to identify them once the percentage of Traitor SAs and SSAs drops below the threshold below which access-DB-attacks s are no longer launched (compare to Fig. 50). Thus, IDACS provides strong capabilities for identifying and holding accountable Byzantine agents in the system. For the rare case (Runaway Botnet), several Traitor Cust? are identified, but once 100% of the SAs and SSAs in IDACS are turned Traitor, there are no new detections of Traitor Cust?. Fig. 56. Percentage of all Clients ever turned Traitor that are detected, Contained vs. Runaway Botnet 72 6 Implementation At this time, the IDACS Network has been implemented by the researchers as a proof-of-concept to demonstrate the feasibility of IDACS. The current implementation focuses on the IDACS Network Protocol (which governs network messages as well as the exchange of ????????, ????????, and XV) and distributed storage. Only limited forensics capabilities have been implemented; currently, only Algorithm 9 (identifying stolen/cloned Client-side Client?, Badge?, Pwd?, and PIN?) has been implemented. Algorithm 7 (identify root attacker), Algorithm 8 (identify root attacker's botnet), and Algorithm 10 (identify controlled/spoofed SAs and SSAs) will be implemented at a later time. All IDACS Network elements (SAs, SSAs, and Databases) and the User Badge (Badge?) have been implemented in Java as Command Line Interface (CLI) programs (Fig. 58 (a) through (d)). The Client Device (Client?) has been implemented in two different ways. First, it has been implemented as a CLI program (Fig. 58 (e)). This Client Device implementation performs simple Read and Write operations to store and retrieve blocks of data on the IDACS Database; the purpose of this implementation is to test the IDACS reaction to incorrectly formed PIDs (PID?) and also compromise of the Client Device (Client?), User Badge (Badge?), User Password (Pwd?), or Badge PIN (PIN?). The second Client Device implementation is an app that runs on a BlackBerry 9800 simulator available from RIM (Fig. 58 (f)). This app encrypts a file and stores Xslices and Xbits on the DB in a distributed manner. The current implementation has been tested in the configuration shown in Fig. 57; however, the IDACS network can be scaled to any size desired (1 SSA, 2 SAs, and 1 DB minimum are required). The space separation of the Client Device (Client?) and User Badge (Badge?) is simulated by storing their respective cryptographic seeds on different Java Virtual Machines. Fig. 57. IDACS Implementation Tested Setup 73 In addition to being scalable, this implementation of IDACS uses a network communications protocol that achieves reliable delivery over UDP. Given the particulars of the IDACS algorithm and the per-message overhead ????????, ????????, and XV, the researchers believe that combining IDACS with TCP mightbe inefficient. The researchers believe that it would be more efficient to build a proprietary IDACS protocol on top of UDP. Therefore, the current implementation of IDACS is built on top of UDP; in order to compensate for the reliable delivery problem, IDACS contains built-in reliable delivery capabilities based on the concept of TCP SACKs [34]. Using this method, IDACS is able to reliably transfer large numbers of packets. (a) (b) (c) (d) (e) (f) Fig. 58. Java CLI implementations of (a) SSA, (b) SA, (c) Database, (d) User Badge , and (e) Client Device. BlackBerry app implementation of (f) Client. A demonstration of the real-time digital forensics capabilities of this implementation is shown in Fig. 59. A simulated attacker attempts a Data Write operation, having stolen the User Password and the Client Device, but not the User Badge (the Badge PIN is bundled with the User Badge in this particular implementation) (Fig. 59 (a)). Due to the missing cryptographic seeds residing on the User Badge, several of the PIDs (PID?) cannot be formed correctly; these incorrect PIDs are detected by an SA, and the attack is flagged (Fig. 59 (b)). Based on the cryptographic seed space separation and PID formation in this particular IDACS implementation, the digital forensics suite is able to determine correctly that the User Password and Client device were stolen or cloned (Fig. 59 (c)). 74 (a) (b) (c) Fig. 59. A demonstration of the implementation: (a) Attacker with access to User Password and Client Device; (b) Attacker attemps a Write Transaction and is detected by an SA due to Client PID violations; (c)digital forensics report identifies which User Authentication elements have been compromised. The BlackBerry application implements the concepts of space/time-separation and also Xbits and Xslices to protect encrypted data. When the BlackBerry application is run, it is given a file to encrypt. The application begins with a few randomly- generated cryptographic seeds that are the basis for all following actions. These cryptographic seeds are used to seed a pseudo-random process which divides the file data into pseudo-random-sized blocks and encrypts each block with a unique pseudo-random AES key. Next, the resulting ciphertext is divided into pseudo-random-sized blocks, and a certain percentage of those blocks are removed as Xslices. Xbits are also pseudo-randomly removed from these cryptographic seeds (using the User Password as a seed for the pseudo-randomness). The post-Xslice ciphertext is then divided into 1 KB blocks, which are stored in alternating data files (Fig. 59 (a)) that have random file names and random file extensions (Fig. 59 (b)). A ?pointer? file is formed which contains the names of the data files as well as the cryptographic seeds (minus the Xbits). All of this information is mixed randomly with garbage data, encrypted with the User Password, and stored in the ?pointer? file (Fig. 59 (c)). The Xslices and Xbits are sent to IDACS to be stored on a random Database (Fig. 59 (d)). Only the SSAs are able to link the User with the stored Xslices and Xbits; the Databases store no information regarding the type of the data or the owner of this data. The 75 Database stores all data simply as data; there is no indication as to whether the data is Xslices, Xbits, or another type of data. In this way, an Attacker would be forced to compromise both an SSA and the correct Database to recover the Xbits or Xslices and relate them to the correct Client Device and User. In order to decrypt the file, the User must possess the Client Device and provide the correct User Password to extract and retrieve all the relevant data to complete reassembly and decryption (Fig. 59 (e)). In this implementation, the space separation of the storage of authentication items is simulated by storing Client and User Badge data in different text files (Fig. 59 (f)); however, due to the difficulty of integrating a stand-alone User Badge program with the BlackBerry simulator, the User Badge interface and the User Badge PIN are only used in conjunction with the CLI Client, not the BlackBerry application. a) b) c) d) e) f) Fig. 60. BlackBerry implementation of IDACS encryption and distributed storage. (a) file ciphertext with Xslices removed is (b) divided and stored in multiple data files. (c) Names of data files are encrypted and stored in a pointer file. (d) Xslices and Xbits are stored on IDACS Database as pure data; no information saved on Database indicating the identity of this data. (e) Correctly reassembling all distributed and mutated pieces results in correct file decryption. (f) Distributed storage between Client and User Badge is simulated using separate text files. 76 7 Conclusion The work presented in this paper describes an integrated security system IDACS that utilizes the space-time separated and jointly evolving relationship to provide multiple layers of constantly-changing barriers that will be mathematically infeasible for attackers to predict. The implementation of these ideas in the IDACS system can successfully detect and defeat different types of network attacks, including zero-day attacks. Table 10 details several common network attacks that IDACS addresses. Mathematical analysis demonstrates that it is infeasible to recreate the IDACS authentication protocol, and simulations also reinforce the strength of these space-time relationships. Table 10. Types of Network Attacks Defeated by IDACS Attack defeated Reason Zero-day attack Network access control is mathematically defined by a space-time separated and jointly evolving relationship; zero-day attacks which can compromise hosts cannot forge such a relationship when accessing protected data; furthermore, the zero-day attack method can be captured and analyzed Denial-of Service (DoS) attack Quick stateless OTP checking allows attack packets to be quickly discarded Replay attack Time-evolution means OTPs and PIDs are true one-use items tied to packet sequence number Client-side device loss Cryptographic seeds for calculating authentication parameters are space-separated; loss of one or more devices does not allow attacker to reconstruct the space-time separated and evolving relationship SA and/or SSA hijacking Mutual support in authentication chain detects a hijacked SA or SSA SA and/or SSA memory leakage Space-time separation and evolution of cryptographic seeds means memory leakage at one or more SAs does not leak all OTP/PID seeds System downtime while waiting for network healing Space-time evolving determination of network-side authentication chain path allows real-time network healing with no network downtime by using available system migration In addition to detecting and preventing attacks, the design of IDACS also provides real-time forensics capabilities, allowing traitorous network actors to be identified quickly and accurately. Simulations demonstrate that IDACS forensics are efficient and effective. Also, IDACS uses the space-time separated and jointly-evolving relationship to protect at-rest encrypted data. Time-changing Xbits and Xslices stored across multiple locations and data segmentation provide greater security for encrypted data. Once again, mathematical analysis demonstrates the theoretical strength of this system, and simulation provides a more concrete expression of this security. IDACS implements the space-time separated and jointly-evolving relationship across multiple aspects of the system to provide a complete end-to-end network and data protection system that has strong mathematical properties. 77 8 References [1] J. Wylie, M. Bigrigg, J. Strunk, G. Ganger, H. Kiliccote and P. Khosla, "Survivable information storage systems," Computer, vol. 23, no. 8, pp. 61-68, August 2000. [2] H. Sengar, X. Wang, H. Wang, D. Wijesekera and S. Jajodia, "Online detection of network traffic anomalies using behavioral distance," Quality of Service, 2009. IWQoS. 17th International Workshop on, pp. 1-9, July 2009. [3] A. Mishra, K. Nadkarni and A. Patcha, "Intrusion detection in wireless ad hoc networks," IEEE Wireless Communications, vol. 11, no. 1, pp. 48-60, February 2004. [4] A. Bouchahda, N. L. Thanh, A. Bouhoula and F. Labbene, "Enforcing Access Control to Web Databases," in 2010 IEEE 10th International Conference on Computer and Information Technology (CIT), 2010. [5] M. Kiani, A. Clark and G. Mohay, "Evaluation of Anomaly Based Character Distribution Models in the Detection of SQL Injection Attacks," in Third International Conference on Availability, Reliability and Security, 2008. ARES 08, 2008. [6] Y. Amir, Y. Kim, C. Nita-Rotaru and G. Tsudik, "On the performance of group key agreement protocols," ACM Trans. Inf. Syst. Secur., vol. 7, no. 3, pp. 457-488, August 2004. [7] U. Tupakula and V. Varadharajan, "On the design of Virtual machine Intrusion detection system," in Integrated Network Management (IM), 2011 IFIP/IEEE International Symposium on , 2011. [8] X. Zhao, K. Borders and A. Prakash, "Towards protecting sensitive files in a compromised system," in Security in Storage Workshop, 2005. SISW '05. Third IEEE International, 2005. [9] R. Sandhu and V. Bhamidipati, "The ASCAA Principles for Next-Generation Role-Based Access Control," in ARES 2008 - International Conference on Availabilityy, Reliability and Security, 2008. [10] T. Lodderstedt, D. Basin and J. Doser, SecureUML: A UML-Based Modeling Language for Model-Driven Security, Springer Berlin /Heidelberg, 2002. [11] C. Cachin and Chandran, N., "A Secure Cryptographic Token Interface," in Computer Security Foundations Symposium, 2009. CSF '09. 22nd IEEE, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.161.1048, 2009. [12] J. B. a. F. S. J. Anderson, "Inglorious Installers: Security in the Application Marketplace," in Proc. 9th Workshop Economics of Information Security, 2010. [13] D. Z. K. S. Haining Wang, "Change-point monitoring for the detection of DoS attacks," IEEE Transactions of Dependable and Secure Computing, vol. 1, no. 4, pp. 193-208, Oct-Dec 2004. [14] K. L. W. Z. Yang Xiang, "Low-Rate DDoS Attacks Detection and Traceback by Using New Information Metrics," IEEE Transactions on Information Forensics and Security, vol. 6, no. 2, pp. 426-437, June 2011. [15] Y. Kim, W. C. Lau, M. C. Chuah and H. Chao, "PacketScore: a statistics-based packet filtering scheme against distributed denial- of-service attacks," IEEE Transactions on Dependable and Secure Computing, vol. 3, no. 2, pp. 141-155, April-June 2006. [16] X. Wang and M. Reiter, "Using Web-Referral Architectures to Mitigate Denial-of-Service Threats," IEEE Transactions on Dependable and Secure Computing, vol. 7, no. 2, pp. 203-216, April-June 2010. [17] P. Peng, P. Ning and D. S. Reeves, "On the secrecy of timing-based active watermarking trace-back techniques," in Security and Privacy, 2006 IEEE Symposium on, 2006. [18] N. Paxton, G.-J. Ahn and B. Chu, "Towards Practical Framework for Collecting and Analyzing Network-Centric Attacks," in IEEE International Conference on Information Reuse and Integration, 2007. IRI 2007., 2007. [19] D. Whyte, P. C. v. Oorschot and E. Kranakis, "Tracking Darkports for Network Defense," in Computer Security Applications Conference, 2007. ACSAC 2007. Twenty-Third Annual, 2007. [20] A. G. Y. C. V. P. Zhichun Li, "Towards Situational Awareness of Large-Scale Botnet Probing Events," IEEE Transactions on Information Forensics and Security, vol. 6, no. 1, pp. 175-188, March 2011. [21] M. C. Y. C. M. Q. Kai Hwang, "Hybrid Intrusion Detection with Weighted Signature Generation over Anomalous Internet Episodes," IEEE Transactions on Dependable and Secure Computing, vol. 4, no. 1, pp. 41-55, Jan-March 2007. 78 [22] K. Brasee, S. Makki and S. Zeadally, "A Novel Distributed Authentication Framework for Single Sign-On Services," in 2008 IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing, Taichung, Taiwan, 2008. [23] T. Li and R. Deng, "Vulnerability Analysis of EMAP-An Efficient RFID Mutual Authentication Protocol," in Availability, Reliability and Security, 2007. ARES 2007. The Second International Conference on, 2007. [24] S. Peisert, M. Bishop, S. Karin and K. Marzullo, "Toward Models for Forensic Analysis," in Systematic Approaches to Digital Forensic Engineering, 2007. SADFE 2007. Second International Workshop on, 2007. [25] S. Kirbiz, M. Celik, A. Lemma and S. Katzenbeisser, "Watermarking of AAC Bit-streams for Forensic Tracking," in Signal Processing and Communications Applications, 2007. SIU 2007. IEEE 15th, 2007. [26] A. Sharma, Z. Kalbarczyk, R. Iyer and J. Barlow, "Analysis of Credential Stealing Attacks in an Open Networked Environment," in Network and System Security (NSS), 2010 4th International Conference on , 2010. [27] F. Valeur, G. Vigna, C. Kruegel and R. Kemmerer, "A Comprehensive approach to intrusion detection alert correlation," IEEE Transactions on Dependable and Secure Computing, vol. 1, no. 3, pp. 146-169, July-September 2004. [28] A. Hofmann and B. Sick, "Online Intrusion Alert Aggregation with Generative Data Stream Modeling," Dependable and Secure Computing, IEEE Transactions on, vol. 8, no. 2, pp. 282-294, March-April 2011. [29] F. e. a. Chang, "Bigtable: A Distributed Storage System for Structured Data," ACM Transactions on Computer Systems (TOCS), vol. 26, no. 2, June 2008. [30] H. H. Huang and A. S. Grimshaw, "Automated performance control in a virtual distributed storage system," in GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing, Washington, DC, 2008. [31] H. Weatherspoon, P. Eaton, C. Byung-Gon and J. Kubiatowicz, "Antiquity: exploiting a secure log for wide-area distributed storage," in Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 200, New York, NY, 2007. [32] K. S. a. N. Memon, "Automatic Reassembly of Document Fragments via Context Based Statistical Models," in Proceedings of the 19th annual Computer Security Applications Conference (ASAC '03), Washington, D.C., 2003. [33] M. R. Garey and D. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, San Fracisco, CA: Freeman, 1979. [34] July 2011. [Online]. Available: http://csrc.nist.gov/groups/ST/toolkit/mg/documentation_software.html. [35] A. R. et.al., "A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications," Gaithersburg, MD, 2010. [36] A. Maurer and S. Tixeuil, "Limiting Byzantine Influence in Multihop Asynchronous Networks," 2012 IEEE 32nd International Conference on Distributed Computing Systems (ICDCS), pp. 183-192, 2012. [37] V. Pandit, J. H. Jun and D. Agrawal, "Inherent Security Benefits of Analog Network Coding for the Detection of Byzantine Attacks in Multi-Hop Wireless Networks," 2011 IEEE 8th International Conference on Mobile Adhoc and Sensor Systems (MASS), pp. 697- 702, 2011. [38] F. Tao, Z. Bingtao and M. Jianfeng, "Security Random Network Coding Model against Byzantine Attack Based on CBC," 2011 International Conference on Intelligent Computation Technology and Automation (ICICTA), vol. 2, pp. 1178-1181, 2011. [39] K. Shanmugasundaram and N. Memon, "Automatic Reassembly of Document Fragments via Context Based Statistical Models," in Proceedings of the 19th annual Computer Security Applications Conference (ASAC '03), Washington, D.C., 2003. [40] M. Mathis, J. Mahdavi, S. Floyd and A. Romanow, October 1996. [Online]. Available: http://tools.ietf.org/html/rfc2018. [41] C.-H. '. Wu, T. Liu, C.-C. '. Huang and J. D. Irwin, "Modelling and Simulations for Identity-Based Privacy-Protected Access Control Filter (IPACF) Capability to Resist Massive Denial of Service Attacks," International Journal of Information and Computer Security, vol. 3, no. 2, pp. 195-223, 2009. 79 9 Appendix A The purpose of this section is to prove that the Maximum Weight Path of Specified Length (MWPSL) and Maximum Weight Directed Path of Specified Length (MWDPSL) decision problems are NP-complete. The process of proving a given decision problem C to be NP-complete has two steps: 1. Show that C is in NP 2. Show that every problem in NP is reducible to C in polynomial time The first step can be shown by demonstrating that a candidate solution to C can be checked for correctness in polynomial (or better) time. The second step can be shown by demonstrating that any one known NP-complete problem B is reducible to C. If one NP-complete problem B can be reduced to C, then all other NP-complete problems can be reduced to C. A problem B is reducible to problem C if there is a polynomial-time, many-one reduction from B to C; that is, there is a reduction that can transform any instance of B into an instance of C. Any algorithm that can be used to solve all instances of problem C can be used to solve all instances of problem B [30]. The process of proving that the MWPSL and MWDPSL problems (C) are NP-complete begins with a proven NP- complete problem, the Hamiltonian Path problem (B) [30]. The reduction path between the Hamiltonian Path problem (B) and the MWPSL and MWDPSL (C) problems is shown in Fig. 61. Fig. 61. NP-complete reduction path In the following sections, each of the reductions will be shown in the indicated steps. At the end, the MWPSL and MWDPSL (C) problems will be proven to be NP-complete. Starting Point: The Hamiltonian Path is NP-Complete Hamiltonian Path: Given an undirected graph G = (?, E) where ? is a set of vertices {v1, v2, ?} and every e E is an unordered set of vertices {v1, v2} called edges. Does G contain a Hamiltonian path, which is a sequence of distinct vertices from ? such that {vi, vi+1} E for 1?i in 80 G where ?=|?| such that ? ( ) , where {vi, vi+1} E? Step 1.1: Show that the Maximum Weight Hamiltonian Path is in NP. A candidate solution to this problem can be checked by tracing the path, verifying that each vertex is touched once and only once, and summing the weights of the edges in the path and checking the final sum. The candidate solution is checked in linear time. Step 1.2: Show that the Hamiltonian Path problem is reducible to the Maximum Weight Hamiltonian Path problem in polynomial time. The Hamiltonian Path problem is a special case of the Maximum Weight Hamiltonian Path problem, so the first can be reduced to the second. Create an instance of the Maximum Weight Hamiltonian Path problem. Set all W(e) = 1 for all e E and set R = (|?| - 1). This is now an instance of the Hamiltonian Path problem, and the reduction is accomplished in linear time. Result: The Maximum Weight Hamiltonian Path problem is NP-complete. Step 2: Show that the Maximum Weight Path of Specified Length (MWPSL) problem is NP-complete. Maximum Weight Path of Specified Length (MWPSL) : Given an undirected graph G = (?, E) where every e E is an unordered set of vertices {v1, v2} called edges and has a weight W(e) Q+, there is a number R Q+ and an integer N ? |?|. Is there a path P = in G such that any v ? appears at most once in P and ? ( ) , where {vi, vi+1} E? Step 2.1: Show that the MWPSL problem is in NP. A candidate solution that connects some or all of the vertices can be checked by tracing the path, verifying each vertex in the path is touched at most once, verifying that there are N vertices in the path, and summing the path edge weights and comparing the sum to R. This candidate solution is checked in linear time. Step 2.2: Show that the Maximum Weight Hamiltonian Path problem is reducible to the MWPSL problem. The Maximum Weight Hamiltonian Path problem is a special case of the MWPSL problem, so the first can be reduced to the second. Create an instance of the MWPSL problem and set N = |?|. This is now an instance of the Maximum Weight Hamiltonian Path problem; this reduction is accomplished in linear time. Result: The Maximum Weight Path of Specified Length (MWPSL) problem is NP-complete. Step 3: Show that the Maximum Weight Directed Path of Specified Length (MWDPSL) problem is NP-complete. Maximum Weight Directed Path of Specified Length (MWDPSL): Given a directed graph G = (?, E) where every e E is an ordered set of vertices {v1, v2} called arcs and has a weight W(e) Q+, there is a number R Q+ and an integer N ? |?|. Is 81 there a path P = in G such that any v ? appears at most once in P and ? ( ) , where {vi, vi+1} E? Step 3.1: Show that the MWDPSL problem is in NP. A candidate solution that connects some or all of the vertices can be checked by tracing the path, verifying each vertex in the path is touched at most once, verifying that there are N vertices in the path, and summing the path edge weights and comparing the sum to R. This candidate solution is checked in linear time. Step 3.2: Show that the MWPSL problem is reducible to the MWDPSL problem. The MWPSL problem is a special case of the MWDPSL problem, so the first can be reduced to the second. Create an instance of the MWDPSL problem corresponding to an instance of the MWPSL problem where every e E with a given W(e) in the MWPSL problem is replaced by a pair of opposite-direction directed e E in the MWDPSL problem, both with the same W(e) as in the MWPSL problem. This now equates to an instance of the MWPSL problem; this reduction is accomplished in linear time. Result: The Maximum Weight Directed Path of Specified Length (MWDPSL) problem is NP-complete. Implications for IDACS strength: The MWPSL and MWDPSL problems represent a reassembly-due-to-space-separation problem at a given instant in time. Thus, the space separation of IDACS provides the NP-completeness to the systems. However, due to the joint time-evolution of IDACS, the problem evolves into a completely new MWPSL or MWDPSL problem each time the system states change (which can occur every few seconds). Therefore, the time-evolution greatly increases the complexity of the problem.