APPLICATIONS OF GUI USAGE ANALYSIS Except where reference is made to the work of others, the work described in this dissertation is my own or was done in collaboration with my advisory committee. This dissertation does not include proprietary or classified information. _________________________________ Eric Shaun Imsand Certificate of Approval: _______________________________ Cheryl D. Seals Associate Professor Computer Science and Software Engineering _______________________________ John A. Hamilton, Jr., Chair Assistant Professor Computer Science and Software Engineering _______________________________ David A. Umphress Associate Professor Computer Science and Software Engineering _______________________________ Joe F. Pittman Interim Dean Graduate School APPLICATIONS OF GUI USAGE ANALYSIS Eric Shaun Imsand A Dissertation Submitted to the Graduate Faculty of Auburn University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Auburn, Alabama May 10, 2008 iii APPLICATIONS OF GUI USAGE ANALYSIS Eric Shaun Imsand Permission is granted to Auburn University to make copies of this dissertation at its discretion, upon request of individuals or institutions and at their expense. The author reserves all publication rights ______________________________ Signature of Author ______________________________ Date of Graduation iv DISSERTATION ABSTRACT APPLICATIONS OF GUI USAGE ANALYSIS Eric Shaun Imsand Doctor of Philosophy, May 10, 2008 (M.S. Auburn University, 2003) (B.S. Auburn University, 2002) 141 Typed Pages Directed by John A Hamilton, Jr. In the realm of computer security, a masquerade attack is a form of attack wherein the attacker deceives the victim, causing them to believe they are someone other than who they are. One particularly dangerous form of masquerade attack occurs when an attacker begins using an unattended and unlocked computer workstation. This form of masquerade attack is particularly troubling because it requires no technical expertise to perform. Though proper adherence to organizational security policies can mitigate this risk, new technologies are needed to completely defend against this type of attack. This dissertation presents the results of a study into the potential suitability of GUI Usage Analysis as an authentication mechanism which can be used as a defense against masquerade attacks. Previous attempts at authenticating the current user of a computer system have focused on typing patterns and mouse movements. GUI Usage v Analysis does not focus on the user?s physical interaction with the computer system, but instead on how the user manipulates the windows, icons, menus, and pointers that comprise a graphical user interface. Results are presented showing the feasibility of employing GUI Usage Analysis as a means of authenticating the user of a computer system. Furthermore, results are also presented demonstrating the effectiveness of using GUI Usage Analysis as a means of identification with the goal of identifying a potential attacker. Finally, the results obtained here are compared to other previously published masquerade detection techniques. vi Style manual used A Manual for Writers of Research Papers, Theses, and Dissertations by Kate Turabian (6th Edition) Computer software used Microsoft Office Enterprise 2007, Microsoft Visual Studio 2005, Matlab 7.0 vii TABLE OF CONTENTS CHAPTER 1 - Introduction ................................................................................................ 1 CHAPTER 2 ? Review of Related Work ........................................................................... 6 CHAPTER 3 ? Experimental Setup .................................................................................. 20 CHAPTER 4 ? Use of GUI Usage Analysis as an Authentication Mechanism ............... 41 CHAPTER 5 ? Use of GUI Usage Analysis as an Identification Mechanism ................. 74 CHAPTER 6 ? Comparison to Other Published Techniques ........................................... 92 CHAPTER 7 ? Potential Vulnerabilities of GUI Usage Analysis .................................. 102 CHAPTER 8 ? Summary of Findings and Imminent Future Research .......................... 107 CHAPTER 9 ? Future Applications of this Research ..................................................... 112 REFERENCES ............................................................................................................... 119 APPENDIX A ................................................................................................................. 123 APPENDIX B ................................................................................................................. 127 APPENDIX C. ................................................................................................................ 129 viii LIST OF TABLES Table 1. Terminology Used in this Dissertation but not Defined Elsewhere .................... 5 Table 2. Demographic Survey Results.............................................................................. 32 Table 3. Dissimilarity Analysis with a Static Attack Threshold ....................................... 47 Table 4. Attack Detection Rate for TF-IDF Inspired with Window Size 1 ...................... 49 Table 5. Attempted Extrapolation of Real-World Attack Detection Rates ...................... 52 Table 6. Dissimilarity Analysis with a Static Attack Threshold and an Event Window of 2 ............................................................................................................................... 56 Table 7. Jaccard Index with a Universal Attack Threshold .............................................. 60 Table 8. Individual Performances with a Universal Attack Threshold using Jaccard Analysis................................................................................................................. 62 Table 9. Individual Performance with Customized Attack Thresholds using Jaccard Index ............................................................................................................................... 64 Table 10. Comparison of Dissimilarity and Jaccard Analysis ......................................... 66 Table 11. Impact of Vocation on Attack Detection .......................................................... 69 Table 12. Impact of Educational Achievement on Attack Detection ............................... 70 Table 13. Impact of Self-Reported Computer Skill on Attack Detection ......................... 70 Table 14. Impact of Daily Computer Usage on Attack Detection .................................... 71 Table 15. Impact of Computer Skill Source on Attack Detection .................................... 72 Table 16. Impact of Age on Attack Detection .................................................................. 72 Table 17. Successful Identification Rate by User ............................................................. 77 ix Table 18. Impact of Vocation on Identification Rate ....................................................... 80 Table 19. Impact of Educational Achievement on Identification Rate ............................. 81 Table 20. Impact of Self-Reported Computer Skill on Identification Rate ...................... 82 Table 21. Impact of Daily Computer Usage on Identification Rates ................................ 83 Table 22. Impact of Computer Skill Source on Identification Rates ................................ 83 Table 23. Impact of Age on Identification Rates .............................................................. 84 Table 24. Average User Rank in Identification with Neural Network ............................. 90 Table 25. Comparison of GUI Usage Analysis and Initial Mouse Movement Analysis .. 94 Table 26. Comparison of GUI Usage Analysis and Revised Mouse Movement Analysis95 Table 27. Comparison of GUI Usage Analysis and Garg's Mouse Movement Analysis . 98 Table 28. Comparison of all Presented Techniques........................................................ 101 Table A1. Simulated Attack Rate for Known Users and Attackers ............................... 126 Table A2. Results of User Identification Experiment showing which Users were mistakenly Identified .......................................................................................... 129 x LIST OF FIGURES Figure 1. Spy++ User Interface ........................................................................................ 22 Figure 2. Message Processing Prior to "Hooking" ........................................................... 25 Figure 3. Message Processing After "Hooking" ............................................................... 25 Figure 4. Interface for of the monitoring software ........................................................... 27 Figure 5. Complete Task List provided to Experimental Participants .............................. 37 Figure 6. Illustration of Attack Threshold ........................................................................ 44 Figure 7. Relationship between False Positive Error Rates and False Negative Error Rates ............................................................................................................................... 48 Figure 8. Relationship between False Positive Error Rates and False Negative Error Rates using the Jaccard Index ......................................................................................... 61 Figure 9. Pseudo-code of Jaccard Index Identification Function ..................................... 76 Figure 10. Conceptual Diagram of a Neural Network with N Input Units, four Hidden Units, and two Ouputs........................................................................................... 85 Figure A1. Text of the Informed Consent Document Provided to Study Participants .. 125 1 CHAPTER 1 INTRODUCTION Most every student that has received formal training in the information security field has been taught that data has three fundamental qualities that describe its overall security: confidentiality, integrity, and availability. If any of these qualities is degraded, then the data in question cannot be considered secure. Students of information security are also taught to identify vulnerabilities. Generally speaking, vulnerabilities can be grouped into two categories: vulnerabilities that originate outside of an organization and vulnerabilities that originate from within an organization. Much of the collective effort of information security professionals has been directed towards threats from ?outsiders?, persons not part of the organization. Recent studies, though, have begun to illustrate the considerable damage that can be carried out by insiders (United States Secret Service: National Threat Assessment Center 2004). These findings illustrate the need for the development of additional defensive technologies to protect against insider threats. Please note that the term ?insiders? can be used to denote either persons with authorized access to systems/information or persons with physical access to the premises. It is believed that the results of the studies presented in this dissertation can be utilized by software engineers to develop new technologies to guard against one 2 particular type of insider attack known as a masquerade attack. It is believed that such a line of products would be of great value to information security professionals, who are currently limited in their ability to detect and prevent insider attacks on confidentiality and integrity. Masquerade Attacks Masquerade attacks have been studied in some detail. The general description of a masquerade attack is an instance in which the attacker is able to trick the targeted system into believing that they are someone they are not. In essence, the attacker is impersonating, or masquerading, as another user with legitimate system access. Masquerade attacks are considered dangerous due to the fact that, when successful, the attacker is able to assume the identity of the impersonated user, resulting in the attacker having access to all the resources the impersonated user has access to. Masquerade attacks can take on many different forms. For example, an attacker that successfully hijacks an authenticated web session is one form of a masquerade attack. In this instance the attacker would appear to the website as the authenticated user. Another form of masquerade attack can occur when an attacker begins using an unattended and unlocked computer workstation. This is the form of masquerade attack investigated by the studies presented in this dissertation. In the form of masquerade attack considered here, the attacker has physical access to the system and has found it in an unsecured state. In this scenario the system has already authenticated an authorized user and that user has subsequently left the system unattended and unlocked, allowing the attacker to take control of the system without 3 having to identify or authenticate themselves. In this scenario the system believes the attacker to be the legitimate user, resulting in the attacker being granted access to all of the legitimate users files, e-mail, database tables, etc. As a real-world scenario in which this type of attack might occur, consider an employee who slips into a co-workers office right after they have left for lunch. Even if the targeted employee uses a password on their screensaver, there will still be a window of time in which the system is vulnerable. As previously mentioned, the overall security of an object is typically measured in three different manners: confidentiality, integrity, and availability. The type of masquerade attack discussed here can affect both the confidentiality and integrity of an object with little chance of being detected. This is because the behaviors exhibited by an attacker seeking to compromise the confidentiality or integrity of an object would closely resemble the normal behavior exhibited by the legitimate user. To illustrate, consider a spreadsheet containing revenue figures on a corporate file server. Suppose that the legitimate user is authorized to view and modify the contents of that file. From the system?s point of view, a masquerading attacker will exhibit the same behavior as the known user. The attacker may even modify the file (compromising the integrity of the object) without raising any suspicion that an attack is occurring. Because this is a behavior that the legitimate user regularly engages in, the system will have no way of knowing that the changes being affected have been performed by an attacker. To detect this type of attack, the system must have some defensive measure that is capable of knowing the actual identity of the user behind the monitor. Furthermore, the identity must be difficult to forge or otherwise defeat. While solutions have been 4 developed that meet these criterion using hardware devices, software based solutions are also desirable. This dissertation presents several studies that explore the suitability of user interactions with a graphical user interface as a profile to be used in identification and/or authentication. Description of the Microsoft Windows Graphical User Interface Though modern computing systems undoubtedly represent vast engineering and scientific achievement, they are ultimately tools; a means to be used in the completion of some greater task. As a result they require some manner of receiving instruction. Typically this instruction comes from human operators who issue commands that are then carried out by the computer. In a continuing effort to make the issuance of these commands more efficient, modern computing environments feature intuitive user interfaces that depict data using pictures instead of plain textual characters, as was the case in the past. The term for these new graphically driven user interfaces is graphical user interface and may sometimes be abbreviated to ?GUI?. The computing environment used in the studies presented here is Microsoft Windows XP, frequently referred to as simply ?Windows?. Windows features a graphical user interface which requires users to manipulate objects known as windows, icons, and menus using a tool referred to as a pointer. Thus, the Windows user interface is a sub-category of GUI known as WIMP (?Windows, Icons, Menus, and Pointers?). The vast majority of people, both computer users and non-users, are familiar with WIMP based user interfaces. As of December 2007 some estimates place the market share of Windows XP at nearly eighty percent (Market Share 2007). 5 Several terms are sometimes used in this dissertation without an accompanying definition. These terms will be defined here. Table 1. Terminology Used in this Dissertation but not Defined Elsewhere Term Definition Control An object on which a user takes some action (keystroke, mouse click, double-click, etc.). Examples include text-boxes, buttons, menus, etc. Handle An ID number used by the operating system (Windows) to keep track of and refer to a specific control. (Microsoft Corporation 2007) Event Strictly speaking, an event is a notification issued by the operating system to indicate to a client application that something specific has occurred. For the purposes of this dissertation, event will denote the occurrence of an action performed by the user on some control, such as left mouse button click on a button. Process For the purposes of this dissertation, the term process describes a computer program that is actively executing (running) on the host computer system. Message The term message is used to describe a piece of information transmitted from the operating system to a running process, updating the process on the current state of the computing environment 6 CHAPTER 2 REVIEW OF RELATED WORK The ideas presented in this dissertation certainly did not occur in a vacuum. Rather, the ideas expressed here are a natural progression of thought built upon other previously published research generated by other authors. The ideas presented here draw from several different separate research sources. These include varying types of behavioral biometrics as well as other system oriented profiling techniques. The one constant in all of the techniques described here is that they attempt to detect attacks (sometimes referred to as intrusions) by searching for abnormalities in monitored behavior. One of the earliest references to monitoring abnormalities as a means of detecting intrusions was introduced by Denning (Denning 1986). Denning proposed the creation of a network intrusion detection system that monitored the system for abnormal occurrences. The hypothesis espoused by Denning was that abnormal network activity could be strongly indicative of an attack. Many of the modern intrusion/attack detection systems described in this chapter follow a similar architectural style. Behavioral Biometrics Biometrics are traits that are generally unique to individual users that can be used as either means of identification or authentication. Biometrics themselves can be further divided into behavioral biometrics and physiological biometrics (Coventry 2005) 7 (Newbold 2007). Physiological biometrics describes physical characteristics that are used as biometrics. Examples of physiological biometrics include fingerprints, iris and retinal scans, palm prints, etc. Behavioral biometrics describes human behaviors that are generally found to be unique to individual persons, but are not necessarily dependent on any physiological traits. Examples of behavioral biometrics include handwriting, voice analysis, command line profiling, and keystroke dynamics (sometimes known as typing dynamics). It is believed that GUI Usage Analysis is a new form of behavioral biometric. For this reason, only background work concerning behavioral biometrics will be presented in this chapter. Furthermore, only behavioral biometrics that can be analyzed using a computer keyboard and mouse will be discussed. It is believed that while voice analysis, handwriting, and GUI Usage Analysis are all examples of behavioral biometrics, they are still sufficiently different from each other to make their inclusion in this discussion unnecessary. User/System Interaction Profiling Several studies have been published that attempt to identify and verify a user?s identity based on the current instructions passed to the system. These attempts are somewhat related to GUI Usage Analysis as both techniques utilize passive profiling techniques based on the user?s manipulation of the computer system. Because this relationship is somewhat tangential to GUI Usage Analysis, the techniques presented here are not given as much consideration as other profiling technologies detailed in this chapter. 8 Attempts have been made at profiling users based on the current state of the system. Researchers believe that users tend to utilize computer systems in similar manners on a day to day basis. Because the system is being utilized in a consistent manner from day to day, the underlying components of the system should also be relatively consistent from day to day. Assuming that the same user is operating the computer system in a standard manner, the usage of system resources like processor load and available memory should be relatively constant. Any deviation from these observed norms could represent some sort of attack, either from an intruder having gained unauthorized access to a user?s account, or from the user themselves engaging in suspicious behavior. Such a study was carried out by Li and Manikopoulous (Li and Manikopoulos 2004). Li and Manikopoulous selected key system events such as processor load, process table status, memory consumption, etc., from eight users and utilized them as either legitimate user events or as attacker events. Thirty five separate sessions generated by four of the selected users were used as both training and test data. At the same time, sessions from four additional users were used exclusively as test data. The authors fed the experimental data into a support vector machine classifier. The classification engine correctly detected 63% of attacks and had a false positive rate of 3.7%. Another approach frequently seen in attempting to detect masquerade attacks is commonly referred to as ?command line profiling?. Command line profiling attempts to detect masquerading attackers based on the commands issued to the system console. The hypothesis behind command line profiling is very similar to the rationale used for the 9 system load profiling experiments previously described: users tend to operate computer systems in a consistent manner. For example, an office assistant would commonly be observed issuing console commands to invoke a word processor or spreadsheet application, but not the system command for modifying the system?s password file. Such an occurrence would represent a deviation from the observed normal behavior for that user and would most likely indicate an attack of some sort, either from an external attacker or malicious insider. Multiple studies have been published studying the viability of profiling a user based on the commands entered into a system console (Maxion and Townsend 2002), (Coull, et al. 2003), (Khanna and Liu 2006). Perhaps the most commonly cited work on the subject of command line profiling was presented by a team led by Matthias Schonlau (Schonlau, et al. 2001). Schonlau and his team collected approximately 15,000 commands for each user. A total of fifty users participated in the study. Schonlau then used the collected data and analyzed it using multiple previously published analytical techniques. He concluded that each of the materials could detect intrusions reasonably well, though none of the techniques detected a high enough percentage of attacks to justify its use as the sole means of detecting intrusions. Keystroke Dynamics Beginning in the mid 1980s, researchers began to experiment with the possibility that users typing on a keyboard may exhibit patterns in their typing. This general field of research, commonly referred to as ?keystroke dynamics? centered on the notion that users had typing patterns that could be used as either the basis for identification or 10 authentication. Studies frequently focused on the periods of time observed between successive keystrokes. It was generally determined that experienced typists did exhibit patterns in their typing, and that these patterns could be used as a means of authentication. The one caveat was that the user needed to be typing a word or phrase that they frequently typed, such as a password. One of the initial keystroke dynamics studies was carried out by Umphress and Williams in 1985 (Umphress and Williams 1985). Umphress and Williams performed a study using 17 participants in which each participant completed two typing tests. The first test asked participants to type approximately 1400 characters to serve as the reference profile. The second test asked participants to type approximately 300 characters to use as the test profile. Umphress and Williams analyzed the average latencies occurring between each key press for each user. After comparing the observed latencies in the reference profile to the latencies observed in the test sample, a confidence score of either low, medium, or high was computed. Test samples assigned a value of low confidence were suspected to have originated from a user other than the known reference user. Similarly, test samples assigned a confidence level of ?high? were judged to have been generated by the reference user. Umphress and Williams experimental results generated a false positive rate of 12% and a false negative rate of 6%. Several years after the research published by Umphress and Williams, the team of Joyce and Gupta implemented a follow-up study (Joyce and Gupta 1990). This study was noteworthy because it marked one of the earliest examples of keystroke dynamics research targeting the use of a PIN or password. Using an experimental sample of 33 11 participants, Joyce and Gupta asked participants to type their user id, password/PIN, first name, and last name. By examining the latencies between consecutive characters, Joyce and Gupta were able to achieve a false negative error rate of 1% and a false positive error rate of 7%. Following the efforts of Joyce and Gupta, the Italian team of Bergadano, Gunetti, and Picardi published a large scale study designed to verify the utility of keystroke dynamics over dramatically larger samples (Bergadano, Gunetti and Picardi 2002). Bergadano and his team utilized an experimental sample of 154 participants, marking a very large increase over all other prominent keystroke dynamics techniques. The authors of this study were ultimately able to achieve a false positive rate of four percent and a false negative rate of 0.01 percent. The authors of this study utilized a slightly different analytical technique than previous studies to achieve these results. The authors consisted latencies over three character periods rather than the two character period utilized by earlier researchers. To date, the published research into Keystroke Dynamics has been inconclusive (Peacock, Ke and Wilkerson 2005). Failure to use common data collection and analysis techniques between researchers has made it difficult to make definitive judgments regarding the utility of keystroke dynamics as a means of identification or authentication. The transition of modern computing environments to graphical user interfaces may also have impacted the effectiveness of keystroke dynamics (Garg, et al. 2006), either in their overall usefulness or in terms of the repeatability of keystroke patterns caused by decreased typing. 12 Despite the apparent conclusion of the academic community regarding the effectiveness of keystroke dynamics, multiple commercial products have been developed utilizing this technique. For example, in 2002 BioPassword Inc. released their first keystroke dynamics based authentication package (Kingsbury 2006). As their name implies, BioPassword Inc. products utilize keystroke dynamics to harden passwords. According to BioPassword, a user is required to not only know a password, but they are required to type it in a manner consistent with an observed reference profile (BioPassword, Inc. 2007). Recently other companies have moved into the marketplace offering competing products (iMagic Software 2007), (ID Control, Inc. 2007), (bioChec 2007). Not surprisingly precise performance data is not readily available for the commercial solutions. Other GUI Based Authentication Studies Though the ideas presented in this dissertation are unique, there have been previously published studies that attempt to accomplish the same goals with varying success (Pusara and Brodley 2004), (Garg, et al. 2006). These papers have attempted to differentiate between users based on how users interact with a graphical user interface. This section will describe the work performed by both authors as well as compare and contrast their methodologies. Because only two published studies are considered, this section will differ slightly from the other sections in this chapter, taking advantage of the opportunity to demonstrate direct comparisons between the two studies. The studies presented here have several key similarities. First, and arguably most importantly, both studies attempted to solve the same problem: verifying the identity of 13 the user sitting behind the keyboard. Garg and his team referred to this practice as ?masquerade detection? while Pusara and Brodley coined the term ?user re- authentication?. Garg describes a masquerade attack as being an incident in which the operator of a computer is not the rightful owner of the credentials used to access the system. Pusara and Brodley?s chosen term, re-authentication, implies a similar problem set: re-verifying that the operator of the computer is the proper owner of the credentials used to access it. As an aside, Garg?s terminology was adopted to describe the research presented in this dissertation. Garg?s terminology was chosen over Pusara and Brodley?s for two primary reasons. First, an informal literature review seemed to slightly favor the term ?masquerade attack? over ?re-authentication?. Secondly, it is believed that the majority of incidents in which GUI Usage analysis or any similar technique might be applied are best described as attacks. Both Garg and Pusara and Brodley ultimately focused their research on data gathered from the mouse. Pusara and Brodley appear to have focused on gathering mouse movement data from the beginning, no doubt at least partially inspired by the keystroke dynamics research described earlier in this chapter. Garg, on the other hand, initially captured data from multiple sources including mouse movements as well as keystrokes. As Garg?s research progressed he focused exclusively on the movements of the mouse. Garg did not provide a formal explanation for why the other data that was collected was not factored into his final analysis. It is interesting to note that Pusara and 14 Brodley?s work was referenced by Garg, though his findings are not presented as a mere confirmation of Pusara and Brodley?s work. Both authors utilized machine learning techniques to analyze the data that was gathered. Pusara and Brodley utilized a commercial machine learning algorithm known as See5/C5.0 (Rulequest Research 2007). See5 is a form of decision tree learning, a type of machine learning that has been widely studied (Russell and Norvig 2003). Garg selected a different form of machine learning known as Support Vector Machine (SVM) (Vapnik 1995). The specific implementation chosen by Garg is known as SVM-Light, an open source C implementation available free of charge for non-commercial uses (Joachims 2004). Aside from the difference in machine learning algorithm, the data gathered by both authors was relatively similar. Both examined the distance, angle, and speed observed between consecutive user actions and derived several statistics which were used during the analysis. Both authors calculated the mean and standard deviation of the change between consecutive events. Pusara and Brodley also calculated the third moment observed between consecutive events. Both authors also used a sliding window as part of their analysis to limit the amount of data considered by the machine learning algorithms during any single analytical session. In spite of all of the similarities found between the studies published by Garg and Pusara and Brodley, key differences can also be observed. The first and most striking difference between the two studies deals with the size of the experimental sample gathered. Pusara and Brodley initially started with a sample of 18 participants, though 15 that number was eventually decreased to 11. Garg, on the other hand, began with an experimental sample of size three. Garg indicates that no participants were excluded from the analytical data at any time. The next, and possibly most important, difference between the two studies concerns the activities assigned to the study participants. Garg opted to have the participants utilize their computer systems in an ordinary manner, apparently imposing no limitations on what the participants could and could not do while gathering data. Pusara and Brodley took a different approach and severely limited the activities of the participants in their study. Pusara and Brodley?s participants were instructed to read a series of web pages using the Internet Explorer web browser. Pusara and Brodley?s participants did not engage in any other activities other than reading the prescribed web pages. The results of the final experiment presented by Pusara and Brodley were highly comparable to the results of the final experiment presented by Garg. Pusara and Brodley reported a final false negative error rate of 1.75% while Garg and his team reported a false negative error rate of 3.85%. Pusara and Brodley reported a false positive error rate of 0.43% in their final study. Unfortunately Garg and his team did not report a false positive error rate in their findings. Previous GUI Usage Analysis Research The experiments presented in this dissertation are not the first attempts at using GUI Usage Analysis as a means of authentication. Prior to conducting the experiments presented here, two additional studies were performed using differing data analysis 16 techniques and participants. These studies attempted to determine the utility of using GUI Usage Analysis as a means of authentication for an overall participant pool, as well as the impacts that demographical traits might have on the effectiveness of GUI Usage Analysis. In (Imsand and Hamilton, GUI Usage Analysis for Masquerade Detection 2007), Imsand and Hamilton present the results of an initial study into the viability of using GUI Usage Analysis as a means of detecting masquerade attacks. The authors of this study utilized an experimental sample of ten subjects, all of whom were undergraduate students majoring in either business of computer science. These ten subjects were asked to complete a specific set of tasks three times with a period of two days between data collection sessions. In all experiments two of the three sessions were used as training data, while the third session was used as a test session. Imsand and Hamilton utilized a compositional analysis method similar to the techniques described at other points in this dissertation. The authors calculated the number of times each pair of consecutively occurring events was found in the reference sample and compared that number to a similarly calculated total obtained from the test sample. The total number of discrepancies between the reference sample and the test sample was then calculated, indicating the amount of dissimilarity observed between the reference sample and the test sample. The authors exhaustively tested each possible combination of training and test data for each user and reached a tentative conclusion that GUI Usage Analysis could be used as a means of authentication and/or part of a masquerade attack detection scheme. 17 This tentative conclusion was based on an observed false negative attack rate of 40%. The authors used an ?attack threshold? which indicated the maximum amount of dissimilarity that could be observed between two sets of data without signaling an attack. By tuning the attack threshold for each user the false positive rate was zero percent. The authors of this study made one additional interesting observation regarding this research. It was noted that, in comparison to other GUI based profiling techniques, GUI Usage Analysis required much less data to make a classification decision. It was determined that other GUI based profiling schemes would require up to 72% more data in order to operate as designed. The results in (Imsand and Hamilton, GUI Usage Analysis for Masquerade Detection 2007) showed that some users experienced much higher rates of masquerade attack detection than some other study participants. An investigation was conducted in an effort to determine if there was any characteristic that was unique to users who experienced better attack detection. In (Imsand and Hamilton, Impact of Daily Computer Usage on GUI Usage Analysis 2007), Imsand and Hamilton conducted a study that sought to determine if such a characteristic could be found. A very small amount of demographical data was gathered by the authors of (Imsand and Hamilton, GUI Usage Analysis for Masquerade Detection 2007), with none of the solicited characteristics corresponding to a significantly higher attack detection rate. For this reason, the authors of (Imsand and Hamilton, Impact of Daily Computer Usage on GUI Usage Analysis 2007) gathered an entirely new set of participants and created a much larger participant survey. The task list that each participant was asked to 18 complete remained the same, as did the algorithm used to classify test users (i.e. attacker vs. legitimate user). While the second study mirrored the first study in many ways, there were several key experimental differences. First, the participant pool grew from ten participants to sixteen participants. Secondly, the period of time waited between experimental runs was shortened from two days to at least one hour. The other significant difference observed between these two studies was the pre-experiment briefing given to participants. In the first study participants were not briefed on the overall objective of the research prior to their participation. Participants of the second study were given a briefing describing the overall goals of the research. This disclosure was done at the encouragement of the Auburn University office of Human Subjects Research. The overall attack detection rate observed in the second study dropped slightly. The initial study produced an attack detection rate of 60% with no false positives, while the second study found an attack detection rate of 52% with no false positives. The authors provided several suspected reasons for the dip in attack detection rate, though no conclusive determinations were drawn. As previously mentioned, the goal of the study was to find characteristics that might indicate a user may be better protected by GUI Usage Analysis. The data presented by the authors indicated that users that spend more than six hours per day using a computer enjoyed significantly better protection from GUI Usage Analysis. Other characteristics demonstrated slightly better performance for members of certain groups, though the observed improvements were not nearly as pronounced as in cases in which a user spent more than six hours a day operating a computer. Unfortunately the overall number of participants in the study prevented the 19 authors from drawing definitive conclusions regarding computer usage and the protection offered by GUI Usage Analysis. Multiple deficiencies found in the first two studies of GUI Usage Analysis were identified and corrected prior to the commencement of work on this dissertation. Most notable of these corrections was the inclusion of many more participants. Both of the previous studies featured, at most, sixteen participants. The formal findings presented in this dissertation are based on studies featuring thirty one participants. Furthermore, the analytical techniques utilized in prior GUI Usage Analysis studies were unsatisfactory. Neither study featured proven data mining and classification techniques, instead opting for custom developed solutions. This deficiency has also been addressed in the final findings of this dissertation. 20 CHAPTER 3 EXPERIMENTAL SETUP The study presented here was designed to assess the feasibility of using GUI Usage Analysis as both a means of authentication and identification. To accomplish this great care was taken in the design of the performed experiments. As already noted, other studies did not present a clear path to be followed when conducting this study. No single recording tool was used, nor was there a common set of tasks to be performed by participants. This was partially due to the varying nature of what previous studies were attempting to prove; clearly investigators assessing the suitability of command line input profiling would gather different data from individuals investigating mouse movements. Unfortunately there are discrepancies in the literature more fundamental than these differences. Multiple studies observed participants performing the same set of prescribed tasks (Schonlau, et al. 2001), (Pusara and Brodley 2004). Other studies (Garg, et al. 2006) monitored their participants while using a computer as they ordinarily would, completing their everyday tasks. In (Schonlau, et al. 2001), Schonlau argues that no definitive conclusions about identifying characteristics can be drawn unless the only variable in the study are the identifying characteristics themselves. In other words, unless everything else, including the student?s task list is considered to be a constant, no definitive conclusions can be drawn. It is difficult to find fault with this analysis and as a result, the general method 21 proposed by Schonlau was the method that was adopted for this study. Users were monitored in the same software environment using the same monitoring suite while performing the same set of tasks. The remainder of this chapter is organized in the following manner. Section 3.1 presents a discussion of the custom monitoring software that was used in the collection of data for this study. Section 3.2 discuses the criteria used to determine who was eligible for participation in this study. Section 3.3 provides a brief outline of the environment that participants used when participating in this study. Section 3.4 discuses the task list that participants were asked to complete and provides the rationale used in some of the limitations imposed on the participants (such as the frequency of completed runs, amount of time between runs, etc.). Monitoring Software One of the most important aspects of assessing the viability of identifying or authenticating users by their GUI interaction patterns is the actual collection of user?s interactions. It was determined that the following pieces of information had to be gathered from users actively using a graphical user interface: The action the user performed The application the user was interacting with The specific control (i.e. button, text-box, toolbar, etc.) that the user acted on It was believed that failure to collect any of these pieces of information would prevent successful classification (identification and/or authentication) from occurring at acceptable levels. To illustrate the rationale behind this belief, suppose that the specific 22 control that a user was interacting with was not collected. It would then be impossible to determine if a user?s left click was performed on the ?File? menu, a button on the toolbar, relocating the cursor within a document, etc. It was determined that the utility Spy++ provided by Microsoft Corporation as part of its Visual Studio application provided the desired functionality. Unfortunately the interface for Spy++ was determined to be too complex for the average user to interact with. Figure 1. Spy++ User Interface The difficulties provided by the unwieldy interface provided by Spy++ could have been overcome. Unfortunately Spy++ was unsuitable for this study for other reason as well. There was no way to automate the setup and configuration of Spy++ between experimental runs. Proper configuration of Spy++ required several highly specific interactions with the interface. Concern that human error could lead to the correction of 23 incorrect or incomplete data was high. Furthermore, the application could not be placed in a stealth mode in which it was hidden from users. There was concern that the complex interface presented by the application might serve to distract or confuse the users while they were participating in the experiment. When the decision was made not to use Spy++ alternatives were sought out. Unfortunately no freely available utilities were found at the time that provided all of the desired data. The decision was made to create a custom utility to use as a data collection instrument in this study. Research was commenced on how to log keystrokes and mouse-clicks in the Microsoft Windows Operating System. This research quickly yielded two key pieces of information. The first was that the Microsoft Windows operating system uses messages sent from the OS to applications to inform them of user actions that should be responded to (Microsoft Corporation 2007). The second key piece of information was that Windows provides a construct that Microsoft terms ?hooks? that enable an application to intercept these messages (Microsoft Corporation 2007a). According to Microsoft, a hook is, ?a point in the system message-handling mechanism where an application can install a subroutine to monitor the message traffic in the system and process certain types of messages before they reach the target window procedure.? The Windows XP operating system provides a wide variety of hooks that can be installed depending on which messages an application seeks to capture. Also, hooks can be configured to capture only the messages for a single application, or for all applications on the system. For this study it was determined that the WH_GETMESSAGE hook was 24 appropriate as that hook provided the ability to capture all messages sent by the OS to the application (Microsoft Corporation 2007a). Other hooks were considered, such as the WH_KEYBOARD_LL hook as well as the WH_MOUSE_LL hook. Unfortunately experimentation showed that the only hook that provided all of the information needed for full analysis was WH_GETMESSAGE. The actual mechanics involved in the capture of Windows messages is somewhat complex. The Microsoft Developer Network entry on hooks states: ?The system maintains a separate hook chain for each type of hook. A hook chain is a list of pointers to special, application-defined callback functions called hook procedures. When a message occurs that is associated with a particular type of hook, the system passes the message to each hook procedure referenced in the hook chain, one after the other.? (Microsoft Corporation 2007a) Stated more clearly, a hook allows an application to inject a specified sub-routine into the message processing machinery of each process. Conceptually this can be thought of as simply replacing the message handling sub-routine for each application running on the system. 25 Figure 2. Message Processing Prior to "Hooking" Figure 3. Message Processing After "Hooking" Clearly capturing the data sent by the operating system to each application did not completely meet the requirements needed for the data collection phase of this study. The other difficulty that had to be overcome by the development of the monitoring software was a logging procedure to store the data for later retrieval. Ideally all of the collected information would be stored in a file stored locally and then retrieved later. In theory it would have been simple for the ?hooked? sub-routine to simply copy all captured messages to a log file before allowing the target application to process the data. Unfortunately certain characteristics of the hooking process prevented this simple scenario from being executed. It was discovered that the injected sub-routine ran in the context of the actual application that was to receive the message from the operating system. For this reason the sub-routine could not attempt to use a single common file to record the information, due to the fact that it would be impossible to pass an open file pointer into the function prior to hooking (Newcomer 2003). With simplest design 26 eliminated due to technical problems, other data collection facilities were considered, such as transmitting data to a database stored either locally or on a server located elsewhere on the Internet. It was decided that the ?hooked? sub-routine (the sub-routine that was injected into each process) would pass copies of all intercepted messages back to a server application via UDP datagrams. Since the server application was also running on the local host, it was determined that the likelihood of packet loss was low, making the reliability of TCP not worth the added overhead. This solution was chosen because it removed the need for an external database stored locally or remotely. In its final form the monitoring software consisted of two components: a server application and a library that was injected into each application running on the system. The final system was initially based on an open source project published on the internet (Newcomer 2003). That code was then heavily modified to suit the specific requirements of the monitoring software needed for this study. The server application was responsible for invoking the API that caused Windows to inject the library into each executable. It was also responsible for ?unhooking? the DLL from each executable. The final responsibility of the server application was to display an easy to use graphical interface that study participants could easily understand and work with. 27 Figure 4. Interface for of the monitoring software Up to now the messages have been discussed as if they automatically contained all of the needed data to perform an analysis. Unfortunately this was not the case. The messages transmitted by Windows typically contain the following pieces of information: A handle to the ?window? the message is addressed to The message itself A pointer to additional data about the message. The contents of the additional data were dependent on the message itself. Of the three things that were needed to conduct the analysis, only a single item was provided by Windows in the original intercepted message. Fortunately the Windows XP API contains several functions that could be used to deduce the remaining desired information. The GetClassName function is provided by Windows XP and returns a string containing the name of the class that the specified window is an instance of (Microsoft Corporation 2007). As outlined earlier, the ideal dataset would have enabled the analysis to compare precisely what each user was clicking or typing on. The information obtained by examining the classes of the controls acted upon does not quite provide that same level of specificity. For instance, two different users could click on two different buttons located on the toolbar of an application. Because the two buttons are instances of the same class, though, the analysis method used in this study had no way 28 to tell that these were not identical actions. Unfortunately a reliable manner of obtaining the precise data that was desired was never found. It is not believed that relying on this slightly less specific data dramatically affected the results of this study. The other item of information that was not contained in the original message sent by the operating system was the actual application that the message was destined for. Ordinarily, when applications are not attempting to inspect all messages transmitted by the operating system, this piece of information is not important. When an event occurs Windows determines the active window, looks up the message processing chain associated with that window, and transmits the message accordingly. In this study, though, that information was not enough. A method of logging the application that received the action was required. Fortunately, as outlined earlier, the library function that inspected all of the message traffic ran in the context of the application that was to receive the message. Using the GetCommandLine function provided by the Windows XP API, the library function was able to discover the name of the application that it was currently executing in (Microsoft Corporation 2007). This information was transmitted to the server portion of the monitoring software, along with the name of the class of the control to receive the event. Participant Eligibility One of the underlying suspicions used in the design and implementation of this study is that users have certain routines or habits that are unique to each person. It was suspected at the outset of this study that, on average, two different people completing the same set of tasks would undoubtedly use a different set of actions to accomplish their goals. For GUI Usage Analysis to be suitable as a means of either identification or 29 authentication, though, each person would have to have a certain degree of consistency to their actions. For this reason it was determined that the first and most important criteria that study participants were required to meet was that they be moderately proficient at the use of a computer. To illustrate why this limitation was placed, consider a novice user who is completing some task for the first time. When asked to complete a task like ?spell check the current document, correcting all errors,? the user may be able to ?stumble? their way through this task. The interface of modern computing system is designed to be intuitive and the novice user takes advantage of this. The user, reasoning that the spell check system is a ?tool? offered by the word processor, searches under the ?Tools? drop down menu and successfully locates the spell check system. On the third session the user happens to place their mouse over the spell check shortcut and a tool tip appears indicating that pressing the current button will invoke the spell checker. The novice user, not having any particular manner in which they are accustomed to doing things, might decide to use this execution path to invoke the spell checker because it requires fewer mouse clicks. Because the user is a novice they are continually searching and learning, attempting to complete each task. As they rapidly acquire new skills their behavior will undoubtedly change. Demographical Make-up of the Participant Population The participants used in this study were all non-students, meaning that they were all full time employees of some organization. This was considered advantageous in order to ensure a participant population that was not overly skewed towards any particular 30 demographic. The following pieces of information were obtained from the study participants: Profession Highest degree of academic achievement Gender Age range Self-assessment of computer skills Average daily computer usage Source of computer knowledge (self-taught or through a book/instruction) There were a variety of reasons behind the collection of the demographical information from the participants. Some questions, particularly inquiries regarding gender and age, were made with no prior indication that those pieces of data would yield any interesting findings. They were collected for the sake of completeness in case the initial beliefs were proven concerning the utility of that data were proven to be incorrect. The remaining pieces of information were quite purposefully collected from the participants. Prior to commencement of the study, it was hypothesized that an individual?s profession might greatly influence their suitability for GUI Usage Analysis. For instance, it was hypothesized that a user?s skill at operating a computer might have a great impact on their suitability for GUI Usage Analysis. It was believed that more skilled users may know of more efficient, less conspicuous manners of accomplishing tasks. In this way their higher skill set might lead to a more unique behavioral thumbprint. It was also hypothesized that the source of a user?s computer skills might also greatly influence their suitability for GUI Usage Analysis. It was believed that users that 31 taught themselves might be more likely to have unique behavioral fingerprints when compared with those who learned from a common book or instruction. The downside to this question is that it may not be possible to fully investigate and answer with the data collected in this study, as the participants that indicated they learned from a book or instruction presumably did not all use the same book/instructor. Daily computer usage was also considered to be a trait that might impact the performance of the algorithm in this study. It was hypothesized that persons that use a computer for longer periods of the day may have behavioral patterns that are more firmly entrenched than users that spend less time in a day using a computer. Though this initial description sounds similar to the rationale behind the collection of data concerning computer skills, the two pieces of information are different. Consider a clerk who routinely enters data into Microsoft Word or Excel files. This clerk knows how to do his/her job well and has learned to complete most common tasks in those two applications quite efficiently. This person?s skill set is confined, though, to the use of Microsoft Excel and Word, meaning that this person hardly qualifies as an expert computer user. The final, unaddressed piece of data gathered from the study participants concerned their academic achievement. It was hypothesized that an individual?s native intelligence might be a more accurate indicator of success with GUI Usage Analysis than any of the other items previously addressed. It was believed that intelligent users may be more likely to learn methods of accomplishing tasks, be classified as expert users, as well as being heavy computer users. While hardly conclusive, the degree academic 32 achievement does have a natural link to an individual?s native intelligence, explaining why it was chosen as an item to be collected from the participants of this study. Though 31 participants completed the experiment correctly, only 29 completed the required questionnaire. The information provided by those 29 participants is listed in Table 2. Initial inspection seems to show the participant population as being heavily comprised of individuals in two different classes: IT and Education. It is important to realize, though, that the individuals that listed their field as ?Education? are faculty at a particular high school. These individuals were chosen because of their diverse specialties. While it may be factually correct to list their field as ?Education?, each of them is equally proficient in a secondary skill such as mathematics, literature, physical science, etc. Table 2. Demographic Survey Results Participant Number Current Profession Highest Degree of Academic Achievement Gender Age Self Assessment of Computer Skills Avg. Amt Of Daily Comp. Usage Source of Comp. Skill 1 Marketing / PR Some Undergrad. F 40- 65 Average 8+ Course / Instruct. 2 Accounting Some Grad. F 40- 65 High 8+ Self Taught 3 IT Some Undergrad M 25- 39 Average 4-6 Self Taught 4 IT Associates Degree M 18- 24 High 6-8 Self Taught 5 IT Some Grad. M 25- 39 Above Average 8+ Self Taught 6 Engineering Graduate Degree M 25- 39 High 6-8 Self Taught 7 IT Some Undergrad. M 18- 24 Average 2-4 Self Taught 8 IT Some Undergrad. M 25- 39 High 8+ Self Taught 33 Table 2 ? Continued Participant Number Current Profession Highest Degree of Academic Achievement Gender Age Self Assessment of Computer Skills Avg. Amt Of Daily Comp. Usage Source of Comp. Skill 9 Accounting Associates Degree F 25- 39 High 6-8 Course / Instruct. 10 IT Associates Degree M 40- 65 Average 4-6 Self Taught 11 Engineering Associates Degree M 40- 65 Average 6-8 Self Taught 12 IT Some Undergrad. M 40- 65 Above Average 2-4 Self Taught 13 IT Some Undergrad. M 25- 39 Above Average 8+ Self Taught 14 IT Associates Degree M 18- 24 High 8+ Self Taught 15 Education Some Grad. F 40- 65 Above Average 2-4 Self Taught 16 Education Graduate Degree F 40- 65 Average 2-4 Course / Instruct. 17 Education Graduate Degree F 40- 65 High 2-4 Course / Instruct. 18 clerical Some Undergrad F 40- 65 High 8+ Self Taught 19 Education Graduate Degree F 40- 65 High 6-8 Self Taught 20 Education Graduate Degree F 40- 65 Above Average 4-6 Self Taught 21 Education Graduate Degree M 25- 39 Above Average 2-4 Self Taught 22 Education Undergrad. Degree F 40- 65 Above Average 2-4 Self Taught 23 Education Graduate Degree F 25- 39 Above Average 2-4 Course / Instruct. 24 Education Undergrad. Degree F 40- 65 Above Average 2-4 Self Taught 25 Education Graduate Degree F 40- 65 Above Average 2-4 Self Taught 26 IT Graduate Degree M 25- 39 High 6-8 Self Taught 34 Table 2 ? Continued Participant Number Current Profession Highest Degree of Academic Achievement Gender Age Self Assessment of Computer Skills Avg. Amt Of Daily Comp. Usage Source of Comp. Skill 27 Education Graduate Degree M 40- 65 Above Average 6-8 Self Taught 28 Education Graduate Degree F 40- 65 High 8+ Self Taught 29 IT Graduate Degree M 40- 65 High 8+ Course / Instruct. Experimental Environment The data collected in this study utilized an identical software configuration on all systems. The systems were configured as follows: Microsoft Windows XP Professional, Service Pack 2 Microsoft Office 2003 Microsoft Internet Explorer 6.0 Users completed the experiment using a standard three button mouse and standard keyboard. Experimental Task List Creating the task list was a surprisingly difficult endeavor. The nature of this study placed a premium on tasks that normal users would automatically know how to do. Unfortunately coming up with tasks that the majority of users knew how to do proved difficult, as evidenced by the need to suppress all interactions with Microsoft Excel. 35 Many computer proficiency books were examined in an attempt to find a suitable task list. Many computer literacy tests, offered both online and in print form, were also examined. It was determined that none of the tests examined met the needs of this study. Frequently the tests were judged to be too complex and containing tasks that the average user would not know how to complete. As a result the task list used in this study was created by the investigators and comprised of tasks that, in the estimation of the investigators, most users would automatically know how to complete. The task list itself consisted of a total of ten steps, with some steps having sub- tasks that had to be completed. The complete task list is shown in Figure 5. User Identification Research Experiment Eric Imsand imsanes@auburn.edu (901) 338-9323 This experiment that you are being asked to participate in is designed to determine whether the ways in which we use a computer are unique to each person. If this is the case then it may be possible to identify people by how they work with a computer, not just the username and password that they typed in. In order to determine this a small amount of experimental data must be collected. You will shortly be asked to complete some routine tasks using a computer. While you are completing these tasks a piece of monitoring software will record which buttons you click on, keys you press, etc. At no time will any personal information be collected from you. Participation in this experiment is voluntary. If you have any questions about the scope of this experiment or what kind of data is to be collected, please contact Eric Imsand. NOTE: Please close Microsoft Outlook before starting the experiment. Directions: 1. Click on Start -> Programs -> E Imsand -> Recording Software 36 2. Click the ?Record? button to start recording. (NOTE: You may minimize the recording software window if you like) 3. Create a folder in the ?My Documents? folder. The folder should be named ?CompTest? 4. Download the file ?sample_document? from http://www.auburn.edu/~imsanes/research/. When asked, please choose to save the document to the folder you created in step #3. 5. Open the document you downloaded in the previous step if it is not already opened. a. Change the font for the entire document to Times New Roman with a 12pt. font size. b. Center the title (the first line of text on the first page), and increase its font size to 16 pt., also making it bold and italics. c. Move the last paragraph in the document to be the third paragraph in the document. d. Spell check the entire document, correcting all errors. e. Save the document as "research document" in the folder created in step #3. f. Close Microsoft Word. 6. Download the file ?Data? from http://www.auburn.edu/~imsanes/research/. 7. Open the document you downloaded in the previous step if it is not already opened a. Calculate the average of columns A & B for all rows (i.e. calculate the average of A2 & B2, A3 & B3, etc.) and store the value in column C (labeled ?Overall average?) of that row. b. Save the document as ?research data? in the folder created in step #3. c. Print the document (if your computer is configured to print to more than one system, use the default printer). d. Close Microsoft Excel. 8. Perform the following file and directory operations: a. Change the name of the file created/saved in #4 from "research document" to "rd#####" where ##### is your actual student ID or your initials. b. Move the folder created in step #1 and all of its contents to the Desktop. 9. 9. Using Internet Explorer (not Netscape, Fire Fox, etc.) please navigate to the following websites. MSNBC CNN Google Auburn University Randolph School 10. Click the ?Stop? button on the monitoring software. Close the recording software. If a survey is displayed, please complete it. 37 Figure 5. Complete Task List provided to Experimental Participants As previously alluded to, undergraduate and graduate students were specifically avoided in the recruitment of test subjects for this study. The fact that all participants were working professionals provided several advantages. Perhaps the most significant of these advantages was that the participants? employment allowed verification that the chosen participants were capable of completing the task list prior to commencement of the study. Each participant?s supervisor was consulted concerning the participant?s computer skills and their ability to successfully complete all of the tasks on the list. The supervisors provided verification that all of the invited participants were capable of completing the tasks on the list, with one possible exception. The assignment list, which is fully enumerated later in this chapter, asked users to complete some tasks in Microsoft Excel. Several supervisors indicated that their participants might be incapable of completing the Excel tasks. The decision was made to ask the participants to complete the tasks in Microsoft Excel and then conduct a post-experiment interview to determine if there were any tasks on the list that they were incapable of completing. Approximately half of the participants indicated that they were not able to complete the Microsoft Excel portion of the study. All of the interactions with Microsoft Excel were removed from the data set prior to analysis. Administration of the Experiment Participants in this study were required to complete the task list a total of five times. The selection of five runs was a compromise between the need to obtain enough information to conduct the experiment without placing an unreasonable burden on the participants. Participants were asked to complete the task list five times on a single day 38 with at least half an hour between runs. The selection of one half hour as an interval between runs was another compromise between the need participants? need for flexibility and a need for some interval between runs. The interval was added to the experimental procedure in the hopes that participants would not remember how they completed the task list. It was hoped that even if the participants could remember what they did they would not be able to remember precisely how they did it. There was some consideration given to the amount of time the administration of the experiment should cover. It was initially believed that the experiment would be carried out over a series of days in an effort to minimize the likelihood that a participant might adjust their behavior to better suit themselves to the need of the study. On the other hand, most users do learn new skills and change the manners in which they consciously complete a task. In the end it was decided that the best approximation of what an actual, deployed system might encounter would be to have the participants perform the experiments in a single day. The overall requirements of the study definitely had an adverse impact on participation. Over 55 users originally volunteered to participate in the study. Of these approximately 55 users, only 31 successfully completed the study in the prescribed manner. Of the 31 that successfully completed the tasks as prescribed by the assignment list, only 29 completed the survey on their demographics. These users have been excluded from parts of the analysis dealing with portions of the subpopulation. As previously stated, participants in this study completed the task list a total of five times. It was believed that, due to the artificial nature of the testing environment (i.e. 39 taking users out of their ordinary environment) two ?test? runs should be administered to users prior to the collection of data actually used in this study. This allowed users to become familiar with the testing environment as well as to refresh their skill set. It was suspected that at least some of the participants in this study would have a working knowledge of how to complete each task on the list but, due to the nature of their daily computer usage, would not immediately recall precisely how to perform each step. Consider a network administrator who spends a great deal of time working with router configuration menus and operating system security policies. This user has used the spell check functionality of a word processor many times in the past but does not need to make use of that particular skill on a daily basis. By allowing two runs before actual data collection, this user is able to refresh his/her memory and recall precisely how they typically invoke the spell check system. Ignoring the first two sessions that were collected had other advantages as well. Many users expressed concern about their performance, frequently making statements such as, ?I?m not sure if I?m doing this the correct way,? or ?I don?t remember exactly how to do this.? To ease these concerns users were informed that their first two runs would not be used in the final analysis. It was believed that telling the participants that their initial two runs would not be analyzed would have little impact on overall result of the study. Pre-experimental Participant Briefing Prior to participation in the experiment, the participants were briefed and consent was obtained. The briefing consisted of both a brief oral presentation, typically lasting 40 less than 2 minutes, as well as information contained in the informed consent form they were required to sign. In accordance with the recommended procedures provided by the Auburn University Office of Human Subjects Research (Auburn University 2007), participants were given a broad overview of the research that was to take place. It is believed that any impact this briefing may have had on the performance of the subjects is minimal. The complete informed consent document that was obtained from all participants is included in Appendix A. 41 CHAPTER 4 USE OF GUI USAGE ANALYSIS AS AN AUTHENTICATION MECHANISM After the data was collected from the study participants, a variety of analytical techniques were applied to the data. Multiple techniques were used in the analysis for two reasons: first to determine the suitability of GUI Usage Analysis as a means of authentication and secondly to determine which analytical method yielded the best results. This chapter discusses the varying analytical techniques and the results produced by each of them. Authentication is typically considered to be the second phase of an access control system. While the identification phase is responsible for determining the user?s supposed identity, authentication is concerned with verifying that claim. Authentication typically centers around requiring the user to provide proof in one of three manners: the user provides something (s)he has, the user states something that (s)he knows, or the user provides some physical characteristic (i.e. something the user ?is?) (Pfleeger and Pfleeger 2003) (Apple, Inc. 2007). Prior to discussing each technique in great detail, some terminology should be clarified. During this discussion, the term ?event? is used to describe the entire set of information yielded from a single user action. In other words, an event is considered to be the combination of the actual user action (left mouse click, ?A? key depressed, etc.), the Windows class of the object the user was interacting with (button, text-box, etc.), and 42 the executable the user was using at the time. The final piece of information may, at first glance, appear redundant. It is not uncommon for applications developed by differing software vendors to have unique class names for the objects present in their user interfaces. However, since the majority of the applications used by the participants of this study were developed by Microsoft Corporation it was decided to include the final piece of information in order to avoid any confusion that could occur if any two applications happened to share class names. As previously detailed, each participant generated three sessions which were used as part of this study. When simulating attacks for this study, one user was selected to be the ?known? user. Two of the known user?s sessions were fed into the classification engine to serve as training data. The other session was omitted to serve as a possible test session. Attacks were simulated by taking a third session and feeding it into the classification engine as well. The classification engine then made a determination as to whether or not the three sessions were generated by the same user, or a different user. The third, unknown session could be either the third session from the ?known? user, or it could be a session from a randomly selected user. Unless otherwise noted, all possible combinations of sessions were tested as both training and testing data. The remainder of this chapter is organized as follows: section 4.1 discusses the overall rationale behind the choices of analytical methods used. Section 4.2 discusses the results of a TF-IDF inspired analysis that was performed with a ?window? of size 2. Section 4.3 discusses the results of a using Jaccard analysis. Section 4.4 covers the results of analysis performed on varying sub-portions of the participant population. 43 Rationale behind Analytical Techniques Used in this Study When initially considering the data produced by the participants in this study, it became clear that there were some unique considerations. First and foremost, there were few, if any, restrictions on the output that could be generated. Generating an experimental alphabet that could represent all events generated during the course of the study would be highly difficult. Consider the process of enumerating all objects within all of the applications the participants could possibly interact with. Next consider all the possible actions that users might perform on the enumerated objects. Clearly this yields a very large number of possible events. For this reason, it was decided to investigate analytical methods that were originally designed for processing and analyzing human generated text. Consider the similarities between the two sets of data. When analyzing a set of human generated text there are few limitations on the text that will be produced. As previously noted, this is very similar to the set of data generated by the users in this study. One additional note should be made prior to considering the differing methods that were used in this study. When evaluating the effectiveness of intrusion detection systems it is customary to measure their performance in terms of false positives and false negatives. ?False positives? and ?false negatives? are informal terms used to describe type 1 and type 2 errors. A false positive, or type 1 error, is said to have occurred when the null hypothesis was incorrectly rejected. In the case of most of the analyses presented here, the null hypothesis states that a masquerade attack is occurring. This means that a false positive, or type 1 error, occurs when the classification engine incorrectly 44 determines that an attack is not occurring. Inversely, a false negative, or type 2 error occurs when the classification engine incorrectly determines that an attack is occurring. For the sake of consistency many of the analysis techniques presented here are quantified using the familiar terminology of ?false positives? and ?false negatives?. For several of the analyses presented, particularly analyses based on variants of the TF-IDF algorithm, the terms ?false positive? and ?false negative? are not entirely accurate. Those analyses typically assess the degree of similarity between two samples, labeling samples that are sufficiently different as being ?attacks?. Thus the TF-IDF inspired analyses are reported as having a false positive rate of 0%. In these instances the threshold was set at a level where false negatives could not occur. Figure 6. Illustration of Attack Threshold Figure 6 illustrates the rationale used graphically. The red line in the graph depicts the maximum dissimilarity between any two sessions generated by the same user, user 1 in 45 this case. The points in blue illustrate the number of discrepancies observed between any random session generated by user one and a session generated by some other user. Simple Dissimilarity Analysis As previously described, TF-IDF (term frequency ? inverse document frequency) based analysis seeks to determine the importance of an event to the overall corpus. In the context of this discussion the corpus is the larger session of data that is being evaluated. The TF-IDF weight is calculated for each word in a body of text. The weight is equal to the number of occurrences of that word, divided by the total number of words in a document. For example, suppose that the TF-IDF weight of the word ?cat? was desired. It would be calculated in the following manner: Where m is the size of the entire document, in number of words. It is not completely accurate to term the analyses performed here as being TF-IDF analyses. TF-IDF analysis is typically used when seeking to determine the importance of a single term or event to a larger body of data, not for determining the dissimilarity of two sets of data. The term ?TF-IDF inspired? will be used in this discussion to demonstrate the role the algorithm played in inspiring the analytical methods described here. A total of three different simple dissimilarity analyses were performed in this study, focusing on two different attributes. As previously documented, the simple 46 dissimilarity analyses presented here determine whether or not a masquerade attack is occurring by comparing the number of discrepancies or differences between the known and unknown sessions. If the degree of dissimilarity is greater than some attack threshold a then an attack is determined to have occurred. The first dissimilarity based analytical method studied used a static attack threshold for all participants. The second dissimilarity based method used a custom attack threshold for each user as well as using an event window of size one. The final dissimilarity based analysis method used a variable attack threshold for each user as well as using an event window of size 2, meaning that each event and its predecessor were analyzed jointly. Simple Dissimilarity Analysis with a static Attack Threshold As a scholarly experiment, GUI Usage Analysis provides the basis for an interesting investigation. For the data obtained to be of any use to the general populace, though, GUI Usage Analysis must eventually be implemented in some sort of software product designed to be used by the general public. The research described in this section outlines one attempt to assess difficulties that might be encountered if GUI Usage Analysis were to be ported to a commercial product. Ease of deployment is a key consideration for any organization that seeks to utilize behaviorally based systems. A system using GUI Usage Analysis would undoubtedly fall under this category. It is believed that a system using a statically determined attack threshold would be far simpler to deploy as opposed to a system that used a variable attack threshold for each user. A system with a static threshold could potentially bypass the training period. Any training period that was required would 47 undoubtedly be simpler when using a static threshold as opposed to a dynamic threshold that adjusted for each user. For this reason, an analysis was performed using a simple dissimilarity analysis and a static attack threshold. In other words, the same dissimilarity threshold was used for all users when determining whether or not a session was generated by the known user or an attacker. If a single threshold can be obtained experimentally then that would represent a finding of great value to any organization that might seek to develop a usable software system incorporating GUI Usage Analysis. A variety of static thresholds were experimented with. The results of these experiments are shown in Table 3. Please note that the attack detection rate is simply the inverse of the false negative rate. It is included here for the sake of completion. Table 3. Dissimilarity Analysis with a Static Attack Threshold Attack Threshold False Positive Rate False Negative Rate Attack Detection Rate 428.5 35.175% 17.24% 82.75% 450 31.94% 20.58% 79.41% 650 24.42% 49.07% 50.92% 700 22.27% 55.6% 44.39% The results listed in Table 3 indicate that several things. First, a decrease in false positive errors was accompanied by an increase in false negative errors, as expected. Second, the relationship between the two error rates was not linear; an increase of X percent in the 48 false negative error rate did not lead to an equivalent drop in the false positive error rate. A chart plotting the relative ascent and decent of each data rate is given in Figure 7. Figure 7. Relationship between False Positive Error Rates and False Negative Error Rates Simple Dissimilarity Analysis with a variable Attack Threshold The analytical method that produced the most promising results was a TF-IDF inspired analysis with a window of size one. As previously described this indicates that events were considered singularly, without any consideration to the event that preceded it or followed it. This analysis was conducted in the following steps: Calculate how often each individual event occurs in the two known samples Average the number of occurrences by dividing by 2. Calculate how often each individual event occurs in the unknown sample Calculate A, the difference between how often a particular event occurs in the averaged known sample in comparison to the unknown sample Calculate B, the occurrences of events that were present in the known sample 49 Calculate C, the occurrences of events that were present in the unknown sample. The total number of discrepancies between the known sample and unknown sample was calculated as: , where A is considered to be the difference between the number of times a particular event occurred in the known and unknown sample. The results achieved using the simple dissimilarity analysis with a window of size 1 were generally very good. Strictly speaking this analysis produced an attack detection rate of 91.34%. In terms of false positives and false negatives, there was a false positive rate of 0% and a false negative rate of 8.66%. Performance of particular users is listed in Table 4. Table 4. Attack Detection Rate for TF-IDF Inspired with Window Size 1 ID Number Attack Detection Rate 1 100.00% 2 100.00% 3 98.58% 4 78.72% 5 100.00% 6 99.29% 7 65.96% 8 100.00% 9 60.28% 10 100.00% 11 73.40% 12 100.00% 13 97.87% 50 Table 4 ? Continued ID Number Attack Detection Rate 14 100.00% 15 98.58% 16 88.65% 17 91.84% 18 89.01% 19 98.58% 20 77.66% 21 86.88% 22 99.65% 23 100.00% 24 82.62% 25 99.65% 26 71.63% 27 95.74% 28 97.16% 29 98.58% 30 81.21% 31 100.00% As illustrated in Table 4, there were a total of 31 participants that successfully completed the experiment at least five times. Student?s t distribution was used to approximate the overall success rate for the entire population. Using an ? of 0.05 with 30 degrees of freedom yields a t value of 2.04. The observed data had a standard deviation 51 of 11.63. This data was used in the following calculations to estimate the interval for the estimated attack detection rate of the entire population, ?: Attempted Simulation of Real-World Performance Obviously three samples is not a large enough sample of data to declare the results achieved here as definitive. If the participants submitted to fourth, fifth, and sixth experimental runs it would not be surprising for a wider degree of dissimilarity to be observed between any two experimental runs. For this reason additional analysis was performed with a threshold of dissimilarity set at a different level. As already discussed the similarity threshold used previously was the maximum dissimilarity seen between two sessions of the same user. A subsequent analysis was performed with the threshold being adjusted by one standard deviation. In other words, a standard deviation was added to the attack threshold to make it appear that the user supplied data was less consistent. The decision to adjust by one standard deviation was driven by the belief that three data points represented too small a number to estimate the actual distribution that participants? data might present. The choice of one standard deviation was admittedly arbitrary. It was felt, though, that the choice of a single standard deviation was as valid as any other amount of adjustment because of the large 52 number of unforeseen variables that are present when considering any human-centric system in deployed ?in the field?. In other words, the author felt that it would not be possible to accurately account for all of the differences that might be encountered when transitioning the system from controlled settings into a real-world deployment. The addition of one standard deviation to the attack threshold simply represents a ?good faith? estimation of the difference in performance that might be encountered outside of a laboratory. Obviously adding one standard deviation to the results caused the attack detection rate to fall. Whereas the first method resulted in an average attack detection rate of ~91%, the secondary analysis resulted in an average attack detection rate of ~81%. The performance of each individual when adding a standard deviation to the attack threshold is indicated in Table 5. Table 5. Attempted Extrapolation of Real-World Attack Detection Rates ID Number Attack Detection Rate 1 98.58% 2 100.00% 3 83.69% 4 48.94% 5 99.29% 6 97.16% 7 49.29% 8 100.00% 9 39.36% 10 100.00% 53 Table 5 - Continued ID Number Attack Detection Rate 11 49.29% 12 100.00% 13 91.13% 14 100.00% 15 97.16% 16 70.57% 17 66.31% 18 66.31% 19 88.65% 20 60.64% 21 65.60% 22 98.23% 23 100.00% 24 62.06% 25 97.16% 26 60.99% 27 79.79% 28 92.91% 29 89.72% 30 63.12% 31 97.87% As previously alluded to, while it appears that the attach detection rate is normally distributed across the entire population, it was not at all clear that the dissimilarity 54 between sets generated by the same user was normally distributed. It was felt that having only three data points was too few to speculate as to the precise nature of the distribution. This was the rationale behind the decision to conduct additional analysis by adding one standard deviation to the attack threshold as opposed to generating a confidence interval based on t values. These calculations were performed, though, and are presented here for the sake of completeness. Using the three known dissimilarity scores, an upper limit for each user?s attack threshold was calculated with an ? = 0.05. The average attack detection rate when analyzing the data in this manner was 83.95%, slightly better than the average detection rate of 81% observed when adjusting each user?s attack threshold by one standard deviation. Likelihood of Individual Users Successfully Perpetrating a Masquerade Attack After determining the overall likelihood that any given session might represent a masquerade attack it was decided to investigate the likelihood that each individual user might be able to perpetrate a masquerade attack on a separate user. In other words, it was hypothesized that the majority of false negative errors for any particular user might be generated by a relatively small portion of the population. It was believed that some users might possibly be able to impersonate another user with high accuracy. Appendix B shows the individual results obtained for each possible combination of known user and attacker. 55 Simple Dissimilarity Analysis with Variable Attack Threshold and Event Window of Size 2 The high success rate achieved when using a simple dissimilarity analysis and an attack threshold that varied for each user was encouraging. Further analysis was performed in an attempt to better these results. The initial analysis was concerned with the overall composition of two sets of data, with little regard for the order in which the events occurred. It was hypothesized that the results might be improved by attempting to factor chronological ordering into the analysis. This method can be described as a simple dissimilarity analysis with an event window of size two. The analysis links each event with the event that preceded it in an attempt to cause the analysis to consider the chronological order in which events occurred. This was the only change from the analytical method described in section titled, ?Simple Dissimilarity Analysis with a static Attack Threshold?. To better understand the analytical method used here, suppose that the following events are executed by user 1: A C E B G D. The analytical method used in section one would have simply compared the data provided by user 1 against an unknown sample. The analysis itself simply consisted of comparing how often event A occurs in the datasets provided by the known and unknown users. The method used in this section is slightly different, though. Instead of comparing how often each event occurred in isolation, pairs of events were compared instead. Instead of focusing on how often event E occurred in each dataset, this method instead focused on how often events (C, E) occurred in tandem in the data sets provided by both the known and unknown users. 56 Stated differently, an event window of size two caused the algorithm to compare how often (A, C), (C, E), (E, B), (B, G), and (G, D) occurred in both the known and unknown datasets. Surprisingly, the results achieved using this method were markedly worse than the results achieved in the ?Simple Dissimilarity Analysis with a static Attack Threshold? experiment. The overall attack detection rate was found to be 50.72%. As always false positives were prevented from occurring. The false negative error rate, which was simply 1 ? {Attack Detection Rate} was 49.28%. The standard deviation observed in this analysis was also much higher than the results achieved in the section titled, ?Simple Dissimilarity Analysis with a static Attack Threshold? (32.53% for this experiment in comparison to 11.63% for the previous experiment). The results for each participant are provided in Table 6. Table 6. Dissimilarity Analysis with a Static Attack Threshold and an Event Window of 2 User ID Attack Detection Rate 1 100.00% 2 0.61% 3 23.48% 4 17.99% 5 67.68% 6 99.39% 7 42.68% 8 97.26% 9 88.72% 57 Table 6 - Continued User ID Attack Detection Rate 10 99.39% 11 35.37% 12 71.34% 13 93.60% 14 64.94% 15 10.98% 16 14.63% 17 37.80% 18 84.45% 19 42.68% 20 13.11% 21 43.90% 22 21.65% 23 25.00% 24 25.91% 25 45.12% 26 7.32% 27 51.22% 28 61.59% 29 84.15% 30 14.02% 31 86.28% 58 Simple Dissimilarity Analysis with a static Attack Threshold and Event Window of Size Two The results achieved in section 4.3.4 were considered to be so discouraging that an analysis using dissimilarity analysis, a static attack threshold, and an event window of two was not conducted. This suspicion was based on the fact that the results achieved when using a static attack threshold and an event window of size one were worse than the results achieved when a variable attack threshold was used. This result was not surprising at the time. It was strongly suspected that a similar trend would be observed when analysis was performed using an event window of size two. Jaccard Co-efficient Analysis Originally developed by Paul Jaccard in the early 20th century, the Jaccard co- efficient is often used to determine the similarity between two sets of data (Kumar, et al. 2006) (He, Chen-Chuan Chang and Han 2004). The Jaccard Index for two sets of data is computed as: . The Jaccard Index is very similar to the basic dissimilarity computation featured in the previous section. It supplements the simple dissimilarity scale by factoring in the total number of events in the sets. The Jaccard co-efficient was computed from the data collected in this study in the following manner: 1. The number of events that the two data sets had in common was computed and used as . If an event was found to have occurred a differing number of times in the two datasets, the lower number of occurrences was used. If event A 59 occurred 7 times in dataset 1 and 9 times in dataset 2, seven was computed as part of the overall sum used to indicate the intersection of the two datasets. 2. The total number of events was calculated and used as . In the opposite manner of the intersection operator, if an event was found to occur in both datasets, the larger of the two values was used as part of the sum used to represent the union of the two datasets. 3. The intersection computed in phase 1 and the intersection computed in phase 2 was divided to yield the Jaccard co-efficient used in this study. Jaccard Co-efficient Analysis with Universal Attack Threshold The analysis performed with simple dissimilarity using a static threshold produced less than desirable results. Both the false positive and false negative rates were higher than would be desired in a commercial system. Despite the lack of success demonstrated with the static attack threshold when used with simple dissimilarity analysis, a similar experiment was conducted using the Jaccard Index. A suitable attack threshold that could be used for all users would undoubtedly be easier to manage than dynamic thresholds individualized for each user. The mathematical properties of the Jaccard Index increase the likelihood of finding a single attack threshold that might be suitable. By dividing the intersection of two data sets by the union of those same two datasets, the Jaccard Index normalizes its values making comparisons easier to conduct. For example, a static attack threshold implemented on a simple dissimilarity analysis would probably not be successful. The simple length of the datasets produced by different users make a simple count of the 60 discrepancies difficult to use as a threshold. This problem is addressed by the Jaccard Index as previously described. The value for the static attack threshold was experimentally derived. As expected, there was a direct inverse effect between the false positive and false negative error rates. An increase in one error rate coincided with a decrease in the other error rate. The false positive, false negative, and attack detection rates for certain illustrative attack thresholds are listed in Table 7. Please note that the attack detection rate is simply and is included in the table only to enhance comprehensibility. Table 7. Jaccard Index with a Universal Attack Threshold Attack Threshold False Positive Rate False Negative Rate Attack Detection Rate 0.70 52.69% 0.90% 99.1% 0.675 45.16% 1.21% 98.79% 0.65 37.63% 1.92% 98.08% 0.625 33.33% 3.53% 96.47% 0.60 26.88% 5.77% 94.23% 0.575 19.35% 9.05% 90.95% 0.55 12.90% 13.26% 86.74% 0.525 7.53% 18.54% 81.46% 0.50 3.23% 25.53% 74.47% 61 Unlike the performance of the simple dissimilarity comparison algorithm, the attack detection rate found when using the Jaccard Index was, in the estimation of the researcher, reasonable. Most encouraging was the fact that, unlike the simple dissimilarity analysis, the false negative error rate climbed at a relatively slow rate. Figure 8 provides a graphical representation of the relative rise and fall of the false positive and false negative error rates. Figure 8. Relationship between False Positive Error Rates and False Negative Error Rates using the Jaccard Index The overall performance was determined by the researcher to be satisfactory. The suitability of these statistics is ultimately up to any customer were this to someday be developed into a commercial product. Assuming that the majority of businesses would prefer the results achieved with an attack threshold of 0.525, the specific performance of each user is provided in Table 8. 62 Table 8. Individual Performances with a Universal Attack Threshold using Jaccard Analysis User ID False Positive Rate False Negative Rate Attack Detection Rate 1 0.00% 0.00% 100.00% 2 0.00% 11.70% 88.30% 3 0.00% 24.47% 75.53% 4 0.00% 32.62% 67.38% 5 0.00% 37.94% 62.06% 6 0.00% 33.69% 66.31% 7 33.33% 5.32% 94.68% 8 0.00% 24.82% 75.18% 9 66.67% 17.38% 82.62% 10 0.00% 15.96% 84.04% 11 0.00% 20.21% 79.79% 12 0.00% 21.28% 78.72% 13 0.00% 39.36% 60.64% 14 0.00% 16.67% 83.33% 15 0.00% 33.69% 66.31% 16 33.33% 0.71% 99.29% 17 33.33% 0.35% 99.65% 18 33.33% 3.55% 96.45% 19 0.00% 0.71% 99.29% 20 0.00% 14.18% 85.82% 21 0.00% 25.53% 74.47% 22 0.00% 25.18% 74.82% 63 Table 8 - Continued User ID False Positive Rate False Negative Rate Attack Detection Rate 23 0.00% 0.35% 99.65% 24 0.00% 29.43% 70.57% 25 0.00% 30.14% 69.86% 26 33.33% 18.79% 81.21% 27 0.00% 15.60% 84.40% 28 0.00% 11.70% 88.30% 29 0.00% 0.00% 100.00% 30 0.00% 28.37% 71.63% 31 0.00% 35.11% 64.89% Jaccard Co-efficient Analysis with Individual Attack Thresholds Just as was the case with the simple dissimilarity analysis performed in the previous section, an analysis was also conducted using attack thresholds customized for each user. The results obtained were slightly better than the results obtained from the simple dissimilarity analysis. Just as with the dissimilarity analysis the attack threshold was set at a level to prevent false positives from occurring. The overall average attack detection rate was 93.73%. The standard deviation observed when finding masquerade attacks using the Jaccard Co-efficient was also lower than observed during the comparable dissimilarity analysis. The simple dissimilarity analysis yielded a standard deviation of 9.6 for the Jaccard Co-efficient analysis and a standard deviation of 11.6 when using simple dissimilarity analysis. 64 The attack detection rates for each user are shown in Table 9. The 95% confidence interval obtained from the data analyzed using Jaccard?s Index was 90.19% ? ? ? 97.26%. Table 9. Individual Performance with Customized Attack Thresholds using Jaccard Index ID Number Attack Detection Rate 1 100.00% 2 96.81% 3 95.74% 4 82.27% 5 100.00% 6 100.00% 7 92.20% 8 100.00% 9 69.50% 10 100.00% 11 91.13% 12 100.00% 13 98.23% 14 100.00% 15 95.74% 16 61.35% 17 100.00% 18 93.26% 19 99.29% 20 93.26% 65 Table 9 - Continued ID Number Attack Detection Rate 21 92.91% 22 97.87% 23 100.00% 24 82.98% 25 99.65% 26 76.24% 27 97.87% 28 98.23% 29 100.00% 30 91.13% 31 100.00% It is interesting to note that only one user had their attack detection rate drop using Jaccard?s Index as a measure of similarity when compared to the results obtained by simply analyzing the number of discrepancies between sets. Several users, though, had a dramatic rise in their attack detection rate. Users 7 (49.29% to 92.20%), 4 (48.94% up to 82.27%), and 11 (49.29% up to 91.13%) are several examples of users that experienced dramatic increases of at least 30% in their attack detection rate. Only user 16?s attack detection rate fell when analyzed using the Jaccard Index. A comparison of the two methods and the performance for each user is shown in Table 10. 66 Table 10. Comparison of Dissimilarity and Jaccard Analysis Overall Average Detection Rate Simple Dissimilarity Analysis Jaccard Analysis 91.34% 93.73% User ID Number Simple Dissimilarity Analysis Jaccard Analysis 1 98.58% 100.00% 2 100.00% 96.81% 3 83.69% 95.74% 4 48.94% 82.27% 5 99.29% 100.00% 6 97.16% 100.00% 7 49.29% 92.20% 8 100.00% 100.00% 9 39.36% 69.50% 10 100.00% 100.00% 11 49.29% 91.13% 12 100.00% 100.00% 13 91.13% 98.23% 14 100.00% 100.00% 15 97.16% 95.74% 16 70.57% 61.35% 17 66.31% 100.00% 18 66.31% 93.26% 19 88.65% 99.29% 20 60.64% 93.26% 67 Table 10 - Continued User ID Number Simple Dissimilarity Analysis Jaccard Analysis 21 65.60% 92.91% 22 98.23% 97.87% 23 100.00% 100.00% 24 62.06% 82.98% 25 97.16% 99.65% 26 60.99% 76.24% 27 79.79% 97.87% 28 92.91% 98.23% 29 89.72% 100.00% 30 63.12% 91.13% 31 97.87% 100.00% Impact of Varying Demographics on Authentication As described in Chapter 3, in the section titled, ?Demographical Makeup of the Participant Population?, the participants in this study were surveyed and a variety of demographical characteristics were recorded. This data was collected to be used to determine if any particular portion of the experiment population was better suited for using GUI Usage Analysis as a means of authentication. Because no single sub- population contained enough members to be considered statistically significant, the findings presented here are submitted as indicators, not definitive evidence. 68 The findings presented here are assessed in terms of overall attack detection rate when using the Jaccard Index to analyze and detect masquerade attacks. The majority of the characteristics collected from the participants centered on the suspicion that computer proficiency would positively impact the attack detection rate. It was believed that surveying the participants on a variety of characteristics was the most efficient manner in which to determine computer proficiency, short of administering a potentially biased computer proficiency exam to participants. Unless otherwise noted, the analyses presented here utilize the data yielded from section the section titled, ?Jaccard Co-efficient Analysis with Individual Attack Thresholds?. Furthermore, because the number of respondents falling into a particular group was often small, confidence intervals with ? = 0.05 have also been included to demonstrate how close these results might mirror results obtained from the overall population. Please also note that any categories for which only a single participant was a member were excluded from the following analysis. Impact of Vocation on Authentication The data was analyzed to determine the degree to which an individual?s vocation affects the attack detection rate when using Jaccard Index and GUI Usage Analysis. It was hypothesized by the author that fields that relied on the use of computers might have higher attack detection rates than members of professions that did not. Only vocations that were indicated on more than three occasions were included in the analysis. The differences in attack detection rates between the varying vocations was judged to be too small to be of any significant consequence. The results are shown in Table 11. 69 Table 11. Impact of Vocation on Attack Detection Profession Num of Respondents Attack Detection Confidence Interval IT 7 93.82% Education 12 93.88% Engineering 3 97.04% Technician 3 97.40% Impact of Educational Achievement on Attack Detection The next demographical characteristic examined was the degree to which educational achievement impacted the attack detection rate. The author hypothesized that individuals with greater educational achievement would possess greater proficiency at using a computer. This belief was based on the anecdotally supported notion that professional people are more likely to require a computer to complete their daily tasks. The results of this analysis are shown in Table 12. The data seems to indicate a slight trend towards higher achievement producing a higher attack detection rate. 70 Table 12. Impact of Educational Achievement on Attack Detection Highest Degree Achieved Num of Responses Attack Detection Confidence Interval Some Undergraduate Coursework 5 97.87% Associates Degree 5 88.58% Undergraduate Degree 3 94.56% Some Graduate Coursework 4 98.05% Graduate Degree 11 93.46% Impact of Self-Reported Computer Skill on Attack Detection As previously stated, the author suspected that the greater a user?s proficiency at using a computer, the higher their attack detection rate would be. In an effort to determine the accuracy of this hypothesis each participant was surveyed and asked to indicate their proficiency at operating a computer. The results are shown in Table 13. The results show a slight trend indicating that users with above average computer skills enjoy higher attack detection rates. Table 13. Impact of Self-Reported Computer Skill on Attack Detection Computer Skill Num of Respondents Attack Detection Confidence Interval High 12 93.38% Above Average 11 96.84% Average 6 90.60% Impact of Daily Computer Usage on Attack Detection The average amount of time spent each day using a computer was collected from each participant. Subjects were asked to indicate the amount of in two hour blocks (0-2, 71 2-4, etc.). The results of this analysis are provided in Table 14. The results show a slight trend indicating that users that spend more time each day using a computer enjoy a slightly higher attack detection rate. Table 14. Impact of Daily Computer Usage on Attack Detection Daily Use Num of Respondents Attack Detection Confidence Interval 8+ Hours 9 98.86% 6-8 Hours 7 88.30% 4-6 Hours 3 97.40% 2-4 Hours 10 92.94% Impact of Computer Skill Source on Attack Detection The final piece of information that was solicited from each participant was the source of their computing skills. Participants were asked to indicate where they acquired the majority of their computing skills, either through a course or through self instruction. It was hypothesized by the author that users that had taught themselves to use a computer would be better suited for GUI Usage Analysis and consequently have a higher attack detection rate. The results of this analysis are shown in Table 15. The results are consistent with the expectations and indicate a trend towards users that teach themselves having higher attack detection rates. It is interesting to note that a relatively small number of users indicated that they had obtained their computing skills from an organized course or instruction. It is possible that a user?s source of computing skill might be a great indicator of a user?s suitability for GUI Usage Analysis but offer little practical value because of the relatively small number that obtained their skills from a course. 72 Table 15. Impact of Computer Skill Source on Attack Detection Educational Method Num of Respondents Attack Detection Confidence Interval Self Taught 23 95.59% Through a course or instruction 6 88.47% Impact of Age on Attack Detection The participants? age was another item that was analyzed to determine the degree to which it could be used as a predictor of suitability for GUI Usage Analysis. It was hypothesized by the author that older users might be less comfortable with the use of a computer, possibly impacting the effectiveness of GUI Usage Analysis. The results of this analysis are provided in Table 16. The results of this analysis were quite interesting. Contrary to expectations, the average attack detection rate increased with the participants? age, though the improvement was rather modest. Table 16. Impact of Age on Attack Detection Age Num of Respondents Attack Detection Confidence Interval 18-24 3 91.48% 25-39 9 93.06% 40-65 17 95.13% In all, it was felt that the demographical findings, while interesting, could only be used to indicate trends. There were simply not enough participants in each category, regardless of the characteristic being examined, to strongly infer how each subpopulation 73 might actually perform in a laboratory environment. While secondary to the investigations presented here, studies into which particular user groups perform better in GUI Usage Analysis studies might be an area of future study. Summary of Findings The results obtained here indicate that GUI Usage Analysis can be used as an effective means of authentication to defend against masquerade attacks. The overall attack detection rate was found to be higher in instances in which customized attack thresholds were used for each user. Furthermore, the attack detection rate was found to be higher when analyzed using the Jaccard Index as opposed to other simple dissimilarity analyses. Finally, the participant pool was analyzed to determine if a single demographic provided significant insight to the suitability of GUI Usage Analysis for a particular user. The number of participants in each portion of the population was too small to draw any definitive conclusions. The data indicated that users using a computer eight or more hours a day enjoyed higher attack detection rates. 74 CHAPTER 5 GUI USAGE ANALYSIS AS AN IDENTIFICATION MECHANISM The ability to correctly spot an attack is always highly prized by administrators. Traditionally, the second part of coping with an attack is recovery from the attack. For many system administrators, part of the recovery process is the ability to examine the evidence and attempt to place blame or responsibility for the attack; a virtual ?whodunit?. In Chapter 4 a study was presented demonstrating the utility of GUI Usage Analysis as a means of authentication. An authentication system enables someone to determine that an attack is occurring. In order to determine who the attacker was, GUI Usage Analysis must be adapted to serve a different purpose: identification. Authentication systems can traditionally be thought of as producing two possible outputs: either the current user is authenticated, indicating that the proper user is the one operating the computer, or the current user is not authenticated, indicating an attack is occurring. Identification, on the other hand, is a much more difficult function to construct. Whereas authentication functions returned one of two values (TRUE, FALSE), identification functions will return one of N values, where N is the number of users known to the system. This chapter will present the results of two different attempts at performing identification using GUI Usage Analysis data. The first attempt uses the Jaccard Index 75 presented in Chapter 4 as part of an identification scheme. The second attempt involves the use of Artificial Neural Networks as part of an identification scheme. Identification Using Jaccard Index User Identification As explained in Chapter 4, the Jaccard Index is a mathematical representation of the similarity between two sets of data. The Jaccard Index is defined as: J(A, B) = The result of the Jaccard Index can then be used to determine which sets are more similar to each other, as was done in Chapter 4. In Chapter 4 the Jaccard Index was used to authenticate a simulated unknown user using a computer terminal. The result of that operation was binary: the user?s claimed identity was either confirmed (the user was authenticated) or the user?s claimed identity was rejected. As previously described, identification is a function that can produce up to N outputs, where N represents the number of users known to the system. As the first step of using GUI Usage Analysis as a means of identification, the Jaccard Index authentication procedure introduced in Chapter 4 was modified. As previously illustrated, when two sessions are compared to each other using the Jaccard Index a similarity score is generated. The results of the authentication study presented in Chapter 4 indicate that, generally speaking, sessions generated by the same user have higher Jaccard Index values than sessions generated by differing users. Using 76 this rationale, a study was performed to determine if the Jaccard Index could be implemented as part of an identification study. To test whether or not the Jaccard Index could be used to identify users from their GUI interaction data, the following algorithm was used: For each user A Max_Jaccard_Index = 0.0 Training = Select_training_sessions(A, 2) For each user B If A = B Test = Select_test_session(A, Training) Else Test = Select_test_session(B) End If Current_Jaccard_index = Jaccard(Training, Test) If Current_Jaccard_index > Max_Jaccard_Index Max_Jaccard_Index = Current_Jaccard_Index Suspected_Identity = B End If End For Print(?Suspected Identity: ?, B) End For Figure 9. Pseudo-code of Jaccard Index Identification Function The algorithm selects a known user A and examines each session not present in the training data. The algorithm returns the user B that generated the maximum Jaccard Index when analyzed with the training data extracted from A. That user is assumed to be the same person as the one that provided the training data. In this instance, the Jaccard Index is used to identify the donor of an unknown sample. It should be noted that every possible combination of known and unknown user was experimented with. Furthermore, all possible combinations of training and test data were also exhaustively tested. This identification experiment was performed on the same thirty one individuals that participated in the authentication study presented in Chapter 4. Overall, the Jaccard 77 based identification algorithm successfully identified the unknown user in 77.1% of trials. The raw data resulting from this study is presented in Appendix C. The performance for each individual user (i.e. successful identification rate) is shown in Table 17. Table 17. Successful Identification Rate by User User Number Identification Success Rate 1 100 2 100 3 100 4 33.33 5 100 6 100 7 33.33 8 100 9 33.33 10 100 11 33.33 12 66.66 13 100 14 100 15 100 16 66.66 17 100 18 66.66 19 100 20 66.66 21 100 22 100 23 100 24 33.33 25 66.66 26 0 27 66.66 28 100 29 66.66 30 33.33 31 100 78 The standard deviation of the individual identification performance is quite large, averaging 30.05. This is due to the fact that only three test sessions were possible from the data being used. Therefore, a single failed identification would reduce the success rate from 100% to 66.66%. For this reason, it may not be reasonable to say on an individual by individual basis that one user definitively performed better, or that their behavioral patterns more uniquely identify them in comparison to fellow participants. However, when examined as a whole, this study featured 93 different trials, with successful identification occurring in 77.1% of the cases. It is reasonable to conclude that this number is probably highly comparable to the overall identification success rate that would be experienced in the general population. Demographic Analysis Just as was the case when GUI Usage Analysis was analyzed as a means of authentication, an analysis was conducted to determine if any single user trait predisposed users to enjoying more success with GUI Usage Analysis. Also, it must again be stated that the findings here are simply indications of possible trends. The number of participants in each group is too small to draw any definitive conclusions regarding each trait?s impact on the use of GUI Usage Analysis as a means of identification. The findings presented here are assessed in terms of overall successful identification rate when using the Jaccard Index to analyze and detect masquerade attacks. The majority of the characteristics collected from the participants centered on 79 the suspicion that computer proficiency would positively impact the attack detection rate. It was believed that surveying the participants on a variety of characteristics was the most efficient manner in which to determine computer proficiency, short of administering a potentially biased computer proficiency exam to participants. The results presented here are based on the number of times members of a particular group were successfully identified based on their GUI Usage patterns. Confidence intervals with ? = 0.05 are provided in an effort to indicate what the attack detection rate for the entire sub-population might be. Finally, please note that any category which had one or fewer respondents were not included in these analyses. Almost universally, the standard deviation amongst respondents of a particular sub-group was simply too large to make any reliable inferences regarding the likelihood of using GUI Usage Analysis as a reliable means of identification within some sub-group of the population. The resulting analyses do show trends in the data, but these must also be taken with a proverbial ?grain of salt?. Impact of Vocation on Authentication The data was analyzed to determine the degree to which an individual?s vocation affects the identification rate when using Jaccard Index and GUI Usage Analysis. The author hypothesized that fields that relied on the use of computers might have higher identification rates than members of professions that did not. Only vocations that were indicated on more than three occasions were included in the analysis. In total the difference in identification rate between members of the varying professions was judged by the author to be relatively small, particularly given the few number of participants that 80 listed their vocation as ?Engineering? or ?Technician?. The results are shown in Table 18. Table 18. Impact of Vocation on Identification Rate Profession Num of Respondents Identification Rate Confidence Interval IT 7 71.43% Education 12 83.33% Engineering 3 66.66% Technician 3 77.78% Impact of Educational Achievement on Identification Rate The next demographical characteristic examined was the degree to which educational achievement impacted the identification rate. The author hypothesized that individuals with greater educational achievement would possess greater proficiency at using a computer. This belief was based on the anecdotally supported notion that professional people are more likely to require a computer to complete their daily tasks. The results of this analysis are shown in Table 19. The data seems to indicate a slight trend towards higher achievement producing a higher identification rate. In particular, the largest gap seems to be between individuals with at least some training at a traditional four year institution. 81 Table 19. Impact of Educational Achievement on Identification Rate Highest Degree Achieved Num of Responses Identification Rate Confidence Interval Some Undergraduate Coursework 5 80.00% Associates Degree 5 60.00% Undergraduate Degree 3 77.78% Some Graduate Coursework 4 91.67% Graduate Degree 11 78.79% Impact of Self-Reported Computer Skill on Identification Rate As previously stated, it was suspected that the greater a user?s proficiency at using a computer, the higher their rate of successful identification would be. In an effort to determine the accuracy of this hypothesis each participant was surveyed and asked to indicate their proficiency at operating a computer. The results are shown in Table 20. Unlike the results observed during the authentication study, self-reported computer skill seems to have had a slightly larger impact on the identification rate of individuals. 82 Table 20. Impact of Self-Reported Computer Skill on Identification Rate Computer Skill Num of Respondents Identification Rate Confidence Interval High 12 75.00% Above Average 11 81.82% Average 6 72.22% Impact of Daily Computer Usage on Identification Rate The average amount of time spent each day using a computer was collected from each participant. Subjects were asked to indicate the amount of in two hour blocks (0-2, 2-4, etc.). This data was then analyzed to determine if there might be any evidence linking daily computer usage to successful identification rates. The results of this analysis are provided in Table 5. Unlike the results observed during the authentication study, the data gathered here indicates that identification success seems to be dependent on either high daily usage or low daily usage. Users that utilized computer systems for intermediate amounts of time on a daily basis appear to enjoy reduced identification rates, particularly when accounting for the fact that only three individuals indicated between four and six hours of use daily (and thus might represent an aberration). It is possible that the amount of daily computer usage corresponds to overall levels of actual computer proficiency (contrast with self-reported proficiency described in the previous section). In this instance it may be the case that being either an expert user or a novice user may result in distinctive patterns, generated by either copious computer knowledge or a dearth of computer knowledge. The results are shown in Table 21. 83 Table 21. Impact of Daily Computer Usage on Identification Rates Daily Use Num of Respondents Identification Rate Confidence Interval 8+ Hours 9 92.59% 6-8 Hours 7 52.38% 4-6 Hours 3 88.89% 2-4 Hours 10 76.66% Impact of Computer Skill Source on Identification Rates Information was solicited from each participant regarding the source of their computing skills. Participants were asked to indicate where they acquired the majority of their computing skills, either through a course or through self instruction. The author hypothesized that users that had taught themselves to use a computer would be better suited for GUI Usage Analysis and consequently have a higher identification rate. The results of this analysis are shown in Table 22. Unlike the authentication study, the results do not show much of a trend between the source of computer usage and identification rates, though the relatively small number of respondents indicating they obtained their skills through a course or instruction makes it impossible to draw definitive conclusions. Table 22. Impact of Computer Skill Source on Identification Rates Educational Method Num of Respondents Identification Rate Confidence Interval Self Taught 23 76.81% Through a course or instruction 6 73.33% 84 Impact of Age on Identification Rates The participants? age was another item that was analyzed to determine the degree to which it could be used as a predictor of suitability for GUI Usage Analysis. The author hypothesized older users might be less comfortable with the use of a computer, possibly impacting the effectiveness of GUI Usage Analysis. The results of this analysis are provided in Table 23. The results of this analysis were quite interesting. Contrary to expectations, the identification rate increased with the participants? age, though the improvement was rather modest. It is suspected that a larger population would yield results indicating that a user?s age offers no indication of his/her suitability for GUI Usage Analysis. Table 23. Impact of Age on Identification Rates Age Num of Respondents Identification Rate Confidence Interval 18-24 3 55.55% 25-39 9 81.48% 40-65 17 78.43% Analysis Using Artificial Neural Networks In an effort to further confirm the findings of the Jaccard Index experiment, another identification experiment was performed. This experiment made use of a form of machine learning known as artificial neural networks, commonly referred to as ?neural networks?. Neural networks were selected for experimentation because of their history 85 as a proven machine learning technique capable of analyzing many different problems in a wide variety of problem domains (Mathworks Corporation 2007). A diagram showing a conceptual neural network may be found in Figure 10. Figure 10. Conceptual Diagram of a Neural Network with N Input Units, four Hidden Units, and two Ouputs Several challenges were encountered in the adaptation of neural networks to the data collected in this study. For example, neural networks are typically designed to accept multiple numerical values in parallel as inputs. This posed an implementation challenge in this instance, since the data gathered in this study was more analogous to a corpus of text. Furthermore, when neural networks are used to analyze text documents, the input is frequently categorized by word frequency (i.e. how often each word occurs in the document). Often the number of distinct words in a document is a relatively small number that the neural network can cope with. Applying this model to the data gathered in this study, each event (action, control, executable) becomes analogous to a word in a text document. Unfortunately, this precise implementation method was not practical for 86 use in this study. In this study, tens of thousands of distinct events were observed, translating to tens of thousands of ?words? and resulting in the neural network needing to accommodate the same number of parallel inputs. This is generally considered to be too many inputs for a single neural network to process effectively, leading to a condition known as overfitting (Russell and Norvig 2003). Overfitting occurs in instances in which the number of inputs to a neural network greatly outpaces the amount of training data provided to the network. A neural network that has been overfitted has been trained not only to recognize the important characteristics in the data, but also ends up ?memorizing? the unimportant data (noise). To illustrate the problem of overfitting, consider the following simple example. Suppose a neural network has been designed to identify graduate students in Computer Science. The network has been configured to accept inputs concerning the following observable qualities of a person: Height Weight Age Gender Ethnicity Hair color Clothing style Build (skinny, stocky, etc.) 87 The neural network is then presented with three examples of Computer Science graduate students to be used as training data. Following the training period the network is presented with its first test case. The network then proceeds to classify the first test case incorrectly, labeling an English major as a CS graduate student. Inspection of the data yields the problem: the test case, like all of the training data, was male. In this overly simple case, overfitting caused the network to incorrectly believe that being male was the most important factor in identifying Computer Science graduate students. The workaround crafted for this problem still relied on a count of the number of times some object was involved in an event. Instead of considering an event as an atomic unit, though, events were instead broken up into their smaller components. This yielded a neural network structure that was given the following inputs: # of times a user action (left click, double click, ?A? press, etc.) was observed # of times a particular type of control (scroll_bar, text_box, etc.) was observed # of times a particular process (Winword.exe, Iexplore.exe, etc.) was observed The number of distinct inputs being passed to the neural network was still quite large (over one thousand) but the network seemed capable of coping with this volume of input sources. The neural network implementation used in this study was the Artificial Neural Network Toolbox (ANN) distributed as part of the Matlab numerical analysis suite (Mathworks Corporation 2007). The Neural Network Toolbox was relatively robust and easy to use. Furthermore, it is believed that the use of pre-written, commercially available analysis packages prevented coding errors that may have occurred in the 88 implementation of a relatively complex algorithm such as back-propagation neural networks. Other than specifying the number of hidden processing layers contained in the neural network, the recommended default settings provided by the Matlab developers were used in this study. The neural network that was developed contained one hidden processing layer of 45 neurons. The number of neurons used in the final experiment was derived through experimentation. Using more than 45 neurons resulted in slightly enhanced accuracy at the cost of much longer processing times. The use of fewer than 45 neurons resulted in significantly degraded accuracy with only modest processing time savings. The neural network featured N outputs, with one output corresponding to each participant in the sample. The network was constructed by randomly selecting two data sessions from each user for use as training data. The remaining data session was set aside to be used as test data. The neural network was trained for five hundred epochs. This number was arbitrarily selected, though this did not appear to matter. Training of the network was aborted by the Matlab package prior to reaching the training limit when the performance of the network hit a proverbial ceiling (i.e. the network was trained to its fullest possible extent). The performance of the neural network was found to be inferior to the performance of the Jaccard Index based algorithm. Using the neural network, only 12 of 31 participants were correctly identified, yielding a successful identification rate of 38.7%. This compares quite poorly with the successful identification rate of the Jaccard Index based identification algorithm described earlier in this chapter. It is believed that 89 overfitting was the primary cause of the poor performance, resulting from the large disparity between the number of training cases (62) and the number of inputs (3185). One of the interesting aspects of the design of neural networks is that all output units produce output on all input cases. This means, in the instance of this output unit, that all N output units yield a number between 0 and 1 after each test. This number corresponds to the similarity that the network perceived between the test sample and the reference sample provided for that individual. On a practical level this structure allows an investigator to easily determine which user the network incorrectly identified in instances in which its analysis produced an incorrect result. Furthermore, this structure allows for determine how many more ?guesses? would have been required before correctly identifying the right user. The results of this type of analysis are presented in Table 24. The column labeled ?User Rank? indicates the number of guesses or attempts that the network required prior to correctly making the identification. Stated differently, this column represents the number of users that were incorrectly identified by the network prior to arriving at the correct result. On average, the neural network did not correctly determine the identity of the unknown user until the eighth attempt. Because the neural network faired poorly in identifying the unknown user, demographic data was not analyzed in conjunction with neural network performance. It was determined that such an analysis would not yield noteworthy findings. 90 Table 24. Average User Rank in Identification with Neural Network User ID User Rank 1 3 2 4 3 1 4 30 5 16 6 1 7 8 8 12 9 21 10 1 11 21 12 6 13 7 14 1 15 1 16 4 17 1 18 9 19 1 20 11 21 3 22 1 23 1 24 11 25 1 26 15 27 11 28 11 29 1 30 1 31 24 Summary of Findings This chapter has presented the results of a study of using GUI Usage Analysis data as part of an identification scheme. The results indicate that Jaccard Index based identification algorithms perform with reasonable accuracy, correctly identifying the 91 unknown user in 77% of cases. This level of accuracy, while certainly insufficient for legal uses, arguably offers enough accuracy so that its findings can be used in conjunction with other evidence in an effort to identify an attacker. The artificial neural network, though, was not able to correctly interpret the information often enough, and as a result, performed much worse than its Jaccard based counterpart. The results of this analysis indicates that the Jaccard method was much better suited to handle the type and quantity of data gathered in this study. 92 CHAPTER 6 COMPARISON TO OTHER PUBLISHED TECHNIQUES As discussed in Chapter 2, many studies have been conducted seeking to study the use of a variety of user traits as possible means of either identification or authentication. Unfortunately many of the traits studied in earlier studies are better suited for command- line driven interfaces, not modern graphical user interfaces. For example, the effectiveness of keystroke dynamics as a means of authentication or identification is inherently linked to the amount of time a user spends typing. Efforts to profile users based on their interactions with modern computing environments first centered around the users mouse movements. Most likely inspired by the previously published studies focusing on users keyboard patterns (i.e. keystroke dynamics), these studies sampled several traits seen by users using the mouse. Two separate studies have been published analyzing the utility of using mouse movements as a form of authentication (Garg, et al. 2006), (Pusara and Brodley 2004). Both of these studies will be examined here, with the published results being compared to the results of the authentication experiments presented in this dissertation. Unfortunately a direct comparison to either study is not possible since the type and quantity of data gathered for the experiments presented here are incompatible with the data analysis techniques used in either of the other studies. 93 Comparison to Pusara & Brodley?s Mouse Movement Profiling In (Pusara and Brodley 2004), Pusara and Brodley present a technique for conducting ?re-authentication? by studying a user?s mouse movements. Their technique performed authentication by comparing several traits extracted from an unknown user?s mouse movements to a reference profile associated with the legitimate user. Pusara and Brodley?s work showed promise, though the application of their technique was ultimately limited to users who made heavy use of the mouse. Pusara and Brodley organized an experimental sample of 18 undergraduate college students. They employed a monitoring package that recorded the following information during the user?s session: Mouse location (sampled every 100 ms) Number of left clicks Number of right clicks Number of double-clicks Number of ?non-client area? clicks Time of each event (mouse movement, mouse clicks, etc.) Pusara and Brodley?s participants were observed while reading the same set of webpages. As a result, the participants interacted exclusively with the Internet Explorer web- browser. The aforementioned traits were analyzed and several features were extracted. The overall number of each of the traits was calculated. Furthermore, the mean, standard deviation, and third moment were calculated for the distance, angle, and speed observed 94 between events. Finally, the mean, standard deviation, and third moment were all calculated between cursor locations (as sampled every 100 ms by the monitoring software). Pusara and Brodley?s participants were asked to submit to data collection one time. The participants were monitored until 10,000 observations were recorded by the monitoring software. This single set of data was then divided up into multiple subsets which were used as either training data or test data. The data gathered was fed into the commercial decision tree classification algorithm known as See5 (Rulequest Research n.d.). The resulting decision tree was used to perform an authentication study over the entire population. The results of Pusara and Brodley?s initial experiment are presented in Table 25, along with the results of the Jaccard analysis presented in Chapter 4. Table 25. Comparison of GUI Usage Analysis and Initial Mouse Movement Analysis Study Sample Size False Positive Rate False Negative Rate GUI Usage Analysis 31 0% 6.27% Pusara and Brodley?s Mouse Movement Analysis 18 27.5% 3.06% Following this initial analysis, Pusara and Brodley reached the conclusion that many of the false positives observed were due to a subset of their experimental population. This subset was found to generate fewer mouse events than other members of the experimental population. For this reason, this subset of 7 users was excluded from further analysis by Pusara and Brodley. Following this initial experiment, Pusara and Brodley began experimenting with several attributes of the decision tree algorithm in an effort to tune the algorithm to the 95 performance of each individual user. Pusara and Brodley customized both the alarm threshold for each user as well as the window size. The alarm threshold represented the number of anomalous events that the system could observe before concluding that an attack was under way. The window size represented the period, or number of consecutive events, that were analyzed on an experimental run. The aforementioned exclusion of certain participants combined with the tuning of algorithm attributes resulted in improved performance. These improved numbers are provided in Table 26. Table 26. Comparison of GUI Usage Analysis and Revised Mouse Movement Analysis Study Sample Size False Positive Rate False Negative Rate GUI Usage Analysis 31 0% 6.27% Pusara and Brodley?s Mouse Movement Analysis (Revised Decision Tree) 11 0.43% 1.75% Several of the attributes generated by the monitoring software used by Pusara and Brodley were not able to be reconstructed using the data gathered in this study. This was primarily due to the different objectives of the two studies: analyzing mouse movements (Pusara and Brodley) versus analysis of overall interaction with the user interface using both keyboard and mouse (GUI Usage Analysis). More specifically, the data used in the experiments presented in this dissertation was collected with the intention of analyzing GUI Usage patterns, not reconstructing previously published studies. As an example of data that was neither collected nor derivable, consider the ?non- client area events? utilized by Pusara and Brodley. Pusara and Brodley calculated the number of mouse events that occurred in the menu bar portion of Internet Explorer, 96 events that they termed ?non-client area events?. This data was not gathered by the monitoring software used to collect the data presented in this dissertation, simply because it was not believed to be necessary to study the effectiveness of GUI Usage Analysis. Furthermore, the ?non-client area events? could not be derived from the data that was collected because the cursor coordinates associated with each observed event were absolute, not relative to the current window. To accurately recreate the type of data used by Pusara and Brodley, the monitoring software would have needed to record the screen location of each window. This data was not recorded. Other difficulties also existed, most notably the amount of data gathered from each user. Pusara and Brodley?s users only used Internet Explorer. The participants in the GUI Usage Analysis study also used Internet Explorer, though only for a portion of the experiment. The results produced by each method are relatively comparable, though any comparisons between the results presented in this dissertation and Pusara and Brodley?s results are admittedly weak. Pusara and Brodley?s approach detected 6.9% more attacks but also registered 0.43% more false positives. Of more importance than the simple performance statistics, though, is the fact that Pusara and Brodley?s method was not reported to be effective for the entire user population. Finally, as previously stated, the relatively small sample size used by Pusara and Brodley makes it difficult to draw any definitive conclusions. Comparison to Garg?s Mouse Movement Profiling In (Garg, et al. 2006) Garg presents the results of a study that was very similar to the study presented by Pusara and Brodley. Garg also sought to use GUI interactions as a 97 means of performing user authentication. Much of the analysis performed by Garg is very similar to the Pusara and Brodley analysis, with some differences. One difference that was interesting to note between Garg?s work and the work of Pusara and Brodley was that Garg?s monitoring software initially captured keystroke data. The keystroke data that was captured did not appear to be used in the final analysis in any way, nor was an explanation provided for why it was not included. Still, this is interesting as it seems to indicate that Garg initially considered keyboard input when organizing his study. Garg and his team captured the following traits from his sample users: Mouse Clicks (left click and right click) Distance between events Mouse speed The angles between succeeding events In addition to those raw events, Garg also calculated the mean and standard deviation for all of the aforementioned raw features. It is interesting to note that Garg and Pusara and Brodley eventually used almost identical feature sets when conducting their analysis (Pusara and Brodley?s paper was referenced by Garg). Finally, whereas Pusara and Brodley used the commercially distributed See5 decision tree classifier, Garg chose to support vector machines (SVM) as to provide machine learning functionality. The results reported by Garg are presented in Table 27. The results of the GUI Usage Analysis authentication study are also presented in Table 27 for the sake of comparison. 98 Table 27. Comparison of GUI Usage Analysis and Garg's Mouse Movement Analysis Study Sample Size False Positive Rate False Negative Rate GUI Usage Analysis 31 0% 6.27% Garg et al.?s Mouse Movement Analysis 3 Not Reported 3.85% One of the most striking characteristics of Garg?s research is the lack of a reported false positive rate. The reader is left to draw their own conclusions regarding this omission. The other striking quality of Garg?s research is the sample size of three participants. Such a small sample size makes it extremely difficult to draw any definitive conclusions from the results. It is suspected that, if a larger sample were obtained, results matching Pusara and Brodley?s might ultimately be obtained by Garg. This belief is based on the striking similarity of the two feature sets that the studies employed. It is believed that the performance between the two machine learning techniques employed (SVM vs. decision tree learning) would ultimately have resulted in only small discrepancies in false positive and false negative rates. When comparing Garg?s research to the results of the GUI Usage Analysis study presented here, the numbers are relatively similar, with Garg?s technique boasting a slightly higher attack detection rate and his false positive rate remaining a mystery. Unfortunately, as was the case with the study by Pusara and Brodley, the comparisons between Garg?s results and the results yielded by GUI Usage Analysis are relatively weak. There are simply too many environmental variables at play to draw definitive conclusions between the two studies. Ultimately Garg?s research was not reproduced here because it was deemed to be scientifically invalid. It is included in this discussion primarily because it represents one of only two known papers attempting to authenticate users based on how they interact 99 with a modern GUI driven computer system. Careful inspection of Garg?s study yields several problems. First among these are the tasks that Garg allowed his participants to perform. Unlike both Pusara and Brodley?s study as well as the GUI Usage Analysis studies presented in this dissertation, Garg did not require his participants to complete a uniform task list. Failure to take this step resulted in an experiment with multiple variables: different users completing different tasks. It is impossible to conclusively determine whether the support vector machine in Garg?s study was able to differentiate between users based on their interaction patterns or the tasks they performed. To further illustrate this problem, suppose user A is a student working to complete their final thesis prior to graduation, while user B is a marketing professional. User A will conceivably spend much of their energies preparing their manuscript in a word processor, while User B may spend a great deal of their time performing market research on the Internet. It is entirely possible, if not likely, that User A may make much heavier use of the keyboard, not out of a behavioral trait, but simply because of the work they happened to be tasked with at that given moment in time. Summary of Comparative Findings As previously stated, it is ultimately believed that the performance of Garg?s technique might prove to be very comparable to the results achieved by Pusara and Brodley. It is also believed that Garg?s method might ultimately encounter the same difficulties that Pusara and Brodley encountered: the discovery that mouse movement analysis is only effective for a portion of the overall computer using population. In comparison, the results of the GUI Usage Analysis study did not indicate a need to exclude any user from the study. Certainly the technique may be more effective for some 100 users, but the findings presented here indicate that all users would enjoy some protection from GUI Usage Analysis as a means of authentication. It is worth noting, however, that users who are suited for Pusara and Brodley?s technique may enjoy slightly higher attack detection rates. At the same time, it should be noted once again that Pusara and Brodley?s experiment focused only on the interactions of users operating a web browser. While speculative, it is not unreasonable to think that the performance of Pusara and Brodley?s system might degrade if their techniques were applied to all activities performed on a computer, as was the case in the GUI Usage Analysis studies. The findings of the studies presented in this dissertation offer one additional finding that might prove to be important in future research. The experiment conducted in this dissertation gathered samples from the participants at different chronological times. Pusara and Brodley?s study, by comparison, gathered all of its data from the participants during a single session. The research presented in this dissertation demonstrates that user behavior is consistent over a longer period of time, and between activity sessions. A side by side comparison of all three studies is presented in Table 28. 101 Table 28. Comparison of all Presented Techniques Study Sample Size False Positive Rate False Negative Rate GUI Usage Analysis 31 0% 8.66% Pusara and Brodley?s Mouse Movement Analysis (Revised Decision Tree) 11 0.43% 1.75% Garg et al.?s Mouse Movement Analysis 3 Not Reported 3.85% 102 CHAPTER 7 POTENTIAL VULNERABILITIES OF GUI USAGE ANALYSIS All common access control schemes have practical weaknesses that make the systems they protect vulnerable to attack. For example, traditional password based authentication schemes have been shown to be vulnerable in a variety of ways: The password may be forgotten The user may write the password down making it vulnerable to theft The password may be stored in cleartext making it vulnerable to hackers The password might be compromised through dictionary and brute force attacks New access control methods also possess vulnerabilities. Biometric systems may experience high false positive rates, denying access to legitimate users. An attacker that can exploit this trait can use it as a form of a denial of service attack. Some biometric systems are also susceptible to forged credentials ? consider finger print readers, some of which have extensively documented weaknesses to fake fingerprints. Given these facts, it would be na?ve to assume that GUI Usage Analysis is not similarly vulnerable to some attack. This chapter presents discussion designed to illuminate some manners in which GUI Usage Analysis might be vulnerable. A great deal of this discussion is hypothetical in nature. Demonstration of any of the potential vulnerabilities outlined here is outside the scope of this document. 103 GUI Usage Analysis, as presented here, is envisioned as being used in defensive manners, protecting critical systems from masquerade attacks. If this vision is correct, it is reasonable to conclude that any attempt to defeat GUI Usage Analysis would most likely occur in cases in which a single person was attempting to access a computer system they were not intended or authorized to access. Because of the relatively specific nature of the defense offered by GUI Usage Analysis, the potential attacks are also fairly specific. It is believed that any attempt to subvert GUI Usage Analysis would have to take the form of one user that attempts to behave like another. In other words, an attacker would almost certainly have to attempt to impersonate the victim in order to defeat GUI Usage Analysis. The studies presented in this dissertation have considered two different defensive uses of GUI Usage Analysis: as a means of authentication and as a means of identification. The remainder of this chapter is divided accordingly, with section 7.1 focusing on authentication vulnerabilities and section 7.2 focusing on possible identification vulnerabilities. It should be noted that the vulnerabilities described here may not represent the complete set of vulnerabilities found in any actual system implementing GUI Usage Analysis. Furthermore, it is also worth noting that the single greatest vulnerability faced by GUI Usage Analysis would almost certainly be a system that fails to perform accurately. Just as is the case with other intrusion detection systems, a high number of false positives will ultimately result in the system either being disabled or desensitized, most likely causing an increase in an attacker?s success rate. 104 Vulnerabilities of GUI Usage Analysis as an Authentication Scheme Chapter 4 presented the results of a study in which GUI Usage Analysis was analyzed to determine its utility as an authentication scheme. The results of this study indicated that GUI Usage Analysis could be used to accurately authenticate a user of a system. Because authentication is a function that produces only two possible outputs (legitimate user vs. potential attacker) the methods in which GUI Usage Analysis are similarly constrained. The only objective for an attacker would be to convince the system that they were the legitimate user. It is believed that an attacker could convince the system that they were the legitimate user in one of two manners. First, the attacker might observe the targeted user for some period of time, making observations about the victim?s mannerisms and methods of interaction. The data gathered by the studies presented here do not offer any indication of what the likelihood of success for such attempts might be. The other alternative is that the attacker can rely on luck and hope that their own interaction patterns closely resembled the interaction patterns of the victim. If the attacker?s interaction patterns truly resembled the patterns of the victim it is possible that the system might mistake the attacker for the victim. The data gathered by the studies presented here indicate that the odds of this happening are low, with the likelihood of this form of attack succeeding being 5.7%. It is worth noting, though, that it is conceivable that real-world systems may possess lower accuracy, thus bettering the attacker?s chances. 105 Vulnerabilities of GUI Usage Analysis as an Identification Scheme Any organization that attempted to base a defense system around GUI Usage Analysis would most likely attempt to implement this scheme not only as an authentication system, but also as a system capable of making identifications. Being able to prevent masquerade attacks and other unauthorized computer use is one thing; being able to identify the attacker is another, more tempting possibility. Unfortunately implementing GUI Usage Analysis as an identification scheme offers additional vulnerabilities not present when considering this technique as an authentication system. As previously stated, authentication functions produce only two possible outputs: the user is either authenticated or (s)he is rejected and their access denied. Identification, on the other hand, is a function that can produce N outputs, where N represents the number of users known to the system. This key difference might potentially allow a defensive technology to be used offensively by an attacker. First, an identification system can be attacked by impersonation in the same manner that an authentication system can. In other words, an attacker can still attempt to make the system believe that they are the legitimate user. However, unlike authentication systems, identification systems are vulnerable to other forms of attack. In this example of a hypothetical attack, if the attacker does not successfully impersonate the legitimate user the system will conceivably take action to protect itself. It stands to reason that if an identification system has determined that the current user is an attacker/impostor, the system must also have made some determination about the real identity of the attacker (otherwise the system would be performing simple authentication). This is where another 106 way to defeat the system can be introduced: misidentification. If the system does not correctly identify the impostor, then the attacker has successfully implicated someone else in their crime. It stands to reason that an attacker might even use this form of attack as a way to frame a rival in the hopes that they (the rival) might be punished. 107 CHAPTER 8 SUMMARY OF FINDINGS AND IMMINENT FUTURE RESEARCH This dissertation has presented findings from two different studies as well as provided a comparison to published performance statistics for other GUI based behavioral profiling models. A review of these findings is presented here as well as research that will be performed in the imminent future. Chapter 9 presents long term research objectives and plans based on the findings made here. Utility of GUI Usage Analysis as an Authentication Scheme GUI Usage Analysis was examined for its utility as a means of authenticating users and preventing masquerade attacks. It was determined that GUI Usage Analysis was an effective authentication scheme in laboratory settings. Sessions from a known user were compared to a session from an unknown user and the similarity was calculated. The similarity was calculated using two different algorithms: TF-IDF (?term frequency ? inverse document frequency?) and Jaccard Similarity. Two different methods of determining an attack were experimented with: a static threshold and a variable threshold. In experiments using a static threshold, a single value was used for all users to determine whether an attack had occurred. In experiments using a variable threshold, the threshold used for labeling a session as being attack was customized for each user. 108 Ultimately the Jaccard coefficient was found to be the most effective at correctly identifying attacks. Not surprisingly, using a variable attack threshold also provided the greatest accuracy. The combination of using Jaccard coefficient with a variable attack threshold resulted in a false positive rate of 0% and a false negative rate of 8.66%. Finally, the performance of each participant was analyzed in relation to their responses to survey data. This analysis was performed in an effort to see if there were any behavioral traits that could serve as a predictor of attack detection rate. While the number of participants in each statistical group was too small to make any definitive conclusions, the data gathered here indicates that users who work with a computer system eight or more hours a day enjoy a higher attack detection rate. Utility of GUI Usage Analysis as an Identification Scheme Following the study investigating the use of GUI Usage Analysis as an authentication scheme, a second study was performed. This study sought to determine the suitability of GUI Usage Analysis as a means of identification. The results of the study indicate that, in laboratory settings, GUI Usage Analysis can be used as a means of identification with reasonable confidence. The use of GUI Usage Analysis as an identification scheme was investigated using two different approaches. The initial approach used the Jaccard coefficient as introduced in Chapter 4. Each unknown session was compared to the reference sample for each user in an effort to determine which user had provided the unknown sample. The reference sample that maximized the Jaccard coefficient was considered to be the 109 supposed donor of the unknown sample. This method resulted in correct identification in 77.1% of all trials. Following the use of the Jaccard coefficient as a means of identification, a traditional machine learning algorithm was used. Artificial Neural Networks (ANNs) have been used in many different problem domains to learn and predict a wide variety of functions. An artificial neural network was constructed using the Matlab Artificial Neural Network toolbox. The network was trained using two sessions from each user. The remaining sessions from each user were then passed to the network. The network correctly identified the donor of the unknown sample in 39% of trials. It is believed that the poor performance of the neural network in comparison to the experiment using the Jaccard coefficient was most likely due to overfitting of the neural network. Comparison between GUI Usage Analysis and Other Published Techniques for Authentication Prior to conducting the studies presented in this dissertation, two studies were published demonstrating the effectiveness of using mouse movements as a means of authentication. The results of the Jaccard coefficient analysis with a variable attack threshold were compared to the results published in both studies. Though a definitive comparison of attack detection methods was not possible, some hypotheses were formed based on the results of those comparisons. The first paper, written by Pusara and Brodley, had 18 participants completing a single session of viewing webpages. The data gathered from each user was divided into sub-sessions which was then fed into a decision tree learning algorithm. Pusara and 110 Brodley initially excluded seven of their participants from their final results because these users did not make heavy use of the mouse. The results obtained from the remaining eleven participants were relatively comparable to the results obtained using GUI Usage Analysis. Pusara and Brodley?s technique resulted in a false positive error rate of 0.43% and a false negative error rate of 1.75%. Contrast these results with the results obtained from the GUI Usage Analysis method, which resulted in a false positive error rate of 0% and a false negative error rate of 8.66%. While Pusara and Brodley?s stated performance would generally be regarded as superior to the performance of GUI Usage Analysis, it is worth pointing out that Pusara and Brodley?s method was not found to be usable on approximately 40% of the participant pool. Garg et al. published a second study that was very similar to the study published by Pusara and Brodley. Garg and his team extracted a virtually identical feature set to the one extracted by Pusara and Brodley, with Garg opting for support vector machine (SVM) analysis instead of the decision tree algorithm chosen by Pusara and Brodley. It was demonstrated that the scientific validity of Garg?s study was somewhat suspect, making any comparison somewhat ill-advised. Imminent Research Objectives Following completion of the studies presented here, subsequent studies are planned that will seek to refine the experimental processes developed for this dissertation. New participants will be recruited and will be asked to supply more sessions than the participants in this study were asked to. Participants will also have data collected over 111 multiple days, as opposed to the single day collection period used in the studies presented here. It is very possible that the participants may require compensation in some form to ensure greater cooperation than what was achieved in these studies. As previously described, the participants in these studies received no compensation. It is believed that the lack of compensation harmed retention efforts, leading to many participants failing to complete the study satisfactorily. The ultimate goal of the additional data is to analyze several variables that were not considered in these studies. For example, it should be determined how consistent a user?s behavior is over a longer period of time (days instead of hours). Also, additional data is required in order to accurately recreate the other published studies in the area of GUI based authentication. Finally, a second set of users will serve to validate the findings presented here. 112 CHAPTER 9 FUTURE APPLICATIONS OF THIS RESEARCH New technologies and discoveries are of little value if they cannot be applied to real-world problems and needs in a meaningful way. Though the original researcher may find the data and results to be of some value by themselves, if the findings cannot impact the general populace in some positive way it can be argued that the research has ultimately been fruitless. This chapter will outline some ways in which GUI Usage Analysis could possibly be applied in order to produce technologies that will serve some purpose. Information Assurance Applications When GUI Usage Analysis was initially conceived, it was thought of as a potential defensive technology. GUI Usage Analysis was originally targeted as a means of helping to secure computer systems against one of the most difficult types of attack, the previously described masquerade attack on unattended, unlocked workstations. The research presented in this dissertation shows that masquerade attacks may be detected using GUI Usage Analysis in a controlled environment. This research indicates that GUI Usage Analysis can be used as both a means of identification and authentication with reasonable rates of success. As previously stated, though, these findings are of little value to the general population without a vision of how to apply these new discoveries. 113 Masquerade Attack Research One of the unfortunate truths regarding masquerade detection research is that, to date, no comprehensive study has been performed to determine exactly how many masquerade attacks typically occur. Stated differently, information security researchers have no empirical data that can be used to indicate exactly how much of a threat masquerade attacks actually pose to the typical organization. The ability to generate some amount of empirical evidence to fill this knowledge vacuum is certainly one valuable application of this research. Conducting a study that attempts to quantify the number of masquerade attacks that may occur presents several difficulties that must be addressed. For example, soliciting participation from differing users could prove to be difficult. As previously mentioned, the participation attrition rate encountered in this study was approximately fifty percent. Any study seeking to gather the type of comprehensive data described here would most likely need to run for a longer period of time using more participants than the study presented in this dissertation. It is likely that, without mandatory participation, finding willing participants could pose a difficulty. The other obstacle to be avoided in conducting this sort of future research regards the amount of data gathered. The log files generated by each participant were, on average, approximately 2-3 megabytes. This amount of data poses logistical problems. Storing the log files on each system could begin to burden local systems. Unfortunately transmitting event data over a local network also creates difficulties. Not only would transmitting a user?s actions over the network cause a sharp increase in the amount of 114 data a network would have to cope with, but it would also be vulnerable to eavesdroppers on most networks. Furthermore, the current technique used to defeat eavesdropping attacks over shared network mediums, encryption, frequently results in even higher computational and network resource requirements. Masquerade Detection Research When considering future applications of this research, most people immediately consider the ability to detect masquerade attacks, often in the form of some type of intrusion detection system offering real-time alerting. Unfortunately, many difficult hurdles must be overcome prior to the creation of a real-time masquerade detection system. The first and possibly most difficult obstacle to be conquered regards online processing. As previously described, the data processing performed in the study presented in this dissertation involved offline data mining techniques. Achieving the same accuracy in a real-time system could prove to be difficult. Furthermore, the issue of where the data processing might occur is also important to consider. Processing the data in an effort to detect attacks locally on each system could impose enough overhead to affect the user?s experience. Processing all data remotely on a separate system could lead to scalability problems as the number of systems being monitors grows. The usability of any real-time monitoring system also becomes critical. Most anomaly based intrusion detection systems suffer from a measurable false positive rate (recall that a false positive occurs when the system incorrectly believes an attack is occurring). It is reasonable to assume that any deployed masquerade detection system 115 would also suffer some number of false positives. How the system responds to those false positives will be critical. Several possibilities exist, offering varying levels of inconvenience. For example, in the event of a suspected attack, the system could require intervention from a system administrator. System administrators could end up devoting a great deal of time to tracking down and confirming/rejecting suspected attacks, particularly given the steps for investigating an attack. Another possible intervention would be to have the system lock the workstation that is supposedly under attack. This approach will most likely thwart a real attack, provided that the attacker does not have the user?s password. The downside to this approach is that the burden for false positives is placed on the end user, possibly leading to complaints about the software. Of the two approaches to responding to false positives, it is believed that it makes more sense to employ the second approach, locking the workstation. It is believed that the burden experienced by an end user who is required to unlock their workstation is lower than the burden placed on system administrators tasked with responding individually to every false positive. Another usability issue confronts the subject of user training. While the data presented here indicates that users develop habits in the way they accomplish tasks, it also stands to reason that users may adjust their behavior over time. Users will conceivably learn new, more efficient techniques for accomplishing tasks. Users may also simply change their behavior over time. Furthermore, a user?s physical body may change as the result of injury or illness. Obviously the system should have some sort of manner to deal with these eventualities. 116 Yet another issue faced by a real-time detection system can be described as session marking. Consider the most illustrative example of a masquerade attack presented earlier in this dissertation: the unattended, unlocked workstation. In this case, the proper user has, at some point in the past, gotten up and left their workspace. Some time later, the attacker begins using the system. The two sessions (the one generated by the authorized user and the one generated by the attacker) can be delimited by the gap in time during which the system received no user input. The question is how much inactivity the system should interpret as the end of a session. This value will have to be determined by any group constructing an intrusion detection system using GUI Usage Analysis. Human-Computer Interaction Applications Much of the discussion in this dissertation has focused on the use of GUI Usage Analysis in a manner designed to enhance information security. There are, though, other disciplines that could potentially benefit from a successfully implemented GUI Usage Analysis product. One of these disciplines is the field of Human-Computer Interaction. More specifically, Usability specialists might find great utility in applying GUI Usage Analysis. To illustrate, consider two computer users who share the same computer. User A is a 16 year old male who has used computers his entire life. User B is a 48 year old female who recently began using computers as a new requirement for her job. User A is quite comfortable manipulating graphical user interfaces and finds the normal tasks required to operate a GUI system to be second nature. User B is intimidated by the 117 keyboard, mouse, and the computer in general. User B lives in fear of ?clicking the wrong thing.? Given their different backgrounds, it comes as no surprise that User A and User B process information differently. When given a screen to interact with, User A?s mind immediately seizes upon the contextual clues placed by the interface designers. Because his mind is accustomed to processing these contextual clues, he is able to process the information displayed on the screen quite quickly. In short, he knows exactly where to look in order to find the information that he is after. User B, though, is accustomed to acquiring information from traditional printed materials. Her mind has not been trained to pick up on the contextual clues placed by the interface designers. As a result, she is not able to immediately ignore portions of the screen that do not contain the information she is after. Instead, she reads the entire screen like a book, progressing from left to right, starting at the top row and working downward. Now, it is quite clear that while User A will accomplish the tasks much more quickly than User B, both users will eventually process the contents of the computer screen and find the information they were looking for. However, because User B has taken so much longer to complete the task, she finds the whole process of using a computer to be arduous and difficult. Her desire to use a computer in the future is greatly reduced. This represents the start of a self-perpetuating cycle in which User B doesn?t use the computer because she is not good at it. Because she never uses it, she never improves. 118 It is conceivable that GUI Usage Analysis could be applied in this situation to alter how information is displayed to these two users. The system could conceivably custom tailor the interface for each user. The interface for User A could remain unaltered, displaying information in the traditional manner in which he is accustomed. The screen could potentially be re-arranged for User B, though, and structured to look more like a traditionally printed material. Critical information could be repositioned to the top left portion of the screen, where User B would be more likely to encounter it quickly. There are other manners in which GUI Usage Analysis could potentially be used to enhance usability for users. Many users might enjoy an interface that automatically adjusts the display to their preferences after a few seconds of use. Users with physical handicaps might particularly benefit from this feature. In short, any computer system that is used by multiple people might potentially enjoy enhanced usability by utilizing GUI Usage Analysis as a means of tailoring the user interface ?on the fly?. 119 REFERENCES Apple, Inc. Performing Privileged Operations with Authorization Services. 2007. http://developer.apple.com/documentation/Security/Conceptual/authorization_concepts/gl ossary/chapter_5_section_1.html (accessed September 24, 2007). Auburn University. "Informed Consent for Adults." OVPR - Human Subjects Research. 2007. http://www.auburn.edu/research/vpr/ohs/forms/IC%20for%2019%20or%20older.doc (accessed September 16, 2007). Bergadano, Francesco, Daniele Gunetti, and Claudia Picardi. "User Authentication through Keystroke Dynamics." ACM Transactions on Information and Systems Security 5, no. 4 (2002): 367-397. bioChec. bioChec - Keystroke Biometrics. 2007. http://www.biochec.com/ (accessed December 16, 2007). BioPassword, Inc. Enterprise and Network Authenticaiton Software Products from BioPassword. 2007. http://biopassword.com/network-authentication-software.php (accessed December 16, 2007). Coull, Scott, Joel Branch, Boleslaw Szymanski, and Eric Breimer. "Intrusion Detection: A Bioinformatics Approach." Annual Computer Security Applications Conference. Las Vegas, Nevada, USA: Applied Computer Security Associates, 2003. Coventry, Lynne. "Usable Biometrics." In Security and Usability - Designing Secure Systems that People Can Use, by Lorrie Faith Cranor and Simson Garfinkel, 175-198. Sebastaopol, California, USA: O'Reilly Media, 2005. Denning, Dorothy. "An Intrusion-Detection Model." IEEE Transactions on Software Engineering (IEEE Press) 13, no. 2 (1986): 222-232. Garg, Ashish, Ragini Rahalkar, Shambhu Upadhyaya, and Kevin Kwiat. "Profiling Users in GUI Based Systems for Masquerade Detection." 2006 IEEE Workshop on Information Assurance. Piscataway, New Jersey, USA: IEEE Press, 2006. 48-54. He, Bin, Kevin Chen-Chuan Chang, and Jiawei Han. "Discovering complex matchings across web query interfaces: a correlation mining approach." Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. Seattle, WA, USA: ACM Press, 2004. 148-157. 120 ID Control, Inc. "KeystrokeID." ID Control - Strong, Affordable, and Easy Authentication. 2007. http://www.idcontrol.com/index.php?option=com_content&task=category§ionid=6& id=34&Itemid=55 (accessed December 16, 2007). iMagic Software. "FAQ Trustable Passwords." iMagic Software - Home of Trustable Passwords. 2007. http://www.imagicsoftware.com/FAQ.htm (accessed December 16, 2007). Imsand, Eric, and John A Hamilton. "GUI Usage Analysis for Masquerade Detection." IEEE Workshop on Information Assurance. Piscataway, New Jersey, USA: IEEE Press, 2007. 270-276. Imsand, Eric, and John A. Hamilton. "Impact of Daily Computer Usage on GUI Usage Analysis." InfoSecCD Conference. New York, New York, USA: ACM Press, 2007. 196- 205. Joachims, Thorsten. SVMLight Support Vector Machine. February 9, 2004. http://svmlight.joachims.org/ (accessed December 12, 2007). Joyce, Rick, and Gopal Gupta. "Identity Authentication Based on Keystroke Latencies." Communications of the ACM 33, no. 2 (1990): 168-176. Khanna, Rahul, and Huaping Liu. "System Approach to Intrusion Detection Using Hidden Markov Model." International Wireless Communications and Mobile Computing. Vancouver, British Columbia, Canada: ACM Press, 2006. 349-354. Kingsbury, Kathleen. "Telltale Fingertips." Time Magazine. December 10, 2006. http://www.time.com/time/insidebiz/article/0,9171,1568467,00.html (accessed December 16, 2007). Kumar, Deept, Naren Ramakrishnan, Richard Helm, and Malcolm Potts. "Algorithms for Storytelling." Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, PA, USA: ACM Press, 2006. 604- 610. Li, Ling, and Constantine Manikopoulos. "Windows NT One-class Masquerade Detection." IEEE Workshop on Information Assurance. Piscataway, New Jersey, USA: IEEE Press, 2004. 82-87. Market Share. Operating System Market Share for November 2007. December 6, 2007. http://marketshare.hitslink.com/report.aspx?qprid=10 (accessed December 6, 2007). Mathworks Corporation. "Neural Network Toolbox." The Mathworks: Accelerating the Pace of Engineering and Science. 2007. http://www.mathworks.com/access/helpdesk/help/toolbox/nnet/ (accessed December 18, 2007). 121 Maxion, Roy, and Tahlia Townsend. "Masquerade Detection Using Truncated Command Lines." IEEE International Conference on Dependable Systems and Networks. Pittsburgh, Pennsylvania, USA: IEEE Press, 2002. 219-228. Microsoft Corporation. "About Hooks." Microsoft Developer Network. 2007a. http://msdn2.microsoft.com/en-us/library/ms644959.aspx (accessed 8 31, 2007). ?. "GetClassName Function." Microsoft Developer Network. 2007. http://msdn2.microsoft.com/en-us/library/ms633582.aspx (accessed September 1, 2007). ?. "GetCommandLine (Windows)." Microsoft Developer Network. 2007. http://msdn2.microsoft.com/en-us/library/ms683156.aspx (accessed September 1, 2007). ?. "Handles and Objects (Windows)." Microsoft Developer Network. November 1, 2007. http://msdn2.microsoft.com/en-us/library/ms724457.aspx (accessed December 6, 2007). ?. "Hooks." Microsoft Development Network. 2007. http://msdn2.microsoft.com/en- us/library/ms632589.aspx (accessed 08 31, 2007). Newbold, Richard D. Newbold's Biometric Dictionary: For Military and Industry. Bloomington, Indiana: AuthorHouse, 2007. Newcomer, J. "Hooks and DLLs." FlounderCraft, Ltd. March 20, 2003. http://www.flounder.com/hooks.htm (accessed September 1, 2007). Peacock, Alen, Xian Ke, and Matt Wilkerson. "Identifying Users from Their Typing Patterns." In Security and Usability - Designing Secure Systems that People Can Use, by Lorrie Faith Cranor and Simson Garfinkel, 199-220. Sebastopol, California, USA: O'Reilly Media, 2005. Pfleeger, Charles, and Shari Pfleeger. Security in Computing. Prentice Hall PTR, 2003. Pusara, M, and C Brodley. "User Re-authentication via Mouse Movements." Computer and Communications Security. Washington, DC, USA: ACM Press, 2004. 1-8. Rulequest Research. "Data Mining Tools See5 and C5.0." Rulequest Research - Data Mining Tools. November 2007. http://www.rulequest.com/see5-info.html (accessed December 16, 2007). ?. Data Mining Tools See5 and C5.0. http://www.rulequest.com/see5-info.html (accessed December 7, 2007). Russell, Stuart, and Peter Norvig. Artificial Intelligence - A Modern Approach. Upper Saddle River, NJ, United States: Prentice Hall, 2003. Schonlau, Matthias, William DuMouchel, Wen-Hua Ju, Alan Karr, Martin Theus, and Yehuda Vardi. "Computer Intrusion: Detecting Masquerades." Statistical Science 16, no. 1 (2001): 1-17. 122 Umphress, D, and G Williams. "Identity Verification through Keyboard Characteristics." International Journal of Man-Machine Studies (Academic Press) 23 (1985): 263-273. United States Secret Service: National Threat Assessment Center. Insider Threat Study: Computer System Sabotage in Critical Infrastructure Sectors. Washington D.C.: United States Secret Service, 2004. United States Secret Service: National Threat Assessment Center. Insider Threat Study: Illicit Cyber Activity in the Banking and Finance Sector. Washington D.C.: United States Secret Service, 2004. Vapnik, Vladamir. The Nature of Statistical Learning Theory. Springer-Verlag, 1995. 123 APPENDIX A INFORMED CONSENT LETTER Figure A1 contains the contents of the informed consent document distributed to all participants prior to participation in this study. INFORMED CONSENT for a Research Study Entitled Employing WIMP Usage Patterns for Masquerade Detection You are invited to participate in a research study investigating new ways to identify individual computer users. This study is being conducted by Eric S. Imsand, a student in the Department of Computer Science and Software Engineering, under the supervision of Dr. John A. Hamilton, Jr. Associate Professor in the Department of Computer Science and Software Engineering. We hope to learn if the manner in which people use a computer ? the icons they click on, keyboard shortcuts they use, etc. ? can be used as a means of identification. You were selected as a possible participant because you are an individual over 19 years of age and have been identified as a person who is at least mildly proficient at the normal operation of a computer running the Windows XP operating system. If you decide to participate, we will first install a piece of software on your personal computer that is designed to record which icons, menus you click on, any keyboard shortcuts you may use, and so on. After the monitoring software has been installed you will be asked to complete a series of tasks once a day for five consecutive days while the monitoring software is running. Finally you will be asked to allow the monitoring software to record your actions over a full two day period of time. If you decide to participate in this study, there is a chance that monitoring software could record data that you might otherwise prefer not be logged. This risk is greatest when the software is recording all actions performed by you while using your computer in a typical manner. The result of this risk is that your privacy could be breached if the experimental data were stolen. To minimize these risks several precautions have been taken. First, the information recorded by the monitoring software is encoded so that it can only be interpreted using a custom piece of computer software. Secondly, the information will be stored 124 on your computer in a secure manner so that anyone seeking to access it would have to provide your username and password (if applicable). Third, you may always temporarily pause the recording software (i.e. pause recording) while you are doing work that you prefer not be recorded. Finally, if you should decide to cease participation in this study, the monitoring software can be permanently removed from your computer at any time by using the ?Add/Remove Programs? menu in the Windows control panel. Technical assistance will be provided if you cannot remove the software yourself. Please note that all of the data that is collected will be destroyed no later than 6 months following the conclusion of our study. Files stored electronically will be erased and any physical media will be destroyed. If our research is successful, we hope to be able to apply these findings to the creation of a new breed of computer security software. This new generation of software will be able to tell whether the person using your computer is actually you. It would provide an extra layer of defense if your personal computer?s password were ever stolen, or if someone were to begin using your computer while you were away from your desk. We cannot promise you that you will receive any or all of the benefits described. Any information obtained in connection with this study and that can be identified with you will remain confidential. Only the authors of this study will handle the data gathered and all data will be stored on a non-networked secured computer workstation. Information collected through your participation may be used in the completion of educational requirements (i.e. doctoral dissertations), published in a professional journal, and/or presented at a professional meeting, etc. If so, none of your identifiable information will be included. Any personally identifiable data gathered during the course of this study will be destroyed no later than 12/31/2007. Please know that you may withdraw from participation at any time, without penalty, and that you may withdraw any data which has been collected about yourself, as long as that data is identifiable. Your decision whether or not to participate will not jeopardize your future relations with Auburn University or the Department of Computer Science and Software Engineering. If you have any questions we invite you to ask them now. If you have questions later, please contact Eric Imsand (Phone: 901-338-9323, e-mail: imsanes@auburn.edu) or Dr. Hamilton (Phone: 334-844-6360, e-mail: hamilton@eng.auburn.edu) and they will be happy to answer them. You will be provided a copy of this form to keep. For more information regarding your rights as a research participant you may contact the Auburn University Office of Human Subjects Research or the Institutional 125 Review Board by phone (334)-844-5966 or e-mail at hsubjec@auburn.edu or IRBChair@auburn.edu. HAVING READ THE INFORMATION PROVIDED, YOU MUST DECIDE WHETHER OR NOT YOU WISH TO PARTICIPATE IN THIS RESEARCH STUDY. YOUR SIGNATURE INDICATES YOUR WILLINGNESS TO PARTICIPATE. ___________________________________ _______________________________ Participant's signature Date Investigator obtaining consent Date ___________________________________ ________________________________ Print Name Print Name ___________________________________ ________________________________ Parent's or Guardian Signature Date Co-investigator's signature Date (if appropriate) (if appropriate) ___________________________________ ___________________________________ Figure A1. Text of the Informed Consent Document Provided to Study Participants 126 APPENDIX B SIMULATED ATTACK SUCCESS RATE FOR INDIVIDUAL USERS Table A1 contains the results of simulated attacks for all possible combinations of known users and attacker. Table A1. Simulated Attack Rate for Known Users and Attackers User ID 1 2 3 4 5 6 7 8 9 10 Attacker ID 1 0% 0% 0% 0% 0% 0% 0% 0% 0% 2 0% 0% 0% 0% 0% 0% 0% 0% 0% 3 0% 0% 11% 0% 0% 11% 0% 56% 0% 4 0% 0% 0% 0% 0% 33% 0% 78% 0% 5 0% 0% 0% 89% 0% 11% 0% 56% 0% 6 0% 0% 0% 56% 0% 0% 0% 22% 0% 7 0% 0% 0% 0% 0% 0% 0% 44% 0% 8 0% 0% 0% 33% 0% 0% 0% 22% 0% 9 0% 0% 0% 0% 0% 0% 44% 0% 0% 10 0% 0% 0% 0% 0% 0% 0% 0% 0% 11 0% 0% 0% 67% 0% 0% 56% 0% 67% 0% 12 0% 0% 0% 0% 0% 0% 0% 0% 56% 0% 13 0% 0% 0% 22% 0% 0% 0% 0% 67% 0% 14 0% 0% 0% 56% 0% 0% 0% 0% 11% 0% 15 0% 0% 33% 11% 0% 0% 0% 0% 33% 0% 16 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 17 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 18 0% 0% 0% 0% 0% 0% 0% 0% 11% 0% 19 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 20 0% 0% 0% 11% 0% 0% 0% 0% 11% 0% 21 0% 0% 0% 11% 0% 0% 0% 0% 11% 0% 22 0% 0% 0% 0% 0% 0% 0% 0% 22% 0% 23 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 24 0% 0% 0% 0% 0% 0% 0% 0% 78% 0% 25 0% 0% 0% 44% 0% 0% 11% 0% 44% 0% 26 0% 0% 0% 0% 0% 0% 0% 0% 22% 0% 27 0% 0% 0% 0% 0% 0% 11% 0% 44% 0% 28 0% 0% 0% 67% 0% 0% 33% 0% 78% 0% 29 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 30 0% 0% 0% 11% 0% 0% 0% 0% 22% 0% 31 0% 0% 0% 67% 0% 0% 33% 0% 100% 0% 127 Table A1 - Continued User ID 11 12 13 14 15 16 17 18 19 20 Attacker ID 1 0% 0% 0% 0% 0% 67% 0% 0% 0% 0% 2 0% 0% 0% 0% 0% 44% 0% 0% 0% 0% 3 11% 0% 0% 0% 0% 33% 0% 0% 0% 0% 4 67% 0% 0% 0% 0% 11% 0% 44% 0% 0% 5 44% 0% 0% 0% 0% 56% 0% 11% 0% 11% 6 11% 0% 0% 0% 0% 89% 0% 0% 0% 67% 7 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 8 0% 0% 0% 0% 0% 100% 0% 0% 0% 22% 9 0% 0% 11% 0% 0% 33% 0% 0% 0% 11% 10 0% 0% 0% 0% 0% 100% 0% 0% 0% 0% 11 0% 0% 0% 0% 0% 0% 11% 0% 0% 12 0% 0% 0% 0% 67% 0% 0% 0% 0% 13 0% 0% 0% 0% 100% 0% 0% 0% 0% 14 0% 0% 0% 0% 0% 0% 22% 0% 0% 15 0% 0% 11% 0% 78% 0% 0% 0% 0% 16 0% 0% 0% 0% 0% 0% 0% 0% 0% 17 0% 0% 0% 0% 0% 0% 0% 0% 0% 18 0% 0% 0% 0% 0% 11% 0% 0% 0% 19 0% 0% 0% 0% 0% 0% 0% 0% 0% 20 0% 0% 0% 0% 0% 33% 0% 0% 0% 21 0% 0% 0% 0% 0% 0% 0% 22% 0% 11% 22 0% 0% 0% 0% 0% 100% 0% 0% 0% 11% 23 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 24 0% 0% 0% 0% 0% 44% 0% 0% 0% 11% 25 22% 0% 0% 0% 0% 67% 0% 0% 0% 44% 26 0% 0% 0% 0% 0% 0% 0% 33% 0% 0% 27 22% 0% 0% 0% 0% 0% 0% 0% 0% 0% 28 44% 0% 0% 0% 0% 0% 0% 67% 0% 0% 29 0% 0% 0% 0% 0% 56% 0% 0% 0% 0% 30 0% 0% 0% 0% 33% 89% 0% 0% 0% 0% 31 56% 0% 33% 0% 0% 33% 0% 0% 0% 22% 128 Table A1 - Continued User ID 21 22 23 24 25 26 27 28 29 30 31 Attacker ID 1 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 2 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 3 0% 0% 0% 56% 0% 22% 0% 0% 0% 0% 0% 4 22% 0% 0% 0% 0% 56% 0% 44% 0% 22% 0% 5 44% 0% 0% 22% 11% 67% 0% 0% 0% 22% 0% 6 0% 0% 0% 0% 0% 67% 0% 0% 0% 67% 0% 7 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 8 0% 0% 0% 0% 0% 56% 0% 0% 0% 22% 0% 9 0% 0% 0% 0% 0% 33% 0% 0% 0% 11% 0% 10 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 11 0% 0% 0% 0% 0% 11% 0% 11% 0% 0% 0% 12 0% 0% 0% 67% 0% 44% 0% 0% 0% 0% 0% 13 0% 0% 0% 78% 0% 56% 0% 0% 0% 22% 0% 14 0% 0% 0% 0% 0% 33% 0% 0% 0% 0% 0% 15 44% 0% 0% 67% 0% 44% 0% 0% 0% 33% 0% 16 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 17 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 18 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 19 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 20 0% 0% 0% 11% 0% 33% 0% 0% 0% 0% 0% 21 0% 0% 22% 0% 0% 11% 0% 0% 0% 0% 22 0% 0% 22% 0% 0% 0% 0% 0% 11% 0% 23 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 24 11% 0% 0% 0% 56% 56% 0% 0% 0% 0% 25 11% 0% 0% 0% 33% 0% 0% 0% 56% 0% 26 0% 0% 0% 33% 0% 0% 0% 0% 0% 0% 27 11% 0% 0% 67% 0% 11% 0% 0% 0% 0% 28 0% 0% 0% 0% 0% 22% 0% 0% 0% 0% 29 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 30 0% 0% 0% 0% 0% 11% 0% 0% 0% 0% 31 56% 0% 0% 67% 0% 56% 0% 0% 0% 11% 129 APPENDIX C. INDIVIDUAL PERFORMANCE ON IDENTIFICATION EXPERIMENTS The following represent the raw data obtained from the identification experiment based on the Jaccard Index, as described in Chapter 5. The first column represents the actual donor of the test data sample. The second column represents the user the identification algorithm suspects provided the unknown sample, based on analysis of the known training sets. Table A2. Results of User Identification Experiment showing which Users were mistakenly Identified ID Number of Actual User ID Number of Suspected User 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 28 4 5 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 11 7 9 130 Table A2 - Continued ID Number of Actual User ID Number of Suspected User 8 8 8 8 8 8 9 9 9 7 9 13 10 10 10 10 10 10 11 28 11 11 11 9 12 24 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 10 17 17 17 17 17 17 18 18 18 28 18 18 19 19 19 19 19 19 20 25 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 131 Table A2 - Continued ID Number of Actual User ID Number of Suspected User 23 23 23 23 23 23 24 9 24 27 24 24 25 8 25 25 25 25 26 14 26 12 26 24 27 27 27 27 27 24 28 28 28 28 28 28 29 8 29 29 29 29 30 30 30 30 30 30 31 31 31 31 31 31