Classifying Speakers Using Voice Biometrics In a Multimodal World Except where reference is made to the work of others, the work described in this dissertation is my own or was done in collaboration with my advisory committee. This dissertation does not include proprietary or classi ed information. Kenneth Arthur Rouse Certi cate of Approval: Juan E. Gilbert, Chair Professor Computer Science and Software Engineering Cheryl D. Seals Associate Professor Computer Science and Software Engineering Richard Chapman Associate Professor Computer Science and Software Engineering George T. Flowers Dean Graduate School Classifying Speakers Using Voice Biometrics In a Multimodal World Kenneth Arthur Rouse A Dissertation Submitted to the Graduate Faculty of Auburn University in Partial Ful llment of the Requirements for the Degree of Doctor of Philosophy Auburn, Alabama August 10, 2009 Classifying Speakers Using Voice Biometrics In a Multimodal World Kenneth Arthur Rouse Permission is granted to Auburn University to make copies of this dissertation at its discretion, upon the request of individuals or institutions and at their expense. The author reserves all publication rights. Signature of Author Date of Graduation iii Dissertation Abstract Classifying Speakers Using Voice Biometrics In a Multimodal World Kenneth Arthur Rouse Doctor of Philosophy, August 10, 2009 (M.S., Tarleton State University, 1998) (B.S., South Dakota State University, 1982) (A.A., Christ For The Nations, 1985) 159 Typed Pages Directed by Juan Gilbert The following dissertation document is a research study conducted to determine whether a classi cation for a person is obtainable by using the person?s voice. The intent of this work was to investigate a collection of voice samples for trends that potentially lead to parameters to be used in the classi cation of an individual. No classi cation area was sought speci cally; for instance gender or ethnicity, as it was preferred to allow the results to dictate the characteristics that point to a particular classi cation group. In the data collection stage, each participant was given the same task and then analysis was done on the voice sample given. Analysis was conducted in phases, with the rst phase focusing on the time domain which resulted with parameters approximating speed of speech and the amount of pauses in the sample. Next the frequency domain was investigated focusing on the complexity of speech and voice tone attributes. The results of the inquiries into this domain concluded with the peaks, in the frequency of the voice, being tracked by frequency threads and represented numerically by a third order polynomial. It is the coe cients of this polynomial that give a representation of an individual?s voice, making it possible to classify them to a particular group. To verify this, the coe cients from these polynomials iv were used with a clustering application to validate the hypotheses of this study, substanti- ating an objective to provide empirical user data to contribute to the design of future phone system communications. v Acknowledgments It is with great pleasure that I give thanks to the many who have made this dissertation possible. First and foremost I want to thank my Lord and Savior Jesus Christ for giving me the grace and wisdom to accomplish this major goal in my life. For without Him I can do nothing, but with Him I can do all things. Second only to my Lord, I want to thank my wife Darlene as it has been 9 long years on this journey to completion. I want to thank her for all the times that she had to cover the home front when I was studying for classes, working on projects, studying for the qualifying exams (TWICE), and all the many hours of working on this project. It really was a we e ort! I want to also thank my children; Heather especially for all the La Tex tables that she made, data that she entered, and all the proof reading that she has done. You were a great research assistant! To my boys, Jhett and Jonathan who did a huge amount of work at home when I was at the o ce. This could not have happened without them helping to cover the home front with Darlene. To my son Jimmy Jimmy Jimmy, thank you for all those words of encouragement and for believing that the day would come that it would be done. To my good friend and brother in the Lord, Dan, thank you for all the brain storming sessions and keeping me going in the right direction. A special thank you to all of my family and friends that have been praying for me and my family during this time. Next, I want to thank my dissertation advisor, Dr. Juan E. Gilbert, for his support and encouragement, for helping to de ne the scope of the project, and for assisting in the many publication opportunities outside this project. Auburn?s loss is truly Clemson?s gain! I also wish to thank the rest of my committee, Dr. Cheryl Seals, Dr. Richard Chapman and Dr. Susana Morris for working with me on such a tight schedule. My thanks to my fellow HCCL lab members Dr. Shaun Gittens, Yolanda McMillan, Dr. Idongesit MkPong-Ru n, vi Wanda Eugene and Vincent Cross for their help in reviewing the proposal for this work. For the task of painstakingly reading through my entire dissertation and making some very valuable suggestions, I want to especially thank; Yolanda McMillan, Philicity Williams, Dr. Win Britt, and Ciao Soares. An extra thank you to Win for all the help and direction he gave in keeping me on task and making my life more manageable by suggesting I use La Tex and Google Code. Thank you for answering all my questions. Also thank you to all my HCCL friends that have been an encouragement through these many years. Finally I want to thank all that participated and gave of their time and voice to give me the many samples that made this project possible. With the last thank you to Judy Rodman who gave her professional advice to a stranger that emailed her a question just out of the blue. PRAISE TO JESUS... THIS DISSERTATION WORK IS DONE! vii Style manual or journal used Journal of Approximation Theory (together with the style known as \aums"). Bibliography follows van Leunen?s A Handbook for Scholars. Computer software used :the document preparation package TEX (speci cally LATEX) together with the departmental style- le aums.sty. viii Table of Contents List of Figures xi List of Tables xiii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Literature Review 5 2.1 Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Voice biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Speaker veri cation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.2 Speaker Identi cation . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.3 Speaker Classi cation . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Voice Sampling Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 Text Dependent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.2 Text Independent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Voice Pitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.5 Discrete and Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . 12 2.6 Window Functions and Spectral Leakage . . . . . . . . . . . . . . . . . . . . 14 2.7 MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3 Research Plan 18 3.1 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Equipment and Material Used . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 Software Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4 Data Collection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.5 Experimental Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4 Time Domain Experimentation and Results 26 4.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1.1 Experiment Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 ix 5 Frequency Domain Experimentation and Results: Initial Phase 36 5.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.1.1 Experiment Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.1.2 Procedure 1 (Converting Data) . . . . . . . . . . . . . . . . . . . . . 37 5.1.3 Procedure 2 (Locate Primary Peaks) . . . . . . . . . . . . . . . . . . 37 5.1.4 Procedure 3 (Calculate Averages) . . . . . . . . . . . . . . . . . . . . 42 5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6 Frequency Domain Experimentation and Results: Graphical Phase 46 6.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.1.1 Experiment Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.1.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 7 Frequency Domain Results: Final Phase 52 7.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 7.1.1 Experiment Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 7.1.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 7.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 7.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 8 Findings and future work 67 8.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 9 Scholarly Contributions 70 Bibliography 71 Appendix 76 A Breakdown of demographics 76 B Screen Shots of HTML Pages Used For Data Collection 143 x List of Figures 2.1 Example of Voice Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Upper graph is a standard periodic signal where as the lower graph is not periodic and has discontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1 Original voice sample opened in digital audio editor, Audacity . . . . . . . 27 4.2 Example Graph of Voice Sample In Time Domain . . . . . . . . . . . . . . 28 4.3 Voice sample showing where the calculated pause of sample is located at. . 29 4.4 A sample with speaking time of approximately 7 seconds and pause time of 0.44 seconds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.5 A sample with speaking time of approximately 7 seconds and pause time of 1.78 seconds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.6 U.S. Census Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.1 Graphs of Cropped Voice Sample Saying Full Message . . . . . . . . . . . . 38 5.2 Full frequency graph showing the boundaries for the area that will give the most information for a voice sample. . . . . . . . . . . . . . . . . . . . . . . 39 5.3 Selected frequency sample (250 - 1250 Hz) graph of the bounded area in the graph above. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.4 Graph showing a view of peak locations of a full frequency sample . . . . . 40 5.5 Graphs showing di erent views of peak locations of sample within the fre- quency boundaries (250 - 1250 Hz) . . . . . . . . . . . . . . . . . . . . . . . 41 5.6 Graphs showing one that has a positive slope average and one with a negative slope average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.7 Graph showing the ranges for the positive and negative average slope of lines between peaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 xi 5.8 Graph showing the ranges for the average distance between the primary peaks 45 6.1 Comparison of participant sample split in two halves . . . . . . . . . . . . . 49 6.2 Comparison of participant saying the word \George" split in two halves . . 49 6.3 Graphs showing the two halves of the word \Nine" . . . . . . . . . . . . . . 51 7.1 Graphs showing the two halves of the word \nine" and the consistent pro- gression of the two samples . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7.2 Multiple graphs showing the change of the frequency and amplitude for the word \nine" spoken by a single participant. . . . . . . . . . . . . . . . . . . 54 7.3 View of peak location of the le 1 from the breakdown of the le where the participant said the word \nine". . . . . . . . . . . . . . . . . . . . . . . . . 55 7.4 View of peak location of the le 2 from the breakdown of the le where the participant said the word \nine". . . . . . . . . . . . . . . . . . . . . . . . . 55 7.5 View of peak location of the le 3 from the breakdown of the le where the participant said the word \nine". . . . . . . . . . . . . . . . . . . . . . . . . 56 7.6 View of peak location of the le 4 from the breakdown of the le where the participant said the word \nine". . . . . . . . . . . . . . . . . . . . . . . . . 56 7.7 Graph of numerical data indicating the FLT stored in an Excel spreadsheet 59 7.8 Graph 1 is of the frequency values of the rst thread of a test sample and graph 2 show the polynomial that ts that step graph . . . . . . . . . . . . 61 B.1 This is the information page from data gathering website . . . . . . . . . . 143 B.2 This is the Demographic Survey page from data gathering website . . . . . 144 B.3 This is the Phone Instruction Page from data gathering website . . . . . . . 145 B.4 User ID and PIN given on Phone Instruction Page from data gathering website146 B.5 Four (4) digit number given on Phone Instruction Page from data gathering website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 B.6 The message that all participants will leave located on the Phone Instruction Page from data gathering website . . . . . . . . . . . . . . . . . . . . . . . . 146 xii List of Tables 2.1 Formula Description Of Symbols . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Windowing Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Description of some MATLAB commands used . . . . . . . . . . . . . . . . 17 4.1 Comparison of total time to say message (cropped sample) and the amount of pause in the sample with the percentage of pause in the sample as it pertains to Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 Regions in the United States and states represented from these regions . . . 32 4.3 Regions in the United States and states represented from these regions . . . 33 5.1 The average slope between peaks with the focal point on Gender . . . . . . 43 5.2 The average slope between peaks with the focal point on Ethnicity . . . . . 43 7.1 This table shows the data on peak location for the rst 40 smaller les that were created from the full sample of a person saying the word \nine". It numerically represents the shifting of the peaks as well as the appearance and disappearance of minor peaks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 7.2 The results from the analysis tool showing the percentages as they pertain to male and female in each coe cient score group . . . . . . . . . . . . . . . 63 7.3 Clustering results from Applications QuestTMreset to look for samples that are alike rather than di erent. . . . . . . . . . . . . . . . . . . . . . . . . . . 64 xiii Chapter 1 Introduction 1.1 Motivation Professionals in the eld of Human Computer Interaction (HCI) are continuously searching for ways to improve the communication between humans and computers, es- pecially when using voice interfaces [45]. HCI research has become paramount to the de- velopment of computer applications that require user authentication, such as e-commerce [1]. Biometrics is one area in which HCI research is being conducted to examine the poten- tial of strengthening the authentication and security processes. The process of biometrics is the use of an automated method to recognize an individual based on physiological, be- havioral characteristic, or a combination of the two [17]. While there are many sub-areas that pertain to biometrics, some of the more recognized areas are: voice, iris, ngerprints, hand, face, retina, signature, keystroke, and gait. Biometrics is founded on the idea that any or all of the aforementioned physical and or behavioral aspects are unique to a person and can be used to identify that person [27]. The focus of this research is the sub-area of voice biometrics, its ideals and its characteristics. Speci cally this research is involved in using voice biometrics to classify an individual and thus make the communication process between them and a computer application more successful. This avenue of thought came about while compiling the post survey responses from a usability study conducted for a electronic voting system, Prime III [48]. One of the most frequent comments made was about the voice used for the system to communicate with the user. Some said it talked to fast, some said it talked to slow, and some said they were not able to understand it. The topic of this dissertation came about after reading these comments and thinking of probable solutions to improve this aspect of communication between an individual and machine. 1 1.2 Problem Description Using voice biometrics in speech technology has evolved greatly over the last decade resulting in many commercial applications. In addition to this evolution, the eld of HCI has taken on an important role in the development of applications by getting awareness on the way humans and machines interact to be a part of the development process [16, 45]. With the advancement of speech technology in phone applications, there also must be more consideration given to making the communication between the diverse group of users and the voice applications more compatible [11]. This is not a trivial task that can be solely xed using technology. It will also require the involvement of social science to help better address the issues that arise. How a system responds to a user can have an immense e ect on the interaction between the system and the user [45]. Currently, speech applications do not give any consideration to potential characteristics of the user that can be used to help the communication between the person and the machine. In the past most research concerning voice biometrics has been conducted in the areas of speaker identi cation or speaker veri cation and very little has been done in the area of speaker classi cation. Thus more focus is being given to speaker classi cation because speech interfaces are becoming widely implemented in today?s phone and web applications [45]. Given the increased interest and the perceived usefulness of voice classi cation in today?s applications, the hypothesis of this research arose. By analyzing the human voice one can conclude the following: H1) The pitch range of the human voice can be used to create a tone classi cation set, such as a low, medium, and high tones. H2) The human tone classi cation can be re ned into human classi cations that per- tain to gender, ethnicity and geographical area where their accent was most e ected. 1.3 Background Voice biometrics is a method of biometric authentication that uses voice recognition techniques based on characteristics of the human voice. According to Dr. Judith Markowitz, 2 voice biometrics can be broken up into speaker veri cation, identi cation, and classi cation [32]. Research in the area of voice biometrics has mainly been associated with two areas: 1) speaker veri cation and 2) speaker identi cation. While equally important, a third area of voice biometrics, referred to as speaker classi cation is not as widely focused on as the other two but is just as important [34]. Speaker veri cation requires a user to create a voice template that can be stored in a database associated with that speci c user to be used with the system. When the user submits a voice sample for veri cation to the application and declares to be a certain individual, the system will then perform a one-to-one comparison with the voice template that is stored in the database for that particular individual. Next a calculation will be made to see how close the two templates are, and a con dence value will be generated. If this con dence value is above a given threshold then the application will verify the authenticity of the speaker [26, 32, 30]. Speaker identi cation is a similar process to speaker veri cation in that it also collects a voice sample. However, the application performs a one-to-many matching process against an already existing database that holds voice templates of known individuals. The matching process consists of the application comparing the voice sample that is given by the user with each voice template that is in the database. Consequently, this time it is searching for the closest match and then it determines if the calculated con dence value falls within a given threshold. In cases like these, the user?s identity is determined by the search and match. In speaker veri cation, the user identi es him/herself and the system veri es their claim to be that individual using the voice template associated with that user. Speaker classi cation is the area of voice biometrics which is used to determine a speci c group that the user may or may not belong to. It does not require a preset database as the previous two types of voice biometrics because it is neither looking to verify or identify a certain individual. Speaker classi cation is the type of voice biometrics that this research is investigating. This research proposes that an algorithm can be developed that will determine a value for the user which can be used to classify the user into a speci c group(s). This 3 classi er will be created using the individual?s speaking range as it pertains to the pitch and the speed of speech of the individual. This algorithm will be discussed further in chapter three. 1.4 Organization In the chapters that follow a research agenda will be examined. In Chapter 2 a liter- ature review will be given that discusses the areas of biometrics, the three types of voice biometrics, and some mathematical concepts that pertain to representing voices graphically and digitally. The speci c mathematical areas covered in the literature review will be the Fast Fourier Transform and windowing processing concepts that will be used to represent the pitch of a person. An overview of the application MATLAB by Mathworks will be discussed as this application and its extensive signal processing libraries will be used in this research. Also the subject area of voice pitch will be discussed as well as what determines the pitch of a person?s voice. A detailed plan of research will be outlined in Chapter 3. Chapters 4 - 7 will present the procedure by which the data was analyzed and preliminary results. Concluding with Chapter 8 were the ndings, signi cant contributions to the eld of voice enabled technology and future work will be discussed. Finally, Chapter 9 will list preliminary work and publications. 4 Chapter 2 Literature Review This Chapter describes some of the work done in the eld of biometrics and the math- ematics involved in this technology. It focuses on biometrics, voice biometric systems, voice pitch, MATLAB programming language and libraries, Discrete and Fast Fourier Transforms, and windowing functions used with data obtained from Fast Fourier Transforms. 2.1 Biometrics The word \biometric" can be broken up in to two words, bio meaning \life" and metric meaning \measurement" [39]. A very basic de nition is be \life measurement" which needs to be expanded to give clarity for today?s uses. The term biometrics has now become a present day word that many have used or at least heard of, but do not fully comprehend its meaning. The de nition of biometrics can vary depending on the speci c context it is being used. The following is a de nition of biometrics as it relates to this research: \Biometrics is the automated use of physiological or behavioral characteristics to determine or verify identity" [44]. As previously mentioned the area of biometrics can be broken into several sub-areas. A broader list of sub-areas are voice, iris, ngerprints, hand, face, retina, DNA, signature, computer keystroke, gait, odor, earlobes, sweat pores, lips, etc. This research will not discuss whether each of these sub-areas is unique in themselves, but will operate under assumption that there are attributes that are unique to each person [44]. In addition to the underlying assumption of this research is that the voice is a unique trait of an individual and thus can be used in identifying the person. Therefore this research will focus on the sub- area of biometrics, referred to as voice biometrics. To understand the topic of biometrics 5 it is bene cial to know that biometrics has been around for hundreds of years. The use of biometrics can be seen throughout history in many di erent forms. In China, in the 14th century, the explorer Joao de Barros found the use of biometrics as merchants recorded the palm prints and footprints of children on paper with ink, for identi cation purposes [6]. In Eastern Asia, potters placed their ngerprints on objects for identi cation of the maker [66]. The use of ngerprints for identi cation has continued to this day and is extensively used by law enforcement to identify criminals. An anthropologist by the name of Alphonse Bertillon who lived in Paris in the late 1890s and his e orts to make the identi cation of criminals easier is credited with bringing biometrics to the point where it was considered an actual eld of study [6]. This brief look at some of the historical usage of biometrics shows that biometrics has evolved over the years and now with advancing technology, will be evolving even further [44]. Voice biometrics is one such growing area of biometrics and will be discussed in Section 2.2. 2.2 Voice biometrics Voice biometrics has mainly been associated with two areas: speaker veri cation and speaker identi cation. The third area of voice biometrics, speaker classi cation, has not received as much attention as the other two [8, 34]. The rst two areas of voice biometrics: speaker veri cation and speaker identi cation, at rst may appear decidedly similar, but each has a distinct purpose [30]. The third area of voice biometrics, speaker classi cation, is considerably di erent from the rst two. The next few sections discuss the similarities and the di erences in these three aforementioned areas of voice biometrics to distinguish the area of speaker classi cation from speaker veri cation and speaker identi cation. 2.2.1 Speaker veri cation Speaker veri cation (SV) aims to validate a person?s identity much like having your driver?s license picture checked at the airport when you check-in for your ight. SV is be- coming widely visible in today?s economy due to the added security issues faced by industry 6 [32]. SV is used to verify that the person speaking is truly the person they are claiming to be [61]. SV is a one-to-one comparison and in general, there are ve steps to SV [8]. First is the enrollment of the user to generate a voice template that will be stored in a database. In a second step, the user speaks for a set amount of time so that a voice sample can be obtained for enrollment or for the veri cation process. This collection phase can be either text dependent or text independent, which will be explained later in Section 2.3. Once the voice sample is obtained, the third step of extracting certain features from this voice sample takes place and a template is made. This template will either be immediately used for comparison or will be stored in a database to be used later. The fourth step is a pattern matching phase, where a con dence value is calculated. This value is used in the fth step to accept or reject the claims of the authenticity of a person [8, 36]. To make this decision, a con dence threshold is set according to the sensitivity needed for the system, which is based on the security required [36]. It is common to use this process in conjunction with some form of ID number or password known by the user. Upon entering their unique ID in the system, the user is required to speak so that the application can verify they are whom they claim to be. 2.2.2 Speaker Identi cation Unlike speaker veri cation where a voice sample is compared to a single stored template in the database, speaker identi cation (SI) uses an individual?s voice to identify them [32]. SI is a one-to-many process, where a sample is obtained from the user and then a comparison is made with all the voice templates in the database to determine if a match can be found [27, 36, 32, 30]. The aforementioned general steps for SV also pertain to SI with a few di erences. In the decision making step, SI does not give a decision to accept or reject as in a SV system. The SI decision process is determining if there is a stored template that matches the collected voice sample within a certain predetermined threshold [8]. As stated earlier, SI is a one-to-many matching process, making this process a much more di cult task. The application will need to go through the pattern matching step for each voice 7 template that is stored in the database. For each voice template, it must be determined whether the con dence value for that particular item is within the preset threshold. Finally, an output is given as it pertains to the identity of the person speaking. The output of this process can be [36]: No matches - indicating that no stored template had a con dence value above the given threshold A single match - where one certain template was the only one to have a con dence value above the threshold for the given sample Multiple matches - since all stored templates are checked and more than one can be a close match, where all are within the given threshold; it is not uncommon to get more than one close match 2.2.3 Speaker Classi cation Speaker classi cation (SC) is di erent from speaker veri cation and speaker identi ca- tion in that it is used to determine if a speaker can be associated to a particular characteristic group rather than associating it to a particular individual [35]. SC is done by extracting information from a voice sample that is obtained from a given speaker. It is an idea of this research and others that di erent characteristics can be determined about the speaker once their voice sample is processed. Characteristics such as gender, age, emotion (i.e. fear or anger) and ethnic origin are a few of the characteristics that may be determined [30, 34, 45]. SC has been around for decades, but attention has mainly been on the other two areas. SC for the most part has only been considered for how it can help with developing these two areas. In the development of either SV or SI systems, there is usually a form of classi cation performed. To facilitate this, the use of Gaussian Mixture Models (GMM) has been and is still being heavily used with speaker classi cation aspect of systems. However, other mod- els and methods are now being researched such as Hidden Markov Model (HMM), Support Vector Machines (SVM?s), and use of voice pitch [41, 68, 51]. Using the pitch of a human 8 voice as a speaker classi er has not yet been researched to a great extent and thus is of interest to this research. Speaker classi cation research now is being conducted to help in many areas. One of these is the monitoring of phone conversations [42]. Dr. Judith Markowitz gives an example of this in the book Speaker Classi cation I, the Loquendo Voice Investigation System. This system is used to monitor cell phone calls where speakers of special interest are determined and this information is then passed on to law-enforcement or intelligence agency clients [34]. The following sections discuss text dependent and text independent methods. An understanding of the di erence between these two is needed to add clarity to voice sample collection from the user of a speaker recognition system no matter if it is for veri cation, identi cation or classi cation. 2.3 Voice Sampling Methods In the previous sections, the terms text dependent and text independent were mentioned as it relates to the collection step and will now be explained. Along with a basic explanation, some of the advantages and disadvantages of each will also be discussed. This overview is important to this research as it dictates which method is most appropriate. This is an important distinction that needs to be determined prior to any study that will be done. 2.3.1 Text Dependent A text dependent system is one that is trained by the user speaking a predetermined phrase or word that has already been established in the system. The phrase that is selected can be determined by the administrator of the application or the user and usually is some- thing that the user will be able to remember easily along with something that will give a broad phonic range. This is an advantage of a text dependent system in that the user will be speaking a phrase, where the voice sample obtained will be a better representation of the person?s voice [55, 8]. It is also customary with a text dependent approach for the same phrase to be used to establish a voice template and for the user to repeat when using the 9 application [17]. This improves the matching step of the application by reducing the chances of having a false reject or false acceptance of a given user [8]. An additional advantage of the text dependent method is that the user selects the word or phrase to be used, which is unique to them [36]. The disadvantages of the text dependent method are mostly connected with the area of security. Because the user will use a predetermined word or phrase, there is always the possibility that the word or phrase was compromised (i.e. obtained by another individual). When the word or phrase is known by another individual, a voice sample can be manufac- tured to circumvent the system or an actual recording can be obtained of the user saying the phrase and this is used to circumvent the system. Both scenarios are possible because of the dependency on a set word or phrase. Along with the risk of someone obtaining the word or phrase, there is a problem that the user of the system must remember the word or phrase that was used to set up their voice sample. When the user does not use the correct word or phrase and the system attempts to match the voice template of the user, then a false reject is given. This puts a burden on the user to always remember the exact word or phrase [67]. The text independent method can be the solution to this problem and will be discussed next. 2.3.2 Text Independent A text independent system in contrast to a text dependent system obtains the voice template by the user reading or speaking anything they prefer. This method allows the user to speak freely and does not tie the user to a predetermined phrase. In the past, this type of system has been used primarily when a person, whose voice sample is being obtained, is not fully cooperative or they are not willing to participate in the process at all. Speaker recognition technology trends for text independent applications are advancing; for example, being able to identify a person without them having to speak any predetermined word or phrase [64]. This is one of the main advantages for this method that the user does not have to remember a phrase. Another advantage is that speaker veri cation can be 10 utilized in a manner that runs in the background of the application. For example when a user calls a bank to transfer money and as the user is making their request, the application simultaneously veri es the speaker [36]. Unlike the text dependent system that requires the same word or phrase be spoken for the collection step, the user of text independent system can record a voice sample for a template with one phrase and use another phrase when using the system. This can also lead to an immense disadvantage with this type of system. With the user freely speaking, it is not guaranteed that the user will speak or read something that spans a broad range of their voice giving the speech features that identi es them. Due to this fact, in some cases, a longer sample may be necessary which may be di cult to obtain from a user that is not fully cooperative [26]. 2.4 Voice Pitch The sound of a human voice is comprised of several components. One of these is the pitch of the voice. In his book, Dr. H. Newell Martin describes the process that a body goes through in order to produce the pitch of a person?s voice. He describes that the larynx is the primary body part that determines the sound of a person?s voice. The larynx holds the vocal cords and it is the vibration of these cords that produces the pitch. He further states that it is the size or the length of these vocal cords that will give a certain pitch to the voice. The longer they are, the lower the person?s voice. Consequently, the shorter the vocal cords are, the higher the pitch of a person?s voice [37]. This can be substantiated by listening to the voice of a woman or child in comparison to the voice of an adult male. The woman or child speaks in a higher pitch due to the fact that they are usually smaller in stature [18]. Another fact Dr. Martin concludes, is that the vocal cords of a certain length will always give a set range to the voice. The range is dependent on a set of muscles in the larynx which determine the tension of the vocal cords. This leads to the fact that a person can only speak as high or as low as their vocal cords permit. This description as to how pitch is formed is still used today. The pitch range of a man?s voice, has been determined to be approximately 80 - 200 Hz with an average pitch of 120 Hz. Whereas the pitch range for 11 a woman?s voice is 150 - 350 Hz with an average pitch of 225 Hz [58]. Upon analyzing this data, one can conclude that a man?s pitch can be closer to the woman?s average than to a man?s average and vice-a-versa. One explanation is the individual?s age, which determines their vocal cords? length, in that a young boy does not have vocal cords the length of a man [62]. Their pitch range is generally determined by the length of their vocal cords. The average length of a male?s vocal cord is about 18 mm and that of a female is about 16 mm [50]. Along with the length, the thickness of the vocal cords also determines the pitch of one?s voice [37]. Based on the research and literature it is quite clear that the physical attributes of a person?s vocal cords give him/her a set pitch range that cannot be easily altered. Exceptions are caused by surgery, accident, sickness, smoking or extreme training (as in opera singer) altered the makeup of the vocal cords [37]. All but the latter most likely is a permanent change and still gives a person a set pitch range. 2.5 Discrete and Fast Fourier Transform The Discrete Fourier Transform (DFT) is a generalization of the Fourier Transform (FT). In general, the FT takes a function and converts it into another function that may be more useful. The FT processes a continuous-time signal using calculus, making it highly complex. Added to this is the fact that in signal processing, the data is processed only in samples that will not be continuous. Therefore, it can be said that a DFT is used to compute a discrete-frequency spectrum from a discrete-time signal of nite length [56]. This research will be using signal processing to analyze voice samples. Most voice samples are in the time domain and DFT will transform it from the time domain to the frequency domain [28]. Considering that the data of a voice sample is both discrete and nite, it is not di cult to see where this approach of analyzing the sample can give some very useful results, as shown in gure 2.1. Where sub- gure (a) represents a voice in the time domain and sub- gure (b) illustrates a sample of the voice in the frequency domain using DFT. The DFT conversion of a signal x may be de ned by the following formula and the table 2.1 contains a description of the symbols utilized by the formula. 12 X(!k)! N 1X n=0 Ce j!ktn where k = 0;1;2;:::N-1 Calculating the DFT can be computationally expensive, even when using a computer. This requires a faster algorithm to be developed and many have been developed that address this need. The most widely used algorithm was developed by James W. Cooley and John W. Tukey in 1965 [14], also known as the Fast Fourier Transforms (FFT) [56]. As the name implies, it is a faster version of DFT and is widely used today in computer applications [15]. The main advantage of the FFT is that it reduces the computational complexity for N points from 2N2 to Nlog2N [14, 65]. To illustrate the di erence, consider that N = 256 points for a given voice sample. With a DFT, there is, worst case, 65536 computations required to make the transform. However, with a FFT, resulting in, worst case, 2048 computations needed. This example shows that it does involve 32 times the number of computations needed to use DFT instead of FFT. Given that a voice sample can have hundreds of thousands of data points, it is clear that the FFT is the best option. It is common that when using Fast Fourier Transform (FFT), a windowing function is also used and is explained next. Table 2.1: Formula Description Of Symbols N 1X n=0 f(n) f(0)+f(1)+...+f(n-1) x(tn) input signal amplitude (real or complex) at time tn (sec) tn nT = nth sample instant (sec), n an integer 0 (sec) T sampling interval (sec) also called the sampling period X(!k) spectrum of x (complex valued), at frequency !k !k k = kth frequency sample (radians per second) 2 NT = radian-frequency sample interval (rad/sec) fs 1T = sampling rate (samples/sec, or Hertz (Hz)) N number of time samples = no. frequency samples (integers) 13 (a) Approximately 4 seconds of a voice sample (b) After using DFT to transform from time domain to frequency domain Figure 2.1: Example of Voice Sample 2.6 Window Functions and Spectral Leakage A windowing function acts as a lter to smooth out the sinusoid (sine curve) that represents the voice sample taken [23]. Since the voice sample being nite, it is most likely the sinusoid representation will be in a truncated waveform [46]. In performing spectrum analysis using a FFT, there can occur a condition identi ed as spectral leakage. This occurs when, using an FFT function, one whole period is used to represent a periodic form for the sample. When a nite sample has been obtained, there is no assurance that one full period of the waveform has been captured which makes it possible for discontinuity. An example of discontinuity and the spectral leakage connected with it, can be seen in gure 2.2. One solution for avoiding discontinuity in the sample waveform is to apply a windowing function that will minimize the discontinuity of the created periodic waveform. The windowing function is a weighted function and will be applied to the data for the waveform to smooth out the connections at the end points minimizing the discontinuity for these end points. This is done by using as many orders of derivatives as possible of the weighted data at the end points, which will lessen the e ect of spectral leakage on the waveform [25]. There are many windowing functions that can be used. Some of the most common window functions are the rectangular, triangular, Rectangle, Hanning, Kaiser, and Ham- ming [22]. One can determine the choice of a windowing function by resolving the tradeo between comparable strength signals with similar frequencies [23]. For this research, the Hanning window function shown below: 14 Figure 2.2: Upper graph is a standard periodic signal where as the lower graph is not periodic and has discontinuity w(n) = 0:5 1 cos 2 n N 1 was used as it is computationally less expensive as compared to the other functions shown in table 2.2. The programming language used in this research is MATLAB1 . One of the reasons for choosing this language is that it has very e cient algorithms for windowing functions as well as for the FFT function, based on the one developed by Cooley and Tukey[38, 59]. Table 2.2: Windowing Functions Triangular window w(n) = 2 N 2 N jn n 1 2 j (non-zero valued end-points) Bartlett window w(n) = 2 N 1 N 1 2 jn n 1 2 j (non-zero valued end-points) Bartlett-Hann window w(n) = a0 a1j nN 1 12 j a2cos 2 n N 1 (non-zero valued end-points) a0=0.62; a1=0.48; a2=0.38 Blackman window w(n) = a0 a1cos 2 n N 1 +a2cos 4 n N 1 (non-zero valued end-points) a0=0.42; a1=0.5; a2=0.08 1MATLAB is a registered trademark of The Mathworks Inc. 15 2.7 MATLAB According to the developers of MATLAB, \MATLAB is a high-level technical comput- ing language and interactive environment for algorithm development, data visualization, data analysis, and numerical computation.", making it a very powerful tool [59]. It is the dual aspect of MATLAB (programming language and development environment) that makes it a good choice for this research. Another advantage of using MATLAB is that it was originally developed with signal processing in mind [59]. The name MATLAB stands for MATrix LABoratory as the MATLAB language stores all data values in a matrix form [19, 49]. This means that if the program stores one value or one thousand values, each value goes into a cell of a matrix. Another feature of this language is that variables do not have to be declared ahead of time or their data types speci ed. Also, one does not have to allocate memory as MATLAB has built-in dynamic memory allocation [59]. For this research, a major bene t of this type of storage method is that in processing a voice sample, the size of the sample will not be known ahead of time. In MATLAB, the size of the matrix will provide a representation for the number of elements, as each cell in the matrix will have some value in it. The large built-in function library (over 8000 commands) in MATLAB provides three commands, \size", \length", and \numel" (see Table 2.3) which can be used to easily determine the number of elements in a given matrix that may have been brought in from an outside data le [59]. E cient accessing of data from an outside source is another strong point of this pro- gramming language. There are multiple commands available to access data from di erent le types, databases, or even other applications that are written in another language such as C or C++ [59]. Some of the commands that were bene cial to this research are \wavread", \xlsread", \textread" and \ nd", see table 2.3 for a description. These commands were used to read in demographic information pertinent to the voice sample that was stored in other le types or a database. Likewise, writing data to an output le was made more 16 e cient with the commands such as, \fprintf" when a text le was required or \xlswrite" if data was better suited to a spreadsheet. Having the tools to collect, process and store data e ciently was a necessity for this research. Also, being able to visualize the data was especially important when doing the voice samples. The graphics ability of MATLAB is another area that made this language a good choice for this research. Within MATLAB, there are functions that will plot 2D and 3D graphs to give a clear visual of the data that this study worked with. Along with this list of functions, included are many labeling and formatting functions that will add to the output of this research. All these built-in functions in MATLAB enabled the analysis process to be more dynamic, making it a powerful development tool for this research [59]. Table 2.3: Description of some MATLAB commands used COMMAND DESCRIPTION OF FUNCTION nd searches a given matrix for a speci ed condition and returnsa matrix with the index values of the found values numel Returns the total number of elements in a given matrix length Returns the larger value between the number of rows and thenumber of columns for a given matrix numel Returns the total number of elements in a given matrix size Returns the number of rows and the number of columns fora given matrix waveread Reads Microsoft sound les xlsread Reads Microsoft Excel spreadsheet les xlswrite Writes data to a Microsoft Excel spreadsheet le fprintf Writes data to the command window or to a text le 17 Chapter 3 Research Plan 3.1 Subjects A goal of this research project was to acquire approximately 100 to 200 subjects to participate in this research to achieve a diverse population set. This goal possible to reach due to the large population of undergraduate and graduate students at this institution. Collecting all samples from this population, due to the fact the sample population was not very diverse in age or area of the United States that most a ected the way they talk. It was found that of the participants that came from the local area and institute, 89% were of the age of 19 to 21 and from either Alabama or Georgia. To remove this limitation and gain as varied group of participants, a request was sent out to friends and family from di erent communities around the United States. As will be discussed in Section 3.4 the \snowball" method was used to gain as many participants as possible without having to have direct contact with each potential participant. This method worked well in that there were over 170 voice samples collected along with the particular demographic information associated with each sample. With the request for involvement going out to presumably any part of the United States or the world, participants from diverse groups were collected. All voice samples and data were collected from a remote location of the participants choosing, (i.e. where they had access to the Internet and a telephone). There were 10 participants that did not complete the study because they did not leave a voice sample as required. Since all data collection was obtained in a private manner, it is not known whether the participant chose to not nish the study or did not understand the directions given. They had the option to contact the study organizers via phone or email if they had any questions or encountered any di culties with the data collection process. Yet only two such contacts were received 18 from participants, one indicating that the web pages for the study did not load, which was due to the server being o line as an unrelated application had caused the server to crash. The other correspondence described an error message that was given when the participant attempted to navigate from the demographic page to the phone instruction page. In this case the reason for the acquired error message was due to an inadvertent double entry in the table that stored the IDs that were given to the users, this was promptly recti ed. Excluding these two cases, data collection transpired e ciently for all participants. The demographic data gathered from participants that did not leave a voice sample was deleted from the database due to the fact that without a voice sample the demographic data was not usable. 3.2 Equipment and Material Used There was very little equipment required to participate in the research study. A partici- pant accessed the application via the Internet connection and a telephone. The participants needed no speci c computer knowledge or experience to participate in this research. They did however need basic knowledge of the Internet to facilitate them with navigation to the initial web page of the study and to answer the demographic questions. All individuals that were contacted about participating in the research were given the option of either coming to the Shelby Center for Engineering Technology at Auburn University to complete the data entry task or choose to complete the study at a remote location of their preference. Individuals that participated in the study remotely were required to obtain access to the aforementioned equipment as no equipment was provided for any remote involvement in the study. Design of the web pages were such that they loaded on most all universally used web browsers. The content of the pages was kept to a minimum to allow a participant with a dial up connection to still participate with the least amount of waiting for the web pages to load. Had an individual chosen to come to the Shelby Center for Engineering Technology, all equipment was provided for their use. 19 3.3 Software Used The algorithm development process utilized several di erent technologies. The primary development environment was the MATLAB programming environment from The Math- Works [60]. MATLAB has literally hundreds of built in functions that vary from the basic functions to specialty functions that are grouped together into what The MathWorks calls toolboxes. For this research several of these functions were utilized in the data processing and analyzing phases. The database used was MySQL by Sun Microsystems [43] and server version: 5.0.51a SUSE MySQL RPM and the operating system for this server was Linux operating system. The basic web pages created were written with HTML and JavaScript. Using JavaScript guaranteed that all elds on the demographic page were lled-in as it does not allow advancement to the next webpage until all elds had been lled. Pages that need to connect to the database were written using PHP along with MySQL commands. For the phone application programs the VoiceXML programming language was used along with PHP and MySQL for situations that needed database access. Clustering analysis was exe- cuted by the software Applications QuestTM[20] with the output being copied to a Microsoft Excel spreadsheet. Microsoft Excel was also used to do some of the storage of calculated results along with preliminary sorting of data for examination. 3.4 Data Collection Methods To conduct this study voice samples were needed along with the demographic infor- mation for each participant. It was determined that to add diversity to the population set, participants needed to be from locations around the United States. Additionally, it was preferred that there be dissimilar ethnic groups be enlisted and that the percentage of male and female participants be balanced. The method chosen for accumulating data was the \snowball" data collection method [57]. This method was conducted in a manner where requests were sent to acquaintances and once they participated they then solicited their friends and family to participate. Originally twenty- ve requests were sent out and 20 at the conclusion of the analysis 170 participants had been acquired, making this method a practical way to collect data. During this process each participant was asked to respond to the following demographic request/questions: Please select your Gender Please enter your Age Please select the following that best describes your ethnicity Please select what your primary language is Please select the country you consider your primary nationality Please select the country for your parents primary nationality Please select the state of the United States that you would say has a ected the accent of your voice the most Please select the one that best represents the highest level of education completed Please select the one that best describes the area that you live in Please select the one that best describes how you feel today Have you had a physical injury or a disease that would a ect your voice? Would you consider yourself to have a speech impediment? Please select the category for your height Upon completion of the data collection it was found that approximately 55% of the participants were female and 45% male. Nine ethnic groups were represented with Caucasian being the largest group at 67%. English made up the largest representative language at approximately 96%, but ve other languages were also declared. Ten countries were given as primary nationality with United States being the highest percentage at 92%. In regard to education the top three categories were as follows: Bachelors Degree at 32%, Masters 21 Degree at 26%, and some college at 21%. Full details of the breakdown of the collected demographics can be viewed in Appendix A. The collection of all demographic information was completed via a website that is hosted on a server under the supervision of Dr. Juan E. Gilbert. The server is located in a locked room that is only accessible by authorized personnel so that all data collected is secure. The o cial URL address for this site was \http://www.voicestudy.com/". A screen shot of all pages can be seen in Appendix B. The rst page of this site gave the person an opportunity to view the information letter about this study (which can be viewed by looking at Figure B.1) and to either agree to continue or not. If the participant chose to continue they were then taken to the demographic page where they responded to thirteen requests for the above information (see Figure B.2). No data was collected to identify any participant. The participant was not able to navigate from this page until they responded to all thirteen requests. Once they completed this and clicked to continue, their information was stored in the database, which was located on the same server previously mentioned, by using a PHP program to interface with the database. The next page that the participant saw was an instruction page that informed them on how to complete the calling procedure for the phone application, that was used to collect a voice sample from them (see Figure B.3). The phone application was accessed using a free developer service under the umbrella of Nuance Communications, Inc by the name of \NUANCE caf e" formally known as \BeVocal caf e" [47]. Since this development platform was free for the participant they had to call a toll free number (1-877-338-6225) and they were prompted to enter a user ID and PIN number. The user ID was 8446348 and the PIN was 1234; both of these were provided to the participant on the phone instruction page. An example can be seen in Figure B.4. Once log-in was accomplished the user continued directly to the phone application which proceeded in the following manner. 1. A welcome message was played. 22 2. Then the application requested the participant to enter the four (4) digit number given to them on the phone instruction webpage, see Figure B.5. 3. The application then veri ed that a valid number was entered by querying the data base and making sure that number was a primary key for a row in the database. 4. Upon validation of the ID the application gave instructions to the participant on what would take place next. After that they heard a phone ringing and a message played as if they had received a friend?s voice mail. 5. When prompted, the participant would then leave the exact message given to them on the instruction page; to see the message view Figure B.6. 6. Next they had an opportunity to hear the message they recorded and either except it or try again. 7. Once they accepted their message they were thanked for their participation and after that the application disconnected. Nuance caf e saves all voice recordings as WAV les which are Microsoft?s audio le type. Nuance?s default le type (audio/wav{WAV (RIFF header) 8 KHz 8-bit mono mu-law [PCM] single channel) worked well and was in a form that MATLAB can open and extract the data from directly. A le name was created by concatenating the word \participant", the four (4) digit number that was given to the participant along with the le extension WAV. This lename was also stored in the database under a eld name \ leName". The actual sample le was stored on the secure server that all les associated with this study are stored on. There was a speci c folder setup for these les (WAV les) which helped to keep them separate from the program les. 3.5 Experimental Overview This section gives an overview of the approach that was used to validate the hypotheses presented in Section 1.3. The main objective of this research was to develop an algorithm 23 that analyzes a voice sample from an individual and obtain numeric data that represents that person?s speech. The sample was analyzed in both the time and frequency domains. Then an evaluation was made using Applications QuestTM(AQ) [2, 20] for the determination of the clusters that were formed using the numeric data. SQL queries were made of a database that had been created to store the demographic and result data. In addition result data was written to Microsoft Excel spreadsheets for sorting and examination of the data. To utilize these applications, the following were needed to gather and analyze the data from the participant: A uniform method for the collection of the demographic information and voice samples from individuals. A database containing all demographic information and values calculated. An algorithm that calculates data from the sample as it pertains to time. An algorithm that uses a FFT and a windowing function to convert a voice sample from the time domain to the frequency domain. An algorithm that calculates di erent parameter values to be used to observe clusters that may occur. With guidance from these principles, the architecture for the proposed voice system consisted of three phases: data collection, voice sample processing, and database setup. In the data collection phase the user interacted with a web interface that collected demographic information. That was used to determine classi cation groups which may be formed after the voice sample was ran through the voice processing algorithms. To prevent bias and to protect anonymity, an arbitrary number was randomly assigned to each submission. Upon completion of the demographic survey a voice sample was collected via a phone application where the user called in and left a voice sample to be analyzed. The voice sample was saved as a WAV le with the given identi cation number as part of the le name. All the participants? data was stored in a table of the database which corresponded to an Excel 24 spread sheet that held a copy of this data. This allowed it to be more e cient when uploading the records into the clustering algorithm for modeling of the results. 25 Chapter 4 Time Domain Experimentation and Results This chapter details the experimentation that was conducted on the voice samples before they were converted to the frequency domain. This was the rst of four experimental phases for this research intended to investigate parameters to utilize the classi cation of an individual. It is not uncommon to hear individuals talking at various rates and/or having di ering amounts of pause between their words. Given that the voice samples were in the time domain, a numerical value was calculated for these two occurrences. 4.1 Experimental Design 4.1.1 Experiment Goals The goals of this experimental phase were the following: Create an algorithm to eliminate beginning and ending white noise from the sample. Calculate the length of the sample in seconds. Create an algorithm to determine where pause areas are in the sample. Calculate the total amount of pause in seconds of a sample. 4.1.2 Procedure The original sample received from each participant was stored at the time of their participation in a WAV le and saved on the same server where the voice application was hosted. All samples were made at a sample rate of 8000 KHz and each participant said the exact same thing \George, I want you to help me x my tire. Call me at 924-2949.". 26 The free digital audio editor, Audacity [3], was used initially to view a graph of the voice samples see Figure 4.1. Audacity gave easy access for playing any part of the sample and Figure 4.1: Original voice sample opened in digital audio editor, Audacity also a quick view of elapsed time. The process that the BeVocal caf e [47] application records the participant?s response, each le has a leading and ending segment that is either silence or nominal white noise, see Figure 4.2a. This proved to be bene cial when developing the algorithm for time domain analysis. A maximum and minimum value for white noise was calculated by using these two sections. With the ability to set these boundaries speci c to each sample, an algorithm was developed to crop the beginning and ending noise from each sample see Figure 4.2b. When the data from a WAV le is read into MATLAB it is put into a vector which makes it very e cient to obtain the starting and ending points of the voice sample. Starting at the beginning of the vector the index number is recorded for 27 Figure 4.2: Example Graph of Voice Sample In Time Domain (a) Original voice sample in the time domain (b) Cropped voice sample in the time domain 28 the rst value that goes above or below the threshold that has been calculated from the white noise. Next, starting at the end of the vector the algorithm begins with the last cell in the vector and decreases the index value by one, until a value that goes above or below the threshold is found and the index number was recorded. Now taking the rst index found and the last index found a cropped sample can be obtained by using the \wavwrite" function which takes the vector values and creates a new WAV le. This new WAV le will be the sample left by the participant without the leading and ending white noise. Once the sample had been cropped the rst parameter and total message time is calculated. This algorithm also uses the threshold that were calculated during the cropping process using them in an alternative method. The algorithm starts at the beginning of the vector and searches for the rst value that falls within the given threshold. The index for this value is then recorded in another vector and the program begins looking for the next value that is above the threshold. This process continues until it has worked its way through the entire vector. The total number of data values found is divided by 8000 (number of bytes per second) giving the total time of pause or no talk in the sample; see Figure 4.3. Deciding Figure 4.3: Voice sample showing where the calculated pause of sample is located at. to consider this calculation came about when two cropped samples were observed that had 29 precisely the same talk time. However when the les were viewed in Audacity it revealed that one le had considerably more pause spaces than the other. This can be attributed to the fact that some people may talk at the same speed with one always making sound (i.e. saying something like \uhuhuh" between words and the other not making any sound but yet still having the same amount of time between words, see the graphs in Figures 4.4 and 4.5). All cropped voice samples where run through an algorithm that calculated the three time values, total lapse time of original voice sample (no cropping), total time lapse for the cropped voice sample, and the total time lapse pertaining to pause in the cropped sample. After these calculations were made the values were then written to a text le in the form of MySQL update statements so they can be added to the database. In addition the values were also stored in a Excel spreadsheet that contained all demographic data, along with all calculations that were made for each voice sample. This le was then used to load all pertinent data into Applications QuestTMfor clustering evaluation. Figure 4.4: A sample with speaking time of approximately 7 seconds and pause time of 0.44 seconds 30 Figure 4.5: A sample with speaking time of approximately 7 seconds and pause time of 1.78 seconds 4.2 Results The initial analysis was conducted to see if the time information made any classi cation as a standalone parameter. The results of this preliminary analysis proved to be very informative when analyzed. The data was sorted, using Excel, by pause time and total talk time of the cropped les and the average for male and female was calculated, see Table 4.1. It was observed that the average time to say the phrase was the same for both male and Table 4.1: Comparison of total time to say message (cropped sample) and the amount of pause in the sample with the percentage of pause in the sample as it pertains to Gender Gender Number of Avg Talking Avg Amount Average Percent Samples Time Pause Pause Female 93 6.28 1.64 25.7% Male 65 6.28 1.81 28.4% female. Comparing the pause time seen in Table 4.1, it shows that males do have a greater percentage of pauses in their speech than females. In addition to gender, the area where the person was from was also examined. To accomplish this the states were separated into 31 regions according to the U.S. census [63]. They are West, Midwest, Northeast, and the South see the map in Figure 4.6. The sample set of participants contained individuals from Figure 4.6: U.S. Census Regions all the regions, with the largest group from the South. In Table 4.2 there is a complete list of the states that the participants were from. Table 4.3 shows the same result elds with the focus on the regions. The results are noteworthy in that there is a di erence in the total talk time as well as the pause time. As with gender, analysis of the time data in general showed Table 4.2: Regions in the United States and states represented from these regions West Midwest Northeast South California Illinois Connecticut Alabama Colorado Indiana Dist. of Columbia Florida Idaho Iowa Maryland Georgia Montana Michigan Massachusetts Kentucky Oregon Minnesota New York Louisiana Washington Missouri Pennsylvania Mississippi Nebraska North Carolina Ohio Oklahoma South Dakota South Carolina Wisconsin Tennessee Texas Virginia 32 Table 4.3: Regions in the United States and states represented from these regions Region Number of Avg Talking Avg Amount Average Percent Samples Time Pause Pause West 13 6.48 1.99 30.7% Midwest 30 6.33 1.72 27.2% Northeast 7 6.59 1.64 24.9% South 103 6.22 1.68 27.0% some interesting results such as, which region had the larger average for total time of talking or which region had the highest amount of pause. Still there was not enough of a di erence to make any de nite classi cations at this time. The data was entered into Applications QuestTMand 6 clusters were made. The overall Di erence Index (DI) was 29.34%, where this value states as a whole how much similar or dissimilarthe samples are. Therefore the lower DI value indicates the greater similarity of the samples. The recommended DI value for this inquiry was 24.47% giving a target value for the clusters. All the cluster?s DI value were below this mark giving validity to the results in that the members of the clusters were close in characteristics. For analysis, gender was the only attribute that was closely distributed within a reasonable ratio, so the focus was put on this attribute when the clusters were evaluated. Clusters 0, 2, had only women participants and 3 had all women participants except one male, where as clusters 1, 4, and 5 only had men, with the following results observed and compared to the total averages shown in Table 4.1. Cluster 0 had 15 females, all from small towns and DI at 14.79% which indicates very little di erence between the participants. The talk time was 5% above the total aver- age and the pause time was 9.75% above the total average, for all females. Indicating that this group talks slower then the average female in the study. Cluster 2 had 65 females, mainly from the suburb/urban area with the DI at 20.81% which indicates a small di erence between the participants. For this group the talk time was 3.6% under the average and the pause time was 7.2% under the average, 33 for all females. Indicating that this group talks faster then the average female in the study. Cluster 3 had 13 females, all from the suburb area with the DI at 20.42% which indicates a small di erence between the participants. For this group the talk time was 13.2% above the average and the pause time was 32.9% under the average, for all females. Indicating that this group talks slower and with considerable more pause then the average female in the study. Cluster 1 had 26 males, with no dominant area with the DI at 23.25% which indicates a nominal di erence between the participants. For this group the talk time was at the average and the pause time at the average, for all males. Indicating that this group is a good representation of the average male in the study. Cluster 4 had 16 males, with all but 1 from a suburban area with the DI at 17.14% which indicates a small di erence between the participants. For this group the talk time was 7% under the average and the pause time at the average, for all males. Indicating that this group does talk faster than the average male in the study. Cluster 5 had 23 males, with all from a small town with the DI at 16.04% which indicates a small di erence between the participants. For this group the talk time was at the average and but the pause time was 8.3% below the average, for all males. Indicating that this group talks at the average speed but with less pause compared to the average male in the study. 4.3 Conclusion This phase of the study yielded good results in that it indicated that the amount of time for an individual to speak a phrase can possibly give an indication of the area they live in and possibly the state where they lived the most. The results did seem to give a clear separation between male and female. When gender is added to the area that they live 34 in this may give characterizing factors for the individual. The ndings from the clustering were interesting and worth noting, but with this not being the primary area of investigation all data was recorded and saved to be used in future work. 35 Chapter 5 Frequency Domain Experimentation and Results: Initial Phase Even though humans do consider the speed and pause of another?s voice it is the frequency domain that can give the greatest amount of data for analysis. This chapter details the experimentation that was conducted on the voice samples that were converted to the frequency domain. This was the second of four experimental phases for this research intended to investigate parameters to classify an individual. Once the voice sample had been converted from the time domain to the frequency domain, analysis was done to nd results to support the hypothesis of this research. 5.1 Experimental Design 5.1.1 Experiment Goals The goals of this experimental phase were the following: Create an algorithm to convert the sample from the time domain to the frequency domain. Determine all peaks for the frequency sample between the boundaries 250 - 1250 Hz. Determine the most prominent peaks of the sample. Calculate and average the slope between the prominent peaks. Calculate and average the distance between the prominent peaks. Determine the maximum and minimum frequency values for the prominent peaks. Determine the total distance between the rst and last prominent peak. Determine the total number of prominent peaks. 36 5.1.2 Procedure 1 (Converting Data) Originally the voice sample was saved as it pertains to time domain; for any analysis of the frequency sample the signal must be converted from the time domain to the frequency domain. As mentioned in Section 2.5 the Fast Fourier Transform (FFT) is the most common formula used to accomplish the change from one domain to the other. The MATLAB programming environment has a very e cient FFT function \ t". This function will receive the time data and process it into frequency data. Since it is not guaranteed that the data sample begins at the start of a cycle, spectral leakage can take place (as explained in Section 2.6) and a windowing function must be applied rst before the data is sent to the \ t" function. The Hanning window function shown below: w(n) = :5 1 cos 2 nN 1 was used because it is a straight forward function and is simple enough that it does not add computational complexity to the algorithm. Once the windowing function was applied the data values were sent to the \ t" function. Each voice sample in this study ranged from 4.5 seconds to 8 seconds of speech which when read into MATLAB using the \wavread" function numbered in the tens of thousands of \time" data points. After sending this time data to the \ t" function, 2048 data values were returned as it is a representation of the frequency for that sample. The graphs in Figure 5.1 show the di erence of data representation of the two domains. The frequency analysis was executed only on data from 250 to 1250 Hz as this is the range that will have the most information for the way a person speaks according to an expert in the signal processing industry, Dan Ginzel owner and lead developer of signal/voice applications for Coach Comm [21]. 5.1.3 Procedure 2 (Locate Primary Peaks) The next task was to take the frequency data from the full sample and crop it to the set boundaries (250-1250 Hz)to get a visual representation of a person?s voice sample. In Figure 5.2 it shows the full view of the frequency graph where as Figure 5.3 shows the 37 Figure 5.1: Graphs of Cropped Voice Sample Saying Full Message (a) Sample in the TIME domain (b) Sample in the FREQUENCY domain (250 - 1250 Hz) 38 sample after the boundaries were set. To begin with, both the peaks and the valleys were considered but after closer analysis the peak information was determined to be adequate. Initially the algorithm found all the peaks for the entire frequency graph. As illustrated in Figure 5.4 the large amount of peaks made it hard to get a clear view of the peaks in relationship to the graph. The graph was then modi ed to be within the boundaries (250- 1250 Hz) which made it much easier to see where the peaks were located; see Figure 5.5 for an example. At this time the complete message was used and with the sample limited within the boundaries stated, it was clear that analysis can continue forward concerning the tone of the sample. Figure 5.2: Full frequency graph showing the boundaries for the area that will give the most information for a voice sample. 39 Figure 5.3: Selected frequency sample (250 - 1250 Hz) graph of the bounded area in the graph above. Figure 5.4: Graph showing a view of peak locations of a full frequency sample 40 Figure 5.5: Graphs showing di erent views of peak locations of sample within the frequency boundaries (250 - 1250 Hz) (a) Shortened Frequency Sample Showing All Peaks (b) Shortened Frequency Sample Showing Primary Peaks 41 5.1.4 Procedure 3 (Calculate Averages) The initial thought behind calculating the goal values listed was that by seeing this data on the dominant peaks illuminates information about the tone of the sample. If the peaks were more spaced out it indicates a more consistent tone for the sample. If the slope average was positive, then going from left to right the peaks progress up in height, see Figure 5.6a. Likewise if the slope average was negative going from left to right the peaks diminish in the height value, see Figure 5.6b. Giving each of these individuals a totally di erent sounding voice, where one is lower sounding(decreasing peaks) and the other is a higher sounding (increasing peaks) voice as it pertains to pitch. Another fact that can be ascertained from the slope average is an idea of the closeness of the peaks. The closer the peaks are to each other causes the slope average to advance towards positive or negative in nity. Whereas, if the slope was approaching 0 this indicates that the peaks were farther away from each other. The last three goals, were accomplished, but when analyzed, did Figure 5.6: Graphs showing one that has a positive slope average and one with a negative slope average (a) Peaks heights increasing (b) Peaks heights decreasing not o er any revealing information towards one classi cation or another. The data was stored for possible further analysis of other variables at a later date. The data tables (30 42 pages) containing all the averages and number of peaks for the aforementioned goals can be found in the Appendix. 5.2 Results The average slope and average distance between the peaks showed the most promise for determining a classi cation for a person. It was the results obtained for these two averages that this phase focused on. The results for the average slope were considered rst. For visualization, the slope values were put into tables showing the ranges for the positive average slope values and ranges for the negative average slope values according to the demographic data. In the rst table gender was considered and it revealed that there was no real di erence between the male and female when it came to the negative slope ranges, when graphed the two lines were the Table 5.1: The average slope between peaks with the focal point on Gender Gender Negative Boundaries Female -0.027925 to -0.000141 Male -0.026729 to -0.000343 Positive Boundaries Female 0.000187 to 0.066688 Male 0.000795 to 0.013462 Table 5.2: The average slope between peaks with the focal point on Ethnicity Ethnicity Negative Boundaries African American -0.017488 to -0.000981 White -0.027925 to -0.000141 Positive Boundaries African American 0.002743 to 0.012799 White 0.000187 to 0.066688 43 same. However when looking at the table for the positive slope ranges the data visualization indicated a very noticeable di erence between the two ranges, see Table 5.1. The next table shows the results for ranges as it pertains to ethnicity, see Table 5.2, where the two most prominent groups are \White" participants and \African American" participants. In com- paring data in both tables it was interesting to note that the positive slope range for females was indistinguishable to that of white participants. Therefore a graph was constructed, see graph 5.7, with data for females, males, white and African American participants. In view- ing this graph it is apparent that at a distinct positive slope value greater than 0.013462 there is a very high probability that the participant is a female, white or both. Dur- Figure 5.7: Graph showing the ranges for the positive and negative average slope of lines between peaks ing this preliminary analysis, data was also considered according to the distance between the peaks. Even though the average slope gave equivalent information, closer analysis was warranted given that the actual distances between peaks aids in the analysis of the tone of a participant?s voice. Higher averages indicated a greater distance between each peak. Whereas a smaller average indicates that the peaks were not separated very much. This was investigated for the prospect that it better indicate a characteristic about the participant 44 then the slope average. For this observation, four groups were considered (male, female, White, African American) as some natural breaks were observed when these ranges were graphed. For this data there were two natural separations, one at the average distance value of 301.8281 and one at value 399.7623. When the graph is viewed it is inductive that the probability is high that a person with a value above 301.8281 is either a white female or a white male. This is due to the fact that all African American averages were below this value. When the value gets over 399.7623 the probability of being a female drops out and the probability that the person is a white male is prominent, see Graph 5.8. Along Figure 5.8: Graph showing the ranges for the average distance between the primary peaks with storing the calculations in the database, a tab delimited text le was created that had these calculations and the demographic information associated with it. This le was then uploaded to Applications QuestTM(AQ) to nd clusters in the data [2, 20]. Clustering was done as it pertains to gender, ethnicity and slope average which when the clusters were investigated veri ed the previous tables. However it reveled that the maximum value was more of an outlier then a representation of the group as a whole. It was the distance average that gave the most validation in that the 11 individuals that had an average above 301.8281 were indeed white female and even more so were members of the same cluster. 45 Chapter 6 Frequency Domain Experimentation and Results: Graphical Phase All analysis speci ed to this point was very promising, but did not give a clear separa- tion in any of the demographic areas. At this time a program was written that allowed the viewing of all the graphs of the voice samples as they related to the frequency domain to obtain direction for the next phase in the analysis process. A di erent digital audio editor, Cool Edit Pro (now Adobe Audition) [12, 13] was used to visualize the frequency graphs. When graphed using Cool Edit the graph changes as the application progresses through the sample. It was then observed that the prominent peaks changed location depending on the section of the sample the application was analyzing. By using the loop function this progression was viewed over and over again. The Cool Edit application showed that in ection was feasible to determine the movement of the peaks that were displayed on the screen. From the examination of the graphs and the visual that Cool Edit presented, it was observed that the rst half and the second half of a sample were di erent. 6.1 Experimental Design It was decided to split the each sample into two parts and do some analysis on both halves to determine if they were similar or dissimilar enough to give some indication of a certain demographic characteristic. The inclination of this analysis was that a parameter is obtained that is connected to voice in ection. 6.1.1 Experiment Goals The goals of this experimental phase were the following: Separate the entire sample into two halves. 46 Isolate a single word. Separate the word into two halves. Get graphical representation for visual analysis. 6.1.2 Procedure Once this decision was made it was straight forward to implement by using the previ- ously mentioned ability of the MATLAB application to store all data in arrays. With all the data points stored as single elements in an array one need only use the command \numel" (number of elements) and then split the array into two separate arrays. Five participant samples were selected for testing to determine whether the smaller samples can be processed using the algorithms that were all ready written or if modi cations were needed. At rst it appeared to work as well as using the full sample, therefore all samples were processed. As before the results were written to an Excel le and upon observation not all participant half les were processed correctly. It was found that by splitting the sample in two parts there was not su cient data when a certain calculation was done. A set parameter of 2048 needed to be set to 1024 for the following calculation to work properly. Following this correction the data from the two separate halves were graphed and observations were made to see what useful information was obtained. The graphs of the two halves were plotted in the same window and each sample was viewed using a simple MATLAB script program that allowed straightforward progression through the graphs. To view an instance of this graphical com- parison of the two halves of a sample, see Figure 6.1. Though some graphs did illustrate that a useful di erence between the two halves was observable, less than 20% of the sam- ples displayed this characteristic. It became clear that using the full sample was going to furnish too much information to obtain a consistent and realistic numeric representation of the voice. The next logical step was to separate a single word from the sample. Because of the work completed earlier, where the pause in the participant sample was determined along with the cropping of the white noise from the beginning and end of the sample, it 47 was possible to isolate words in the sample. The initial preference was to get the rst word in the phrase spoken, that being the word \George". Once this word was isolated, analysis was done with the word separated into two halves. As with the samples that contained the entire phrase, there were several that showed some good interpretation of the voice, but were not robust throughout the entire sample set. After some consultation with Dan Ginzel [21] two situations for this outcome were considered. The rst rationale was that in saying the word \George" being the initial word in the phrase the person may take a deep breath before speaking. Some of this white noise may not be eliminated during the cropping pro- cess having an in uence on the rst half of the sample. The second possible explanation for result from analysis, is that some words have what is commonly called \attack" or \variable stress". Attack is the unambiguous beginning of speaking a word [10] and variable stress is the speaking of a syllable in a word louder and longer [54]. Just as taking that deep breath can create white noise; these two speech methods have the potential to add noise to a word. The situation that arises with these two speech areas is that not everyone may have this mannerism and thus proves to adversely a ect the analysis between the two halves of the given sample as it relates to the general population. Displayed in Figure 6.2 is an example of the e ect of variable stress. The graph of the rst half starts out with an elevated value and then declines continually from there. Where the graph of the second half shows a more osculating sound and the peaks of the two when compared do not give a usable pattern. Given the aforementioned issues, a close assessment of each word in the spoken phrase was made. It was determined that the word \nine" is the best choice as it did not appear to have the possible pitfalls that the word \George" had and this word was used three separate times. The word nine is to be found in the following locations in the phrase: seventh to the last word (start of saying telephone number), third to the last word, and the last word in the spoken response. The last word was not used as it can have similar issues of acquiring white noise. The second instance of the word was the most logical choice as it was spoken in the ow of speaking other numbers. The location of this word for some samples did present 48 Figure 6.1: Comparison of participant sample split in two halves Figure 6.2: Comparison of participant saying the word \George" split in two halves 49 more of a challenge to separate from the phrase as some participants do not have a clear pause in their speech. With the focus of this research not to create an application to retrieve words from a spoken phrase, the second instance of the word nine was manually extracted using the audio application Audacity mentioned previously. As a result of collecting the sample in this manner, it gave assurance that the new samples were an accurate sample of the participant saying the word \nine". Audacity gives the user the ability to see a visual of the WAV le as well as to listen to the section that was selected. All selected instances of the le were listened to and then saved as separate WAV les for analysis. These new samples were then processed into MATLAB and separated into halves as previous samples. Each half was then graphed to determine the new results acquired from the word using the word \George". It can be seen that there is a more usable set of data that comes from these samples, see Figure 6.3, in that there is a higher amount of consistency in the samples. The peaks have a more uniform appearance between them and the amplitude is as one expects. This being that the rst part of the word is spoken with more volume then the second half, but not a recognizable amount when listened to. 6.2 Results For this phase all goals were accomplished and extended into involving more than one word for analysis. The results from this phase gave clarity and direction in that by splitting the entire phrase it showed that there were to many frequency changes for good analysis. This lead to choosing a single word which was the word \George". Resulting from graphically analyzing the two halves of this word the issue of variable stress became evident. Another word that was not e ected by the variable stress, i.e. the word nine, was then chosen. The graphs for this word showed that nine did give good patterns to analyze. This gave the prominent result from this phase to be; nding and using a word that did not have variable stress. This started the research into the nal phase of experimentation. 50 Figure 6.3: Graphs showing the two halves of the word \Nine" 51 Chapter 7 Frequency Domain Results: Final Phase The evolution of this research has been very intriguing, as it relates to the qualifying of the two hypothesizes being pursued. At the completion of analyzing the samples of the participants saying the word \nine" it is the opinion of this research that some important discoveries have been made. One of the most successful is the uniformity that was ascer- tained by comparing the rst and second halves of this word. Graphically it was illustrated that the peak patterns were relatively consistent in their progression, see Figure 7.1 for an example. One can see that even though the value of the amplitude is di erent the pattern Figure 7.1: Graphs showing the two halves of the word \nine" and the consistent progression of the two samples 52 that the peaks make is predominantly analogous. Results such as this were the stimulus for the nal phase of experimentation. The nal phase commenced by asking the question, \If dividing the word into two equal components displayed a pattern, what would splitting it into multiple samples reveal?". 7.1 Experimental Design 7.1.1 Experiment Goals The goals of this experimental phase were the following: Create an algorithm that will divide the sample of the word nine into multiple les. Determine the most prominent peaks for all sub-samples. Store all peak location for prominent peaks for all sub-samples. Calculate the number of peaks for each sub-sample. Complete graphical analysis of peak location. Determine mathematical representation for peak activity. 7.1.2 Procedure At this time a program was written that separates the data values that had been read into MATLAB into multiple WAV les that had 800 bytes each of the original le. A previous observation was recalled when using the audio application \CoolEditPro" [13]. One of the functions of Cool Edit is that it can give analysis in a visual form by illustrating the sample graphically, were the image will change as the application plays the audio sample. Cool Edit also has the function to play a continuous loop of the audio sample which is depicted on the analysis screen as an animated graph. What this con rmed is that as the sample is played the peaks will change position slightly, but because of the overlapping of data there is not a major divergence of location as it ascertains to the peaks. The challenge 53 was to be able to represent what was obvious visually, can be analyzed with tangible results. Therefore the nal phase consisted of taking the sample of the word \nine" and taking 800 bytes of data in small increments. Using MATLAB all data was stored into an array with a simple script program that looped through the array. At the start of each iteration of this loop, the program advanced by 26 bytes and created another 800 byte le. In Figure 7.2 graphs of the rst eleven les are illustrated, showing the slight change in position mentioned before. It is this shifting that this research proposes will give a clear picture of the uctuation of a person?s voice numerically and will thus give parameters that will Figure 7.2: Multiple graphs showing the change of the frequency and amplitude for the word \nine" spoken by a single participant. facilitate that person to be classi ed. In viewing Figure 7.2 the main peaks change position as it pertains to frequency i.e. the location of the second peak changed. The phrase \location" will for this section stand for the relationship of the frequency value and the peak order number for a given sample. Location will give insight into what is transpiring with the number of peaks for the les. If it is determined that the peak count increased or decreased, the location value will indicate at where a new peak was formed or a previous peak no longer occurs. The added details this gives is much more informative than when only the number of peaks were known at the initial phase of this analysis. To visualize this 54 data, the rst four les were graphed and viewed upon which it was clear as to what had transpired. When the number of peaks increased a minor peak emerged. Likewise if the peak count went down then a minor peak had been eliminated. The two graphs in Figures 7.3 and 7.4 illustrates this by going from one cross-section to the next where a new minor peak is formed and at the Figure 7.3: View of peak location of the le 1 from the breakdown of the le where the participant said the word \nine". Figure 7.4: View of peak location of the le 2 from the breakdown of the le where the participant said the word \nine". 55 same time a previous minor peak is eliminated. If the peak count was only considered it indicated that no change had taken place, when in reality two events had occurred. The next two graphs show, Figures 7.5 and 7.6, the event of going between two di erent cross-sections where no minor peaks were formed or eliminated, retaining the same peak Figure 7.5: View of peak location of the le 3 from the breakdown of the le where the participant said the word \nine". Figure 7.6: View of peak location of the le 4 from the breakdown of the le where the participant said the word \nine". 56 count and the graphs are very similar. This indicates, that just counting the number of peaks is not su cient as with the rst set. The count is the same but the location of the second and seventh peak are di erent. In contrast, the second set of graphs show that the peak count can remain the same, as well as the location of the peaks. The only deviation between the two is that the amplitude is slightly higher in one over the other. It was the assessment of the changing of the locations of the rst peak, second peak, third peak and so forth, along with the need to have a numeric representation that inspired the nal area of exploration for this research. To make the comparison of the numeric data as it relates to the graphs more straight forward, all peak data was stored into an Excel spreadsheet for evaluation. Table 7.1 is an example of what this data might look like in the spreadsheet. Looking at this table it can be seen numerically when the location of the rst peak, second peak and so on, either remains at the previous location or changes locations due to a minor peak being found or eliminated. Tracking this activity was vitally important to the completion of this analysis. This is best conveyed by numerically following in the table the previous example the graphs displayed. To accomplish this, two events needed to be monitored, at what location did peaks materialize or dematerialize and the location of each peak as it pertains to all the les. Starting with the location of the rst peak in le 1, it is located at 312.5 and remains this value until le 25 where the rst peak is found at 291.7. Again this location is continuous until le 38, when it shifts back to 312.5. The shifting from one location to another gave an observable pattern that was of great interest. Also this tracking of the peaks within the spreadsheet gave valued information as to when materialization and exodus of minor peaks transpired. Looking at les 1 and 2 (rows 2 and 3) in the spreadsheet, it reveals the same peak activity as the graph in Figure 7.4 shows. The graph shows peak 1 in the same location for both les and the spreadsheet reveals the same event. It can be seen in the spreadsheet that under the \P2" column it has a location of 479.2 for le 1 and 437.5 for le 2 indicating that some change has taken place . For le 2, \P3" is now at location 479.2, clearly indicating that another peak materialized that was not in the le 1. Likewise, 57 Table 7.1: This table shows the data on peak location for the rst 40 smaller les that were created from the full sample of a person saying the word \nine". It numerically represents the shifting of the peaks as well as the appearance and disappearance of minor peaks. File P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 1 312.5 479.2 645.8 791.7 916.7 1041.7 1083.3 1208.3 2 312.5 437.5 479.2 645.8 791.7 916.7 1041.7 1208.3 3 312.5 479.2 645.8 791.7 916.7 1062.5 1208.3 4 312.5 479.2 625.0 791.7 916.7 1062.5 1208.3 5 312.5 479.2 625.0 791.7 916.7 1062.5 1208.3 6 312.5 479.2 625.0 791.7 916.7 1000.0 1062.5 1208.3 7 312.5 416.7 479.2 625.0 791.7 916.7 1000.0 1062.5 1229.2 8 312.5 437.5 479.2 625.0 770.8 916.7 1000.0 1062.5 1229.2 9 312.5 437.5 479.2 625.0 770.8 916.7 1062.5 1229.2 10 312.5 479.2 625.0 770.8 916.7 1062.5 1166.7 1229.2 11 312.5 458.3 625.0 770.8 916.7 1062.5 1166.7 1229.2 12 312.5 458.3 625.0 708.3 770.8 916.7 1062.5 1166.7 1229.2 13 312.5 458.3 625.0 708.3 770.8 916.7 1062.5 1166.7 1229.2 14 312.5 458.3 625.0 708.3 770.8 916.7 1062.5 1229.2 15 312.5 458.3 625.0 708.3 770.8 916.7 1062.5 1229.2 16 312.5 458.3 604.2 708.3 770.8 916.7 1062.5 1229.2 17 312.5 458.3 604.2 770.8 916.7 1062.5 1145.8 1229.2 18 312.5 395.8 458.3 604.2 770.8 916.7 1062.5 1145.8 1229.2 19 312.5 395.8 458.3 604.2 770.8 916.7 1062.5 1145.8 1208.3 20 312.5 395.8 458.3 604.2 770.8 916.7 1062.5 1208.3 21 312.5 395.8 458.3 604.2 770.8 916.7 1062.5 1208.3 22 312.5 395.8 458.3 604.2 770.8 916.7 1062.5 1208.3 23 312.5 458.3 604.2 750.0 916.7 1062.5 1208.3 24 312.5 458.3 604.2 750.0 916.7 1062.5 1208.3 25 291.7 458.3 604.2 750.0 916.7 1062.5 1208.3 26 291.7 458.3 604.2 750.0 916.7 1062.5 1208.3 27 291.7 458.3 604.2 750.0 916.7 1062.5 1208.3 28 291.7 458.3 604.2 750.0 916.7 1062.5 1208.3 29 291.7 437.5 604.2 750.0 916.7 1062.5 1208.3 30 291.7 437.5 604.2 750.0 833.3 916.7 1062.5 1208.3 31 291.7 375.0 437.5 604.2 750.0 833.3 916.7 1062.5 1208.3 32 291.7 375.0 458.3 604.2 750.0 833.3 916.7 1062.5 1208.3 33 291.7 458.3 604.2 750.0 833.3 916.7 1062.5 1208.3 34 291.7 458.3 604.2 750.0 916.7 1062.5 1208.3 35 291.7 458.3 520.8 604.2 750.0 916.7 1062.5 1208.3 36 291.7 458.3 520.8 604.2 750.0 916.7 1062.5 1208.3 37 291.7 458.3 520.8 604.2 750.0 916.7 1062.5 1208.3 38 312.5 458.3 520.8 604.2 750.0 916.7 1062.5 1208.3 39 312.5 458.3 604.2 770.8 916.7 1000.0 1062.5 1229.2 40 312.5 458.3 541.7 604.2 770.8 916.7 1000.0 1062.5 1229.2 \P7" in le 1 is at position 1083.3, although in le 2 \P7" is at 1041.7 which is where \P6" is located for le 1. This indicates that a minor peak that was previously in le 1 is not in le 2. Now that these events were tracked numerically instead of only observing a graph 58 automation, this process began to materilize. One last visual observation was made, when the entire peak locations for all les were stored in the spreadsheet, a visual of the data as it pertained to the number of peaks increasing and decreasing was noticed. By using the \zoom" feature in Excel, a pattern can be seen as it concerns the number of peaks, giving a very unique blend of the numeric and visual environments. Examination of this new data led to the consideration that for a given participant?s multiple samples of the word \nine", there was a pattern pertaining to the frequency values. It became apparent that this pattern of the locations values can be tracked as a thread which is universal to geometric analysis. For this research, the tracking of the location values as it relates to the peaks of that sample will be called a \Frequency Location Thread" (FLT). It is this research?s certainty that tracking numerically the FLT, will provide a pattern for each participant?s voice. In Figure 7.7 it shows the pattern that is formed by where the rst peak is located, where the last peak is located and the average location of all peaks, are represented for all les. Figure 7.7: Graph of numerical data indicating the FLT stored in an Excel spreadsheet 59 At this point in the experimentation process the following steps had been accomplished for each of the samples: The word \nine" was isolated from each full sample and stored in a new WAV le The new le then was split in to multiple les of 800 bytes each (150 - 200 les) The peak locations for each 800 byte le were established All locations for all les were stored in a separate text le The threads for the respective les were established Upon completion of these steps graphical analysis of the threads was initiated. This analysis consisted of selecting one of the samples for a test case and graphically viewing each thread to determine a pattern that can be used to represent the participant?s voice. As described earlier the formation of the threads are directly connected to where peaks materialize or dematerialize as it pertains to previous peak locations. Given the structure of the peak data a thread can have one peak location or any number up to the number of les associated with that sample. Continuing on the same thought process started by looking at the steps that were formulated in just looking at the location values for the individual peaks, it was considered that similar steps occured for this process too. For an illustration of this a graph was created in MATLAB using the \plot" function where the location values for a particular thread were stored as the \y" values and the \x" values corresponded to the number of location values i.e. 1 - (the number of values in that thread). The rst thread?s graph for a test participant had 186 values in it and when graphed gave a stair step graphic, see Figure 7.8. Given the pattern of this graphic it was apparent that this type of graph can be represented by a polynomial that is a mathematical representation of the thread data. This polynomial was found by using the function in MATLAB \poly t" which yields the coe cients for a polynomial of a given order. After some experimentation to nd the order for the polynomial it was determined for this research the third and forth order polynomials 60 Figure 7.8: Graph 1 is of the frequency values of the rst thread of a test sample and graph 2 show the polynomial that ts that step graph 61 was calculated. An example of a forth order polynomial calculated by MATLAB using the test case is displayed below: 0:0000003894X4 0:0001753X3 + 0:029085X2 1:2867X + 320:46 The higher order coe cient (0.0000003894) is representative of the complexity of the sample, whereas the last value (320.46) is representative of the frequency. With this polynomial each participant?s voice can be modeled as it pertains to the frequency domain. After further evaluation the third order polynomial was calculated for each sample as creating a forth order polynomial did not add much to the model but did add to the calculation process. Two tools were used to verify that these polynomial models did give viable information about characteristics of a person and thus allow them to be categorized. The rst was an analysis tool created by Dan Ginzel, an independent software developer. This tool uploads a le where the leading coe cient (in the example 0.0000003894) is multiplied by an integer (20 for this example) giving the result (0.000007788). This value is rounded to the nearest integer towards negative in nity (0.000007788 gives the integer 0) and this value is stored. This integer value (coe cient score) was then used to determine groups as they pertained to the total population. This process is done for the ten longest threads for a sample. These threads were selected as they provide a good representation of the voice and any threads beyond contained to few values to give good quality information. The advantage of this tool is that it gives a percentage break down of the analyzed data as it pertains to the, category group (i.e. gender), individual group (i.e male), as well as the entire sample group (all participants). This gives a insight into how the coe cients represent the groups mentioned. For an example, a le was uploaded and the parameters were selected from a table that was dynamically created. For this experiment the ten longest threads were selected and treated individually. From the table created the coe cient score was selected along with gender and the results can be seen in Table 7.2. In viewing the results it is clear that there are two groups that stand out. One group (0 coe cient score) had a strong showing of females with 68.28% of all females in this group. Another group (-1 coe cient score) had a strong showing of males in it. The information that can come from this is twofold; rst 62 it gives a breakdown of where the strongest percentages of the category types are i.e. male or female and second it gives a method for evaluating the clusters that will be given by the second tool. Table 7.2: The results from the analysis tool showing the percentages as they pertain to male and female in each coe cient score group Total In Percentage Percentage Percentage Category Coe cient Gender Category Individual of total Group Score Group Group Group 1 -6 Male 100.00% 0.15% 0.06% 1 -4 Male 100.00% 0.15% 0.06% 2 -3 Female 100.00% 0.22% 0.13% 17 -2 Female 81.00% 1.83% 1.08% 4 -2 Male 19.00% 0.62% 0.25% 265 -1 Female 45.70% 28.49% 16.77% 315 -1 Male 54.30% 48.46% 19.94% 635 0 Female 65.90% 68.28% 40.19% 328 0 Male 34.10% 50.46% 20.76% 9 1 Female 100.00% 0.97% 0.57% 1 2 Male 100.00% 0.15% 0.06% 2 3 Female 100.00% 0.22% 0.13% The second tool used that gave the most signi cant results as it relates to validating the second hypothesis (that a person can be categorized by their voice), is the Applications QuestTMsoftware developed by Dr. Juan E. Gilbert. This software is a clustering application that takes a tab delimited le with demographic data along with analysis calculations forming clusters using this data to determine which participants are most alike. After initial experimentation with the settings that a user enters i.e. number of clusters preferred or attributes to be used, it was decided that six (6) clusters gave excellent results for this research. The coe cient data mentioned earlier, was part of the data uploaded to Applications QuestTMwith some changes. The average for each set of coe cients of the ten threads that had been selected was calculated. This updated data then was stored along with the demographic data for each sample and stored in a tab delimited le. The following are the attributes that were uploaded into Applications QuestTM; ID, Gender, Ethnicity, 63 State that has e ected voice most, Education, Area they live in, Height, and the three coe cients for the calculated polynomial. Once the le has been uploaded, the next step is to select the attributes that the application will use for clustering. The following shows the results where the attributes gender, ethnicity, and the three coe cient values were used for the nal analysis. The AverageDi erence as talked about in (PUT Section ref) is an indication of how di erent the samples are from each other. Cluster 0 members are more di erent then cluster 4?s members which can be seen in Table 7.3 that shows male and Table 7.3: Clustering results from Applications QuestTMreset to look for samples that are alike rather than di erent. Cluster 0 Ethnicity: White (4), African American (2), Native American (1) Gender: Male (6), Female (1) Cluster 1 Ethnicity: White (34) Gender: Male (17), Female (17) Cluster 2 Ethnicity: White (63) Gender: Female (63) Cluster 3 Ethnicity: White (12), Asian (9), African American (8) Gender: Male (29) Cluster 4 Ethnicity: White (15), Asian (1) Gender: Male (16) Cluster 5 Ethnicity: Native American (9) Gender: Female (9) one female in Cluster 0, but all males in cluster 4. It needs to be stated that the primary use of this software is to form groups that are diverse; however the developer was able to set the program to cluster the samples that are most alike. This was represented by the di erence index which is the average di erence between members of a cluster. So the lower the di erence index the better the cluster representation is [20]. The di erence index for the complete sample set was 28.60% standard deviation 16. It should be noted that all of 64 the cluster?s di erence index are under this value which was an anticipated out come and shows the process is valid. The cluster?s information can be seen in Table 7.3. One can Cluster 0, AverageDi erence = 25.21% Cluster 1, AverageDi erence = 20.87% Cluster 2, AverageDi erence = 9.30% Cluster 3, AverageDi erence = 23.84% Cluster 4, AverageDi erence = 8.76% Cluster 5, AverageDi erence = 13.22% observe that the clusters with the higher di erence index (0, 1, 3) are not as uniform as the lower di erence index clusters (2, 4, 5) which are very distinct. These distinct clustering results and others like them give validity to the approach of establishing threads and calculating the polynomial coe cient to represent the thread pattern for the given voice sample. 7.2 Results A major result from this section was the thread mapping of the frequency peaks. Being able to distinguish when a new peak was formed or an old peak no longer appeared was very important to track the threads as they were created. From these ndings evolved the idea of graphically representing these threads yet using a venue that is purely numerical. This resulted in the calculation of polynomials to represent these threads. Taking the 10 most prominent threads and averaging the coe cients then gave way to having a general representation of the voice and allowed for clustering. The clustering validated that the polynomials did represent the voice and given the coe cient values for an individual, they can be put into a certain group, i.e. gender. 65 7.3 Conclusion It was in this nal phase of experimentation that the strongest results occurred. First the splitting of a single word into multiple 800 byte parts was paramount to getting a nu- meric representation for the voice. From splitting of the word to the thread representation to the creation of polynomials corresponding to the voice, all gave validation to the hy- pothesis set. Upon completion this phase the results from the clustering application showed that hypothesis 2, \The human tone classi cation can be re ned into human classi cations that can pertain to gender, ethnicity and geographical area where their accent was most e ected.", can be accomplished by modeling a person?s voice as a polynomial. 66 Chapter 8 Findings and future work The goal of this research was to con rm the following two hypotheses as they relate to speaker classi cation. H1) The pitch range of the human voice could be used to create a tone classi cation set, such as a low, medium, and high tones. H2) The human tone classi cation could be re ned into human classi cations that could pertain to gender, ethnicity and area where their accent was most e ected. The literature review proved to be the rst obstacle, as there was very little published on the subject matter of speaker classi cation. The two areas of speaker veri cation and speaker identi cation had dominated most e orts in research of this kind. When literature was obtained it was either of the theoretical nature or did not divulge the inner workings of the study attempted. Therefore the primary motivation was the thought that if humans could listen to someone speak and be able to tell certain characteristics about them i.e. that they were male or female; it stood to reason that in some way this could be mathe- matically computer-generated. Given that machines most likely would do this to a lesser degree, the bene ts are still numerous [40]. The results from chapters 4 - 7 document the exact progression and calculations this research has undertaken to obtain a mathematical representation of what a human does naturally. The following summarizes the validation of the fore mentioned hypotheses. The rst hypotheses was quickly validated when the voice sample was converted from the time domain to the frequency domain. In Section 5.1.4 it showed that when the fre- quency data was bounded (250 - 1250 Hz) it could be determined where in that sample the frequency was the strongest. By using the average slope between the prominent peaks of 67 the sample, it could be con rmed when the frequency was stronger at the beginning (neg- ative slope), in the middle (slope approaching 0) or the end (positive slope) of the selected frequency range, review graphs in Figure 5.6. This value clearly indicated that the tone for the sample could be categorized either; high, medium, or low, thus validating the rst hy- pothesis. With the rst hypothesis substantiated, the research progressed to the validation of the second hypothesis. Re ning the development associated with hypothesis one as it pertains to frequency was no inconsequential task. With no previous work to act as a guide experimentation was done in phases. Each phase added to the validation process; however it was the nal phase (Chapter 7) that gave the key to classifying a person. By a series of experiments the frequency of an individual was represented by a polynomial of the third order, refer to Figure 7.8. This polynomial was created by rst establishing a thread that tracked the prominent peaks of a frequency sample. The top ten threads were then selected and a set of polynomial coe cients were calculated for each thread. These ten sets of coe cient values were then averaged and the polynomial that was formed was used for the representation of a person?s voice. Con rmation of this was ascertained by taking these values along with the demographic information for the participants and uploading them to Applications QuestTM, a clustering application. The results obtained gave a clear indication that the polynomial coe cients gave appropriate representation of a person such that they could be put into cluster groups that would indicate gender and ethnicity. With this conclusion hypothesis two was validated in that it was shown that it is possible to re ne the analysis of the voice to give predilection towards a classi cation of an individual. 8.1 Contributions The use of biometrics and voice biometrics in particular are increasing every day [36]. It is the goal of this research to provide, to the area of voice biometrics, validation that, an application can take a voice sample and glean from it information that can be used to enhance the interaction between humans and machines. This could be done by nding 68 characteristics of a person that can be used to classify that person so that more informa- tion is available so the application can better serve the user and the community. This research will not only aid current applications, but could also be expanded into determining other attributes of an individual that will be bene cial to the continuing research of voice applications as they pertain to HCI [40]. 8.2 Future Work There is a great deal of future work planned for this research. The following is a list of planned work. Target data collection such that a more evenly distributed group is available as it pertains to the target attributes. One idea to accomplish this would be to set up in certain areas where a particular participant group can be found, i.e. collecting samples from a senior group at a monthly meeting. Utilize the other parameters established when in the time domain (i.e. amount of pause) into other voice applications Conduct the study under a controlled environment where all participants use the same phone and back ground noise is controlled. Incorporate speech recognition to listen for particular words that may be used by the participant. i.e. \ya?ll" Collect numerous samples from the same participant where they are healthy, sick, have throat problems. To create an application that is fully automated for the processing of the voice samples. Investigate use of classi cation as it pertains to security. 69 Chapter 9 Scholarly Contributions Gilbert, J.E., Cross, E.V., McMillian, Y., Rouse, K., Mkpong-Ru n, I., Gupta, P., & Williams, P. (2007) A Usable Security Approach to Electronic Voting. IEEE Computer. Gilbert, J.E., McMillian, Y., Cross, E.V., Rouse, K., Williams, P., Gupta, P., Rogers, G., McClendon, J., Mkpong-Ru n, I., & Nobles, K. (2007) Multimodal E-Voting with Older Citizens. International Journal of Human-Computer Studies. Williams, A., Rouse, K., Seals, C.D., & Gilbert, J.E. (2007) Enhancing Reading Literacy in Elementary Children using Programming for Scienti c Simulations. International Journal on E-Learning. Cross, E.V., Rogers, G., McClendon, J., Mitchell, W., Rouse, K., Gupta, P., Williams, P., Mkpong-Ru n, I., McMillian, Y., Neely, E., Lane, J., Blunt, H. & Gilbert, J.E. (2007) Prime III: One Machine, One Vote for Everyone. VoComp 2007, Portland, OR, July 16, 2007. Williams, A., Seals, C., Rouse, K., & Gilbert, J. (2006) Visual Programming with Squeak SimBuilder: Techniques for E-Learning in the Creation of Sci- ence Frameworks. In Proceedings of E-Learn 2006 World Conference on E-Learning in Corporate, Government, Healthcare, & Higher Education, CD-ROM. 70 Bibliography [1] ACM SIGGRAPH. (1999). Human-Centered Computing, Online Communities and Vir- tual Environments, Special report Vol.33 No.3. Chateau de Bonas, France: ACM SIG- GRAPH. [2] Applications Quest. (2009). Retrieved March 2009, from Applications Quest, LLC: http://www.applicationsquest.org/ [3] Audacity Home. (2008). Retrieved June 2008, from Audacity: Free Audio Editor and Recorder: http://audacity.sourceforge.net/ [4] Bhattacharyya, S., & Srikanthan, T. (2004). Synthesis Journal. Re- trieved November 2006, from Information Technology Standards Committee: http://www.itsc.org.sg/synthesis/2004/2 Voice.pdf [5] Biometrics 101: Info Biometrics Technology products. (2005). Retrieved November 2006, from Biometrics 101: http://www.biometrics-101.com [6] Biometrics History. (2002). Retrieved 2007, from National Center for State Courts: http://ctl.ncsc.dni.us/biomet%20web/BMHistory.html [7] Biometrics Home Page. (2002). Retrieved November 2006, from National Center for State Courts: http://ctl.ncsc.dni.us/biomet%20web/BMIndex.html [8] Campbell, J. (1997). Speaker Recognition: A Totorial. Preceedings of the IEEE , 85 (9), 1437-1462. [9] Childers, D. (2000). Speech Processing. New York: John Wiley & Sons. [10] Cole, R., & Schwartz, E. (2008). Virginia Tech Multimedia Music Dictionary. Retrieved May, 2009, from http://www.music.vt.edu/musicdictionary/ [11] Cohen, P. R., & Oviatt, S. L. (1995). The Role of Voice Input for Human-Machine Communication. Proceedings of the National Academy of Sciences of the United States of America , 92, 9921-9927. [12] Adobe Audition 2.0. (2009). Retrieved May 2009, from Cool Edit is now Adobe Audi- tion: http://www.adobe.com/special/products/audition/syntrillium.html [13] OldVersion.com. (2009). Retrieved May 2009, from Cool Edit Pro- Download at Old- Version.com: http://www.oldversion.com/Cool-Edit-Pro.html 71 [14] Cooley, J. W., & Tukey, J. W. (1965). An Algorithm for the Machine Calculation of Complex Fourier Series. Mathematics of Computation , 297-301. [15] Duhamel, P., & Vetterli, M. (1990). Fast fourier-transforms - A tutorial review and a state-of the art. Signal Processing , 19 (4), 259 - 299. [16] Dunlap, D. (2005). Automated Identi cation and Data Capture Biometrics Web Site. Retrieved November 2006, from Western Carolina University Web: http://et.wcu.edu/aidc/BioWebPages/Biometrics Voice.html [17] ndBIOMETRICS. (2006). Retrieved November 2006, from ndBIOMETRICS: http://www. ndbiometrics.com/Pages/guide1.html [18] Fry, D. B. (1979). The Physics of Speech. Cambridge: Cambridge University Press. [19] Gilat, A. (2008). MATLAB An Introduction With Applications. Hoboken, NJ: John Wiley & Sons, Inc. [20] Gilbert, J.E. (2006) Applications Quest: Computing Diversity. Communications of the ACM, 49,3, ACM, pp. 99 104. [21] Ginzel, Dan. [Coach Comm.] Personal Interview. 08 August 2008. | Personal Interview. 12 December 2008 | Personal Interview. 23 January 2009 | Personal Interview. 03 March 2009 [22] Graham, J. (2006). Windowing and the DFT. Retrieved June 2009, from Berkeley University of California, web page of Dr. James R. Graham: http://astro.berkeley.edu/ jrg/ngst/ t/window.html [23] Grel, L. (2000). Signal-Processing Techniques to Reduce the Sinusoidal Steady-State Error in the FDTD Method. IEEE Transactions on Antennas and Propagation , 585 - 593. [24] Hanselman, D., & Little eld, B. (2005). Mastering MATLAB 7. Upper Saddle Ridge, NJ: Pearson Education Inc. [25] Harris, F. J. (1978). On the Use of Windows for Harmonic Analysis with the Discrete Fourier Transform. Proceedings Of The IEEE , 66 (1), 51 - 83. [26] Hollien, H. (2002). Forensic Voice Identi cation. San Diego: Academic Press. [27] Independent Biometrics Expertise. (2007). Retrieved April 2007, from International Biometric Group: http://www.biometricgroup.com/reports/public/basic reports.html [28] James, R. C. (1992). Mathematics Dictionary (Fifth ed.). New York: Van Nortrand Reinhold. 72 [29] Jastrow, D. (2007, June 1). The New Fingerprint? Retrieved August 2007, from Speech Technology: http://www.speechtechmag.com/Articles/Editorial/Cover-Story/Voice- The-New-Fingerprint-36320.aspx [30] Klevans, R. (1997). Voice Recognition. New York: Artech House. [31] Markowitz, J. (2007, June 1). Classifying Classi cations. Retrieved July 2007, from Speech Technology Magazine: http://www.speechtechmag.com/Articles/Column Forward-Thinking Classifying- Classi cations-36313.aspx [32] Markowitz, J. (2006). J. Markowitz Consultants. Retrieved April 2007, from J. Markowitz Consultants The Human Side of Computing: http://www.jmarkowitz.com/information.html [33] Markowitz, J. (2007, June 1). SpeechTechMag.com: Classifying Clas- si cations. Retrieved July 2007, from Speech Technology Magazine: http://www.speechtechmag.com/Articles/Column~Forward-Thinking~Classifying- Classi cations-36313.aspx [34] Markowitz, J. (2007). The Many Roles of Speaker Classi cation in Speaker Veri cation and Identi cation. In C. Mller, Speaker Classi cation I: Fundamentals, Features, and Methods (Lecture Notes in Computer Science) (pp. 218 - 225). Berlin / Heidelberg: Springer. [35] Markowitz, J. (2003, November 25). Voice Biometrics - Are You Who You Say You Are? Retrieved Novemeber 2007, from Speech Technol- ogy: http://www.speechtechmag.com/Articles/Editorial Feature~Voice-Biometrics| Are-You-Who-You-Say-You-Are-29621.aspx [36] Markowitz, J. (2000). Voice Biometrics. Communications Of The ACM , 43 (9), 66-73. [37] Martin, H. (1881). The Human Body. New York: Henry Holt and Company. [38] MATLAB Function Reference t. (1984-2007 ). Retrieved 2007, from The MathWorks: http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/ helpdesk/help/techdoc/ref/ t.html & http://www.mathworks.com/cgi-bin/texis/ webinator/search/?db=MSS&prox=page&rorder=750&rprox=750&rdfreq=500& rwfreq=500&rlead=250&sufs=0&order=r&is su [39] Merriam-Webster Inc. (2007). Dictionary. Retrieved 2007, from Merriam-Webster?s On- line Dictionary: http://www.merriam-webster.com/ [40] Metze, F., Englert, R., Bub, U., Burkhardt, F., & Stegmann, J. (2008). Getting closer: tailored humancomputer speech dialog. Universal Access in the Information Society, 8 (2), 97-108. [41] Moreno, P., & Ho, P. (2004). SVM Kernel Adaptation in Speaker Classi cation and Veri cation. INTERSPEECH 2004-ICSLP (pp. 1413-1416). Jeju Island, Korea: IN- TERSPEECH 2004-ICSLP. 73 [42] Muller, C. (2007). Speaker Classi cation I. Berlin / Heidelberg: Springer. [43] MySQL Enterprise. (2008). The world?s most popular open source database. Retrieved 2008, from MySQL:http://www.mysql.com/ [44] Nanavati, S., Thieme, M., & Nanavati, R. (2002). Biometrics Identity Veri cation in a Networked World. New York: John Weily & Sons, Inc. [45] Nass, C., & Brave, S. (2005). Wired for Speech. Cambridge, MA: The MIT Press. [46] National Instruments . (2007). Smoothing Windows for Spectral Leak- age. Retrieved 2007, from National Instruments Developer Zone: http://zone.ni.com/devzone/cda/tut/p/id/4110 [47] Nuance Cafe. (1999-2007). Retrieved 2008, from Nuance Cafe: Supercharge Your Phone!: http://cafe.bevocal.com/index.html [48] Gilbert, J.E., McMillian, Y., Rouse, K., Williams, P., Rogers, G., McClendon, J.,Mitchell, W., Gupta, P., Mkpong-Ru n, I. & Cross, E.V. (2009) Universal Access in e-Voting for the Blind. Universal Access in the Information Society Journal. [49] Palm III, W. J. (2005). A Concise Introduction to MATLAB. New York: McGraw-Hill Higher Education. [50] Pediatric Otolaryngology. (2000). Vocal Cord Paralysis. Retrieved Nove- meber 2007, from Pediatric Otolaryngology: http://www.pediatric- ent.com/learning/problems/vocalcord.htm [51] Roberts, W., & Sabrin, H. (2005). Speaker Classi cation Using Composite Hypothesis Testing and List Decoding. IEEE Transaction on Speech and Audio Processing , 211- 219. [52] Rodman, J. (2007). Judy Rodman - vocal coach, producer, songwriter, recording artist, entertainer, actor, Nashville, Tennessee. http://judyrodman.com [53] Rodman, Judy. Personal Interview. 25 June 2008 [54] Seattle Learning Academy. American English Pronunciation. Retrieved May, 2009, from http://www.pronuncian.com/stress.aspx [55] Sharma, M., & Mammone, R. (1996, May 7-10). Subword-based text-dependent speaker veri cation system withuser-selectable passwords. Retrieved 2006, from IEEE xplore: http://ieeexplore.ieee.org/iel3/3856/11264/00540298.pdf?isnumber=11264&prod=STD &arnumber=540298&arnumber=540298&arSt=93&ared=96+vol.+1&arAuthor= Sharma%2C+M.%3B+Mammone%2C+R. [56] Smith, J. O. (2007). Mathematics of the Discrete Fourier Transform (DFT) with Audio Applications (2 ed.). http://books.w3k.org/: W3K Publishing. 74 [57] Snowball Sampling. (2007). Retrieved November 2008, from Department of Sustainability and Environment: http://www.dse.vic.gov.au/dse/wcmn203.nsf/ linkview/d340630944bb2d51ca25708900062e9838c091705ea81a2fca257091000f8579 [58] Speech Analysis Tutorial. (95). Retrieved Novemeber 2007, from Lund University: http://www.ling.lu.se/research/speechtutorial/tutorial.html [59] The Mathworks. (2004, May). The Mathworks. Retrieved November 2007, from MATLAB 7: https://tagteamdbserver.mathworks.com/ttserverroot/Download/18842 ML 91199v00.pdf [60] The Mathworks. (2004, May). The Mathworks. Retrieved November 2007, from The Mathworks - MATLAB and Simulink for Technical Computing: http://www.mathworks.com/ [61] Thomson Gale. (2005). Biometrics - How biometrics systems work. Retrieved April 2007, from Biometrics: http://www.referencesforbusiness.com [62] Traunmller, H., & Eriksson, A. (1994). The frequency range of the voice fundamental in the speech of male and female adults. Retrieved November 2007, from Stockholms University: http://www.ling.su.se/sta /hartmut/f0 m&f.pdf [63] Cenus Bureau Home Page . U.S. Cenus Bureau. Retrieved November 27, 2007, from http://www.census.gov/geo/www/us regdiv.pdf [64] Volner, R., & Bore, P. (2001, March 2). A Human Classi cation System for Biometric Parameters. Retrieved October 27, 2007, from http://internet.ktu.lt/lt/mokslas/zurnalai/elektr/z62/volner.pdf [65] Weisstein, E. W. (1999). Fast Fourier Transform. Retrieved 2007, from MathWorld{A Wolfram Web Resource: http://mathworld.wolfram.com/FastFourierTransform.html [66] Woodward Jr, J., Orlans, N., & Higgins, P. (2003). Identity Assurance in the Informa- tion Age BIOMETRICS. Berkeley: McGraw-Hill/Osborne. [67] Yudkowsky, M. (2002, November 1). Dr. Dobb?s | Voice Biometrics Application Se- curity | November 1, 2002. Retrieved October 27, 2007, from Dr. Dobb?s Portal, The World of Software Development: http://www.ddj.com/security/184405193 [68] Yun, Y. W. (2003). The ?123? of Biometric Technology. Retrieved 2007, from www.cp.su.ac.th/~rawitat/teaching/forensicit06/course les/ les/biometric.pdf 75 Appendix A Breakdown of demographics Number Of Peaks First Half Gender 4 5 6 7 8 9 10 11 12 13 Female 2 14 22 20 15 8 8 1 Male 1 2 16 13 6 11 6 6 Ethnicity 4 5 6 7 8 9 10 11 12 13 African Am 2 7 1 6 5 4 1 1 Asian 1 4 Hispanic 1 1 1 2 1 White 2 10 14 26 20 8 14 6 4 State 4 5 6 7 8 9 10 11 12 13 AL 2 5 8 17 9 6 6 2 2 CA 1 3 2 1 1 CO 1 DC 1 1 FL 1 1 1 1 3 1 GA 2 1 1 2 1 1 IA 1 3 1 1 1 ID 1 IL 1 1 KY 1 1 LA 1 1 MA 1 76 State 4 5 6 7 8 9 10 11 12 13 MD 2 MI 1 1 1 1 MN 2 2 1 3 MO 1 1 1 MS 1 NC 2 1 1 NE 1 NY 1 OH 1 2 1 OR 1 PA 1 SC 1 TN 1 1 TX 1 3 3 1 1 VA 2 1 2 WA 1 1 77 Number Of Peaks First Half Continued... Education 4 5 6 7 8 9 10 11 12 13 Bachelor 7 7 9 10 7 8 1 1 Grammar 1 1 High School 2 3 4 Master 2 9 11 4 4 6 3 1 MD 1 1 PHD 2 1 3 3 2 1 2 2 Some college 2 3 5 9 6 3 1 2 Vocational 1 Area 4 5 6 7 8 9 10 11 12 13 Rural 1 2 2 2 2 Small Town 1 4 11 13 9 5 3 4 4 Suburb 1 9 7 15 11 6 10 2 2 Urban 1 4 6 6 3 4 1 Height 4 5 6 7 8 9 10 11 12 13 4 to 5 1 2 1 1 5.1 to 5.3 2 6 5 4 1 2 5.4 to 5.6 2 7 8 4 11 5 4 2 1 5.7 to 5.9 6 6 14 9 3 1 1 5.10 to 6.0 2 5 3 1 7 4 3 6.1 to 6.3 5 1 3 3 1 1 6.4 to 6.6 1 6.7 to 6.9 1 6.10 to 7.0 1 78 Number Of Points For The Second Half Gender 4 5 6 7 8 9 10 11 12 13 Female 3 4 11 19 19 13 15 2 5 1 Male 3 6 11 15 20 5 2 Ethnicity 4 5 6 7 8 9 10 11 12 13 African Am 1 1 6 3 7 5 2 2 Asian 2 3 Hispanic 1 1 3 1 White 3 2 11 16 23 20 21 4 4 1 State 4 5 6 7 8 9 10 11 12 13 AL 3 1 5 12 13 9 10 3 11 CA 1 3 3 1 CO 1 DC 1 1 FL 2 3 2 1 GA 1 2 2 1 1 1 IA 1 2 3 1 ID 1 IL 1 1 KY 1 1 79 State 4 5 6 7 8 9 10 11 12 13 LA 1 1 MA 1 MD 1 1 MI 3 1 MN 2 2 3 1 MO 1 2 MS 1 NC 7 9 1 NE 1 NY 1 OH 2 2 OR 1 PA 1 SC 1 TN 1 1 TX 1 1 3 2 1 1 VA 1 1 2 1 WA 1 1 80 Number Of Points For The Second Half Continued... Education 4 5 6 7 8 9 10 11 12 13 Bachelor 1 1 4 8 8 8 13 3 4 Grammar 1 1 High School 1 1 5 1 1 Master 1 1 2 12 12 7 8 1 1 1 MD 1 1 PHD 2 1 3 5 3 2 Some college 1 2 4 3 7 5 6 1 2 Vocational 1 Area 4 5 6 7 8 9 10 11 12 13 Rural 3 3 3 Small Town 1 2 6 11 10 10 6 4 3 1 Suburb 1 2 5 6 14 13 17 2 1 1 Urban 1 2 5 3 4 7 1 1 1 Height 4 5 6 7 8 9 10 11 12 13 4 to 5 1 1 1 1 1 5.1 to 5.3 1 10 2 2 2 1 2 5.4 to 5.6 2 2 6 6 8 5 12 1 2 5.7 to 5.9 2 4 7 11 10 3 2 1 5.10 to 6.0 1 2 1 3 5 9 3 1 6.1 to 6.3 1 5 3 4 1 6.4 to 6.6 1 6.7 to 6.9 1 6.10 to 7.0 1 81 Slope Distance Line Gender Female -0.99365 to -0.00651 Female 0.00085 to 0.61721 Male -0.49033 to -0.00286 Male 0.00308 to 0.66138 Ethnicity African Am Neg -0.59546 to -0.00651 African Am Pos 0.00308 to 0.66138 Asian Neg -0.26139 to -0.15716 Asian Pos 0.0885 to 0.19448 Hispanic Neg -0.20888 to -0.03503 Hispanic Pos 0.00433 to 0.13985 White Neg -0.99365 to -0.00286 White Pos 0.00085 to 0.61721 82 Slope Distance Line Continued... State AL -0.74824 to -0.00286 AL 0.00085 to 0.42001 CA -0.40647 to -0.0089 CA 0.05684 CO 0.16646 DC -0.23638 and 0.38993 FL -0.49033 to -0.00651 FL 0.10738 to 0.66138 GA -0.09982 to -0.02608 GA 0.03778 to 0.58616 IA -0.29616 to -0.03173 IA 0.00935 to 0.61721 ID -0.46797 IL -0.40137 and 0.56757 KY -0.12478 to -0.0204 LA -0.28038 and 0.01232 MA 0.04451 MD -0.19547 and 0.00411 MI -0.19405 to -0.04471 MI 0.133425 MN -0.41642 to -0.04392 MN 0.0197 to 0.25693 MO -0.05815 MO 0.10037 to 0.26848 MS -0.16467 NC -0.59546 to -0.10757 NC 0.13985 to 0.16893 NE 0.12225 83 NY -0.08948 OH -0.19234 to -0.03795 OH 0.00308 OR -0.20888 PA -0.79311 SC 0.06591 TN -0.09841 to -0.04037 TX -0.99365 to -0.0147 TX 0.00798 to 0.31808 VA -0.11912 to -0.01772 VA 0.01027 to 0.19143 WA -0.32368 and 0.01558 84 Slope Distance Line Continued... Education Bachelor -0.89478 to -0.00366 Bachelor 0.00433 to 0.66138 Grammar 0.00935 to 0.02014 High School -0.22005 to -0.00651 High School 0.00085 to 0.32173 Master -0.79311 to -0.0094 Master 0.00411 to 0.58616 MD -0.40137 and 0.56757 PHD -0.49033 to -0.0204 PHD 0.00308 to 0.40287 Some college -0.99365 to -0.00286 Some college 0.01027 to 0.25693 Vocational 0.06534 Area Rural -0.40137 to -0.00286 Rural 0.01667 to 0.32173 Small Town -0.74824 to -0.00366 Small Town 0.00308 to 0.66138 Suburb -0.99365 to -0.00663 Suburb 0.00411 to 0.38993 Urban -0.46797 to -0.01772 Urban 0.00085 to 0.61721 85 Height 4.0-5.0 -0.99365 to -0.02578 4.0-5.0 0.04451 5.1 to 5.3 -0.59546 to -0.04471 5.1 to 5.3 0.00433 to 0.56757 5.4 to 5.6 -0.89478 to -0.00663 5.4 to 5.6 0.00085 to 0.66138 5.7 to 5.9 -0.79311 to -0.00651 5.7 to 5.9 0.00308 to 0.61721 5.10 to 6.0 -0.44787 to -0.00286 5.10 to 6.0 0.00935 to 0.31808 6.1 to 6.3 -0.46797 to -0.04863 6.1 to 6.3 0.01558 to 0.26175 6.4 to 6.6 -0.00366 6.7 to 6.9 -0.2121 6.10 to 7.0 -0.06889 86 Second Half Peak Gender 1 2 3 4 5 6 7 8 9 10 11 12 13 Female 13 20 13 16 14 7 3 6 3 1 1 Male 3 12 11 4 7 9 5 7 3 Ethnicity African Am 3 3 6 6 1 1 6 1 Asian 1 1 1 2 Hispanic 2 1 1 1 1 White 11 24 14 13 10 15 6 7 2 1 1 87 State 1 2 3 4 5 6 7 8 9 10 11 12 13 AL 4 13 7 13 5 6 2 5 2 CA 1 3 3 1 CO 1 DC 1 1 FL 2 1 2 1 1 1 GA 2 1 2 1 1 1 IA 2 2 1 1 1 ID 1 IL 1 1 KY 1 1 LA 1 1 MA 1 MD 1 1 MI 1 1 1 1 MN 1 2 2 1 1 1 MO 1 1 1 MS 1 NC 1 2 1 NE 1 NY 1 OH 1 1 1 1 OR 1 PA 1 SC 1 TN 1 1 TX 1 1 2 1 1 1 1 1 VA 2 1 1 1 WA 1 1 88 Second Half Peak Continued... Education 1 2 3 4 5 6 7 8 9 10 11 12 13 Bachelor 1 12 7 8 6 5 2 4 4 1 Grammar 1 1 High School 1 2 5 1 Master 8 11 3 5 1 4 2 5 1 MD 1 1 PHD 1 3 4 3 2 1 2 Some college 5 3 5 5 3 5 3 1 1 Vocational 1 Area 1 2 3 4 5 6 7 8 9 10 11 12 13 Rural 1 3 2 1 1 1 Small Town 5 12 13 7 4 7 3 5 2 Suburb 5 16 9 8 8 6 1 5 4 1 Urban 6 3 3 3 2 2 3 2 1 Height 1 2 3 4 5 6 7 8 9 10 11 12 13 4.0-5.0 1 1 1 1 1 5.1-5.3 4 4 6 2 2 1 1 5.4 to 5.6 13 9 8 7 2 4 4 1 5.7 to 5.9 4 11 6 4 4 5 3 2 1 5.10 to 6.0 4 5 2 2 5 2 2 3 6.1 to 6.3 2 4 3 1 1 3 6.4 to 6.6 1 6.7 to 6.9 1 6.10 to 7.0 1 89 First Half Peak Gender 1 2 3 4 5 6 7 8 9 10 11 Female 14 16 17 12 10 5 9 2 3 2 Male 4 10 9 5 7 8 5 6 3 1 3 Ethnicity 1 2 3 4 5 6 7 8 9 10 11 African Am 3 2 4 4 5 4 2 3 Asian 1 2 1 1 Hispanic 1 1 1 1 1 1 White 11 20 18 12 9 11 9 5 3 3 3 90 State 1 2 3 4 5 6 7 8 9 10 11 AL 6 9 7 10 7 6 4 3 2 1 2 CA 3 4 1 CO 1 DC 1 1 FL 1 2 1 1 2 1 GA 3 1 1 2 1 IA 1 1 1 1 1 1 ID 1 IL 1 1 KY 1 1 LA 1 1 MA 1 MD 1 1 MI 1 2 1 MN 1 3 2 2 MO 1 1 1 MS 1 NC 2 1 1 NE 1 NY 1 OH 1 1 1 1 OR 1 PA 1 SC 1 TN 1 1 TX 1 2 4 1 1 VA 2 1 1 1 WA 1 1 91 First Half Peak Continued... Education 1 2 3 4 5 6 7 8 9 10 11 Bachelor 4 8 9 5 8 5 5 2 2 2 Grammar 1 1 High School 1 3 2 3 Master 10 5 4 4 4 4 4 1 2 1 1 MD 1 1 PHD 3 5 3 2 1 1 1 Some college 2 7 6 7 2 5 2 1 1 Vocational 1 Area 1 2 3 4 5 6 7 8 9 10 11 Rural 1 1 3 1 2 1 Small Town 7 9 12 3 5 5 4 4 3 2 Suburb 5 13 11 9 8 5 5 2 2 2 1 Urban 5 6 2 2 3 1 5 1 1 1 Height 1 2 3 4 5 6 7 8 9 10 11 4.0-5.0 1 2 1 1 5.1-5.3 2 2 3 3 4 3 2 1 5.4-5.6 9 6 12 6 3 6 2 5.7-5.9 3 12 5 5 3 5 4 1 2 5.10-6.0 1 2 4 2 3 4 1 4 1 1 2 6.1-6.3 3 3 2 1 1 2 1 1 6.4-6.6 1 6.7-6.9 1 6.10-7.0 1 92 Shortest Distance Between Two Peaks Gender Female Range 20.83334 to 29.36933 Male Range 20.83342 to 24.97766 Ethnicity African Am 20.83343 to 24.97766 Asian 20.91476 to 21.5333 Hispanic 20.83353 to 21.28297 White 20.83334 to 29.36933 93 State AL 20.83334 to 26.0197 CA 20.83416 to 22.48856 CO number too high DC 21.33177 to 22.36116 FL 20.83479 to 24.97766 GA 20.84042 to 24.14853 IA 20.83424 to 24.48203 ID 23.00167 IL 22.44879 to 23.95501 KY 20.83767 to 20.99489 LA 20.83491 to 21.6367 MA 20.85396 MD 20.83351 to 21.22761 MI 20.85415 to 21.22195 MN 20.83737 to 22.56749 MO 20.86852 to 21.5711 MS 21.1139 NC 20.95352 to 24.24713 NE 20.98844 NY 27.95571 OH 20.83343 to 21.21518 OR 21.28297 PA 26.59026 SC 20.87853 TN 20.8503 to 20.93396 TX 20.834 to 29.36933 VA 20.83443 to 21.21162 WA 20.83586 to 21.8975 94 Shorest Distance Between Two Peaks Continued... Education Bachelor 20.83347 to 27.95571 Grammar 20.83424 to 20.83756 High School 20.83334 to 21.885 Master 20.83351 to 26.59026 MD 22.44879 to 23.95501 PHD 20.83343 to 23.20301 Some college 20.83342 to 29.36933 Vocational 20.87776 Area Rural 20.83342 to 22.44879 Small Town 20.83343 to 26.0197 Suburb 20.83351 to 29.36933 Urban 20.83334 to 24.48203 Height 4.0-5.0 20.84026 to 29.36933 5.1-5.3 20.83353 to 24.24713 5.4 to 5.6 20.83334 to 27.95571 5.7 to 5.9 20.83343 to 26.59026 5.10 to 6.0 20.83342 to 22.82738 6.1 to 6.3 20.83586 to 23.00167 6.4 to 6.6 20.83347 6.7 to 6.9 21.2968 6.10 to 7.0 20.8827 95 Another Second Half Peak Gender 1 2 3 4 5 6 7 8 9 10 11 Female 11 14 11 14 14 6 8 4 3 1 2 Male 10 7 11 3 6 5 10 7 1 1 Ethnicity 1 2 3 4 5 6 7 8 9 10 11 African Am 3 5 3 2 3 5 2 2 1 Asian 1 3 1 Hispanic 1 1 2 1 1 White 17 18 13 10 15 8 10 9 1 1 96 State 1 2 3 4 5 6 7 8 9 10 11 AL 5 7 10 9 10 3 5 3 2 2 CA 1 2 2 1 2 CO 1 DC 1 1 FL 1 1 1 1 1 2 1 GA 2 1 2 1 2 IA 1 2 1 1 2 ID 1 IL 1 1 KY 1 1 LA 1 1 MA 1 MD 1 1 MI 1 1 1 1 MN 2 2 3 1 MO 1 1 1 MS 1 NC 1 1 1 1 NE 1 NY 1 OH 1 1 2 OR 1 PA 1 SC 1 TN 1 1 TX 2 1 1 1 2 1 1 VA 3 1 1 WA 1 1 97 Another Second Half Continued... Education 1 2 3 4 5 6 7 8 9 10 11 Bachelor 8 8 8 7 9 4 1 1 2 1 1 Grammar 1 1 High School 2 1 4 1 1 Master 6 3 5 6 2 4 9 3 1 1 MD 1 1 PHD 3 2 1 1 2 4 3 Some college 2 6 4 4 5 1 3 2 1 1 Vocational 1 Area 1 2 3 4 5 6 7 8 9 10 11 Rural 2 3 1 1 1 1 Small Town 8 5 8 5 7 4 8 7 2 1 1 Suburb 18 11 9 10 9 6 5 2 2 1 Urban 3 2 4 3 2 1 5 3 1 Height 1 2 3 4 5 6 7 8 9 10 11 4.0-5.0 1 1 1 1 1 5.1-5.3 2 3 2 6 2 3 1 1 5.4-5.6 7 4 8 8 7 1 4 1 2 1 1 5.7-5.9 5 9 7 4 3 3 6 3 5.10-6.0 4 3 2 1 6 1 5 2 1 6.1-6.3 2 1 3 1 1 2 4 6.4-6.6 1 6.7-6.9 1 6.10-7.0 1 98 Another First Half Peak Gender 1 2 3 4 5 6 7 8 9 10 11 Female 10 14 16 13 10 9 7 6 3 2 Male 12 7 7 2 6 4 9 7 2 3 Ethnicity 1 2 3 4 5 6 7 8 9 10 11 African Am 3 3 4 2 4 4 3 3 1 Asian 1 1 2 1 Hispanic 1 1 1 1 1 1 White 17 16 18 8 13 8 9 9 1 1 3 99 State 1 2 3 4 5 6 7 8 9 10 11 AL 6 5 12 5 8 4 8 4 2 2 CA 3 2 1 1 1 CO 1 DC 1 1 FL 2 2 1 1 1 1 GA 3 1 2 1 1 IA 1 2 1 1 1 1 ID 1 IL 1 1 KY 1 1 LA 1 1 MA 1 MD 1 1 MI 1 1 1 1 MN 2 3 1 1 1 MO 1 2 MS 1 NC 1 1 1 1 NE 1 NY 1 OH 1 1 1 1 OR 1 PA 1 SC 1 TN 1 1 TX 2 1 2 3 1 VA 4 1 WA 1 1 100 Another First Half Peak Continued.. Education 1 2 3 4 5 6 7 8 9 10 11 Bachelor 8 9 10 6 8 3 1 2 3 Grammar 1 1 High School 1 3 3 1 1 Master 7 1 5 4 2 6 5 6 2 2 MD 1 1 PHD 3 1 2 1 2 4 1 1 1 Some college 2 7 3 4 4 2 5 2 1 Vocational 1 Area 1 2 3 4 5 6 7 8 9 10 11 Rural 2 2 3 1 1 Small Town 7 7 9 2 7 4 6 4 4 1 2 Suburb 10 10 8 10 4 7 6 5 1 1 Urban 3 2 3 3 5 1 4 3 1 Height 1 2 3 4 5 6 7 8 9 10 11 4.0-5.0 1 1 1 1 1 5.1-5.3 1 3 6 2 3 1 3 1 5.4-5.6 8 4 8 6 4 5 2 3 2 2 5.7-5.9 4 10 6 6 2 2 5 3 2 5.10-6.0 6 1 1 6 3 3 3 2 6.1-6.3 2 2 2 1 1 1 3 1 6.4-6.6 1 6.7-6.9 1 6.10-7.0 1 101 Second Half Number Of Direction Changes Gender 0 1 2 3 4 5 6 7 8 9 Female 2 4 12 15 20 18 6 8 2 3 Male 4 6 16 16 11 13 8 2 1 Ethnicity 0 1 2 3 4 5 6 7 8 9 African Am 5 6 6 4 4 1 1 Asian 1 1 1 1 1 Hispanic 2 1 2 1 White 1 7 11 20 24 24 8 6 1 2 102 State 0 1 2 3 4 5 6 7 8 9 AL 1 5 9 12 9 11 4 2 1 3 CA 4 1 2 1 CO 1 DC 1 1 FL 1 1 2 1 2 1 GA 1 2 2 1 1 1 IA 2 2 1 2 ID 1 IL 1 1 KY 1 1 LA 2 MA 1 MD 1 1 MI 1 1 1 1 MN 2 2 2 1 1 MO 1 2 MS 1 NC 1 3 NE 1 NY 1 OH 1 1 2 OR 1 PA 1 SC 1 TN 1 1 TX 1 1 3 1 2 1 VA 1 4 WA 1 1 103 Second Half Number Of Direction Changes Continued... Education 0 1 2 3 4 5 6 7 8 9 Bachelor 4 3 12 9 8 5 5 2 2 Grammar 1 1 High School 1 2 2 3 1 Master 2 10 7 7 6 4 2 2 MD 1 1 PHD 1 2 4 2 4 3 Some college 2 1 1 5 10 9 2 1 Vocational 1 Area 0 1 2 3 4 5 6 7 8 9 Rural 1 3 2 3 Small Town 1 3 9 13 11 9 6 1 1 Suburb 1 2 7 12 10 14 7 7 2 1 Urban 2 2 3 8 5 1 2 2 Height 0 1 2 3 4 5 6 7 8 9 4.0-5.0 1 1 2 1 5.1-5.3 2 4 7 3 2 2 5.4-5.6 1 2 7 10 6 8 6 3 1 5.7-5.9 1 3 4 8 7 10 3 3 1 5.10-6.0 1 3 5 6 6 2 1 6.1-6.3 2 2 3 3 1 1 1 1 6.4-6.6 1 6.7-6.9 1 6.10-7.0 1 104 First Number Of Direction Changes Gender 1 2 3 4 5 6 7 8 9 Female 2 12 19 17 23 7 8 1 1 Male 2 10 5 12 16 12 4 Ethnicity 1 2 3 4 5 6 7 8 9 African Am 7 2 7 8 3 1 Asian 2 1 1 1 Hispanic 1 1 1 1 2 White 3 11 18 21 28 11 10 1 1 105 State 1 2 3 4 5 6 7 8 9 AL 3 11 7 13 11 8 4 CA 2 2 1 1 1 1 CO 1 DC 1 1 FL 1 1 1 1 2 1 1 GA 1 2 2 2 1 IA 2 3 2 ID 1 IL 1 1 KY 1 1 LA 1 1 MA 1 MD 1 1 MI 1 1 2 MN 2 3 2 1 MO 1 1 1 MS 1 NC 1 1 2 NE 1 NY 1 OH 1 2 1 OR 1 PA 1 SC 1 TN 1 1 TX 1 2 1 2 2 1 VA 1 1 2 1 WA 1 1 106 First Half Number Of Direction Changes Continued... Education 1 2 3 4 5 6 7 8 9 Bachelor 1 4 12 9 11 7 6 1 Grammar 1 1 High School 1 1 1 1 3 2 Master 8 4 10 10 5 3 MD 1 1 PHD 1 3 2 5 3 2 Some college 1 4 6 3 10 3 3 1 Vocational 1 Area 1 2 3 4 5 6 7 8 9 Rural 1 2 2 2 2 Small Town 2 8 10 10 9 7 6 1 Suburb 2 8 9 13 19 5 5 1 Urban 3 3 3 9 5 1 Height 1 2 3 4 5 6 7 8 9 4.0-5.0 1 1 2 1 5.1-5.3 3 3 6 4 2 1 1 5.4-5.6 2 5 7 6 9 4 6 5.7-5.9 1 6 9 8 10 5 1 5.10-6.0 1 4 3 3 6 4 4 6.1-6.3 2 6 4 2 6.4-6.6 1 6.7-6.9 1 6.10-7.0 1 107 Total Second X Distance Gender Female 625 to 979.16667 Male 687.5 to 979.16667 Ethnicity African Am 645.83333 to 958.33333 Asian 875 to 937.5 Hispanic 875 to 937.5 White 625 to 979.16667 108 State AL 625 to 979.16667 CA 729.16667 to 937.5 CO 812.5 DC 645.83333 to 937.5 FL 791.66667 to 937.5 GA 687.5 to 895.83333 IA 729.16667 to 895.83333 ID 958.33333 IL 708.33333 to 916.66667 KY 812.5 to 958.33333 LA 833.33333 to 916.66667 MA 833.33333 MD 708.33333 to 770.83333 MI 895.83333 to 937.5 MN 770.83333 to 958.33333 MO 812.5 to 937.5 MS 750 NC 812.5 to 895.83333 NE 833.33333 NY 895.83333 OH 854.16667 to 895.83333 OR too low PA 812.5 SC 812.5 TN 729.16667 to 854.16667 TX 708.33333 to 958.33333 VA 833.33333 to 979.16667 WA 770.83333 to 916.66667 109 Total Second X Distance Continued... Education Bachelor 625 to 979.16667 Grammar 875 to 895.83333 High School 625 to 937.5 Master 708.33333 to 979.16667 MD 708.33333 to 916.66667 PHD 791.66667 to 937.5 Some college 645.83333 to 979.16667 Vocational 895.83333 Area Rural 687.5 to 937.5 Small Town 625 to 937.5 Suburb 645.83333 to 979.16667 Urban 625 to 979.16667 Height 4.0-5.0 812.5 to 937.5 5.1-5.3 687.5 to 958.33333 5.4-5.6 625 to 979.16667 5.7-5.9 708.33333 to 979.16667 5.10-6.0 625 to 937.5 6.1-6.3 750 to 979.16667 6.4-6.6 854.16667 6.7-6.9 875 6.10-7.0 916.66667 110 Average Second Slope Gender Female -0.09203 to -0.00038 Female 0.00075 to 0.06744 Male -0.04657 to -0.00139 Male 0.00142 to 0.02074 Ethnicity African Am -0.04718 to -0.00252 African Am 0.00142 to 0.01328 Asian -0.03222 to -0.01302 Hispanic -0.03626 to -0.01104 Hispanic 0.00496 White -0.09203 to -0.00038 White 0.00075 to 0.06744 111 State AL -0.0599 to -0.00247 AL 0.00077 to 0.03527 CA -0.03874 to -0.02068 CA 0.00637 to 0.03682 CO -0.00361 DC -0.04232 to -0.02347 FL -0.04211 to -0.00038 FL 0.00933 GA -0.03888 to -0.00724 GA 0.00222 to 0.02074 IA -0.03711 to -0.01016 IA 0.02933 to 0.06744 ID -0.01791 IL -0.0273 to -0.00265 KY -0.03072 and 0.01225 LA -0.02314 to -0.00319 MA -0.01701 MD -0.02163 to -0.00692 MI -0.03305 to -0.00253 MN -0.04381 to -0.01107 MN 0.00238 MO -0.04718 MO 0.00075 to 0.03882 MS -0.01732 NC -0.02475 to -0.00139 NC 0.00496 NE -0.01892 NY -0.0396 OH -0.03398 to -0.0037 OR -0.01355 PA -0.04362 SC 0.01177 TN -0.01908 to -0.00572 TX -0.0845 to -0.01101 TX 0.01218 to 0.01369 VA -0.09203 to -0.01039 VA 0.00855 WA -0.02758 to -0.01922 112 Average Second Slope Continued... Education Bachelor -0.09203 to -0.00263 Bachelor 0.00075 to 0.03682 Grammar -0.01016 and 0.06744 High School -0.03305 to -0.00572 High School 0.00933 to 0.02933 Master -0.04718 to -0.00038 Master 0.00295 to 0.03882 MD -0.0273 to -0.00265 PHD -0.04163 to -0.0037 Some college -0.0845 to -0.00139 Some college 0.00077 to 0.01218 Vocational -0.02185 Area Rural -0.04657 to -0.00265 Rural 0.00077 Small Town -0.09203 to -0.00038 Small Town 0.00142 to 0.03682 Suburb -0.0845 to -0.00139 Suburb 0.00075 to 0.06744 Urban -0.04718 to -0.00247 Urban 0.00738 to 0.02933 113 Height 4.0-5.0 -0.0845 to -0.01101 4.0-5.0 0.01113 5.1-5.3 -0.04649 to -0.0149 5.1-5.3 0.00077 to 0.03682 5.4-5.6 -0.09203 to -0.00038 5.4-5.6 0.00238 to 0.03513 5.7-5.9 -0.0599 to -0.00139 5.7-5.9 0.00222 to 0.06744 5.10-6.0 -0.03745 to -0.00253 5.10-6.0 0.00075 to 0.03527 6.1-6.3 -0.03548 to -0.00265 6.4-6.6 -0.00263 6.7-6.9 -0.01488 6.10-7.0 -0.00816 114 Average Second Y Distance Gender Female 0.76482 to 20.11604 Male 1.06771 to 9.62666 Ethnicity African Am 1.53423 to 9.16597 Asian 1.81319 to 8.46621 Hispanic 1.88871 to 6.26574 White 0.76482 to 20.11604 115 State AL 0.76482 to 18.98124 CA 2.19067 to 10.75513 CO 4.46694 DC 1.70472 to 6.59579 FL 3.20572 to 9.62666 GA 3.85139 to 7.79519 IA 1.06771 to 7.37868 ID 2.83795 IL 3.09743 to 6.44515 KY 4.94241 to 7.51344 LA 1.68953 to 2.76435 MA 3.27227 MD 5.3831 to 9.16597 MI 2.11142 to 6.26574 MN 1.76756 to 7.2374 MO 5.13771 to 8.50846 MS 6.75352 NC 5.08627 to 8.36667 NE 2.02068 NY 9.16335 OH 1.53423 to 5.33724 OR 1.88871 PA 6.88837 SC 7.37903 TN 3.22839 TX 2.63324 to 11.10742 VA 1.76285 to 5.2077 WA 3.10199 to 6.96012 116 Average Second Y Distance Continued... Education Bachelor 1.68953 to 18.98124 Grammar 1.06771 High School 3.22839 to 11.40787 Master 0.76482 to 15.10384 MD 3.09743 to 6.44515 PHD 1.53423 to 5.33724 Some college 1.30732 to 16.67643 Vocational 3.93314 Area Rural 2.48434 to 12.7076 Small Town 1.09057 to 17.44381 Suburb 1.30732 to 18.98124 Urban 0.76482 to 11.10742 Height 4.0-5.0 3.27227 to 5.97455 5.1-5.3 2.48434 to 16.67643 5.4-5.6 1.30732 to 16.62409 5.7-5.9 1.53423 to 20.11604 5.10-6.0 1.06771 to 8.46621 6.1-6.3 0.76482 to 7.2374 6.4-6.6 5.65771 6.7-6.9 8.42994 6.10-7.0 2.82452 117 Total Second Y Distance Gender Female 9.22929 to 125.33082 Male 14.6623 to 102.74155 Ethnicity African Am 31.21441 to 73.10173 Asian 21.76651 to 26.58681 Hispanic 36.28649 to 39.22791 White 9.22929 to 84.39188 118 State AL 15.0501 to 83.69464 CA 21.27234 to 36.28649 CO 21.51626 DC 40.44849 FL 64.77386 to 73.10173 GA 58.55602 to 63.5027 IA 32.68834 to 38.24517 ID 21.63775 IL 9.22929 KY 27.36293 LA 20.66605 MA 74.36435 MD 31.21441 to 31.91552 MN 58.65523 to 61.37713 MO 75.72876 to 81.3068 MS 25.6336 NE 14.6623 NY 14.52795 OH 41.07338 to 42.37182 OR 58.74272 PA 58.8329 TN 35.76295 TX 40.51476 to 43.82436 WA 32.10158 119 Total Second Y Distance Continued... Education Bachelor 20.66605 to 83.69464 Grammar 63.10697 and 125.33082 High School 35.76295 to 38.24517 Master 15.0501 to 82.374 MD 9.22929 and 42.43057 PHD 32.68834 to 48.42309 Some college 20.67707 to 74.3866 Vocational 161.03839 Area Rural 22.50552 to 24.90808 Small Town 18.81512 to 62.91221 Suburb 14.52795 to 84.39188 Urban 9.22929 to 58.65074 Height 4.0-5.0 72.65738 to 74.36435 5.1-5.3 9.22929 to 63.5027 5.4-5.6 20.66605 to 84.30005 5.7-5.9 20.67707 to 75.72876 5.10-6.0 24.90808 to 59.23719 6.1-6.3 21.63775 to 32.68834 6.4-6.6 59.91936 6.7-6.9 140.9927 6.10-7.0 27.18616 120 Average Second X Distance Gender Female 78.18906 to 182.41839 Male 76.49549 to 142.53808 Ethnicity African Am 76.49549 to 142.83182 Asian 97.96869 to 102.7372 Hispanic 101.67403 to 104.70597 White 84.47018 to 180.62882 121 State AL 83.52587 to 180.62882 CA 93.86788 to 97.54918 CO 101.84681 DC 92.63231 and 117.7726 FL 101.67403 to 135.86438 GA 98.65994 to 102.18956 IA 104.38275 to 112.15444 ID 119.96576 IL 131.11741 and 142.0346 KY 87.8971 and 162.67324 LA 92.7988 and 131.01533 MA 166.78481 MD 142.83182 and 192.91273 MI 101.20673 to 105.40881 MN 99.06263 to 100.07749 MO 95.36461 and 104.98533 MS 108.14805 and 137.36484 NC 112.75407 to 113.76038 NE 104.2441 NY 129.1797 OH 95.10007 to 97.40779 OR 108.4114 PA 116.5901 SC 136.18319 TN 122.46823 and 170.94108 TX 87.69269 to 91.1281 VA 122.15614 to 123.55207 WA 101.9961 and 128.97889 122 Second X Distance Continued... Education Bachelor 83.52587 to 179.62909 Grammar 99.57493 and 112.15444 High School 104.38275 to 125.98373 Master 94.0926 to 156.35884 MD 131.11741 and 142.0346 PHD 87.69269 to 134.07075 Some college 93.06539 to 128.6602 Vocational 99.70921 Area Rural 99.68339 to 132.10784 Small Town 76.49549 to 171.02154 Suburb 85.73891 to 142.83182 Urban 89.31884 to 123.55207 Height 4.0-5.0 106.30586 to 108.39525 5.1-5.3 91.1281 to 143.74624 5.4-5.6 84.47018 to 142.83182 5.7-5.9 87.88233 to 137.36484 5.10-6.0 93.06539 to 125.15975 6.1-6.3 100.07749 to 131.11741 6.4-6.6 107.47803 6.7-6.9 97.96869 6.10-7.0 102.01642 123 Average First Slope Gender Female -0.06508 to -0.00009 Female 0.00085 to 0.06542 Male -0.03581 to -0.00002 Male 0.00069 to 0.03931 Ethnicity African Am -0.03077 to -0.00002 African Am 0.00172 to 0.02396 Asian -0.02027 to -0.00341 Asian 0.02443 Hispanic -0.01544 to -0.0013 Hispanic 0.00456 to 0.02385 White -0.06508 to -0.00009 White 0.00069 to 0.06542 124 Average First Slope Continued... State AL -0.06508 to -0.00064 AL 0.00069 to 0.05792 CA -0.03278 to -0.00017 CA 0.01528 to 0.03931 CO -0.0201 DC -0.01278 to -0.00919 FL -0.01494 to -0.00165 FL 0.00098 to 0.00172 GA -0.02813 to -0.00821 GA 0.00271 IA -0.02583 to -0.00276 IA 0.00086 to 0.03344 ID -0.00708 IL -0.01453 to -0.0063 KY -0.01113 to -0.00061 LA -0.01003 to -0.00009 MA 0.0454 125 MD 0.00816 to 0.01769 MI -0.00002 MI 0.01623 to 0.03491 MN -0.01046 to -0.00175 MN 0.0096 to 0.01316 MO 0.00184 to 0.01882 MS 0.00305 NC -0.03077 to -0.00488 NC 0.00456 to 0.00705 NE 0.02295 NY -0.03553 OH -0.02016 to -0.00147 OH 0.0067 OR -0.0013 PA -0.01717 SC -0.01212 TN 0.00085 to 0.01455 TX -0.05697 to -0.00385 TX 0.01909 to 0.02585 VA -0.03581 to -0.00131 VA 0.02055 WA -0.02093 and 0.06542 126 Average First Slope Continued... Education Bachelor -0.05697 to -0.00009 Bachelor 0.00069 to 0.06542 Grammar -0.0243 to -0.00525 High School -0.02119 to -0.00276 High School 0.00085 to 0.03491 Master -0.02417 to -0.00064 Master 0.00184 to 0.05792 MD -0.01453 to -0.0063 PHD -0.03581 to -0.00147 PHD 0.00098 to 0.02443 Some college -0.06508 to -0.00002 Some college 0.0027 to 0.05175 Vocational -0.02403 Area Rural -0.03097 to -0.00341 Rural 0.01476 to 0.03491 Small Town -0.06508 to -0.00002 Small Town 0.00085 to 0.06542 Suburb -0.05697 to -0.00017 Suburb 0.00069 to 0.04558 Urban -0.02583 to -0.00147 Urban 0.00086 to 0.03344 127 Height 4.0-5.0 -0.02583 4.0-5.0 0.02501 to 0.0454 5.1-5.3 -0.03278 to -0.00009 5.1-5.3 0.00271 to 0.06542 5.4-5.6 -0.06508 to -0.00017 5.4-5.6 0.00085 to 0.02385 5.7-5.9 -0.05697 to -0.00276 5.7-5.9 0.00086 to 0.05792 5.10-6.0 -0.03581 to -0.00002 5.10-6.0 0.00098 to 0.0194 6.1-6.3 -0.02093 to -0.00536 6.1-6.3 0.00069 to 0.01094 6.4-6.6 -0.00443 6.7-6.9 -0.01882 6.10-7.0 -0.00476 128 Average Slope Between Points Gender Female -0.027925 to -0.000141 Female 0.000187 to 0.066688 Male -0.026729 to -0.000343 Male 0.000795 to 0.013462 Ethnicity African Am -0.017488 to -0.000981 African Am 0.002743 to 0.012799 Asian -0.017617 to -0.002208 Hispanic -0.020726 to -0.002527 Hispanic 0.008018 to White -0.027925 to -0.000141 White 0.000187 to 0.066688 129 Average Slope Between Points Continued... State AL -0.026729 to -0.000141 AL 0.000215 to 0.008716 CA -0.020726 to -0.006276 CA 0.000187 to 0.013462 CO -0.013631 DC -0.012747 to -0.012487 FL -0.01753 to -0.002527 GA -0.027925 to -0.003762 GA 0.007189 IA -0.021166 to -0.003804 ID -0.007681 IL -0.015346 and 0.002814 KY -0.015761 to -0.015108 LA -0.005995 to -0.003574 MA -0.010597 MD -0.009174 and 0.012799 MI -0.022877 to -0.003417 MI 0.008018 MN -0.019799 to -0.001587 130 MO -0.005814 to -0.001373 MS -0.010884 NC -0.009676 to -0.005234 NC 0.002364 NE 0.000795 NY -0.00867 OH -0.012849 to -0.002632 OR -0.013285 PA -0.0158 SC -0.000981 TN -0.002764 and 0.003036 TX -0.018391 to -0.000344 TX 0.001582 to 0.002319 VA -0.010248 to -0.00246 VA 0.066688 WA -0.008811 and 0.002016 131 Average Slope Between Points Continued... Education Bachelor -0.027925 to -0.000141 Bachelor 0.002319 to 0.066688 Grammar -0.010519 to -0.003804 High School -0.022877 to -0.006276 High School 0.003036 Master -0.026246 to -0.001089 Master 0.000215 to 0.012799 MD -0.015346 and 0.002814 PHD -0.018391 to -0.000343 PHD 0.000795 Some college -0.02123 to -0.00184 Some college 0.000187 to 0.002513 Vocational 0.002743 Area Rural -0.022877 to -0.005032 Rural 0.002814 Small Town -0.026246 to -0.000343 Small Town 0.000795 to 0.066688 Suburb -0.027925 to -0.000141 Suburb 0.000187 to 0.013462 Urban -0.021166 to -0.001587 Urban 0.000215 to 0.002319 132 Height 4.0-5.0 -0.020907 to -0.010597 4.0-5.0 0.001582 5.1-5.3 -0.027925 to -0.000981 5.1-5.3 0.002319 5.4-5.6 -0.026729 to -0.000344 5.4-5.6 0.000215 to 0.066688 5.7-5.9 -0.025951 to -0.000343 5.7-5.9 0.000187 to 0.002513 5.10-6.0 -0.026246 to -0.000141 5.10-6.0 0.002743 to 0.008716 6.1-6.3 -0.019756 to -0.004101 6.1-6.3 0.002016 to 0.002814 6.4-6.6 -0.002208 6.7-6.9 -0.00753 6.10-7.0 -0.012586 133 Average Di erence Between Points Gender Female 127.2556 to 399.7623 Male 133.392 to 482.5931 Ethnicity African Am 152.3675 to 301.8281 Asian 183.6524 to 371.154 Hispanic 159.1971 to 265.7289 White 133.392 to 482.5931 134 State AL 133.392 to 482.5931 CA 181.1088 to 327.0401 CO 201.214 DC 199.3032 to 277.4044 FL 152.3675 to 259.496 GA 127.2556 to 326.1891 IA 168.036 to 388.1414 ID 198.5997 IL 195.3549 to 265.8221 KY 205.1516 to 278.3755 LA 207.8237 to 224.2535 MA 186.3697 MD 243.7944 to 256.8543 MI 159.1971 to 254.0115 MN 173.2406 to 278.3502 MO 225.3363 to 356.8013 MS 156.3135 NC 142.1766 to 237.0423 NE 232.1382 NY 286.2156 OH 170.7894 to 246.116 OR 256.2844 PA 227.4499 SC 159.1133 TN 250.8339 to 399.7623 TX 159.6963 to 265.6611 VA 161.8613 to 316.4243 WA 219.7375 to 248.5473 135 Average Di erence Between Points Continued... Education Bachelor 159.1133 to 371.154 Grammar 168.036 to 242.9888 High School 193.7717 to 388.1414 Master 133.392 to 482.5931 MD 195.3549 to 265.8221 PHD 159.6963 to 306.7029 Some college 127.2556 to 349.0001 Vocational 247.1431 Area Rural 171.9593 to 349.0001 Small Town 127.2556 to 482.5931 Suburb 133.392 to 356.8013 Urban 168.036 to 399.7623 Height 4 to 5 186.3697 to 308.6529 5.1 to 5.3 159.1133 to 346.3599 5.4 to 5.6 159.1971 to 399.7623 5.7 to 5.9 127.2556 to 388.1414 5.10 to 6.0 145.1255 to 482.5931 6.1 to 6.3 156.3135 to 367.2222 6.4 to 6.6 221.1146 6.7 to 6.9 184.2821 6.10 to 7.0 133.392 136 Di erence For Y Gender Female 3.1695 to 24.202 Male 4.6963 to 25.4026 Ethnicity African Am 3.1695 to 24.5502 Asian 9.9157 to 18.0594 Hispanic 4.3011 to 17.2095 White 3.5783 to 25.4026 137 State AL 3.1695 to 25.4026 CA 9.8764 to 21.414 CO 12.6157 DC 12.6416 to 16.149 FL 9.217 to 21.5992 GA 5.8614 to 16.7694 IA 8.2849 to 24.1218 ID 14.29 IL 12.6083 to 21.7575 KY 16.3881 to 18.3383 LA 7.6347 to 7.8482 MA 15.7179 MD 16.7983 MI 4.3011 to 23.099 MN 3.8212 to 19.9132 MO 9.1064 to 11.3593 MS 16.5769 NC 6.1231 to 17.9883 NE 19.4994 NY 19.5033 OH 6.7098 to 24.5502 OR 17.2095 PA 15.2904 to SC 16.2137 to TN 8.195 to 11.8413 TX 6.0749 to 21.0472 VA 6.5411 to 18.2006 WA 7.2793 to 15.9411 138 Di erence For Y Continued... Education Bachelor 3.8212 to 24.202 Grammar 12.1297 to 13.8139 High School 8.7087 to 24.1218 Master 3.1695 to 25.4026 MD 12.6083 to 21.7575 PHD 6.7098 to 24.5502 Some college 3.5783 to 21.414 Vocational 15.2961 Area Rural 4.6963 to 23.099 Small Town 3.5783 to 25.4026 Suburb 3.1695 to 21.414 Urban 3.8212 to 24.1218 Height 4 to 5 12.0287 to 17.1356 5.1 to 5.3 3.1695 to 23.099 5.4 to 5.6 3.8212 to 24.202 5.7 to 5.9 4.6963 to 24.5502 5.10 to 6.0 6.7098 to 25.4026 6.1 to 6.3 9.4881 to 21.7575 6.4 to 6.6 10.5669 6.7 to 6.9 13.9175 6.10 to 7.0 14.0919 139 Di erence For X Gender Female 429.6875 to 1371.0938 Male 542.9688 to 1328.125 Ethnicity African Am 710.9375 to 1371.0938 Asian 757.8125 to 1285.1563 Hispanic 542.9688 to 1281.25 White 429.6875 to 1328.125 140 State AL 621.0938 to 1285.1563 CA 542.9688 to 1230.4688 CO 1207.0313 DC 996.0938 to 1109.375 FL 968.75 to 1296.875 GA 429.6875 to 1371.0938 IA 671.875 to 1226.5625 ID 1191.4063 IL 976.5625 to 1328.125 KY 820.3125 to 1113.2813 LA 1039.0625 to 1121.0938 MA 1117.1875 MD 1218.75 to 1281.25 MI 636.7188 to 1125 MN 832.0313 to 1148.4375 MO 675.7813 to 1164.0625 MS 1093.75 NC 648.4375 to 1277.3438 NE 1160.1563 NY 1144.5313 OH 984.375 to 1195.3125 OR 1281.25 PA 1136.7188 SC 1113.2813 TN 1199.2188 to 1253.9063 TX 531.25 to 1195.3125 VA 894.5313 to 1132.8125 WA 878.9063 to 1242.1875 141 Di erence For X Continued... Education Bachelor 429.6875 to 1296.875 Grammar 671.875 to 1214.8438 High School 828.125 to 1253.9063 Master 675.7813 to 1371.0938 MD 976.5625 to 1328.125 PHD 757.8125 to 1285.1563 Some college 621.0938 to 1277.3438 Vocational 988.2813 Area Rural 621.0938 to 1328.125 Small Town 648.4375 to 1296.875 Suburb 429.6875 to 1371.0938 Urban 542.9688 to 1199.2188 Height 4 to 5 925.7813 to 1195.3125 5.1 to 5.3 429.6875 to 1238.2813 5.4 to 5.6 542.9688 to 1371.0938 5.7 to 5.9 531.25 to 1285.1563 5.10 to 6.0 671.875 to 1250 6.1 to 6.3 722.6563 to 1328.125 6.4 to 6.6 1105.4688 6.7 to 6.9 1105.4688 6.10 to 7.0 933.5938 142 Appendix B Screen Shots of HTML Pages Used For Data Collection Figure B.1: This is the information page from data gathering website 143 Figure B.2: This is the Demographic Survey page from data gathering website 144 Figure B.3: This is the Phone Instruction Page from data gathering website 145 Figure B.4: User ID and PIN given on Phone Instruction Page from data gathering website Figure B.5: Four (4) digit number given on Phone Instruction Page from data gathering website Figure B.6: The message that all participants will leave located on the Phone Instruction Page from data gathering website 146