Classifying Speakers Using Voice Biometrics In a Multimodal World
Except where reference is made to the work of others, the work described in this
dissertation is my own or was done in collaboration with my advisory committee. This
dissertation does not include proprietary or classi ed information.
Kenneth Arthur Rouse
Certi cate of Approval:
Juan E. Gilbert, Chair
Professor
Computer Science and Software
Engineering
Cheryl D. Seals
Associate Professor
Computer Science and Software
Engineering
Richard Chapman
Associate Professor
Computer Science and Software
Engineering
George T. Flowers
Dean
Graduate School
Classifying Speakers Using Voice Biometrics In a Multimodal World
Kenneth Arthur Rouse
A Dissertation
Submitted to
the Graduate Faculty of
Auburn University
in Partial Ful llment of the
Requirements for the
Degree of
Doctor of Philosophy
Auburn, Alabama
August 10, 2009
Classifying Speakers Using Voice Biometrics In a Multimodal World
Kenneth Arthur Rouse
Permission is granted to Auburn University to make copies of this dissertation at its
discretion, upon the request of individuals or institutions and at
their expense. The author reserves all publication rights.
Signature of Author
Date of Graduation
iii
Dissertation Abstract
Classifying Speakers Using Voice Biometrics In a Multimodal World
Kenneth Arthur Rouse
Doctor of Philosophy, August 10, 2009
(M.S., Tarleton State University, 1998)
(B.S., South Dakota State University, 1982)
(A.A., Christ For The Nations, 1985)
159 Typed Pages
Directed by Juan Gilbert
The following dissertation document is a research study conducted to determine whether
a classi cation for a person is obtainable by using the person?s voice. The intent of this
work was to investigate a collection of voice samples for trends that potentially lead to
parameters to be used in the classi cation of an individual. No classi cation area was
sought speci cally; for instance gender or ethnicity, as it was preferred to allow the results
to dictate the characteristics that point to a particular classi cation group. In the data
collection stage, each participant was given the same task and then analysis was done on
the voice sample given. Analysis was conducted in phases, with the  rst phase focusing on
the time domain which resulted with parameters approximating speed of speech and the
amount of pauses in the sample. Next the frequency domain was investigated focusing on
the complexity of speech and voice tone attributes. The results of the inquiries into this
domain concluded with the peaks, in the frequency of the voice, being tracked by frequency
threads and represented numerically by a third order polynomial. It is the coe cients of
this polynomial that give a representation of an individual?s voice, making it possible to
classify them to a particular group. To verify this, the coe cients from these polynomials
iv
were used with a clustering application to validate the hypotheses of this study, substanti-
ating an objective to provide empirical user data to contribute to the design of future phone
system communications.
v
Acknowledgments
It is with great pleasure that I give thanks to the many who have made this dissertation
possible.
First and foremost I want to thank my Lord and Savior Jesus Christ for giving me
the grace and wisdom to accomplish this major goal in my life. For without Him I can do
nothing, but with Him I can do all things.
Second only to my Lord, I want to thank my wife Darlene as it has been 9 long years on
this journey to completion. I want to thank her for all the times that she had to cover the
home front when I was studying for classes, working on projects, studying for the qualifying
exams (TWICE), and all the many hours of working on this project. It really was a we
e ort! I want to also thank my children; Heather especially for all the La Tex tables that
she made, data that she entered, and all the proof reading that she has done. You were a
great research assistant! To my boys, Jhett and Jonathan who did a huge amount of work
at home when I was at the o ce. This could not have happened without them helping to
cover the home front with Darlene. To my son Jimmy Jimmy Jimmy, thank you for all
those words of encouragement and for believing that the day would come that it would be
done. To my good friend and brother in the Lord, Dan, thank you for all the brain storming
sessions and keeping me going in the right direction. A special thank you to all of my family
and friends that have been praying for me and my family during this time.
Next, I want to thank my dissertation advisor, Dr. Juan E. Gilbert, for his support
and encouragement, for helping to de ne the scope of the project, and for assisting in the
many publication opportunities outside this project. Auburn?s loss is truly Clemson?s gain!
I also wish to thank the rest of my committee, Dr. Cheryl Seals, Dr. Richard Chapman and
Dr. Susana Morris for working with me on such a tight schedule. My thanks to my fellow
HCCL lab members Dr. Shaun Gittens, Yolanda McMillan, Dr. Idongesit MkPong-Ru n,
vi
Wanda Eugene and Vincent Cross for their help in reviewing the proposal for this work.
For the task of painstakingly reading through my entire dissertation and making some very
valuable suggestions, I want to especially thank; Yolanda McMillan, Philicity Williams, Dr.
Win Britt, and Ciao Soares. An extra thank you to Win for all the help and direction he
gave in keeping me on task and making my life more manageable by suggesting I use La
Tex and Google Code. Thank you for answering all my questions. Also thank you to all
my HCCL friends that have been an encouragement through these many years.
Finally I want to thank all that participated and gave of their time and voice to give
me the many samples that made this project possible. With the last thank you to Judy
Rodman who gave her professional advice to a stranger that emailed her a question just
out of the blue.
PRAISE TO JESUS... THIS DISSERTATION WORK IS DONE!
vii
Style manual or journal used Journal of Approximation Theory (together with the style
known as \aums"). Bibliography follows van Leunen?s A Handbook for Scholars.
Computer software used :the document preparation package TEX (speci cally LATEX)
together with the departmental style- le aums.sty.
viii
Table of Contents
List of Figures xi
List of Tables xiii
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Literature Review 5
2.1 Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Voice biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Speaker veri cation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.2 Speaker Identi cation . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.3 Speaker Classi cation . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Voice Sampling Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Text Dependent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 Text Independent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Voice Pitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Discrete and Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . 12
2.6 Window Functions and Spectral Leakage . . . . . . . . . . . . . . . . . . . . 14
2.7 MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Research Plan 18
3.1 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Equipment and Material Used . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Software Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Data Collection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Experimental Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4 Time Domain Experimentation and Results 26
4.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.1 Experiment Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
ix
5 Frequency Domain Experimentation and Results: Initial Phase 36
5.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1.1 Experiment Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1.2 Procedure 1 (Converting Data) . . . . . . . . . . . . . . . . . . . . . 37
5.1.3 Procedure 2 (Locate Primary Peaks) . . . . . . . . . . . . . . . . . . 37
5.1.4 Procedure 3 (Calculate Averages) . . . . . . . . . . . . . . . . . . . . 42
5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6 Frequency Domain Experimentation and Results: Graphical Phase 46
6.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.1.1 Experiment Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.1.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7 Frequency Domain Results: Final Phase 52
7.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.1.1 Experiment Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.1.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8 Findings and future work 67
8.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
9 Scholarly Contributions 70
Bibliography 71
Appendix 76
A Breakdown of demographics 76
B Screen Shots of HTML Pages Used For Data Collection 143
x
List of Figures
2.1 Example of Voice Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Upper graph is a standard periodic signal where as the lower graph is not
periodic and has discontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1 Original voice sample opened in digital audio editor, Audacity . . . . . . . 27
4.2 Example Graph of Voice Sample In Time Domain . . . . . . . . . . . . . . 28
4.3 Voice sample showing where the calculated pause of sample is located at. . 29
4.4 A sample with speaking time of approximately 7 seconds and pause time of
0.44 seconds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5 A sample with speaking time of approximately 7 seconds and pause time of
1.78 seconds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.6 U.S. Census Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.1 Graphs of Cropped Voice Sample Saying Full Message . . . . . . . . . . . . 38
5.2 Full frequency graph showing the boundaries for the area that will give the
most information for a voice sample. . . . . . . . . . . . . . . . . . . . . . . 39
5.3 Selected frequency sample (250 - 1250 Hz) graph of the bounded area in the
graph above. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.4 Graph showing a view of peak locations of a full frequency sample . . . . . 40
5.5 Graphs showing di erent views of peak locations of sample within the fre-
quency boundaries (250 - 1250 Hz) . . . . . . . . . . . . . . . . . . . . . . . 41
5.6 Graphs showing one that has a positive slope average and one with a negative
slope average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.7 Graph showing the ranges for the positive and negative average slope of lines
between peaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
xi
5.8 Graph showing the ranges for the average distance between the primary peaks 45
6.1 Comparison of participant sample split in two halves . . . . . . . . . . . . . 49
6.2 Comparison of participant saying the word \George" split in two halves . . 49
6.3 Graphs showing the two halves of the word \Nine" . . . . . . . . . . . . . . 51
7.1 Graphs showing the two halves of the word \nine" and the consistent pro-
gression of the two samples . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.2 Multiple graphs showing the change of the frequency and amplitude for the
word \nine" spoken by a single participant. . . . . . . . . . . . . . . . . . . 54
7.3 View of peak location of the  le 1 from the breakdown of the  le where the
participant said the word \nine". . . . . . . . . . . . . . . . . . . . . . . . . 55
7.4 View of peak location of the  le 2 from the breakdown of the  le where the
participant said the word \nine". . . . . . . . . . . . . . . . . . . . . . . . . 55
7.5 View of peak location of the  le 3 from the breakdown of the  le where the
participant said the word \nine". . . . . . . . . . . . . . . . . . . . . . . . . 56
7.6 View of peak location of the  le 4 from the breakdown of the  le where the
participant said the word \nine". . . . . . . . . . . . . . . . . . . . . . . . . 56
7.7 Graph of numerical data indicating the FLT stored in an Excel spreadsheet 59
7.8 Graph 1 is of the frequency values of the  rst thread of a test sample and
graph 2 show the polynomial that  ts that step graph . . . . . . . . . . . . 61
B.1 This is the information page from data gathering website . . . . . . . . . . 143
B.2 This is the Demographic Survey page from data gathering website . . . . . 144
B.3 This is the Phone Instruction Page from data gathering website . . . . . . . 145
B.4 User ID and PIN given on Phone Instruction Page from data gathering website146
B.5 Four (4) digit number given on Phone Instruction Page from data gathering
website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
B.6 The message that all participants will leave located on the Phone Instruction
Page from data gathering website . . . . . . . . . . . . . . . . . . . . . . . . 146
xii
List of Tables
2.1 Formula Description Of Symbols . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Windowing Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Description of some MATLAB commands used . . . . . . . . . . . . . . . . 17
4.1 Comparison of total time to say message (cropped sample) and the amount of
pause in the sample with the percentage of pause in the sample as it pertains
to Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Regions in the United States and states represented from these regions . . . 32
4.3 Regions in the United States and states represented from these regions . . . 33
5.1 The average slope between peaks with the focal point on Gender . . . . . . 43
5.2 The average slope between peaks with the focal point on Ethnicity . . . . . 43
7.1 This table shows the data on peak location for the  rst 40 smaller  les that were
created from the full sample of a person saying the word \nine". It numerically
represents the shifting of the peaks as well as the appearance and disappearance of
minor peaks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.2 The results from the analysis tool showing the percentages as they pertain
to male and female in each coe cient score group . . . . . . . . . . . . . . . 63
7.3 Clustering results from Applications QuestTMreset to look for samples that
are alike rather than di erent. . . . . . . . . . . . . . . . . . . . . . . . . . . 64
xiii
Chapter 1
Introduction
1.1 Motivation
Professionals in the  eld of Human Computer Interaction (HCI) are continuously
searching for ways to improve the communication between humans and computers, es-
pecially when using voice interfaces [45]. HCI research has become paramount to the de-
velopment of computer applications that require user authentication, such as e-commerce
[1]. Biometrics is one area in which HCI research is being conducted to examine the poten-
tial of strengthening the authentication and security processes. The process of biometrics
is the use of an automated method to recognize an individual based on physiological, be-
havioral characteristic, or a combination of the two [17]. While there are many sub-areas
that pertain to biometrics, some of the more recognized areas are: voice, iris,  ngerprints,
hand, face, retina, signature, keystroke, and gait. Biometrics is founded on the idea that
any or all of the aforementioned physical and or behavioral aspects are unique to a person
and can be used to identify that person [27]. The focus of this research is the sub-area of
voice biometrics, its ideals and its characteristics. Speci cally this research is involved in
using voice biometrics to classify an individual and thus make the communication process
between them and a computer application more successful. This avenue of thought came
about while compiling the post survey responses from a usability study conducted for a
electronic voting system, Prime III [48]. One of the most frequent comments made was
about the voice used for the system to communicate with the user. Some said it talked to
fast, some said it talked to slow, and some said they were not able to understand it. The
topic of this dissertation came about after reading these comments and thinking of probable
solutions to improve this aspect of communication between an individual and machine.
1
1.2 Problem Description
Using voice biometrics in speech technology has evolved greatly over the last decade
resulting in many commercial applications. In addition to this evolution, the  eld of HCI
has taken on an important role in the development of applications by getting awareness on
the way humans and machines interact to be a part of the development process [16, 45].
With the advancement of speech technology in phone applications, there also must be more
consideration given to making the communication between the diverse group of users and
the voice applications more compatible [11]. This is not a trivial task that can be solely
 xed using technology. It will also require the involvement of social science to help better
address the issues that arise. How a system responds to a user can have an immense e ect
on the interaction between the system and the user [45]. Currently, speech applications
do not give any consideration to potential characteristics of the user that can be used to
help the communication between the person and the machine. In the past most research
concerning voice biometrics has been conducted in the areas of speaker identi cation or
speaker veri cation and very little has been done in the area of speaker classi cation. Thus
more focus is being given to speaker classi cation because speech interfaces are becoming
widely implemented in today?s phone and web applications [45]. Given the increased interest
and the perceived usefulness of voice classi cation in today?s applications, the hypothesis
of this research arose. By analyzing the human voice one can conclude the following:
 H1) The pitch range of the human voice can be used to create a tone classi cation
set, such as a low, medium, and high tones.
 H2) The human tone classi cation can be re ned into human classi cations that per-
tain to gender, ethnicity and geographical area where their accent was most e ected.
1.3 Background
Voice biometrics is a method of biometric authentication that uses voice recognition
techniques based on characteristics of the human voice. According to Dr. Judith Markowitz,
2
voice biometrics can be broken up into speaker veri cation, identi cation, and classi cation
[32]. Research in the area of voice biometrics has mainly been associated with two areas:
1) speaker veri cation and 2) speaker identi cation. While equally important, a third area
of voice biometrics, referred to as speaker classi cation is not as widely focused on as the
other two but is just as important [34].
Speaker veri cation requires a user to create a voice template that can be stored in
a database associated with that speci c user to be used with the system. When the user
submits a voice sample for veri cation to the application and declares to be a certain
individual, the system will then perform a one-to-one comparison with the voice template
that is stored in the database for that particular individual. Next a calculation will be made
to see how close the two templates are, and a con dence value will be generated. If this
con dence value is above a given threshold then the application will verify the authenticity
of the speaker [26, 32, 30].
Speaker identi cation is a similar process to speaker veri cation in that it also collects
a voice sample. However, the application performs a one-to-many matching process against
an already existing database that holds voice templates of known individuals. The matching
process consists of the application comparing the voice sample that is given by the user with
each voice template that is in the database. Consequently, this time it is searching for the
closest match and then it determines if the calculated con dence value falls within a given
threshold. In cases like these, the user?s identity is determined by the search and match.
In speaker veri cation, the user identi es him/herself and the system veri es their claim to
be that individual using the voice template associated with that user.
Speaker classi cation is the area of voice biometrics which is used to determine a speci c
group that the user may or may not belong to. It does not require a preset database as the
previous two types of voice biometrics because it is neither looking to verify or identify a
certain individual. Speaker classi cation is the type of voice biometrics that this research is
investigating. This research proposes that an algorithm can be developed that will determine
a value for the user which can be used to classify the user into a speci c group(s). This
3
classi er will be created using the individual?s speaking range as it pertains to the pitch and
the speed of speech of the individual. This algorithm will be discussed further in chapter
three.
1.4 Organization
In the chapters that follow a research agenda will be examined. In Chapter 2 a liter-
ature review will be given that discusses the areas of biometrics, the three types of voice
biometrics, and some mathematical concepts that pertain to representing voices graphically
and digitally. The speci c mathematical areas covered in the literature review will be the
Fast Fourier Transform and windowing processing concepts that will be used to represent
the pitch of a person. An overview of the application MATLAB by Mathworks will be
discussed as this application and its extensive signal processing libraries will be used in this
research. Also the subject area of voice pitch will be discussed as well as what determines
the pitch of a person?s voice. A detailed plan of research will be outlined in Chapter 3.
Chapters 4 - 7 will present the procedure by which the data was analyzed and preliminary
results. Concluding with Chapter 8 were the  ndings, signi cant contributions to the  eld
of voice enabled technology and future work will be discussed. Finally, Chapter 9 will list
preliminary work and publications.
4
Chapter 2
Literature Review
This Chapter describes some of the work done in the  eld of biometrics and the math-
ematics involved in this technology. It focuses on biometrics, voice biometric systems, voice
pitch, MATLAB programming language and libraries, Discrete and Fast Fourier Transforms,
and windowing functions used with data obtained from Fast Fourier Transforms.
2.1 Biometrics
The word \biometric" can be broken up in to two words, bio meaning \life" and metric
meaning \measurement" [39]. A very basic de nition is be \life measurement" which needs
to be expanded to give clarity for today?s uses. The term biometrics has now become a
present day word that many have used or at least heard of, but do not fully comprehend
its meaning. The de nition of biometrics can vary depending on the speci c context it is
being used. The following is a de nition of biometrics as it relates to this research:
\Biometrics is the automated use of physiological or behavioral characteristics
to determine or verify identity" [44].
As previously mentioned the area of biometrics can be broken into several sub-areas. A
broader list of sub-areas are voice, iris,  ngerprints, hand, face, retina, DNA, signature,
computer keystroke, gait, odor, earlobes, sweat pores, lips, etc. This research will not
discuss whether each of these sub-areas is unique in themselves, but will operate under
assumption that there are attributes that are unique to each person [44]. In addition to the
underlying assumption of this research is that the voice is a unique trait of an individual and
thus can be used in identifying the person. Therefore this research will focus on the sub-
area of biometrics, referred to as voice biometrics. To understand the topic of biometrics
5
it is bene cial to know that biometrics has been around for hundreds of years. The use of
biometrics can be seen throughout history in many di erent forms. In China, in the 14th
century, the explorer Joao de Barros found the use of biometrics as merchants recorded the
palm prints and footprints of children on paper with ink, for identi cation purposes [6].
In Eastern Asia, potters placed their  ngerprints on objects for identi cation of the maker
[66]. The use of  ngerprints for identi cation has continued to this day and is extensively
used by law enforcement to identify criminals. An anthropologist by the name of Alphonse
Bertillon who lived in Paris in the late 1890s and his e orts to make the identi cation of
criminals easier is credited with bringing biometrics to the point where it was considered
an actual  eld of study [6]. This brief look at some of the historical usage of biometrics
shows that biometrics has evolved over the years and now with advancing technology, will
be evolving even further [44]. Voice biometrics is one such growing area of biometrics and
will be discussed in Section 2.2.
2.2 Voice biometrics
Voice biometrics has mainly been associated with two areas: speaker veri cation and
speaker identi cation. The third area of voice biometrics, speaker classi cation, has not
received as much attention as the other two [8, 34]. The  rst two areas of voice biometrics:
speaker veri cation and speaker identi cation, at  rst may appear decidedly similar, but
each has a distinct purpose [30]. The third area of voice biometrics, speaker classi cation,
is considerably di erent from the  rst two. The next few sections discuss the similarities
and the di erences in these three aforementioned areas of voice biometrics to distinguish
the area of speaker classi cation from speaker veri cation and speaker identi cation.
2.2.1 Speaker veri cation
Speaker veri cation (SV) aims to validate a person?s identity much like having your
driver?s license picture checked at the airport when you check-in for your  ight. SV is be-
coming widely visible in today?s economy due to the added security issues faced by industry
6
[32]. SV is used to verify that the person speaking is truly the person they are claiming to
be [61]. SV is a one-to-one comparison and in general, there are  ve steps to SV [8]. First
is the enrollment of the user to generate a voice template that will be stored in a database.
In a second step, the user speaks for a set amount of time so that a voice sample can be
obtained for enrollment or for the veri cation process. This collection phase can be either
text dependent or text independent, which will be explained later in Section 2.3. Once
the voice sample is obtained, the third step of extracting certain features from this voice
sample takes place and a template is made. This template will either be immediately used
for comparison or will be stored in a database to be used later. The fourth step is a pattern
matching phase, where a con dence value is calculated. This value is used in the  fth step
to accept or reject the claims of the authenticity of a person [8, 36]. To make this decision,
a con dence threshold is set according to the sensitivity needed for the system, which is
based on the security required [36]. It is common to use this process in conjunction with
some form of ID number or password known by the user. Upon entering their unique ID in
the system, the user is required to speak so that the application can verify they are whom
they claim to be.
2.2.2 Speaker Identi cation
Unlike speaker veri cation where a voice sample is compared to a single stored template
in the database, speaker identi cation (SI) uses an individual?s voice to identify them [32].
SI is a one-to-many process, where a sample is obtained from the user and then a comparison
is made with all the voice templates in the database to determine if a match can be found
[27, 36, 32, 30]. The aforementioned general steps for SV also pertain to SI with a few
di erences. In the decision making step, SI does not give a decision to accept or reject as
in a SV system. The SI decision process is determining if there is a stored template that
matches the collected voice sample within a certain predetermined threshold [8]. As stated
earlier, SI is a one-to-many matching process, making this process a much more di cult
task. The application will need to go through the pattern matching step for each voice
7
template that is stored in the database. For each voice template, it must be determined
whether the con dence value for that particular item is within the preset threshold. Finally,
an output is given as it pertains to the identity of the person speaking. The output of this
process can be [36]:
 No matches - indicating that no stored template had a con dence value above the
given threshold
 A single match - where one certain template was the only one to have a con dence
value above the threshold for the given sample
 Multiple matches - since all stored templates are checked and more than one can be
a close match, where all are within the given threshold; it is not uncommon to get
more than one close match
2.2.3 Speaker Classi cation
Speaker classi cation (SC) is di erent from speaker veri cation and speaker identi ca-
tion in that it is used to determine if a speaker can be associated to a particular characteristic
group rather than associating it to a particular individual [35]. SC is done by extracting
information from a voice sample that is obtained from a given speaker. It is an idea of this
research and others that di erent characteristics can be determined about the speaker once
their voice sample is processed. Characteristics such as gender, age, emotion (i.e. fear or
anger) and ethnic origin are a few of the characteristics that may be determined [30, 34, 45].
SC has been around for decades, but attention has mainly been on the other two areas. SC
for the most part has only been considered for how it can help with developing these two
areas. In the development of either SV or SI systems, there is usually a form of classi cation
performed. To facilitate this, the use of Gaussian Mixture Models (GMM) has been and is
still being heavily used with speaker classi cation aspect of systems. However, other mod-
els and methods are now being researched such as Hidden Markov Model (HMM), Support
Vector Machines (SVM?s), and use of voice pitch [41, 68, 51]. Using the pitch of a human
8
voice as a speaker classi er has not yet been researched to a great extent and thus is of
interest to this research.
Speaker classi cation research now is being conducted to help in many areas. One of
these is the monitoring of phone conversations [42]. Dr. Judith Markowitz gives an example
of this in the book Speaker Classi cation I, the Loquendo Voice Investigation System. This
system is used to monitor cell phone calls where speakers of special interest are determined
and this information is then passed on to law-enforcement or intelligence agency clients [34].
The following sections discuss text dependent and text independent methods. An
understanding of the di erence between these two is needed to add clarity to voice sample
collection from the user of a speaker recognition system no matter if it is for veri cation,
identi cation or classi cation.
2.3 Voice Sampling Methods
In the previous sections, the terms text dependent and text independent were mentioned
as it relates to the collection step and will now be explained. Along with a basic explanation,
some of the advantages and disadvantages of each will also be discussed. This overview is
important to this research as it dictates which method is most appropriate. This is an
important distinction that needs to be determined prior to any study that will be done.
2.3.1 Text Dependent
A text dependent system is one that is trained by the user speaking a predetermined
phrase or word that has already been established in the system. The phrase that is selected
can be determined by the administrator of the application or the user and usually is some-
thing that the user will be able to remember easily along with something that will give a
broad phonic range. This is an advantage of a text dependent system in that the user will
be speaking a phrase, where the voice sample obtained will be a better representation of
the person?s voice [55, 8]. It is also customary with a text dependent approach for the same
phrase to be used to establish a voice template and for the user to repeat when using the
9
application [17]. This improves the matching step of the application by reducing the chances
of having a false reject or false acceptance of a given user [8]. An additional advantage of
the text dependent method is that the user selects the word or phrase to be used, which is
unique to them [36].
The disadvantages of the text dependent method are mostly connected with the area
of security. Because the user will use a predetermined word or phrase, there is always the
possibility that the word or phrase was compromised (i.e. obtained by another individual).
When the word or phrase is known by another individual, a voice sample can be manufac-
tured to circumvent the system or an actual recording can be obtained of the user saying
the phrase and this is used to circumvent the system. Both scenarios are possible because
of the dependency on a set word or phrase. Along with the risk of someone obtaining the
word or phrase, there is a problem that the user of the system must remember the word or
phrase that was used to set up their voice sample. When the user does not use the correct
word or phrase and the system attempts to match the voice template of the user, then a
false reject is given. This puts a burden on the user to always remember the exact word or
phrase [67]. The text independent method can be the solution to this problem and will be
discussed next.
2.3.2 Text Independent
A text independent system in contrast to a text dependent system obtains the voice
template by the user reading or speaking anything they prefer. This method allows the
user to speak freely and does not tie the user to a predetermined phrase. In the past,
this type of system has been used primarily when a person, whose voice sample is being
obtained, is not fully cooperative or they are not willing to participate in the process at all.
Speaker recognition technology trends for text independent applications are advancing; for
example, being able to identify a person without them having to speak any predetermined
word or phrase [64]. This is one of the main advantages for this method that the user
does not have to remember a phrase. Another advantage is that speaker veri cation can be
10
utilized in a manner that runs in the background of the application. For example when a
user calls a bank to transfer money and as the user is making their request, the application
simultaneously veri es the speaker [36]. Unlike the text dependent system that requires
the same word or phrase be spoken for the collection step, the user of text independent
system can record a voice sample for a template with one phrase and use another phrase
when using the system. This can also lead to an immense disadvantage with this type of
system. With the user freely speaking, it is not guaranteed that the user will speak or read
something that spans a broad range of their voice giving the speech features that identi es
them. Due to this fact, in some cases, a longer sample may be necessary which may be
di cult to obtain from a user that is not fully cooperative [26].
2.4 Voice Pitch
The sound of a human voice is comprised of several components. One of these is the
pitch of the voice. In his book, Dr. H. Newell Martin describes the process that a body
goes through in order to produce the pitch of a person?s voice. He describes that the larynx
is the primary body part that determines the sound of a person?s voice. The larynx holds
the vocal cords and it is the vibration of these cords that produces the pitch. He further
states that it is the size or the length of these vocal cords that will give a certain pitch to
the voice. The longer they are, the lower the person?s voice. Consequently, the shorter the
vocal cords are, the higher the pitch of a person?s voice [37]. This can be substantiated by
listening to the voice of a woman or child in comparison to the voice of an adult male. The
woman or child speaks in a higher pitch due to the fact that they are usually smaller in
stature [18]. Another fact Dr. Martin concludes, is that the vocal cords of a certain length
will always give a set range to the voice. The range is dependent on a set of muscles in the
larynx which determine the tension of the vocal cords. This leads to the fact that a person
can only speak as high or as low as their vocal cords permit. This description as to how
pitch is formed is still used today. The pitch range of a man?s voice, has been determined to
be approximately 80 - 200 Hz with an average pitch of 120 Hz. Whereas the pitch range for
11
a woman?s voice is 150 - 350 Hz with an average pitch of 225 Hz [58]. Upon analyzing this
data, one can conclude that a man?s pitch can be closer to the woman?s average than to a
man?s average and vice-a-versa. One explanation is the individual?s age, which determines
their vocal cords? length, in that a young boy does not have vocal cords the length of a
man [62]. Their pitch range is generally determined by the length of their vocal cords. The
average length of a male?s vocal cord is about 18 mm and that of a female is about 16
mm [50]. Along with the length, the thickness of the vocal cords also determines the pitch
of one?s voice [37]. Based on the research and literature it is quite clear that the physical
attributes of a person?s vocal cords give him/her a set pitch range that cannot be easily
altered. Exceptions are caused by surgery, accident, sickness, smoking or extreme training
(as in opera singer) altered the makeup of the vocal cords [37]. All but the latter most
likely is a permanent change and still gives a person a set pitch range.
2.5 Discrete and Fast Fourier Transform
The Discrete Fourier Transform (DFT) is a generalization of the Fourier Transform
(FT). In general, the FT takes a function and converts it into another function that may
be more useful. The FT processes a continuous-time signal using calculus, making it highly
complex. Added to this is the fact that in signal processing, the data is processed only
in samples that will not be continuous. Therefore, it can be said that a DFT is used to
compute a discrete-frequency spectrum from a discrete-time signal of  nite length [56]. This
research will be using signal processing to analyze voice samples. Most voice samples are
in the time domain and DFT will transform it from the time domain to the frequency
domain [28]. Considering that the data of a voice sample is both discrete and  nite, it is
not di cult to see where this approach of analyzing the sample can give some very useful
results, as shown in  gure 2.1. Where sub- gure (a) represents a voice in the time domain
and sub- gure (b) illustrates a sample of the voice in the frequency domain using DFT.
The DFT conversion of a signal x may be de ned by the following formula and the table
2.1 contains a description of the symbols utilized by the formula.
12
X(!k)!
N 1X
n=0
Ce j!ktn where k = 0;1;2;:::N-1
Calculating the DFT can be computationally expensive, even when using a computer.
This requires a faster algorithm to be developed and many have been developed that address
this need. The most widely used algorithm was developed by James W. Cooley and John
W. Tukey in 1965 [14], also known as the Fast Fourier Transforms (FFT) [56]. As the name
implies, it is a faster version of DFT and is widely used today in computer applications [15].
The main advantage of the FFT is that it reduces the computational complexity for N points
from 2N2 to Nlog2N [14, 65]. To illustrate the di erence, consider that N = 256 points for a
given voice sample. With a DFT, there is, worst case, 65536 computations required to make
the transform. However, with a FFT, resulting in, worst case, 2048 computations needed.
This example shows that it does involve 32 times the number of computations needed to
use DFT instead of FFT. Given that a voice sample can have hundreds of thousands of
data points, it is clear that the FFT is the best option. It is common that when using Fast
Fourier Transform (FFT), a windowing function is also used and is explained next.
Table 2.1: Formula Description Of Symbols
N 1X
n=0
f(n) f(0)+f(1)+...+f(n-1)
x(tn) input signal amplitude (real or complex) at time tn (sec)
tn nT = nth sample instant (sec), n an integer  0 (sec)
T sampling interval (sec) also called the sampling period
X(!k) spectrum of x (complex valued), at frequency !k
!k k = kth frequency sample (radians per second)
 2 NT = radian-frequency sample interval (rad/sec)
fs 1T = sampling rate (samples/sec, or Hertz (Hz))
N number of time samples = no. frequency samples (integers)
13
(a) Approximately 4 seconds of a voice sample (b) After using DFT to transform from time
domain to frequency domain
Figure 2.1: Example of Voice Sample
2.6 Window Functions and Spectral Leakage
A windowing function acts as a  lter to smooth out the sinusoid (sine curve) that
represents the voice sample taken [23]. Since the voice sample being  nite, it is most likely
the sinusoid representation will be in a truncated waveform [46]. In performing spectrum
analysis using a FFT, there can occur a condition identi ed as spectral leakage. This occurs
when, using an FFT function, one whole period is used to represent a periodic form for the
sample. When a  nite sample has been obtained, there is no assurance that one full period
of the waveform has been captured which makes it possible for discontinuity. An example
of discontinuity and the spectral leakage connected with it, can be seen in  gure 2.2. One
solution for avoiding discontinuity in the sample waveform is to apply a windowing function
that will minimize the discontinuity of the created periodic waveform. The windowing
function is a weighted function and will be applied to the data for the waveform to smooth
out the connections at the end points minimizing the discontinuity for these end points.
This is done by using as many orders of derivatives as possible of the weighted data at the
end points, which will lessen the e ect of spectral leakage on the waveform [25].
There are many windowing functions that can be used. Some of the most common
window functions are the rectangular, triangular, Rectangle, Hanning, Kaiser, and Ham-
ming [22]. One can determine the choice of a windowing function by resolving the tradeo 
between comparable strength signals with similar frequencies [23]. For this research, the
Hanning window function shown below:
14
Figure 2.2: Upper graph is a standard periodic signal where as the lower graph is not
periodic and has discontinuity
w(n) = 0:5
 
1 cos
 2 n
N 1
  
was used as it is computationally less expensive as compared to the other functions shown
in table 2.2. The programming language used in this research is MATLAB1 . One of the
reasons for choosing this language is that it has very e cient algorithms for windowing
functions as well as for the FFT function, based on the one developed by Cooley and
Tukey[38, 59].
Table 2.2: Windowing Functions
Triangular window w(n) = 2
N  
 2
N jn 
n 1
2 j
 
(non-zero valued end-points)
Bartlett window w(n) = 2
N 1 
 N 1
2  jn 
n 1
2 j
 
(non-zero valued end-points)
Bartlett-Hann window
w(n) = a0 a1j nN 1 12 j a2cos
 2 n
N 1
 
(non-zero valued end-points)
a0=0.62; a1=0.48; a2=0.38
Blackman window
w(n) = a0 a1cos
 2 n
N 1
 
+a2cos
 4 n
N 1
 
(non-zero valued end-points)
a0=0.42; a1=0.5; a2=0.08
1MATLAB is a registered trademark of The Mathworks Inc.
15
2.7 MATLAB
According to the developers of MATLAB, \MATLAB is a high-level technical comput-
ing language and interactive environment for algorithm development, data visualization,
data analysis, and numerical computation.", making it a very powerful tool [59]. It is
the dual aspect of MATLAB (programming language and development environment) that
makes it a good choice for this research. Another advantage of using MATLAB is that it
was originally developed with signal processing in mind [59].
The name MATLAB stands for MATrix LABoratory as the MATLAB language stores
all data values in a matrix form [19, 49]. This means that if the program stores one value
or one thousand values, each value goes into a cell of a matrix. Another feature of this
language is that variables do not have to be declared ahead of time or their data types
speci ed. Also, one does not have to allocate memory as MATLAB has built-in dynamic
memory allocation [59]. For this research, a major bene t of this type of storage method
is that in processing a voice sample, the size of the sample will not be known ahead of
time. In MATLAB, the size of the matrix will provide a representation for the number of
elements, as each cell in the matrix will have some value in it. The large built-in function
library (over 8000 commands) in MATLAB provides three commands, \size", \length", and
\numel" (see Table 2.3) which can be used to easily determine the number of elements in a
given matrix that may have been brought in from an outside data  le [59].
E cient accessing of data from an outside source is another strong point of this pro-
gramming language. There are multiple commands available to access data from di erent
 le types, databases, or even other applications that are written in another language such as
C or C++ [59]. Some of the commands that were bene cial to this research are \wavread",
\xlsread", \textread" and \ nd", see table 2.3 for a description. These commands were
used to read in demographic information pertinent to the voice sample that was stored in
other  le types or a database. Likewise, writing data to an output  le was made more
16
e cient with the commands such as, \fprintf" when a text  le was required or \xlswrite"
if data was better suited to a spreadsheet.
Having the tools to collect, process and store data e ciently was a necessity for this
research. Also, being able to visualize the data was especially important when doing the
voice samples. The graphics ability of MATLAB is another area that made this language
a good choice for this research. Within MATLAB, there are functions that will plot 2D
and 3D graphs to give a clear visual of the data that this study worked with. Along with
this list of functions, included are many labeling and formatting functions that will add to
the output of this research. All these built-in functions in MATLAB enabled the analysis
process to be more dynamic, making it a powerful development tool for this research [59].
Table 2.3: Description of some MATLAB commands used
COMMAND DESCRIPTION OF FUNCTION
 nd searches a given matrix for a speci ed condition and returnsa matrix with the index values of the found values
numel Returns the total number of elements in a given matrix
length Returns the larger value between the number of rows and thenumber of columns for a given matrix
numel Returns the total number of elements in a given matrix
size Returns the number of rows and the number of columns fora given matrix
waveread Reads Microsoft sound  les
xlsread Reads Microsoft Excel spreadsheet  les
xlswrite Writes data to a Microsoft Excel spreadsheet  le
fprintf Writes data to the command window or to a text  le
17
Chapter 3
Research Plan
3.1 Subjects
A goal of this research project was to acquire approximately 100 to 200 subjects to
participate in this research to achieve a diverse population set. This goal possible to reach
due to the large population of undergraduate and graduate students at this institution.
Collecting all samples from this population, due to the fact the sample population was not
very diverse in age or area of the United States that most a ected the way they talk. It was
found that of the participants that came from the local area and institute, 89% were of the
age of 19 to 21 and from either Alabama or Georgia. To remove this limitation and gain
as varied group of participants, a request was sent out to friends and family from di erent
communities around the United States. As will be discussed in Section 3.4 the \snowball"
method was used to gain as many participants as possible without having to have direct
contact with each potential participant. This method worked well in that there were over
170 voice samples collected along with the particular demographic information associated
with each sample. With the request for involvement going out to presumably any part of
the United States or the world, participants from diverse groups were collected. All voice
samples and data were collected from a remote location of the participants choosing, (i.e.
where they had access to the Internet and a telephone). There were 10 participants that did
not complete the study because they did not leave a voice sample as required. Since all data
collection was obtained in a private manner, it is not known whether the participant chose
to not  nish the study or did not understand the directions given. They had the option to
contact the study organizers via phone or email if they had any questions or encountered
any di culties with the data collection process. Yet only two such contacts were received
18
from participants, one indicating that the web pages for the study did not load, which was
due to the server being o line as an unrelated application had caused the server to crash.
The other correspondence described an error message that was given when the participant
attempted to navigate from the demographic page to the phone instruction page. In this
case the reason for the acquired error message was due to an inadvertent double entry in
the table that stored the IDs that were given to the users, this was promptly recti ed.
Excluding these two cases, data collection transpired e ciently for all participants. The
demographic data gathered from participants that did not leave a voice sample was deleted
from the database due to the fact that without a voice sample the demographic data was
not usable.
3.2 Equipment and Material Used
There was very little equipment required to participate in the research study. A partici-
pant accessed the application via the Internet connection and a telephone. The participants
needed no speci c computer knowledge or experience to participate in this research. They
did however need basic knowledge of the Internet to facilitate them with navigation to the
initial web page of the study and to answer the demographic questions. All individuals that
were contacted about participating in the research were given the option of either coming
to the Shelby Center for Engineering Technology at Auburn University to complete the
data entry task or choose to complete the study at a remote location of their preference.
Individuals that participated in the study remotely were required to obtain access to the
aforementioned equipment as no equipment was provided for any remote involvement in the
study. Design of the web pages were such that they loaded on most all universally used web
browsers. The content of the pages was kept to a minimum to allow a participant with a
dial up connection to still participate with the least amount of waiting for the web pages to
load. Had an individual chosen to come to the Shelby Center for Engineering Technology,
all equipment was provided for their use.
19
3.3 Software Used
The algorithm development process utilized several di erent technologies. The primary
development environment was the MATLAB programming environment from The Math-
Works [60]. MATLAB has literally hundreds of built in functions that vary from the basic
functions to specialty functions that are grouped together into what The MathWorks calls
toolboxes. For this research several of these functions were utilized in the data processing
and analyzing phases. The database used was MySQL by Sun Microsystems [43] and server
version: 5.0.51a SUSE MySQL RPM and the operating system for this server was Linux
operating system. The basic web pages created were written with HTML and JavaScript.
Using JavaScript guaranteed that all  elds on the demographic page were  lled-in as it does
not allow advancement to the next webpage until all  elds had been  lled. Pages that need
to connect to the database were written using PHP along with MySQL commands. For
the phone application programs the VoiceXML programming language was used along with
PHP and MySQL for situations that needed database access. Clustering analysis was exe-
cuted by the software Applications QuestTM[20] with the output being copied to a Microsoft
Excel spreadsheet. Microsoft Excel was also used to do some of the storage of calculated
results along with preliminary sorting of data for examination.
3.4 Data Collection Methods
To conduct this study voice samples were needed along with the demographic infor-
mation for each participant. It was determined that to add diversity to the population
set, participants needed to be from locations around the United States. Additionally, it
was preferred that there be dissimilar ethnic groups be enlisted and that the percentage
of male and female participants be balanced. The method chosen for accumulating data
was the \snowball" data collection method [57]. This method was conducted in a manner
where requests were sent to acquaintances and once they participated they then solicited
their friends and family to participate. Originally twenty- ve requests were sent out and
20
at the conclusion of the analysis 170 participants had been acquired, making this method
a practical way to collect data. During this process each participant was asked to respond
to the following demographic request/questions:
 Please select your Gender
 Please enter your Age
 Please select the following that best describes your ethnicity
 Please select what your primary language is
 Please select the country you consider your primary nationality
 Please select the country for your parents primary nationality
 Please select the state of the United States that you would say has a ected the accent
of your voice the most
 Please select the one that best represents the highest level of education completed
 Please select the one that best describes the area that you live in
 Please select the one that best describes how you feel today
 Have you had a physical injury or a disease that would a ect your voice?
 Would you consider yourself to have a speech impediment?
 Please select the category for your height
Upon completion of the data collection it was found that approximately 55% of the
participants were female and 45% male. Nine ethnic groups were represented with Caucasian
being the largest group at 67%. English made up the largest representative language at
approximately 96%, but  ve other languages were also declared. Ten countries were given
as primary nationality with United States being the highest percentage at 92%. In regard
to education the top three categories were as follows: Bachelors Degree at 32%, Masters
21
Degree at 26%, and some college at 21%. Full details of the breakdown of the collected
demographics can be viewed in Appendix A.
The collection of all demographic information was completed via a website that is
hosted on a server under the supervision of Dr. Juan E. Gilbert. The server is located
in a locked room that is only accessible by authorized personnel so that all data collected
is secure. The o cial URL address for this site was \http://www.voicestudy.com/". A
screen shot of all pages can be seen in Appendix B. The  rst page of this site gave the person
an opportunity to view the information letter about this study (which can be viewed by
looking at Figure B.1) and to either agree to continue or not. If the participant chose to
continue they were then taken to the demographic page where they responded to thirteen
requests for the above information (see Figure B.2). No data was collected to identify any
participant. The participant was not able to navigate from this page until they responded
to all thirteen requests. Once they completed this and clicked to continue, their information
was stored in the database, which was located on the same server previously mentioned, by
using a PHP program to interface with the database. The next page that the participant saw
was an instruction page that informed them on how to complete the calling procedure for
the phone application, that was used to collect a voice sample from them (see Figure B.3).
The phone application was accessed using a free developer service under the umbrella of
Nuance Communications, Inc by the name of \NUANCE caf e" formally known as \BeVocal
caf e" [47]. Since this development platform was free for the participant they had to call
a toll free number (1-877-338-6225) and they were prompted to enter a user ID and PIN
number. The user ID was 8446348 and the PIN was 1234; both of these were provided
to the participant on the phone instruction page. An example can be seen in Figure B.4.
Once log-in was accomplished the user continued directly to the phone application which
proceeded in the following manner.
1. A welcome message was played.
22
2. Then the application requested the participant to enter the four (4) digit number
given to them on the phone instruction webpage, see Figure B.5.
3. The application then veri ed that a valid number was entered by querying the data
base and making sure that number was a primary key for a row in the database.
4. Upon validation of the ID the application gave instructions to the participant on what
would take place next. After that they heard a phone ringing and a message played
as if they had received a friend?s voice mail.
5. When prompted, the participant would then leave the exact message given to them
on the instruction page; to see the message view Figure B.6.
6. Next they had an opportunity to hear the message they recorded and either except it
or try again.
7. Once they accepted their message they were thanked for their participation and after
that the application disconnected.
Nuance caf e saves all voice recordings as WAV  les which are Microsoft?s audio  le
type. Nuance?s default  le type (audio/wav{WAV (RIFF header) 8 KHz 8-bit mono mu-law
[PCM] single channel) worked well and was in a form that MATLAB can open and extract
the data from directly. A  le name was created by concatenating the word \participant",
the four (4) digit number that was given to the participant along with the  le extension
WAV. This  lename was also stored in the database under a  eld name \ leName". The
actual sample  le was stored on the secure server that all  les associated with this study
are stored on. There was a speci c folder setup for these  les (WAV  les) which helped to
keep them separate from the program  les.
3.5 Experimental Overview
This section gives an overview of the approach that was used to validate the hypotheses
presented in Section 1.3. The main objective of this research was to develop an algorithm
23
that analyzes a voice sample from an individual and obtain numeric data that represents that
person?s speech. The sample was analyzed in both the time and frequency domains. Then
an evaluation was made using Applications QuestTM(AQ) [2, 20] for the determination of
the clusters that were formed using the numeric data. SQL queries were made of a database
that had been created to store the demographic and result data. In addition result data
was written to Microsoft Excel spreadsheets for sorting and examination of the data. To
utilize these applications, the following were needed to gather and analyze the data from
the participant:
 A uniform method for the collection of the demographic information and voice samples
from individuals.
 A database containing all demographic information and values calculated.
 An algorithm that calculates data from the sample as it pertains to time.
 An algorithm that uses a FFT and a windowing function to convert a voice sample
from the time domain to the frequency domain.
 An algorithm that calculates di erent parameter values to be used to observe clusters
that may occur.
With guidance from these principles, the architecture for the proposed voice system
consisted of three phases: data collection, voice sample processing, and database setup. In
the data collection phase the user interacted with a web interface that collected demographic
information. That was used to determine classi cation groups which may be formed after
the voice sample was ran through the voice processing algorithms. To prevent bias and to
protect anonymity, an arbitrary number was randomly assigned to each submission. Upon
completion of the demographic survey a voice sample was collected via a phone application
where the user called in and left a voice sample to be analyzed. The voice sample was
saved as a WAV  le with the given identi cation number as part of the  le name. All the
participants? data was stored in a table of the database which corresponded to an Excel
24
spread sheet that held a copy of this data. This allowed it to be more e cient when
uploading the records into the clustering algorithm for modeling of the results.
25
Chapter 4
Time Domain Experimentation and Results
This chapter details the experimentation that was conducted on the voice samples
before they were converted to the frequency domain. This was the  rst of four experimental
phases for this research intended to investigate parameters to utilize the classi cation of an
individual.
It is not uncommon to hear individuals talking at various rates and/or having di ering
amounts of pause between their words. Given that the voice samples were in the time
domain, a numerical value was calculated for these two occurrences.
4.1 Experimental Design
4.1.1 Experiment Goals
The goals of this experimental phase were the following:
 Create an algorithm to eliminate beginning and ending white noise from the sample.
 Calculate the length of the sample in seconds.
 Create an algorithm to determine where pause areas are in the sample.
 Calculate the total amount of pause in seconds of a sample.
4.1.2 Procedure
The original sample received from each participant was stored at the time of their
participation in a WAV  le and saved on the same server where the voice application was
hosted. All samples were made at a sample rate of 8000 KHz and each participant said
the exact same thing \George, I want you to help me  x my tire. Call me at 924-2949.".
26
The free digital audio editor, Audacity [3], was used initially to view a graph of the voice
samples see Figure 4.1. Audacity gave easy access for playing any part of the sample and
Figure 4.1: Original voice sample opened in digital audio editor, Audacity
also a quick view of elapsed time. The process that the BeVocal caf e [47] application records
the participant?s response, each  le has a leading and ending segment that is either silence
or nominal white noise, see Figure 4.2a. This proved to be bene cial when developing the
algorithm for time domain analysis. A maximum and minimum value for white noise was
calculated by using these two sections. With the ability to set these boundaries speci c
to each sample, an algorithm was developed to crop the beginning and ending noise from
each sample see Figure 4.2b. When the data from a WAV  le is read into MATLAB it is
put into a vector which makes it very e cient to obtain the starting and ending points of
the voice sample. Starting at the beginning of the vector the index number is recorded for
27
Figure 4.2: Example Graph of Voice Sample In Time Domain
(a) Original voice sample in the time domain
(b) Cropped voice sample in the time domain
28
the  rst value that goes above or below the threshold that has been calculated from the
white noise. Next, starting at the end of the vector the algorithm begins with the last cell
in the vector and decreases the index value by one, until a value that goes above or below
the threshold is found and the index number was recorded. Now taking the  rst index
found and the last index found a cropped sample can be obtained by using the \wavwrite"
function which takes the vector values and creates a new WAV  le. This new WAV  le will
be the sample left by the participant without the leading and ending white noise. Once the
sample had been cropped the  rst parameter and total message time is calculated. This
algorithm also uses the threshold that were calculated during the cropping process using
them in an alternative method. The algorithm starts at the beginning of the vector and
searches for the  rst value that falls within the given threshold. The index for this value is
then recorded in another vector and the program begins looking for the next value that is
above the threshold. This process continues until it has worked its way through the entire
vector. The total number of data values found is divided by 8000 (number of bytes per
second) giving the total time of pause or no talk in the sample; see Figure 4.3. Deciding
Figure 4.3: Voice sample showing where the calculated pause of sample is located at.
to consider this calculation came about when two cropped samples were observed that had
29
precisely the same talk time. However when the  les were viewed in Audacity it revealed
that one  le had considerably more pause spaces than the other. This can be attributed to
the fact that some people may talk at the same speed with one always making sound (i.e.
saying something like \uhuhuh" between words and the other not making any sound but
yet still having the same amount of time between words, see the graphs in Figures 4.4 and
4.5). All cropped voice samples where run through an algorithm that calculated the three
time values, total lapse time of original voice sample (no cropping), total time lapse for the
cropped voice sample, and the total time lapse pertaining to pause in the cropped sample.
After these calculations were made the values were then written to a text  le in the form
of MySQL update statements so they can be added to the database. In addition the values
were also stored in a Excel spreadsheet that contained all demographic data, along with
all calculations that were made for each voice sample. This  le was then used to load all
pertinent data into Applications QuestTMfor clustering evaluation.
Figure 4.4: A sample with speaking time of approximately 7 seconds and pause time of 0.44
seconds
30
Figure 4.5: A sample with speaking time of approximately 7 seconds and pause time of 1.78
seconds
4.2 Results
The initial analysis was conducted to see if the time information made any classi cation
as a standalone parameter. The results of this preliminary analysis proved to be very
informative when analyzed. The data was sorted, using Excel, by pause time and total talk
time of the cropped  les and the average for male and female was calculated, see Table 4.1.
It was observed that the average time to say the phrase was the same for both male and
Table 4.1: Comparison of total time to say message (cropped sample) and the amount of
pause in the sample with the percentage of pause in the sample as it pertains to Gender
Gender Number of Avg Talking Avg Amount Average Percent
Samples Time Pause Pause
Female 93 6.28 1.64 25.7%
Male 65 6.28 1.81 28.4%
female. Comparing the pause time seen in Table 4.1, it shows that males do have a greater
percentage of pauses in their speech than females. In addition to gender, the area where
the person was from was also examined. To accomplish this the states were separated into
31
regions according to the U.S. census [63]. They are West, Midwest, Northeast, and the
South see the map in Figure 4.6. The sample set of participants contained individuals from
Figure 4.6: U.S. Census Regions
all the regions, with the largest group from the South. In Table 4.2 there is a complete list
of the states that the participants were from. Table 4.3 shows the same result  elds with
the focus on the regions. The results are noteworthy in that there is a di erence in the total
talk time as well as the pause time. As with gender, analysis of the time data in general
showed
Table 4.2: Regions in the United States and states represented from these regions
West Midwest Northeast South
California Illinois Connecticut Alabama
Colorado Indiana Dist. of Columbia Florida
Idaho Iowa Maryland Georgia
Montana Michigan Massachusetts Kentucky
Oregon Minnesota New York Louisiana
Washington Missouri Pennsylvania Mississippi
Nebraska North Carolina
Ohio Oklahoma
South Dakota South Carolina
Wisconsin Tennessee
Texas
Virginia
32
Table 4.3: Regions in the United States and states represented from these regions
Region Number of Avg Talking Avg Amount Average Percent
Samples Time Pause Pause
West 13 6.48 1.99 30.7%
Midwest 30 6.33 1.72 27.2%
Northeast 7 6.59 1.64 24.9%
South 103 6.22 1.68 27.0%
some interesting results such as, which region had the larger average for total time of talking
or which region had the highest amount of pause. Still there was not enough of a di erence
to make any de nite classi cations at this time.
The data was entered into Applications QuestTMand 6 clusters were made. The overall
Di erence Index (DI) was 29.34%, where this value states as a whole how much similar
or dissimilarthe samples are. Therefore the lower DI value indicates the greater similarity
of the samples. The recommended DI value for this inquiry was 24.47% giving a target
value for the clusters. All the cluster?s DI value were below this mark giving validity to
the results in that the members of the clusters were close in characteristics. For analysis,
gender was the only attribute that was closely distributed within a reasonable ratio, so the
focus was put on this attribute when the clusters were evaluated. Clusters 0, 2, had only
women participants and 3 had all women participants except one male, where as clusters
1, 4, and 5 only had men, with the following results observed and compared to the total
averages shown in Table 4.1.
 Cluster 0 had 15 females, all from small towns and DI at 14.79% which indicates very
little di erence between the participants. The talk time was 5% above the total aver-
age and the pause time was 9.75% above the total average, for all females. Indicating
that this group talks slower then the average female in the study.
 Cluster 2 had 65 females, mainly from the suburb/urban area with the DI at 20.81%
which indicates a small di erence between the participants. For this group the talk
time was 3.6% under the average and the pause time was 7.2% under the average,
33
for all females. Indicating that this group talks faster then the average female in the
study.
 Cluster 3 had 13 females, all from the suburb area with the DI at 20.42% which
indicates a small di erence between the participants. For this group the talk time
was 13.2% above the average and the pause time was 32.9% under the average, for
all females. Indicating that this group talks slower and with considerable more pause
then the average female in the study.
 Cluster 1 had 26 males, with no dominant area with the DI at 23.25% which indicates
a nominal di erence between the participants. For this group the talk time was at the
average and the pause time at the average, for all males. Indicating that this group
is a good representation of the average male in the study.
 Cluster 4 had 16 males, with all but 1 from a suburban area with the DI at 17.14%
which indicates a small di erence between the participants. For this group the talk
time was 7% under the average and the pause time at the average, for all males.
Indicating that this group does talk faster than the average male in the study.
 Cluster 5 had 23 males, with all from a small town with the DI at 16.04% which
indicates a small di erence between the participants. For this group the talk time
was at the average and but the pause time was 8.3% below the average, for all males.
Indicating that this group talks at the average speed but with less pause compared to
the average male in the study.
4.3 Conclusion
This phase of the study yielded good results in that it indicated that the amount of
time for an individual to speak a phrase can possibly give an indication of the area they
live in and possibly the state where they lived the most. The results did seem to give a
clear separation between male and female. When gender is added to the area that they live
34
in this may give characterizing factors for the individual. The  ndings from the clustering
were interesting and worth noting, but with this not being the primary area of investigation
all data was recorded and saved to be used in future work.
35
Chapter 5
Frequency Domain Experimentation and Results: Initial Phase
Even though humans do consider the speed and pause of another?s voice it is the
frequency domain that can give the greatest amount of data for analysis. This chapter
details the experimentation that was conducted on the voice samples that were converted
to the frequency domain. This was the second of four experimental phases for this research
intended to investigate parameters to classify an individual. Once the voice sample had
been converted from the time domain to the frequency domain, analysis was done to  nd
results to support the hypothesis of this research.
5.1 Experimental Design
5.1.1 Experiment Goals
The goals of this experimental phase were the following:
 Create an algorithm to convert the sample from the time domain to the frequency
domain.
 Determine all peaks for the frequency sample between the boundaries 250 - 1250 Hz.
 Determine the most prominent peaks of the sample.
 Calculate and average the slope between the prominent peaks.
 Calculate and average the distance between the prominent peaks.
 Determine the maximum and minimum frequency values for the prominent peaks.
 Determine the total distance between the  rst and last prominent peak.
 Determine the total number of prominent peaks.
36
5.1.2 Procedure 1 (Converting Data)
Originally the voice sample was saved as it pertains to time domain; for any analysis of
the frequency sample the signal must be converted from the time domain to the frequency
domain. As mentioned in Section 2.5 the Fast Fourier Transform (FFT) is the most common
formula used to accomplish the change from one domain to the other. The MATLAB
programming environment has a very e cient FFT function \ t". This function will receive
the time data and process it into frequency data. Since it is not guaranteed that the data
sample begins at the start of a cycle, spectral leakage can take place (as explained in
Section 2.6) and a windowing function must be applied  rst before the data is sent to the
\ t" function. The Hanning window function shown below:
w(n) = :5 1 cos 2 nN 1  
was used because it is a straight forward function and is simple enough that it does not add
computational complexity to the algorithm. Once the windowing function was applied the
data values were sent to the \ t" function. Each voice sample in this study ranged from 4.5
seconds to 8 seconds of speech which when read into MATLAB using the \wavread" function
numbered in the tens of thousands of \time" data points. After sending this time data to
the \ t" function, 2048 data values were returned as it is a representation of the frequency
for that sample. The graphs in Figure 5.1 show the di erence of data representation of the
two domains. The frequency analysis was executed only on data from 250 to 1250 Hz as
this is the range that will have the most information for the way a person speaks according
to an expert in the signal processing industry, Dan Ginzel owner and lead developer of
signal/voice applications for Coach Comm [21].
5.1.3 Procedure 2 (Locate Primary Peaks)
The next task was to take the frequency data from the full sample and crop it to the
set boundaries (250-1250 Hz)to get a visual representation of a person?s voice sample. In
Figure 5.2 it shows the full view of the frequency graph where as Figure 5.3 shows the
37
Figure 5.1: Graphs of Cropped Voice Sample Saying Full Message
(a) Sample in the TIME domain
(b) Sample in the FREQUENCY domain (250 - 1250 Hz)
38
sample after the boundaries were set. To begin with, both the peaks and the valleys were
considered but after closer analysis the peak information was determined to be adequate.
Initially the algorithm found all the peaks for the entire frequency graph. As illustrated
in Figure 5.4 the large amount of peaks made it hard to get a clear view of the peaks in
relationship to the graph. The graph was then modi ed to be within the boundaries (250-
1250 Hz) which made it much easier to see where the peaks were located; see Figure 5.5
for an example. At this time the complete message was used and with the sample limited
within the boundaries stated, it was clear that analysis can continue forward concerning
the tone of the sample.
Figure 5.2: Full frequency graph showing the boundaries for the area that will give the most
information for a voice sample.
39
Figure 5.3: Selected frequency sample (250 - 1250 Hz) graph of the bounded area in the
graph above.
Figure 5.4: Graph showing a view of peak locations of a full frequency sample
40
Figure 5.5: Graphs showing di erent views of peak locations of sample within the frequency
boundaries (250 - 1250 Hz)
(a) Shortened Frequency Sample Showing All Peaks
(b) Shortened Frequency Sample Showing Primary Peaks
41
5.1.4 Procedure 3 (Calculate Averages)
The initial thought behind calculating the goal values listed was that by seeing this
data on the dominant peaks illuminates information about the tone of the sample. If the
peaks were more spaced out it indicates a more consistent tone for the sample. If the slope
average was positive, then going from left to right the peaks progress up in height, see
Figure 5.6a. Likewise if the slope average was negative going from left to right the peaks
diminish in the height value, see Figure 5.6b. Giving each of these individuals a totally
di erent sounding voice, where one is lower sounding(decreasing peaks) and the other is a
higher sounding (increasing peaks) voice as it pertains to pitch. Another fact that can be
ascertained from the slope average is an idea of the closeness of the peaks. The closer the
peaks are to each other causes the slope average to advance towards positive or negative
in nity. Whereas, if the slope was approaching 0 this indicates that the peaks were farther
away from each other. The last three goals, were accomplished, but when analyzed, did
Figure 5.6: Graphs showing one that has a positive slope average and one with a negative
slope average
(a) Peaks heights increasing (b) Peaks heights decreasing
not o er any revealing information towards one classi cation or another. The data was
stored for possible further analysis of other variables at a later date. The data tables (30
42
pages) containing all the averages and number of peaks for the aforementioned goals can
be found in the Appendix.
5.2 Results
The average slope and average distance between the peaks showed the most promise
for determining a classi cation for a person. It was the results obtained for these two
averages that this phase focused on. The results for the average slope were considered
 rst. For visualization, the slope values were put into tables showing the ranges for the
positive average slope values and ranges for the negative average slope values according to
the demographic data. In the  rst table gender was considered and it revealed that there
was no real di erence between the male and female when it came to the negative slope
ranges, when graphed the two lines were the
Table 5.1: The average slope between peaks with the focal point on Gender
Gender
Negative Boundaries
Female -0.027925 to -0.000141
Male -0.026729 to -0.000343
Positive Boundaries
Female 0.000187 to 0.066688
Male 0.000795 to 0.013462
Table 5.2: The average slope between peaks with the focal point on Ethnicity
Ethnicity
Negative Boundaries
African American -0.017488 to -0.000981
White -0.027925 to -0.000141
Positive Boundaries
African American 0.002743 to 0.012799
White 0.000187 to 0.066688
43
same. However when looking at the table for the positive slope ranges the data visualization
indicated a very noticeable di erence between the two ranges, see Table 5.1. The next table
shows the results for ranges as it pertains to ethnicity, see Table 5.2, where the two most
prominent groups are \White" participants and \African American" participants. In com-
paring data in both tables it was interesting to note that the positive slope range for females
was indistinguishable to that of white participants. Therefore a graph was constructed, see
graph 5.7, with data for females, males, white and African American participants. In view-
ing this graph it is apparent that at a distinct positive slope value greater than 0.013462
there is a very high probability that the participant is a female, white or both. Dur-
Figure 5.7: Graph showing the ranges for the positive and negative average slope of lines
between peaks
ing this preliminary analysis, data was also considered according to the distance between
the peaks. Even though the average slope gave equivalent information, closer analysis was
warranted given that the actual distances between peaks aids in the analysis of the tone
of a participant?s voice. Higher averages indicated a greater distance between each peak.
Whereas a smaller average indicates that the peaks were not separated very much. This was
investigated for the prospect that it better indicate a characteristic about the participant
44
then the slope average. For this observation, four groups were considered (male, female,
White, African American) as some natural breaks were observed when these ranges were
graphed. For this data there were two natural separations, one at the average distance
value of 301.8281 and one at value 399.7623. When the graph is viewed it is inductive that
the probability is high that a person with a value above 301.8281 is either a white female
or a white male. This is due to the fact that all African American averages were below
this value. When the value gets over 399.7623 the probability of being a female drops out
and the probability that the person is a white male is prominent, see Graph 5.8. Along
Figure 5.8: Graph showing the ranges for the average distance between the primary peaks
with storing the calculations in the database, a tab delimited text  le was created that had
these calculations and the demographic information associated with it. This  le was then
uploaded to Applications QuestTM(AQ) to  nd clusters in the data [2, 20]. Clustering was
done as it pertains to gender, ethnicity and slope average which when the clusters were
investigated veri ed the previous tables. However it reveled that the maximum value was
more of an outlier then a representation of the group as a whole. It was the distance average
that gave the most validation in that the 11 individuals that had an average above 301.8281
were indeed white female and even more so were members of the same cluster.
45
Chapter 6
Frequency Domain Experimentation and Results: Graphical Phase
All analysis speci ed to this point was very promising, but did not give a clear separa-
tion in any of the demographic areas. At this time a program was written that allowed the
viewing of all the graphs of the voice samples as they related to the frequency domain to
obtain direction for the next phase in the analysis process. A di erent digital audio editor,
Cool Edit Pro (now Adobe Audition) [12, 13] was used to visualize the frequency graphs.
When graphed using Cool Edit the graph changes as the application progresses through
the sample. It was then observed that the prominent peaks changed location depending
on the section of the sample the application was analyzing. By using the loop function
this progression was viewed over and over again. The Cool Edit application showed that
in ection was feasible to determine the movement of the peaks that were displayed on the
screen. From the examination of the graphs and the visual that Cool Edit presented, it was
observed that the  rst half and the second half of a sample were di erent.
6.1 Experimental Design
It was decided to split the each sample into two parts and do some analysis on both
halves to determine if they were similar or dissimilar enough to give some indication of a
certain demographic characteristic. The inclination of this analysis was that a parameter
is obtained that is connected to voice in ection.
6.1.1 Experiment Goals
The goals of this experimental phase were the following:
 Separate the entire sample into two halves.
46
 Isolate a single word.
 Separate the word into two halves.
 Get graphical representation for visual analysis.
6.1.2 Procedure
Once this decision was made it was straight forward to implement by using the previ-
ously mentioned ability of the MATLAB application to store all data in arrays. With all the
data points stored as single elements in an array one need only use the command \numel"
(number of elements) and then split the array into two separate arrays. Five participant
samples were selected for testing to determine whether the smaller samples can be processed
using the algorithms that were all ready written or if modi cations were needed. At  rst it
appeared to work as well as using the full sample, therefore all samples were processed. As
before the results were written to an Excel  le and upon observation not all participant half
 les were processed correctly. It was found that by splitting the sample in two parts there
was not su cient data when a certain calculation was done. A set parameter of 2048 needed
to be set to 1024 for the following calculation to work properly. Following this correction
the data from the two separate halves were graphed and observations were made to see what
useful information was obtained. The graphs of the two halves were plotted in the same
window and each sample was viewed using a simple MATLAB script program that allowed
straightforward progression through the graphs. To view an instance of this graphical com-
parison of the two halves of a sample, see Figure 6.1. Though some graphs did illustrate
that a useful di erence between the two halves was observable, less than 20% of the sam-
ples displayed this characteristic. It became clear that using the full sample was going to
furnish too much information to obtain a consistent and realistic numeric representation of
the voice. The next logical step was to separate a single word from the sample. Because
of the work completed earlier, where the pause in the participant sample was determined
along with the cropping of the white noise from the beginning and end of the sample, it
47
was possible to isolate words in the sample. The initial preference was to get the  rst word
in the phrase spoken, that being the word \George". Once this word was isolated, analysis
was done with the word separated into two halves. As with the samples that contained the
entire phrase, there were several that showed some good interpretation of the voice, but
were not robust throughout the entire sample set. After some consultation with Dan Ginzel
[21] two situations for this outcome were considered. The  rst rationale was that in saying
the word \George" being the initial word in the phrase the person may take a deep breath
before speaking. Some of this white noise may not be eliminated during the cropping pro-
cess having an in uence on the  rst half of the sample. The second possible explanation for
result from analysis, is that some words have what is commonly called \attack" or \variable
stress". Attack is the unambiguous beginning of speaking a word [10] and variable stress is
the speaking of a syllable in a word louder and longer [54]. Just as taking that deep breath
can create white noise; these two speech methods have the potential to add noise to a word.
The situation that arises with these two speech areas is that not everyone may have this
mannerism and thus proves to adversely a ect the analysis between the two halves of the
given sample as it relates to the general population. Displayed in Figure 6.2 is an example
of the e ect of variable stress. The graph of the  rst half starts out with an elevated value
and then declines continually from there. Where the graph of the second half shows a more
osculating sound and the peaks of the two when compared do not give a usable pattern.
Given the aforementioned issues, a close assessment of each word in the spoken phrase
was made. It was determined that the word \nine" is the best choice as it did not appear to
have the possible pitfalls that the word \George" had and this word was used three separate
times. The word nine is to be found in the following locations in the phrase: seventh to the
last word (start of saying telephone number), third to the last word, and the last word in
the spoken response. The last word was not used as it can have similar issues of acquiring
white noise. The second instance of the word was the most logical choice as it was spoken in
the  ow of speaking other numbers. The location of this word for some samples did present
48
Figure 6.1: Comparison of participant sample split in two halves
Figure 6.2: Comparison of participant saying the word \George" split in two halves
49
more of a challenge to separate from the phrase as some participants do not have a clear
pause in their speech. With the focus of this research not to create an application to retrieve
words from a spoken phrase, the second instance of the word nine was manually extracted
using the audio application Audacity mentioned previously. As a result of collecting the
sample in this manner, it gave assurance that the new samples were an accurate sample of
the participant saying the word \nine". Audacity gives the user the ability to see a visual
of the WAV  le as well as to listen to the section that was selected. All selected instances
of the  le were listened to and then saved as separate WAV  les for analysis. These new
samples were then processed into MATLAB and separated into halves as previous samples.
Each half was then graphed to determine the new results acquired from the word using the
word \George". It can be seen that there is a more usable set of data that comes from these
samples, see Figure 6.3, in that there is a higher amount of consistency in the samples. The
peaks have a more uniform appearance between them and the amplitude is as one expects.
This being that the  rst part of the word is spoken with more volume then the second half,
but not a recognizable amount when listened to.
6.2 Results
For this phase all goals were accomplished and extended into involving more than
one word for analysis. The results from this phase gave clarity and direction in that by
splitting the entire phrase it showed that there were to many frequency changes for good
analysis. This lead to choosing a single word which was the word \George". Resulting
from graphically analyzing the two halves of this word the issue of variable stress became
evident. Another word that was not e ected by the variable stress, i.e. the word nine, was
then chosen. The graphs for this word showed that nine did give good patterns to analyze.
This gave the prominent result from this phase to be;  nding and using a word that did
not have variable stress. This started the research into the  nal phase of experimentation.
50
Figure 6.3: Graphs showing the two halves of the word \Nine"
51
Chapter 7
Frequency Domain Results: Final Phase
The evolution of this research has been very intriguing, as it relates to the qualifying
of the two hypothesizes being pursued. At the completion of analyzing the samples of the
participants saying the word \nine" it is the opinion of this research that some important
discoveries have been made. One of the most successful is the uniformity that was ascer-
tained by comparing the  rst and second halves of this word. Graphically it was illustrated
that the peak patterns were relatively consistent in their progression, see Figure 7.1 for an
example. One can see that even though the value of the amplitude is di erent the pattern
Figure 7.1: Graphs showing the two halves of the word \nine" and the consistent progression
of the two samples
52
that the peaks make is predominantly analogous. Results such as this were the stimulus for
the  nal phase of experimentation. The  nal phase commenced by asking the question, \If
dividing the word into two equal components displayed a pattern, what would splitting it
into multiple samples reveal?".
7.1 Experimental Design
7.1.1 Experiment Goals
The goals of this experimental phase were the following:
 Create an algorithm that will divide the sample of the word nine into multiple  les.
 Determine the most prominent peaks for all sub-samples.
 Store all peak location for prominent peaks for all sub-samples.
 Calculate the number of peaks for each sub-sample.
 Complete graphical analysis of peak location.
 Determine mathematical representation for peak activity.
7.1.2 Procedure
At this time a program was written that separates the data values that had been
read into MATLAB into multiple WAV  les that had 800 bytes each of the original  le. A
previous observation was recalled when using the audio application \CoolEditPro" [13]. One
of the functions of Cool Edit is that it can give analysis in a visual form by illustrating the
sample graphically, were the image will change as the application plays the audio sample.
Cool Edit also has the function to play a continuous loop of the audio sample which is
depicted on the analysis screen as an animated graph. What this con rmed is that as the
sample is played the peaks will change position slightly, but because of the overlapping of
data there is not a major divergence of location as it ascertains to the peaks. The challenge
53
was to be able to represent what was obvious visually, can be analyzed with tangible results.
Therefore the  nal phase consisted of taking the sample of the word \nine" and taking 800
bytes of data in small increments. Using MATLAB all data was stored into an array with
a simple script program that looped through the array. At the start of each iteration of
this loop, the program advanced by 26 bytes and created another 800 byte  le. In Figure
7.2 graphs of the  rst eleven  les are illustrated, showing the slight change in position
mentioned before. It is this shifting that this research proposes will give a clear picture
of the  uctuation of a person?s voice numerically and will thus give parameters that will
Figure 7.2: Multiple graphs showing the change of the frequency and amplitude for the
word \nine" spoken by a single participant.
facilitate that person to be classi ed. In viewing Figure 7.2 the main peaks change position
as it pertains to frequency i.e. the location of the second peak changed. The phrase
\location" will for this section stand for the relationship of the frequency value and the
peak order number for a given sample. Location will give insight into what is transpiring
with the number of peaks for the  les. If it is determined that the peak count increased or
decreased, the location value will indicate at where a new peak was formed or a previous
peak no longer occurs. The added details this gives is much more informative than when
only the number of peaks were known at the initial phase of this analysis. To visualize this
54
data, the  rst four  les were graphed and viewed upon which it was clear as to what had
transpired. When the number of peaks increased a minor peak emerged. Likewise if the
peak count went down then a minor peak had been eliminated. The two graphs in Figures
7.3 and 7.4 illustrates this by going from one cross-section to the next where a new minor
peak is formed and at the
Figure 7.3: View of peak location of the  le 1 from the breakdown of the  le where the
participant said the word \nine".
Figure 7.4: View of peak location of the  le 2 from the breakdown of the  le where the
participant said the word \nine".
55
same time a previous minor peak is eliminated. If the peak count was only considered it
indicated that no change had taken place, when in reality two events had occurred. The next
two graphs show, Figures 7.5 and 7.6, the event of going between two di erent cross-sections
where no minor peaks were formed or eliminated, retaining the same peak
Figure 7.5: View of peak location of the  le 3 from the breakdown of the  le where the
participant said the word \nine".
Figure 7.6: View of peak location of the  le 4 from the breakdown of the  le where the
participant said the word \nine".
56
count and the graphs are very similar. This indicates, that just counting the number of
peaks is not su cient as with the  rst set. The count is the same but the location of the
second and seventh peak are di erent. In contrast, the second set of graphs show that the
peak count can remain the same, as well as the location of the peaks. The only deviation
between the two is that the amplitude is slightly higher in one over the other. It was the
assessment of the changing of the locations of the  rst peak, second peak, third peak and
so forth, along with the need to have a numeric representation that inspired the  nal area
of exploration for this research.
To make the comparison of the numeric data as it relates to the graphs more straight
forward, all peak data was stored into an Excel spreadsheet for evaluation. Table 7.1 is
an example of what this data might look like in the spreadsheet. Looking at this table it
can be seen numerically when the location of the  rst peak, second peak and so on, either
remains at the previous location or changes locations due to a minor peak being found or
eliminated. Tracking this activity was vitally important to the completion of this analysis.
This is best conveyed by numerically following in the table the previous example the graphs
displayed. To accomplish this, two events needed to be monitored, at what location did
peaks materialize or dematerialize and the location of each peak as it pertains to all the  les.
Starting with the location of the  rst peak in  le 1, it is located at 312.5 and remains this
value until  le 25 where the  rst peak is found at 291.7. Again this location is continuous
until  le 38, when it shifts back to 312.5. The shifting from one location to another gave
an observable pattern that was of great interest. Also this tracking of the peaks within
the spreadsheet gave valued information as to when materialization and exodus of minor
peaks transpired. Looking at  les 1 and 2 (rows 2 and 3) in the spreadsheet, it reveals the
same peak activity as the graph in Figure 7.4 shows. The graph shows peak 1 in the same
location for both  les and the spreadsheet reveals the same event. It can be seen in the
spreadsheet that under the \P2" column it has a location of 479.2 for  le 1 and 437.5 for
 le 2 indicating that some change has taken place . For  le 2, \P3" is now at location 479.2,
clearly indicating that another peak materialized that was not in the  le 1. Likewise,
57
Table 7.1: This table shows the data on peak location for the  rst 40 smaller  les that were created
from the full sample of a person saying the word \nine". It numerically represents the shifting of
the peaks as well as the appearance and disappearance of minor peaks.
File P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
1 312.5 479.2 645.8 791.7 916.7 1041.7 1083.3 1208.3
2 312.5 437.5 479.2 645.8 791.7 916.7 1041.7 1208.3
3 312.5 479.2 645.8 791.7 916.7 1062.5 1208.3
4 312.5 479.2 625.0 791.7 916.7 1062.5 1208.3
5 312.5 479.2 625.0 791.7 916.7 1062.5 1208.3
6 312.5 479.2 625.0 791.7 916.7 1000.0 1062.5 1208.3
7 312.5 416.7 479.2 625.0 791.7 916.7 1000.0 1062.5 1229.2
8 312.5 437.5 479.2 625.0 770.8 916.7 1000.0 1062.5 1229.2
9 312.5 437.5 479.2 625.0 770.8 916.7 1062.5 1229.2
10 312.5 479.2 625.0 770.8 916.7 1062.5 1166.7 1229.2
11 312.5 458.3 625.0 770.8 916.7 1062.5 1166.7 1229.2
12 312.5 458.3 625.0 708.3 770.8 916.7 1062.5 1166.7 1229.2
13 312.5 458.3 625.0 708.3 770.8 916.7 1062.5 1166.7 1229.2
14 312.5 458.3 625.0 708.3 770.8 916.7 1062.5 1229.2
15 312.5 458.3 625.0 708.3 770.8 916.7 1062.5 1229.2
16 312.5 458.3 604.2 708.3 770.8 916.7 1062.5 1229.2
17 312.5 458.3 604.2 770.8 916.7 1062.5 1145.8 1229.2
18 312.5 395.8 458.3 604.2 770.8 916.7 1062.5 1145.8 1229.2
19 312.5 395.8 458.3 604.2 770.8 916.7 1062.5 1145.8 1208.3
20 312.5 395.8 458.3 604.2 770.8 916.7 1062.5 1208.3
21 312.5 395.8 458.3 604.2 770.8 916.7 1062.5 1208.3
22 312.5 395.8 458.3 604.2 770.8 916.7 1062.5 1208.3
23 312.5 458.3 604.2 750.0 916.7 1062.5 1208.3
24 312.5 458.3 604.2 750.0 916.7 1062.5 1208.3
25 291.7 458.3 604.2 750.0 916.7 1062.5 1208.3
26 291.7 458.3 604.2 750.0 916.7 1062.5 1208.3
27 291.7 458.3 604.2 750.0 916.7 1062.5 1208.3
28 291.7 458.3 604.2 750.0 916.7 1062.5 1208.3
29 291.7 437.5 604.2 750.0 916.7 1062.5 1208.3
30 291.7 437.5 604.2 750.0 833.3 916.7 1062.5 1208.3
31 291.7 375.0 437.5 604.2 750.0 833.3 916.7 1062.5 1208.3
32 291.7 375.0 458.3 604.2 750.0 833.3 916.7 1062.5 1208.3
33 291.7 458.3 604.2 750.0 833.3 916.7 1062.5 1208.3
34 291.7 458.3 604.2 750.0 916.7 1062.5 1208.3
35 291.7 458.3 520.8 604.2 750.0 916.7 1062.5 1208.3
36 291.7 458.3 520.8 604.2 750.0 916.7 1062.5 1208.3
37 291.7 458.3 520.8 604.2 750.0 916.7 1062.5 1208.3
38 312.5 458.3 520.8 604.2 750.0 916.7 1062.5 1208.3
39 312.5 458.3 604.2 770.8 916.7 1000.0 1062.5 1229.2
40 312.5 458.3 541.7 604.2 770.8 916.7 1000.0 1062.5 1229.2
\P7" in  le 1 is at position 1083.3, although in  le 2 \P7" is at 1041.7 which is where \P6"
is located for  le 1. This indicates that a minor peak that was previously in  le 1 is not
in  le 2. Now that these events were tracked numerically instead of only observing a graph
58
automation, this process began to materilize. One last visual observation was made, when
the entire peak locations for all  les were stored in the spreadsheet, a visual of the data
as it pertained to the number of peaks increasing and decreasing was noticed. By using
the \zoom" feature in Excel, a pattern can be seen as it concerns the number of peaks,
giving a very unique blend of the numeric and visual environments. Examination of this
new data led to the consideration that for a given participant?s multiple samples of the word
\nine", there was a pattern pertaining to the frequency values. It became apparent that this
pattern of the locations values can be tracked as a thread which is universal to geometric
analysis. For this research, the tracking of the location values as it relates to the peaks
of that sample will be called a \Frequency Location Thread" (FLT). It is this research?s
certainty that tracking numerically the FLT, will provide a pattern for each participant?s
voice. In Figure 7.7 it shows the pattern that is formed by where the  rst peak is located,
where the last peak is located and the average location of all peaks, are represented for all
 les.
Figure 7.7: Graph of numerical data indicating the FLT stored in an Excel spreadsheet
59
At this point in the experimentation process the following steps had been accomplished
for each of the samples:
 The word \nine" was isolated from each full sample and stored in a new WAV  le
 The new  le then was split in to multiple  les of 800 bytes each (150 - 200  les)
 The peak locations for each 800 byte  le were established
 All locations for all  les were stored in a separate text  le
 The threads for the respective  les were established
Upon completion of these steps graphical analysis of the threads was initiated. This analysis
consisted of selecting one of the samples for a test case and graphically viewing each thread
to determine a pattern that can be used to represent the participant?s voice. As described
earlier the formation of the threads are directly connected to where peaks materialize or
dematerialize as it pertains to previous peak locations. Given the structure of the peak data
a thread can have one peak location or any number up to the number of  les associated
with that sample. Continuing on the same thought process started by looking at the steps
that were formulated in just looking at the location values for the individual peaks, it was
considered that similar steps occured for this process too. For an illustration of this a
graph was created in MATLAB using the \plot" function where the location values for a
particular thread were stored as the \y" values and the \x" values corresponded to the
number of location values i.e. 1 - (the number of values in that thread). The  rst thread?s
graph for a test participant had 186 values in it and when graphed gave a stair step graphic,
see Figure 7.8. Given the pattern of this graphic it was apparent that this type of graph can
be represented by a polynomial that is a mathematical representation of the thread data.
This polynomial was found by using the function in MATLAB \poly t" which yields the
coe cients for a polynomial of a given order. After some experimentation to  nd the order
for the polynomial it was determined for this research the third and forth order polynomials
60
Figure 7.8: Graph 1 is of the frequency values of the  rst thread of a test sample and graph
2 show the polynomial that  ts that step graph
61
was calculated. An example of a forth order polynomial calculated by MATLAB using the
test case is displayed below:
0:0000003894X4 0:0001753X3 + 0:029085X2 1:2867X + 320:46
The higher order coe cient (0.0000003894) is representative of the complexity of the sample,
whereas the last value (320.46) is representative of the frequency. With this polynomial each
participant?s voice can be modeled as it pertains to the frequency domain. After further
evaluation the third order polynomial was calculated for each sample as creating a forth
order polynomial did not add much to the model but did add to the calculation process.
Two tools were used to verify that these polynomial models did give viable information
about characteristics of a person and thus allow them to be categorized. The  rst was an
analysis tool created by Dan Ginzel, an independent software developer. This tool uploads
a  le where the leading coe cient (in the example 0.0000003894) is multiplied by an integer
(20 for this example) giving the result (0.000007788). This value is rounded to the nearest
integer towards negative in nity (0.000007788 gives the integer 0) and this value is stored.
This integer value (coe cient score) was then used to determine groups as they pertained
to the total population. This process is done for the ten longest threads for a sample. These
threads were selected as they provide a good representation of the voice and any threads
beyond contained to few values to give good quality information. The advantage of this tool
is that it gives a percentage break down of the analyzed data as it pertains to the, category
group (i.e. gender), individual group (i.e male), as well as the entire sample group (all
participants). This gives a insight into how the coe cients represent the groups mentioned.
For an example, a  le was uploaded and the parameters were selected from a table
that was dynamically created. For this experiment the ten longest threads were selected
and treated individually. From the table created the coe cient score was selected along
with gender and the results can be seen in Table 7.2. In viewing the results it is clear that
there are two groups that stand out. One group (0 coe cient score) had a strong showing
of females with 68.28% of all females in this group. Another group (-1 coe cient score) had
a strong showing of males in it. The information that can come from this is twofold;  rst
62
it gives a breakdown of where the strongest percentages of the category types are i.e. male
or female and second it gives a method for evaluating the clusters that will be given by the
second tool.
Table 7.2: The results from the analysis tool showing the percentages as they pertain to
male and female in each coe cient score group
Total In Percentage Percentage Percentage
Category Coe cient Gender Category Individual of total
Group Score Group Group Group
1 -6 Male 100.00% 0.15% 0.06%
1 -4 Male 100.00% 0.15% 0.06%
2 -3 Female 100.00% 0.22% 0.13%
17 -2 Female 81.00% 1.83% 1.08%
4 -2 Male 19.00% 0.62% 0.25%
265 -1 Female 45.70% 28.49% 16.77%
315 -1 Male 54.30% 48.46% 19.94%
635 0 Female 65.90% 68.28% 40.19%
328 0 Male 34.10% 50.46% 20.76%
9 1 Female 100.00% 0.97% 0.57%
1 2 Male 100.00% 0.15% 0.06%
2 3 Female 100.00% 0.22% 0.13%
The second tool used that gave the most signi cant results as it relates to validating
the second hypothesis (that a person can be categorized by their voice), is the Applications
QuestTMsoftware developed by Dr. Juan E. Gilbert. This software is a clustering application
that takes a tab delimited  le with demographic data along with analysis calculations
forming clusters using this data to determine which participants are most alike. After
initial experimentation with the settings that a user enters i.e. number of clusters preferred
or attributes to be used, it was decided that six (6) clusters gave excellent results for
this research. The coe cient data mentioned earlier, was part of the data uploaded to
Applications QuestTMwith some changes. The average for each set of coe cients of the ten
threads that had been selected was calculated. This updated data then was stored along
with the demographic data for each sample and stored in a tab delimited  le. The following
are the attributes that were uploaded into Applications QuestTM; ID, Gender, Ethnicity,
63
State that has e ected voice most, Education, Area they live in, Height, and the three
coe cients for the calculated polynomial. Once the  le has been uploaded, the next step
is to select the attributes that the application will use for clustering. The following shows
the results where the attributes gender, ethnicity, and the three coe cient values were used
for the  nal analysis. The AverageDi erence as talked about in (PUT Section ref) is an
indication of how di erent the samples are from each other. Cluster 0 members are more
di erent then cluster 4?s members which can be seen in Table 7.3 that shows male and
Table 7.3: Clustering results from Applications QuestTMreset to look for samples that are
alike rather than di erent.
Cluster 0
Ethnicity: White (4), African American (2), Native American (1)
Gender: Male (6), Female (1)
Cluster 1
Ethnicity: White (34)
Gender: Male (17), Female (17)
Cluster 2
Ethnicity: White (63)
Gender: Female (63)
Cluster 3
Ethnicity: White (12), Asian (9), African American (8)
Gender: Male (29)
Cluster 4
Ethnicity: White (15), Asian (1)
Gender: Male (16)
Cluster 5
Ethnicity: Native American (9)
Gender: Female (9)
one female in Cluster 0, but all males in cluster 4. It needs to be stated that the primary
use of this software is to form groups that are diverse; however the developer was able to
set the program to cluster the samples that are most alike. This was represented by the
di erence index which is the average di erence between members of a cluster. So the lower
the di erence index the better the cluster representation is [20]. The di erence index for
the complete sample set was 28.60% standard deviation 16. It should be noted that all of
64
the cluster?s di erence index are under this value which was an anticipated out come and
shows the process is valid. The cluster?s information can be seen in Table 7.3. One can
 Cluster 0, AverageDi erence = 25.21%
 Cluster 1, AverageDi erence = 20.87%
 Cluster 2, AverageDi erence = 9.30%
 Cluster 3, AverageDi erence = 23.84%
 Cluster 4, AverageDi erence = 8.76%
 Cluster 5, AverageDi erence = 13.22%
observe that the clusters with the higher di erence index (0, 1, 3) are not as uniform
as the lower di erence index clusters (2, 4, 5) which are very distinct. These distinct
clustering results and others like them give validity to the approach of establishing threads
and calculating the polynomial coe cient to represent the thread pattern for the given voice
sample.
7.2 Results
A major result from this section was the thread mapping of the frequency peaks. Being
able to distinguish when a new peak was formed or an old peak no longer appeared was
very important to track the threads as they were created. From these  ndings evolved the
idea of graphically representing these threads yet using a venue that is purely numerical.
This resulted in the calculation of polynomials to represent these threads. Taking the 10
most prominent threads and averaging the coe cients then gave way to having a general
representation of the voice and allowed for clustering. The clustering validated that the
polynomials did represent the voice and given the coe cient values for an individual, they
can be put into a certain group, i.e. gender.
65
7.3 Conclusion
It was in this  nal phase of experimentation that the strongest results occurred. First
the splitting of a single word into multiple 800 byte parts was paramount to getting a nu-
meric representation for the voice. From splitting of the word to the thread representation
to the creation of polynomials corresponding to the voice, all gave validation to the hy-
pothesis set. Upon completion this phase the results from the clustering application showed
that hypothesis 2, \The human tone classi cation can be re ned into human classi cations
that can pertain to gender, ethnicity and geographical area where their accent was most
e ected.", can be accomplished by modeling a person?s voice as a polynomial.
66
Chapter 8
Findings and future work
The goal of this research was to con rm the following two hypotheses as they relate to
speaker classi cation.
 H1) The pitch range of the human voice could be used to create a tone classi cation
set, such as a low, medium, and high tones.
 H2) The human tone classi cation could be re ned into human classi cations that
could pertain to gender, ethnicity and area where their accent was most e ected.
The literature review proved to be the  rst obstacle, as there was very little published
on the subject matter of speaker classi cation. The two areas of speaker veri cation and
speaker identi cation had dominated most e orts in research of this kind. When literature
was obtained it was either of the theoretical nature or did not divulge the inner workings
of the study attempted. Therefore the primary motivation was the thought that if humans
could listen to someone speak and be able to tell certain characteristics about them i.e.
that they were male or female; it stood to reason that in some way this could be mathe-
matically computer-generated. Given that machines most likely would do this to a lesser
degree, the bene ts are still numerous [40]. The results from chapters 4 - 7 document the
exact progression and calculations this research has undertaken to obtain a mathematical
representation of what a human does naturally. The following summarizes the validation
of the fore mentioned hypotheses.
The  rst hypotheses was quickly validated when the voice sample was converted from
the time domain to the frequency domain. In Section 5.1.4 it showed that when the fre-
quency data was bounded (250 - 1250 Hz) it could be determined where in that sample the
frequency was the strongest. By using the average slope between the prominent peaks of
67
the sample, it could be con rmed when the frequency was stronger at the beginning (neg-
ative slope), in the middle (slope approaching 0) or the end (positive slope) of the selected
frequency range, review graphs in Figure 5.6. This value clearly indicated that the tone for
the sample could be categorized either; high, medium, or low, thus validating the  rst hy-
pothesis. With the  rst hypothesis substantiated, the research progressed to the validation
of the second hypothesis.
Re ning the development associated with hypothesis one as it pertains to frequency
was no inconsequential task. With no previous work to act as a guide experimentation
was done in phases. Each phase added to the validation process; however it was the  nal
phase (Chapter 7) that gave the key to classifying a person. By a series of experiments
the frequency of an individual was represented by a polynomial of the third order, refer
to Figure 7.8. This polynomial was created by  rst establishing a thread that tracked the
prominent peaks of a frequency sample. The top ten threads were then selected and a set of
polynomial coe cients were calculated for each thread. These ten sets of coe cient values
were then averaged and the polynomial that was formed was used for the representation of
a person?s voice. Con rmation of this was ascertained by taking these values along with the
demographic information for the participants and uploading them to Applications QuestTM,
a clustering application. The results obtained gave a clear indication that the polynomial
coe cients gave appropriate representation of a person such that they could be put into
cluster groups that would indicate gender and ethnicity. With this conclusion hypothesis
two was validated in that it was shown that it is possible to re ne the analysis of the voice
to give predilection towards a classi cation of an individual.
8.1 Contributions
The use of biometrics and voice biometrics in particular are increasing every day [36].
It is the goal of this research to provide, to the area of voice biometrics, validation that,
an application can take a voice sample and glean from it information that can be used
to enhance the interaction between humans and machines. This could be done by  nding
68
characteristics of a person that can be used to classify that person so that more informa-
tion is available so the application can better serve the user and the community. This
research will not only aid current applications, but could also be expanded into determining
other attributes of an individual that will be bene cial to the continuing research of voice
applications as they pertain to HCI [40].
8.2 Future Work
There is a great deal of future work planned for this research. The following is a list
of planned work.
 Target data collection such that a more evenly distributed group is available as it
pertains to the target attributes. One idea to accomplish this would be to set up
in certain areas where a particular participant group can be found, i.e. collecting
samples from a senior group at a monthly meeting.
 Utilize the other parameters established when in the time domain (i.e. amount of
pause) into other voice applications
 Conduct the study under a controlled environment where all participants use the same
phone and back ground noise is controlled.
 Incorporate speech recognition to listen for particular words that may be used by the
participant. i.e. \ya?ll"
 Collect numerous samples from the same participant where they are healthy, sick,
have throat problems.
 To create an application that is fully automated for the processing of the voice samples.
 Investigate use of classi cation as it pertains to security.
69
Chapter 9
Scholarly Contributions
Gilbert, J.E., Cross, E.V., McMillian, Y., Rouse, K., Mkpong-Ru n, I., Gupta, P.,
& Williams, P. (2007) A Usable Security Approach to Electronic Voting. IEEE
Computer.
Gilbert, J.E., McMillian, Y., Cross, E.V., Rouse, K., Williams, P., Gupta, P., Rogers,
G., McClendon, J., Mkpong-Ru n, I., & Nobles, K. (2007) Multimodal E-Voting with
Older Citizens. International Journal of Human-Computer Studies.
Williams, A., Rouse, K., Seals, C.D., & Gilbert, J.E. (2007) Enhancing Reading
Literacy in Elementary Children using Programming for Scienti c Simulations.
International Journal on E-Learning.
Cross, E.V., Rogers, G., McClendon, J., Mitchell, W., Rouse, K., Gupta, P., Williams,
P., Mkpong-Ru n, I., McMillian, Y., Neely, E., Lane, J., Blunt, H. & Gilbert, J.E. (2007)
Prime III: One Machine, One Vote for Everyone. VoComp 2007, Portland, OR,
July 16, 2007.
Williams, A., Seals, C., Rouse, K., & Gilbert, J. (2006) Visual Programming
with Squeak SimBuilder: Techniques for E-Learning in the Creation of Sci-
ence Frameworks. In Proceedings of E-Learn 2006 World Conference on E-Learning in
Corporate, Government, Healthcare, & Higher Education, CD-ROM.
70
Bibliography
[1] ACM SIGGRAPH. (1999). Human-Centered Computing, Online Communities and Vir-
tual Environments, Special report Vol.33 No.3. Chateau de Bonas, France: ACM SIG-
GRAPH.
[2] Applications Quest. (2009). Retrieved March 2009, from Applications Quest, LLC:
http://www.applicationsquest.org/
[3] Audacity Home. (2008). Retrieved June 2008, from Audacity: Free Audio Editor and
Recorder: http://audacity.sourceforge.net/
[4] Bhattacharyya, S., & Srikanthan, T. (2004). Synthesis Journal. Re-
trieved November 2006, from Information Technology Standards Committee:
http://www.itsc.org.sg/synthesis/2004/2 Voice.pdf
[5] Biometrics 101: Info Biometrics Technology products. (2005). Retrieved November
2006, from Biometrics 101: http://www.biometrics-101.com
[6] Biometrics History. (2002). Retrieved 2007, from National Center for State Courts:
http://ctl.ncsc.dni.us/biomet%20web/BMHistory.html
[7] Biometrics Home Page. (2002). Retrieved November 2006, from National Center for
State Courts: http://ctl.ncsc.dni.us/biomet%20web/BMIndex.html
[8] Campbell, J. (1997). Speaker Recognition: A Totorial. Preceedings of the IEEE , 85
(9), 1437-1462.
[9] Childers, D. (2000). Speech Processing. New York: John Wiley & Sons.
[10] Cole, R., & Schwartz, E. (2008). Virginia Tech Multimedia Music Dictionary. Retrieved
May, 2009, from http://www.music.vt.edu/musicdictionary/
[11] Cohen, P. R., & Oviatt, S. L. (1995). The Role of Voice Input for Human-Machine
Communication. Proceedings of the National Academy of Sciences of the United States
of America , 92, 9921-9927.
[12] Adobe Audition 2.0. (2009). Retrieved May 2009, from Cool Edit is now Adobe Audi-
tion: http://www.adobe.com/special/products/audition/syntrillium.html
[13] OldVersion.com. (2009). Retrieved May 2009, from Cool Edit Pro- Download at Old-
Version.com: http://www.oldversion.com/Cool-Edit-Pro.html
71
[14] Cooley, J. W., & Tukey, J. W. (1965). An Algorithm for the Machine Calculation of
Complex Fourier Series. Mathematics of Computation , 297-301.
[15] Duhamel, P., & Vetterli, M. (1990). Fast fourier-transforms - A tutorial review and a
state-of the art. Signal Processing , 19 (4), 259 - 299.
[16] Dunlap, D. (2005). Automated Identi cation and Data Capture Biometrics
Web Site. Retrieved November 2006, from Western Carolina University Web:
http://et.wcu.edu/aidc/BioWebPages/Biometrics Voice.html
[17]  ndBIOMETRICS. (2006). Retrieved November 2006, from  ndBIOMETRICS:
http://www. ndbiometrics.com/Pages/guide1.html
[18] Fry, D. B. (1979). The Physics of Speech. Cambridge: Cambridge University Press.
[19] Gilat, A. (2008). MATLAB An Introduction With Applications. Hoboken, NJ: John
Wiley & Sons, Inc.
[20] Gilbert, J.E. (2006) Applications Quest: Computing Diversity. Communications of the
ACM, 49,3, ACM, pp. 99 104.
[21] Ginzel, Dan. [Coach Comm.] Personal Interview. 08 August 2008.
| Personal Interview. 12 December 2008
| Personal Interview. 23 January 2009
| Personal Interview. 03 March 2009
[22] Graham, J. (2006). Windowing and the DFT. Retrieved June 2009, from
Berkeley University of California, web page of Dr. James R. Graham:
http://astro.berkeley.edu/ jrg/ngst/ t/window.html
[23] Grel, L. (2000). Signal-Processing Techniques to Reduce the Sinusoidal Steady-State
Error in the FDTD Method. IEEE Transactions on Antennas and Propagation , 585 -
593.
[24] Hanselman, D., & Little eld, B. (2005). Mastering MATLAB 7. Upper Saddle Ridge,
NJ: Pearson Education Inc.
[25] Harris, F. J. (1978). On the Use of Windows for Harmonic Analysis with the Discrete
Fourier Transform. Proceedings Of The IEEE , 66 (1), 51 - 83.
[26] Hollien, H. (2002). Forensic Voice Identi cation. San Diego: Academic Press.
[27] Independent Biometrics Expertise. (2007). Retrieved April 2007, from International
Biometric Group: http://www.biometricgroup.com/reports/public/basic reports.html
[28] James, R. C. (1992). Mathematics Dictionary (Fifth ed.). New York: Van Nortrand
Reinhold.
72
[29] Jastrow, D. (2007, June 1). The New Fingerprint? Retrieved August 2007, from Speech
Technology: http://www.speechtechmag.com/Articles/Editorial/Cover-Story/Voice-
The-New-Fingerprint-36320.aspx
[30] Klevans, R. (1997). Voice Recognition. New York: Artech House.
[31] Markowitz, J. (2007, June 1). Classifying Classi cations.
Retrieved July 2007, from Speech Technology Magazine:
http://www.speechtechmag.com/Articles/Column Forward-Thinking Classifying-
Classi cations-36313.aspx
[32] Markowitz, J. (2006). J. Markowitz Consultants. Retrieved April
2007, from J. Markowitz Consultants The Human Side of Computing:
http://www.jmarkowitz.com/information.html
[33] Markowitz, J. (2007, June 1). SpeechTechMag.com: Classifying Clas-
si cations. Retrieved July 2007, from Speech Technology Magazine:
http://www.speechtechmag.com/Articles/Column~Forward-Thinking~Classifying-
Classi cations-36313.aspx
[34] Markowitz, J. (2007). The Many Roles of Speaker Classi cation in Speaker Veri cation
and Identi cation. In C. Mller, Speaker Classi cation I: Fundamentals, Features, and
Methods (Lecture Notes in Computer Science) (pp. 218 - 225). Berlin / Heidelberg:
Springer.
[35] Markowitz, J. (2003, November 25). Voice Biometrics - Are You Who
You Say You Are? Retrieved Novemeber 2007, from Speech Technol-
ogy: http://www.speechtechmag.com/Articles/Editorial Feature~Voice-Biometrics|
Are-You-Who-You-Say-You-Are-29621.aspx
[36] Markowitz, J. (2000). Voice Biometrics. Communications Of The ACM , 43 (9), 66-73.
[37] Martin, H. (1881). The Human Body. New York: Henry Holt and Company.
[38] MATLAB Function Reference  t. (1984-2007 ). Retrieved 2007, from The MathWorks:
http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/
helpdesk/help/techdoc/ref/ t.html & http://www.mathworks.com/cgi-bin/texis/
webinator/search/?db=MSS&prox=page&rorder=750&rprox=750&rdfreq=500&
rwfreq=500&rlead=250&sufs=0&order=r&is su
[39] Merriam-Webster Inc. (2007). Dictionary. Retrieved 2007, from Merriam-Webster?s On-
line Dictionary: http://www.merriam-webster.com/
[40] Metze, F., Englert, R., Bub, U., Burkhardt, F., & Stegmann, J. (2008). Getting closer:
tailored humancomputer speech dialog. Universal Access in the Information Society, 8
(2), 97-108.
[41] Moreno, P., & Ho, P. (2004). SVM Kernel Adaptation in Speaker Classi cation and
Veri cation. INTERSPEECH 2004-ICSLP (pp. 1413-1416). Jeju Island, Korea: IN-
TERSPEECH 2004-ICSLP.
73
[42] Muller, C. (2007). Speaker Classi cation I. Berlin / Heidelberg: Springer.
[43] MySQL Enterprise. (2008). The world?s most popular open source database. Retrieved
2008, from MySQL:http://www.mysql.com/
[44] Nanavati, S., Thieme, M., & Nanavati, R. (2002). Biometrics Identity Veri cation in a
Networked World. New York: John Weily & Sons, Inc.
[45] Nass, C., & Brave, S. (2005). Wired for Speech. Cambridge, MA: The MIT Press.
[46] National Instruments . (2007). Smoothing Windows for Spectral Leak-
age. Retrieved 2007, from National Instruments Developer Zone:
http://zone.ni.com/devzone/cda/tut/p/id/4110
[47] Nuance Cafe. (1999-2007). Retrieved 2008, from Nuance Cafe: Supercharge Your
Phone!: http://cafe.bevocal.com/index.html
[48] Gilbert, J.E., McMillian, Y., Rouse, K., Williams, P., Rogers, G., McClendon,
J.,Mitchell, W., Gupta, P., Mkpong-Ru n, I. & Cross, E.V. (2009) Universal Access
in e-Voting for the Blind. Universal Access in the Information Society Journal.
[49] Palm III, W. J. (2005). A Concise Introduction to MATLAB. New York: McGraw-Hill
Higher Education.
[50] Pediatric Otolaryngology. (2000). Vocal Cord Paralysis. Retrieved Nove-
meber 2007, from Pediatric Otolaryngology: http://www.pediatric-
ent.com/learning/problems/vocalcord.htm
[51] Roberts, W., & Sabrin, H. (2005). Speaker Classi cation Using Composite Hypothesis
Testing and List Decoding. IEEE Transaction on Speech and Audio Processing , 211-
219.
[52] Rodman, J. (2007). Judy Rodman - vocal coach, producer, songwriter, recording artist,
entertainer, actor, Nashville, Tennessee. http://judyrodman.com
[53] Rodman, Judy. Personal Interview. 25 June 2008
[54] Seattle Learning Academy. American English Pronunciation. Retrieved May, 2009,
from http://www.pronuncian.com/stress.aspx
[55] Sharma, M., & Mammone, R. (1996, May 7-10). Subword-based text-dependent speaker
veri cation system withuser-selectable passwords. Retrieved 2006, from IEEE xplore:
http://ieeexplore.ieee.org/iel3/3856/11264/00540298.pdf?isnumber=11264&prod=STD
&arnumber=540298&arnumber=540298&arSt=93&ared=96+vol.+1&arAuthor=
Sharma%2C+M.%3B+Mammone%2C+R.
[56] Smith, J. O. (2007). Mathematics of the Discrete Fourier Transform (DFT) with Audio
Applications (2 ed.). http://books.w3k.org/: W3K Publishing.
74
[57] Snowball Sampling. (2007). Retrieved November 2008, from Department of
Sustainability and Environment: http://www.dse.vic.gov.au/dse/wcmn203.nsf/
linkview/d340630944bb2d51ca25708900062e9838c091705ea81a2fca257091000f8579
[58] Speech Analysis Tutorial. (95). Retrieved Novemeber 2007, from Lund University:
http://www.ling.lu.se/research/speechtutorial/tutorial.html
[59] The Mathworks. (2004, May). The Mathworks. Retrieved November 2007, from
MATLAB 7: https://tagteamdbserver.mathworks.com/ttserverroot/Download/18842
ML 91199v00.pdf
[60] The Mathworks. (2004, May). The Mathworks. Retrieved November 2007,
from The Mathworks - MATLAB and Simulink for Technical Computing:
http://www.mathworks.com/
[61] Thomson Gale. (2005). Biometrics - How biometrics systems work. Retrieved April
2007, from Biometrics: http://www.referencesforbusiness.com
[62] Traunmller, H., & Eriksson, A. (1994). The frequency range of the voice fundamental
in the speech of male and female adults. Retrieved November 2007, from Stockholms
University: http://www.ling.su.se/sta /hartmut/f0 m&f.pdf
[63] Cenus Bureau Home Page . U.S. Cenus Bureau. Retrieved November 27, 2007, from
http://www.census.gov/geo/www/us regdiv.pdf
[64] Volner, R., & Bore, P. (2001, March 2). A Human Classi cation
System for Biometric Parameters. Retrieved October 27, 2007, from
http://internet.ktu.lt/lt/mokslas/zurnalai/elektr/z62/volner.pdf
[65] Weisstein, E. W. (1999). Fast Fourier Transform. Retrieved 2007, from MathWorld{A
Wolfram Web Resource: http://mathworld.wolfram.com/FastFourierTransform.html
[66] Woodward Jr, J., Orlans, N., & Higgins, P. (2003). Identity Assurance in the Informa-
tion Age BIOMETRICS. Berkeley: McGraw-Hill/Osborne.
[67] Yudkowsky, M. (2002, November 1). Dr. Dobb?s | Voice Biometrics Application Se-
curity | November 1, 2002. Retrieved October 27, 2007, from Dr. Dobb?s Portal, The
World of Software Development: http://www.ddj.com/security/184405193
[68] Yun, Y. W. (2003). The ?123? of Biometric Technology. Retrieved 2007, from
www.cp.su.ac.th/~rawitat/teaching/forensicit06/course les/ les/biometric.pdf
75
Appendix A
Breakdown of demographics
Number Of Peaks First Half
Gender 4 5 6 7 8 9 10 11 12 13
Female 2 14 22 20 15 8 8 1
Male 1 2 16 13 6 11 6 6
Ethnicity 4 5 6 7 8 9 10 11 12 13
African Am 2 7 1 6 5 4 1 1
Asian 1 4
Hispanic 1 1 1 2 1
White 2 10 14 26 20 8 14 6 4
State 4 5 6 7 8 9 10 11 12 13
AL 2 5 8 17 9 6 6 2 2
CA 1 3 2 1 1
CO 1
DC 1 1
FL 1 1 1 1 3 1
GA 2 1 1 2 1 1
IA 1 3 1 1 1
ID 1
IL 1 1
KY 1 1
LA 1 1
MA 1
76
State 4 5 6 7 8 9 10 11 12 13
MD 2
MI 1 1 1 1
MN 2 2 1 3
MO 1 1 1
MS 1
NC 2 1 1
NE 1
NY 1
OH 1 2 1
OR 1
PA 1
SC 1
TN 1 1
TX 1 3 3 1 1
VA 2 1 2
WA 1 1
77
Number Of Peaks First Half Continued...
Education 4 5 6 7 8 9 10 11 12 13
Bachelor 7 7 9 10 7 8 1 1
Grammar 1 1
High School 2 3 4
Master 2 9 11 4 4 6 3 1
MD 1 1
PHD 2 1 3 3 2 1 2 2
Some college 2 3 5 9 6 3 1 2
Vocational 1
Area 4 5 6 7 8 9 10 11 12 13
Rural 1 2 2 2 2
Small Town 1 4 11 13 9 5 3 4 4
Suburb 1 9 7 15 11 6 10 2 2
Urban 1 4 6 6 3 4 1
Height 4 5 6 7 8 9 10 11 12 13
4 to 5 1 2 1 1
5.1 to 5.3 2 6 5 4 1 2
5.4 to 5.6 2 7 8 4 11 5 4 2 1
5.7 to 5.9 6 6 14 9 3 1 1
5.10 to 6.0 2 5 3 1 7 4 3
6.1 to 6.3 5 1 3 3 1 1
6.4 to 6.6 1
6.7 to 6.9 1
6.10 to 7.0 1
78
Number Of Points For The Second Half
Gender 4 5 6 7 8 9 10 11 12 13
Female 3 4 11 19 19 13 15 2 5 1
Male 3 6 11 15 20 5 2
Ethnicity 4 5 6 7 8 9 10 11 12 13
African Am 1 1 6 3 7 5 2 2
Asian 2 3
Hispanic 1 1 3 1
White 3 2 11 16 23 20 21 4 4 1
State 4 5 6 7 8 9 10 11 12 13
AL 3 1 5 12 13 9 10 3 11
CA 1 3 3 1
CO 1
DC 1 1
FL 2 3 2 1
GA 1 2 2 1 1 1
IA 1 2 3 1
ID 1
IL 1 1
KY 1 1
79
State 4 5 6 7 8 9 10 11 12 13
LA 1 1
MA 1
MD 1 1
MI 3 1
MN 2 2 3 1
MO 1 2
MS 1
NC 7 9 1
NE 1
NY 1
OH 2 2
OR 1
PA 1
SC 1
TN 1 1
TX 1 1 3 2 1 1
VA 1 1 2 1
WA 1 1
80
Number Of Points For The Second Half Continued...
Education 4 5 6 7 8 9 10 11 12 13
Bachelor 1 1 4 8 8 8 13 3 4
Grammar 1 1
High School 1 1 5 1 1
Master 1 1 2 12 12 7 8 1 1 1
MD 1 1
PHD 2 1 3 5 3 2
Some college 1 2 4 3 7 5 6 1 2
Vocational 1
Area 4 5 6 7 8 9 10 11 12 13
Rural 3 3 3
Small Town 1 2 6 11 10 10 6 4 3 1
Suburb 1 2 5 6 14 13 17 2 1 1
Urban 1 2 5 3 4 7 1 1 1
Height 4 5 6 7 8 9 10 11 12 13
4 to 5 1 1 1 1 1
5.1 to 5.3 1 10 2 2 2 1 2
5.4 to 5.6 2 2 6 6 8 5 12 1 2
5.7 to 5.9 2 4 7 11 10 3 2 1
5.10 to 6.0 1 2 1 3 5 9 3 1
6.1 to 6.3 1 5 3 4 1
6.4 to 6.6 1
6.7 to 6.9 1
6.10 to 7.0 1
81
Slope Distance Line
Gender
Female -0.99365 to -0.00651
Female 0.00085 to 0.61721
Male -0.49033 to -0.00286
Male 0.00308 to 0.66138
Ethnicity
African Am Neg -0.59546 to -0.00651
African Am Pos 0.00308 to 0.66138
Asian Neg -0.26139 to -0.15716
Asian Pos 0.0885 to 0.19448
Hispanic Neg -0.20888 to -0.03503
Hispanic Pos 0.00433 to 0.13985
White Neg -0.99365 to -0.00286
White Pos 0.00085 to 0.61721
82
Slope Distance Line Continued...
State
AL -0.74824 to -0.00286
AL 0.00085 to 0.42001
CA -0.40647 to -0.0089
CA 0.05684
CO 0.16646
DC -0.23638 and 0.38993
FL -0.49033 to -0.00651
FL 0.10738 to 0.66138
GA -0.09982 to -0.02608
GA 0.03778 to 0.58616
IA -0.29616 to -0.03173
IA 0.00935 to 0.61721
ID -0.46797
IL -0.40137 and 0.56757
KY -0.12478 to -0.0204
LA -0.28038 and 0.01232
MA 0.04451
MD -0.19547 and 0.00411
MI -0.19405 to -0.04471
MI 0.133425
MN -0.41642 to -0.04392
MN 0.0197 to 0.25693
MO -0.05815
MO 0.10037 to 0.26848
MS -0.16467
NC -0.59546 to -0.10757
NC 0.13985 to 0.16893
NE 0.12225
83
NY -0.08948
OH -0.19234 to -0.03795
OH 0.00308
OR -0.20888
PA -0.79311
SC 0.06591
TN -0.09841 to -0.04037
TX -0.99365 to -0.0147
TX 0.00798 to 0.31808
VA -0.11912 to -0.01772
VA 0.01027 to 0.19143
WA -0.32368 and 0.01558
84
Slope Distance Line Continued...
Education
Bachelor -0.89478 to -0.00366
Bachelor 0.00433 to 0.66138
Grammar 0.00935 to 0.02014
High School -0.22005 to -0.00651
High School 0.00085 to 0.32173
Master -0.79311 to -0.0094
Master 0.00411 to 0.58616
MD -0.40137 and 0.56757
PHD -0.49033 to -0.0204
PHD 0.00308 to 0.40287
Some college -0.99365 to -0.00286
Some college 0.01027 to 0.25693
Vocational 0.06534
Area
Rural -0.40137 to -0.00286
Rural 0.01667 to 0.32173
Small Town -0.74824 to -0.00366
Small Town 0.00308 to 0.66138
Suburb -0.99365 to -0.00663
Suburb 0.00411 to 0.38993
Urban -0.46797 to -0.01772
Urban 0.00085 to 0.61721
85
Height
4.0-5.0 -0.99365 to -0.02578
4.0-5.0 0.04451
5.1 to 5.3 -0.59546 to -0.04471
5.1 to 5.3 0.00433 to 0.56757
5.4 to 5.6 -0.89478 to -0.00663
5.4 to 5.6 0.00085 to 0.66138
5.7 to 5.9 -0.79311 to -0.00651
5.7 to 5.9 0.00308 to 0.61721
5.10 to 6.0 -0.44787 to -0.00286
5.10 to 6.0 0.00935 to 0.31808
6.1 to 6.3 -0.46797 to -0.04863
6.1 to 6.3 0.01558 to 0.26175
6.4 to 6.6 -0.00366
6.7 to 6.9 -0.2121
6.10 to 7.0 -0.06889
86
Second Half Peak
Gender 1 2 3 4 5 6 7 8 9 10 11 12 13
Female 13 20 13 16 14 7 3 6 3 1 1
Male 3 12 11 4 7 9 5 7 3
Ethnicity
African Am 3 3 6 6 1 1 6 1
Asian 1 1 1 2
Hispanic 2 1 1 1 1
White 11 24 14 13 10 15 6 7 2 1 1
87
State 1 2 3 4 5 6 7 8 9 10 11 12 13
AL 4 13 7 13 5 6 2 5 2
CA 1 3 3 1
CO 1
DC 1 1
FL 2 1 2 1 1 1
GA 2 1 2 1 1 1
IA 2 2 1 1 1
ID 1
IL 1 1
KY 1 1
LA 1 1
MA 1
MD 1 1
MI 1 1 1 1
MN 1 2 2 1 1 1
MO 1 1 1
MS 1
NC 1 2 1
NE 1
NY 1
OH 1 1 1 1
OR 1
PA 1
SC 1
TN 1 1
TX 1 1 2 1 1 1 1 1
VA 2 1 1 1
WA 1 1
88
Second Half Peak Continued...
Education 1 2 3 4 5 6 7 8 9 10 11 12 13
Bachelor 1 12 7 8 6 5 2 4 4 1
Grammar 1 1
High School 1 2 5 1
Master 8 11 3 5 1 4 2 5 1
MD 1 1
PHD 1 3 4 3 2 1 2
Some college 5 3 5 5 3 5 3 1 1
Vocational 1
Area 1 2 3 4 5 6 7 8 9 10 11 12 13
Rural 1 3 2 1 1 1
Small Town 5 12 13 7 4 7 3 5 2
Suburb 5 16 9 8 8 6 1 5 4 1
Urban 6 3 3 3 2 2 3 2 1
Height 1 2 3 4 5 6 7 8 9 10 11 12 13
4.0-5.0 1 1 1 1 1
5.1-5.3 4 4 6 2 2 1 1
5.4 to 5.6 13 9 8 7 2 4 4 1
5.7 to 5.9 4 11 6 4 4 5 3 2 1
5.10 to 6.0 4 5 2 2 5 2 2 3
6.1 to 6.3 2 4 3 1 1 3
6.4 to 6.6 1
6.7 to 6.9 1
6.10 to 7.0 1
89
First Half Peak
Gender 1 2 3 4 5 6 7 8 9 10 11
Female 14 16 17 12 10 5 9 2 3 2
Male 4 10 9 5 7 8 5 6 3 1 3
Ethnicity 1 2 3 4 5 6 7 8 9 10 11
African Am 3 2 4 4 5 4 2 3
Asian 1 2 1 1
Hispanic 1 1 1 1 1 1
White 11 20 18 12 9 11 9 5 3 3 3
90
State 1 2 3 4 5 6 7 8 9 10 11
AL 6 9 7 10 7 6 4 3 2 1 2
CA 3 4 1
CO 1
DC 1 1
FL 1 2 1 1 2 1
GA 3 1 1 2 1
IA 1 1 1 1 1 1
ID 1
IL 1 1
KY 1 1
LA 1 1
MA 1
MD 1 1
MI 1 2 1
MN 1 3 2 2
MO 1 1 1
MS 1
NC 2 1 1
NE 1
NY 1
OH 1 1 1 1
OR 1
PA 1
SC 1
TN 1 1
TX 1 2 4 1 1
VA 2 1 1 1
WA 1 1
91
First Half Peak Continued...
Education 1 2 3 4 5 6 7 8 9 10 11
Bachelor 4 8 9 5 8 5 5 2 2 2
Grammar 1 1
High School 1 3 2 3
Master 10 5 4 4 4 4 4 1 2 1 1
MD 1 1
PHD 3 5 3 2 1 1 1
Some college 2 7 6 7 2 5 2 1 1
Vocational 1
Area 1 2 3 4 5 6 7 8 9 10 11
Rural 1 1 3 1 2 1
Small Town 7 9 12 3 5 5 4 4 3 2
Suburb 5 13 11 9 8 5 5 2 2 2 1
Urban 5 6 2 2 3 1 5 1 1 1
Height 1 2 3 4 5 6 7 8 9 10 11
4.0-5.0 1 2 1 1
5.1-5.3 2 2 3 3 4 3 2 1
5.4-5.6 9 6 12 6 3 6 2
5.7-5.9 3 12 5 5 3 5 4 1 2
5.10-6.0 1 2 4 2 3 4 1 4 1 1 2
6.1-6.3 3 3 2 1 1 2 1 1
6.4-6.6 1
6.7-6.9 1
6.10-7.0 1
92
Shortest Distance Between Two Peaks
Gender
Female Range 20.83334 to 29.36933
Male Range 20.83342 to 24.97766
Ethnicity
African Am 20.83343 to 24.97766
Asian 20.91476 to 21.5333
Hispanic 20.83353 to 21.28297
White 20.83334 to 29.36933
93
State
AL 20.83334 to 26.0197
CA 20.83416 to 22.48856
CO number too high
DC 21.33177 to 22.36116
FL 20.83479 to 24.97766
GA 20.84042 to 24.14853
IA 20.83424 to 24.48203
ID 23.00167
IL 22.44879 to 23.95501
KY 20.83767 to 20.99489
LA 20.83491 to 21.6367
MA 20.85396
MD 20.83351 to 21.22761
MI 20.85415 to 21.22195
MN 20.83737 to 22.56749
MO 20.86852 to 21.5711
MS 21.1139
NC 20.95352 to 24.24713
NE 20.98844
NY 27.95571
OH 20.83343 to 21.21518
OR 21.28297
PA 26.59026
SC 20.87853
TN 20.8503 to 20.93396
TX 20.834 to 29.36933
VA 20.83443 to 21.21162
WA 20.83586 to 21.8975
94
Shorest Distance Between Two Peaks Continued...
Education
Bachelor 20.83347 to 27.95571
Grammar 20.83424 to 20.83756
High School 20.83334 to 21.885
Master 20.83351 to 26.59026
MD 22.44879 to 23.95501
PHD 20.83343 to 23.20301
Some college 20.83342 to 29.36933
Vocational 20.87776
Area
Rural 20.83342 to 22.44879
Small Town 20.83343 to 26.0197
Suburb 20.83351 to 29.36933
Urban 20.83334 to 24.48203
Height
4.0-5.0 20.84026 to 29.36933
5.1-5.3 20.83353 to 24.24713
5.4 to 5.6 20.83334 to 27.95571
5.7 to 5.9 20.83343 to 26.59026
5.10 to 6.0 20.83342 to 22.82738
6.1 to 6.3 20.83586 to 23.00167
6.4 to 6.6 20.83347
6.7 to 6.9 21.2968
6.10 to 7.0 20.8827
95
Another Second Half Peak
Gender 1 2 3 4 5 6 7 8 9 10 11
Female 11 14 11 14 14 6 8 4 3 1 2
Male 10 7 11 3 6 5 10 7 1 1
Ethnicity 1 2 3 4 5 6 7 8 9 10 11
African Am 3 5 3 2 3 5 2 2 1
Asian 1 3 1
Hispanic 1 1 2 1 1
White 17 18 13 10 15 8 10 9 1 1
96
State 1 2 3 4 5 6 7 8 9 10 11
AL 5 7 10 9 10 3 5 3 2 2
CA 1 2 2 1 2
CO 1
DC 1 1
FL 1 1 1 1 1 2 1
GA 2 1 2 1 2
IA 1 2 1 1 2
ID 1
IL 1 1
KY 1 1
LA 1 1
MA 1
MD 1 1
MI 1 1 1 1
MN 2 2 3 1
MO 1 1 1
MS 1
NC 1 1 1 1
NE 1
NY 1
OH 1 1 2
OR 1
PA 1
SC 1
TN 1 1
TX 2 1 1 1 2 1 1
VA 3 1 1
WA 1 1
97
Another Second Half Continued...
Education 1 2 3 4 5 6 7 8 9 10 11
Bachelor 8 8 8 7 9 4 1 1 2 1 1
Grammar 1 1
High School 2 1 4 1 1
Master 6 3 5 6 2 4 9 3 1 1
MD 1 1
PHD 3 2 1 1 2 4 3
Some college 2 6 4 4 5 1 3 2 1 1
Vocational 1
Area 1 2 3 4 5 6 7 8 9 10 11
Rural 2 3 1 1 1 1
Small Town 8 5 8 5 7 4 8 7 2 1 1
Suburb 18 11 9 10 9 6 5 2 2 1
Urban 3 2 4 3 2 1 5 3 1
Height 1 2 3 4 5 6 7 8 9 10 11
4.0-5.0 1 1 1 1 1
5.1-5.3 2 3 2 6 2 3 1 1
5.4-5.6 7 4 8 8 7 1 4 1 2 1 1
5.7-5.9 5 9 7 4 3 3 6 3
5.10-6.0 4 3 2 1 6 1 5 2 1
6.1-6.3 2 1 3 1 1 2 4
6.4-6.6 1
6.7-6.9 1
6.10-7.0 1
98
Another First Half Peak
Gender 1 2 3 4 5 6 7 8 9 10 11
Female 10 14 16 13 10 9 7 6 3 2
Male 12 7 7 2 6 4 9 7 2 3
Ethnicity 1 2 3 4 5 6 7 8 9 10 11
African Am 3 3 4 2 4 4 3 3 1
Asian 1 1 2 1
Hispanic 1 1 1 1 1 1
White 17 16 18 8 13 8 9 9 1 1 3
99
State 1 2 3 4 5 6 7 8 9 10 11
AL 6 5 12 5 8 4 8 4 2 2
CA 3 2 1 1 1
CO 1
DC 1 1
FL 2 2 1 1 1 1
GA 3 1 2 1 1
IA 1 2 1 1 1 1
ID 1
IL 1 1
KY 1 1
LA 1 1
MA 1
MD 1 1
MI 1 1 1 1
MN 2 3 1 1 1
MO 1 2
MS 1
NC 1 1 1 1
NE 1
NY 1
OH 1 1 1 1
OR 1
PA 1
SC 1
TN 1 1
TX 2 1 2 3 1
VA 4 1
WA 1 1
100
Another First Half Peak Continued..
Education 1 2 3 4 5 6 7 8 9 10 11
Bachelor 8 9 10 6 8 3 1 2 3
Grammar 1 1
High School 1 3 3 1 1
Master 7 1 5 4 2 6 5 6 2 2
MD 1 1
PHD 3 1 2 1 2 4 1 1 1
Some college 2 7 3 4 4 2 5 2 1
Vocational 1
Area 1 2 3 4 5 6 7 8 9 10 11
Rural 2 2 3 1 1
Small Town 7 7 9 2 7 4 6 4 4 1 2
Suburb 10 10 8 10 4 7 6 5 1 1
Urban 3 2 3 3 5 1 4 3 1
Height 1 2 3 4 5 6 7 8 9 10 11
4.0-5.0 1 1 1 1 1
5.1-5.3 1 3 6 2 3 1 3 1
5.4-5.6 8 4 8 6 4 5 2 3 2 2
5.7-5.9 4 10 6 6 2 2 5 3 2
5.10-6.0 6 1 1 6 3 3 3 2
6.1-6.3 2 2 2 1 1 1 3 1
6.4-6.6 1
6.7-6.9 1
6.10-7.0 1
101
Second Half Number Of Direction Changes
Gender 0 1 2 3 4 5 6 7 8 9
Female 2 4 12 15 20 18 6 8 2 3
Male 4 6 16 16 11 13 8 2 1
Ethnicity 0 1 2 3 4 5 6 7 8 9
African Am 5 6 6 4 4 1 1
Asian 1 1 1 1 1
Hispanic 2 1 2 1
White 1 7 11 20 24 24 8 6 1 2
102
State 0 1 2 3 4 5 6 7 8 9
AL 1 5 9 12 9 11 4 2 1 3
CA 4 1 2 1
CO 1
DC 1 1
FL 1 1 2 1 2 1
GA 1 2 2 1 1 1
IA 2 2 1 2
ID 1
IL 1 1
KY 1 1
LA 2
MA 1
MD 1 1
MI 1 1 1 1
MN 2 2 2 1 1
MO 1 2
MS 1
NC 1 3
NE 1
NY 1
OH 1 1 2
OR 1
PA 1
SC 1
TN 1 1
TX 1 1 3 1 2 1
VA 1 4
WA 1 1
103
Second Half Number Of Direction Changes Continued...
Education 0 1 2 3 4 5 6 7 8 9
Bachelor 4 3 12 9 8 5 5 2 2
Grammar 1 1
High School 1 2 2 3 1
Master 2 10 7 7 6 4 2 2
MD 1 1
PHD 1 2 4 2 4 3
Some college 2 1 1 5 10 9 2 1
Vocational 1
Area 0 1 2 3 4 5 6 7 8 9
Rural 1 3 2 3
Small Town 1 3 9 13 11 9 6 1 1
Suburb 1 2 7 12 10 14 7 7 2 1
Urban 2 2 3 8 5 1 2 2
Height 0 1 2 3 4 5 6 7 8 9
4.0-5.0 1 1 2 1
5.1-5.3 2 4 7 3 2 2
5.4-5.6 1 2 7 10 6 8 6 3 1
5.7-5.9 1 3 4 8 7 10 3 3 1
5.10-6.0 1 3 5 6 6 2 1
6.1-6.3 2 2 3 3 1 1 1 1
6.4-6.6 1
6.7-6.9 1
6.10-7.0 1
104
First Number Of Direction Changes
Gender 1 2 3 4 5 6 7 8 9
Female 2 12 19 17 23 7 8 1 1
Male 2 10 5 12 16 12 4
Ethnicity 1 2 3 4 5 6 7 8 9
African Am 7 2 7 8 3 1
Asian 2 1 1 1
Hispanic 1 1 1 1 2
White 3 11 18 21 28 11 10 1 1
105
State 1 2 3 4 5 6 7 8 9
AL 3 11 7 13 11 8 4
CA 2 2 1 1 1 1
CO 1
DC 1 1
FL 1 1 1 1 2 1 1
GA 1 2 2 2 1
IA 2 3 2
ID 1
IL 1 1
KY 1 1
LA 1 1
MA 1
MD 1 1
MI 1 1 2
MN 2 3 2 1
MO 1 1 1
MS 1
NC 1 1 2
NE 1
NY 1
OH 1 2 1
OR 1
PA 1
SC 1
TN 1 1
TX 1 2 1 2 2 1
VA 1 1 2 1
WA 1 1
106
First Half Number Of Direction Changes Continued...
Education 1 2 3 4 5 6 7 8 9
Bachelor 1 4 12 9 11 7 6 1
Grammar 1 1
High School 1 1 1 1 3 2
Master 8 4 10 10 5 3
MD 1 1
PHD 1 3 2 5 3 2
Some college 1 4 6 3 10 3 3 1
Vocational 1
Area 1 2 3 4 5 6 7 8 9
Rural 1 2 2 2 2
Small Town 2 8 10 10 9 7 6 1
Suburb 2 8 9 13 19 5 5 1
Urban 3 3 3 9 5 1
Height 1 2 3 4 5 6 7 8 9
4.0-5.0 1 1 2 1
5.1-5.3 3 3 6 4 2 1 1
5.4-5.6 2 5 7 6 9 4 6
5.7-5.9 1 6 9 8 10 5 1
5.10-6.0 1 4 3 3 6 4 4
6.1-6.3 2 6 4 2
6.4-6.6 1
6.7-6.9 1
6.10-7.0 1
107
Total Second X Distance
Gender
Female 625 to 979.16667
Male 687.5 to 979.16667
Ethnicity
African Am 645.83333 to 958.33333
Asian 875 to 937.5
Hispanic 875 to 937.5
White 625 to 979.16667
108
State
AL 625 to 979.16667
CA 729.16667 to 937.5
CO 812.5
DC 645.83333 to 937.5
FL 791.66667 to 937.5
GA 687.5 to 895.83333
IA 729.16667 to 895.83333
ID 958.33333
IL 708.33333 to 916.66667
KY 812.5 to 958.33333
LA 833.33333 to 916.66667
MA 833.33333
MD 708.33333 to 770.83333
MI 895.83333 to 937.5
MN 770.83333 to 958.33333
MO 812.5 to 937.5
MS 750
NC 812.5 to 895.83333
NE 833.33333
NY 895.83333
OH 854.16667 to 895.83333
OR too low
PA 812.5
SC 812.5
TN 729.16667 to 854.16667
TX 708.33333 to 958.33333
VA 833.33333 to 979.16667
WA 770.83333 to 916.66667
109
Total Second X Distance Continued...
Education
Bachelor 625 to 979.16667
Grammar 875 to 895.83333
High School 625 to 937.5
Master 708.33333 to 979.16667
MD 708.33333 to 916.66667
PHD 791.66667 to 937.5
Some college 645.83333 to 979.16667
Vocational 895.83333
Area
Rural 687.5 to 937.5
Small Town 625 to 937.5
Suburb 645.83333 to 979.16667
Urban 625 to 979.16667
Height
4.0-5.0 812.5 to 937.5
5.1-5.3 687.5 to 958.33333
5.4-5.6 625 to 979.16667
5.7-5.9 708.33333 to 979.16667
5.10-6.0 625 to 937.5
6.1-6.3 750 to 979.16667
6.4-6.6 854.16667
6.7-6.9 875
6.10-7.0 916.66667
110
Average Second Slope
Gender
Female -0.09203 to -0.00038
Female 0.00075 to 0.06744
Male -0.04657 to -0.00139
Male 0.00142 to 0.02074
Ethnicity
African Am -0.04718 to -0.00252
African Am 0.00142 to 0.01328
Asian -0.03222 to -0.01302
Hispanic -0.03626 to -0.01104
Hispanic 0.00496
White -0.09203 to -0.00038
White 0.00075 to 0.06744
111
State
AL -0.0599 to -0.00247
AL 0.00077 to 0.03527
CA -0.03874 to -0.02068
CA 0.00637 to 0.03682
CO -0.00361
DC -0.04232 to -0.02347
FL -0.04211 to -0.00038
FL 0.00933
GA -0.03888 to -0.00724
GA 0.00222 to 0.02074
IA -0.03711 to -0.01016
IA 0.02933 to 0.06744
ID -0.01791
IL -0.0273 to -0.00265
KY -0.03072 and 0.01225
LA -0.02314 to -0.00319
MA -0.01701
MD -0.02163 to -0.00692
MI -0.03305 to -0.00253
MN -0.04381 to -0.01107
MN 0.00238
MO -0.04718
MO 0.00075 to 0.03882
MS -0.01732
NC -0.02475 to -0.00139
NC 0.00496
NE -0.01892
NY -0.0396
OH -0.03398 to -0.0037
OR -0.01355
PA -0.04362
SC 0.01177
TN -0.01908 to -0.00572
TX -0.0845 to -0.01101
TX 0.01218 to 0.01369
VA -0.09203 to -0.01039
VA 0.00855
WA -0.02758 to -0.01922
112
Average Second Slope Continued...
Education
Bachelor -0.09203 to -0.00263
Bachelor 0.00075 to 0.03682
Grammar -0.01016 and 0.06744
High School -0.03305 to -0.00572
High School 0.00933 to 0.02933
Master -0.04718 to -0.00038
Master 0.00295 to 0.03882
MD -0.0273 to -0.00265
PHD -0.04163 to -0.0037
Some college -0.0845 to -0.00139
Some college 0.00077 to 0.01218
Vocational -0.02185
Area
Rural -0.04657 to -0.00265
Rural 0.00077
Small Town -0.09203 to -0.00038
Small Town 0.00142 to 0.03682
Suburb -0.0845 to -0.00139
Suburb 0.00075 to 0.06744
Urban -0.04718 to -0.00247
Urban 0.00738 to 0.02933
113
Height
4.0-5.0 -0.0845 to -0.01101
4.0-5.0 0.01113
5.1-5.3 -0.04649 to -0.0149
5.1-5.3 0.00077 to 0.03682
5.4-5.6 -0.09203 to -0.00038
5.4-5.6 0.00238 to 0.03513
5.7-5.9 -0.0599 to -0.00139
5.7-5.9 0.00222 to 0.06744
5.10-6.0 -0.03745 to -0.00253
5.10-6.0 0.00075 to 0.03527
6.1-6.3 -0.03548 to -0.00265
6.4-6.6 -0.00263
6.7-6.9 -0.01488
6.10-7.0 -0.00816
114
Average Second Y Distance
Gender
Female 0.76482 to 20.11604
Male 1.06771 to 9.62666
Ethnicity
African Am 1.53423 to 9.16597
Asian 1.81319 to 8.46621
Hispanic 1.88871 to 6.26574
White 0.76482 to 20.11604
115
State
AL 0.76482 to 18.98124
CA 2.19067 to 10.75513
CO 4.46694
DC 1.70472 to 6.59579
FL 3.20572 to 9.62666
GA 3.85139 to 7.79519
IA 1.06771 to 7.37868
ID 2.83795
IL 3.09743 to 6.44515
KY 4.94241 to 7.51344
LA 1.68953 to 2.76435
MA 3.27227
MD 5.3831 to 9.16597
MI 2.11142 to 6.26574
MN 1.76756 to 7.2374
MO 5.13771 to 8.50846
MS 6.75352
NC 5.08627 to 8.36667
NE 2.02068
NY 9.16335
OH 1.53423 to 5.33724
OR 1.88871
PA 6.88837
SC 7.37903
TN 3.22839
TX 2.63324 to 11.10742
VA 1.76285 to 5.2077
WA 3.10199 to 6.96012
116
Average Second Y Distance Continued...
Education
Bachelor 1.68953 to 18.98124
Grammar 1.06771
High School 3.22839 to 11.40787
Master 0.76482 to 15.10384
MD 3.09743 to 6.44515
PHD 1.53423 to 5.33724
Some college 1.30732 to 16.67643
Vocational 3.93314
Area
Rural 2.48434 to 12.7076
Small Town 1.09057 to 17.44381
Suburb 1.30732 to 18.98124
Urban 0.76482 to 11.10742
Height
4.0-5.0 3.27227 to 5.97455
5.1-5.3 2.48434 to 16.67643
5.4-5.6 1.30732 to 16.62409
5.7-5.9 1.53423 to 20.11604
5.10-6.0 1.06771 to 8.46621
6.1-6.3 0.76482 to 7.2374
6.4-6.6 5.65771
6.7-6.9 8.42994
6.10-7.0 2.82452
117
Total Second Y Distance
Gender
Female 9.22929 to 125.33082
Male 14.6623 to 102.74155
Ethnicity
African Am 31.21441 to 73.10173
Asian 21.76651 to 26.58681
Hispanic 36.28649 to 39.22791
White 9.22929 to 84.39188
118
State
AL 15.0501 to 83.69464
CA 21.27234 to 36.28649
CO 21.51626
DC 40.44849
FL 64.77386 to 73.10173
GA 58.55602 to 63.5027
IA 32.68834 to 38.24517
ID 21.63775
IL 9.22929
KY 27.36293
LA 20.66605
MA 74.36435
MD 31.21441 to 31.91552
MN 58.65523 to 61.37713
MO 75.72876 to 81.3068
MS 25.6336
NE 14.6623
NY 14.52795
OH 41.07338 to 42.37182
OR 58.74272
PA 58.8329
TN 35.76295
TX 40.51476 to 43.82436
WA 32.10158
119
Total Second Y Distance Continued...
Education
Bachelor 20.66605 to 83.69464
Grammar 63.10697 and 125.33082
High School 35.76295 to 38.24517
Master 15.0501 to 82.374
MD 9.22929 and 42.43057
PHD 32.68834 to 48.42309
Some college 20.67707 to 74.3866
Vocational 161.03839
Area
Rural 22.50552 to 24.90808
Small Town 18.81512 to 62.91221
Suburb 14.52795 to 84.39188
Urban 9.22929 to 58.65074
Height
4.0-5.0 72.65738 to 74.36435
5.1-5.3 9.22929 to 63.5027
5.4-5.6 20.66605 to 84.30005
5.7-5.9 20.67707 to 75.72876
5.10-6.0 24.90808 to 59.23719
6.1-6.3 21.63775 to 32.68834
6.4-6.6 59.91936
6.7-6.9 140.9927
6.10-7.0 27.18616
120
Average Second X Distance
Gender
Female 78.18906 to 182.41839
Male 76.49549 to 142.53808
Ethnicity
African Am 76.49549 to 142.83182
Asian 97.96869 to 102.7372
Hispanic 101.67403 to 104.70597
White 84.47018 to 180.62882
121
State
AL 83.52587 to 180.62882
CA 93.86788 to 97.54918
CO 101.84681
DC 92.63231 and 117.7726
FL 101.67403 to 135.86438
GA 98.65994 to 102.18956
IA 104.38275 to 112.15444
ID 119.96576
IL 131.11741 and 142.0346
KY 87.8971 and 162.67324
LA 92.7988 and 131.01533
MA 166.78481
MD 142.83182 and 192.91273
MI 101.20673 to 105.40881
MN 99.06263 to 100.07749
MO 95.36461 and 104.98533
MS 108.14805 and 137.36484
NC 112.75407 to 113.76038
NE 104.2441
NY 129.1797
OH 95.10007 to 97.40779
OR 108.4114
PA 116.5901
SC 136.18319
TN 122.46823 and 170.94108
TX 87.69269 to 91.1281
VA 122.15614 to 123.55207
WA 101.9961 and 128.97889
122
Second X Distance Continued...
Education
Bachelor 83.52587 to 179.62909
Grammar 99.57493 and 112.15444
High School 104.38275 to 125.98373
Master 94.0926 to 156.35884
MD 131.11741 and 142.0346
PHD 87.69269 to 134.07075
Some college 93.06539 to 128.6602
Vocational 99.70921
Area
Rural 99.68339 to 132.10784
Small Town 76.49549 to 171.02154
Suburb 85.73891 to 142.83182
Urban 89.31884 to 123.55207
Height
4.0-5.0 106.30586 to 108.39525
5.1-5.3 91.1281 to 143.74624
5.4-5.6 84.47018 to 142.83182
5.7-5.9 87.88233 to 137.36484
5.10-6.0 93.06539 to 125.15975
6.1-6.3 100.07749 to 131.11741
6.4-6.6 107.47803
6.7-6.9 97.96869
6.10-7.0 102.01642
123
Average First Slope
Gender
Female -0.06508 to -0.00009
Female 0.00085 to 0.06542
Male -0.03581 to -0.00002
Male 0.00069 to 0.03931
Ethnicity
African Am -0.03077 to -0.00002
African Am 0.00172 to 0.02396
Asian -0.02027 to -0.00341
Asian 0.02443
Hispanic -0.01544 to -0.0013
Hispanic 0.00456 to 0.02385
White -0.06508 to -0.00009
White 0.00069 to 0.06542
124
Average First Slope Continued...
State
AL -0.06508 to -0.00064
AL 0.00069 to 0.05792
CA -0.03278 to -0.00017
CA 0.01528 to 0.03931
CO -0.0201
DC -0.01278 to -0.00919
FL -0.01494 to -0.00165
FL 0.00098 to 0.00172
GA -0.02813 to -0.00821
GA 0.00271
IA -0.02583 to -0.00276
IA 0.00086 to 0.03344
ID -0.00708
IL -0.01453 to -0.0063
KY -0.01113 to -0.00061
LA -0.01003 to -0.00009
MA 0.0454
125
MD 0.00816 to 0.01769
MI -0.00002
MI 0.01623 to 0.03491
MN -0.01046 to -0.00175
MN 0.0096 to 0.01316
MO 0.00184 to 0.01882
MS 0.00305
NC -0.03077 to -0.00488
NC 0.00456 to 0.00705
NE 0.02295
NY -0.03553
OH -0.02016 to -0.00147
OH 0.0067
OR -0.0013
PA -0.01717
SC -0.01212
TN 0.00085 to 0.01455
TX -0.05697 to -0.00385
TX 0.01909 to 0.02585
VA -0.03581 to -0.00131
VA 0.02055
WA -0.02093 and 0.06542
126
Average First Slope Continued...
Education
Bachelor -0.05697 to -0.00009
Bachelor 0.00069 to 0.06542
Grammar -0.0243 to -0.00525
High School -0.02119 to -0.00276
High School 0.00085 to 0.03491
Master -0.02417 to -0.00064
Master 0.00184 to 0.05792
MD -0.01453 to -0.0063
PHD -0.03581 to -0.00147
PHD 0.00098 to 0.02443
Some college -0.06508 to -0.00002
Some college 0.0027 to 0.05175
Vocational -0.02403
Area
Rural -0.03097 to -0.00341
Rural 0.01476 to 0.03491
Small Town -0.06508 to -0.00002
Small Town 0.00085 to 0.06542
Suburb -0.05697 to -0.00017
Suburb 0.00069 to 0.04558
Urban -0.02583 to -0.00147
Urban 0.00086 to 0.03344
127
Height
4.0-5.0 -0.02583
4.0-5.0 0.02501 to 0.0454
5.1-5.3 -0.03278 to -0.00009
5.1-5.3 0.00271 to 0.06542
5.4-5.6 -0.06508 to -0.00017
5.4-5.6 0.00085 to 0.02385
5.7-5.9 -0.05697 to -0.00276
5.7-5.9 0.00086 to 0.05792
5.10-6.0 -0.03581 to -0.00002
5.10-6.0 0.00098 to 0.0194
6.1-6.3 -0.02093 to -0.00536
6.1-6.3 0.00069 to 0.01094
6.4-6.6 -0.00443
6.7-6.9 -0.01882
6.10-7.0 -0.00476
128
Average Slope Between Points
Gender
Female -0.027925 to -0.000141
Female 0.000187 to 0.066688
Male -0.026729 to -0.000343
Male 0.000795 to 0.013462
Ethnicity
African Am -0.017488 to -0.000981
African Am 0.002743 to 0.012799
Asian -0.017617 to -0.002208
Hispanic -0.020726 to -0.002527
Hispanic 0.008018 to
White -0.027925 to -0.000141
White 0.000187 to 0.066688
129
Average Slope Between Points Continued...
State
AL -0.026729 to -0.000141
AL 0.000215 to 0.008716
CA -0.020726 to -0.006276
CA 0.000187 to 0.013462
CO -0.013631
DC -0.012747 to -0.012487
FL -0.01753 to -0.002527
GA -0.027925 to -0.003762
GA 0.007189
IA -0.021166 to -0.003804
ID -0.007681
IL -0.015346 and 0.002814
KY -0.015761 to -0.015108
LA -0.005995 to -0.003574
MA -0.010597
MD -0.009174 and 0.012799
MI -0.022877 to -0.003417
MI 0.008018
MN -0.019799 to -0.001587
130
MO -0.005814 to -0.001373
MS -0.010884
NC -0.009676 to -0.005234
NC 0.002364
NE 0.000795
NY -0.00867
OH -0.012849 to -0.002632
OR -0.013285
PA -0.0158
SC -0.000981
TN -0.002764 and 0.003036
TX -0.018391 to -0.000344
TX 0.001582 to 0.002319
VA -0.010248 to -0.00246
VA 0.066688
WA -0.008811 and 0.002016
131
Average Slope Between Points Continued...
Education
Bachelor -0.027925 to -0.000141
Bachelor 0.002319 to 0.066688
Grammar -0.010519 to -0.003804
High School -0.022877 to -0.006276
High School 0.003036
Master -0.026246 to -0.001089
Master 0.000215 to 0.012799
MD -0.015346 and 0.002814
PHD -0.018391 to -0.000343
PHD 0.000795
Some college -0.02123 to -0.00184
Some college 0.000187 to 0.002513
Vocational 0.002743
Area
Rural -0.022877 to -0.005032
Rural 0.002814
Small Town -0.026246 to -0.000343
Small Town 0.000795 to 0.066688
Suburb -0.027925 to -0.000141
Suburb 0.000187 to 0.013462
Urban -0.021166 to -0.001587
Urban 0.000215 to 0.002319
132
Height
4.0-5.0 -0.020907 to -0.010597
4.0-5.0 0.001582
5.1-5.3 -0.027925 to -0.000981
5.1-5.3 0.002319
5.4-5.6 -0.026729 to -0.000344
5.4-5.6 0.000215 to 0.066688
5.7-5.9 -0.025951 to -0.000343
5.7-5.9 0.000187 to 0.002513
5.10-6.0 -0.026246 to -0.000141
5.10-6.0 0.002743 to 0.008716
6.1-6.3 -0.019756 to -0.004101
6.1-6.3 0.002016 to 0.002814
6.4-6.6 -0.002208
6.7-6.9 -0.00753
6.10-7.0 -0.012586
133
Average Di erence Between Points
Gender
Female 127.2556 to 399.7623
Male 133.392 to 482.5931
Ethnicity
African Am 152.3675 to 301.8281
Asian 183.6524 to 371.154
Hispanic 159.1971 to 265.7289
White 133.392 to 482.5931
134
State
AL 133.392 to 482.5931
CA 181.1088 to 327.0401
CO 201.214
DC 199.3032 to 277.4044
FL 152.3675 to 259.496
GA 127.2556 to 326.1891
IA 168.036 to 388.1414
ID 198.5997
IL 195.3549 to 265.8221
KY 205.1516 to 278.3755
LA 207.8237 to 224.2535
MA 186.3697
MD 243.7944 to 256.8543
MI 159.1971 to 254.0115
MN 173.2406 to 278.3502
MO 225.3363 to 356.8013
MS 156.3135
NC 142.1766 to 237.0423
NE 232.1382
NY 286.2156
OH 170.7894 to 246.116
OR 256.2844
PA 227.4499
SC 159.1133
TN 250.8339 to 399.7623
TX 159.6963 to 265.6611
VA 161.8613 to 316.4243
WA 219.7375 to 248.5473
135
Average Di erence Between Points Continued...
Education
Bachelor 159.1133 to 371.154
Grammar 168.036 to 242.9888
High School 193.7717 to 388.1414
Master 133.392 to 482.5931
MD 195.3549 to 265.8221
PHD 159.6963 to 306.7029
Some college 127.2556 to 349.0001
Vocational 247.1431
Area
Rural 171.9593 to 349.0001
Small Town 127.2556 to 482.5931
Suburb 133.392 to 356.8013
Urban 168.036 to 399.7623
Height
4 to 5 186.3697 to 308.6529
5.1 to 5.3 159.1133 to 346.3599
5.4 to 5.6 159.1971 to 399.7623
5.7 to 5.9 127.2556 to 388.1414
5.10 to 6.0 145.1255 to 482.5931
6.1 to 6.3 156.3135 to 367.2222
6.4 to 6.6 221.1146
6.7 to 6.9 184.2821
6.10 to 7.0 133.392
136
Di erence For Y
Gender
Female 3.1695 to 24.202
Male 4.6963 to 25.4026
Ethnicity
African Am 3.1695 to 24.5502
Asian 9.9157 to 18.0594
Hispanic 4.3011 to 17.2095
White 3.5783 to 25.4026
137
State
AL 3.1695 to 25.4026
CA 9.8764 to 21.414
CO 12.6157
DC 12.6416 to 16.149
FL 9.217 to 21.5992
GA 5.8614 to 16.7694
IA 8.2849 to 24.1218
ID 14.29
IL 12.6083 to 21.7575
KY 16.3881 to 18.3383
LA 7.6347 to 7.8482
MA 15.7179
MD 16.7983
MI 4.3011 to 23.099
MN 3.8212 to 19.9132
MO 9.1064 to 11.3593
MS 16.5769
NC 6.1231 to 17.9883
NE 19.4994
NY 19.5033
OH 6.7098 to 24.5502
OR 17.2095
PA 15.2904 to
SC 16.2137 to
TN 8.195 to 11.8413
TX 6.0749 to 21.0472
VA 6.5411 to 18.2006
WA 7.2793 to 15.9411
138
Di erence For Y Continued...
Education
Bachelor 3.8212 to 24.202
Grammar 12.1297 to 13.8139
High School 8.7087 to 24.1218
Master 3.1695 to 25.4026
MD 12.6083 to 21.7575
PHD 6.7098 to 24.5502
Some college 3.5783 to 21.414
Vocational 15.2961
Area
Rural 4.6963 to 23.099
Small Town 3.5783 to 25.4026
Suburb 3.1695 to 21.414
Urban 3.8212 to 24.1218
Height
4 to 5 12.0287 to 17.1356
5.1 to 5.3 3.1695 to 23.099
5.4 to 5.6 3.8212 to 24.202
5.7 to 5.9 4.6963 to 24.5502
5.10 to 6.0 6.7098 to 25.4026
6.1 to 6.3 9.4881 to 21.7575
6.4 to 6.6 10.5669
6.7 to 6.9 13.9175
6.10 to 7.0 14.0919
139
Di erence For X
Gender
Female 429.6875 to 1371.0938
Male 542.9688 to 1328.125
Ethnicity
African Am 710.9375 to 1371.0938
Asian 757.8125 to 1285.1563
Hispanic 542.9688 to 1281.25
White 429.6875 to 1328.125
140
State
AL 621.0938 to 1285.1563
CA 542.9688 to 1230.4688
CO 1207.0313
DC 996.0938 to 1109.375
FL 968.75 to 1296.875
GA 429.6875 to 1371.0938
IA 671.875 to 1226.5625
ID 1191.4063
IL 976.5625 to 1328.125
KY 820.3125 to 1113.2813
LA 1039.0625 to 1121.0938
MA 1117.1875
MD 1218.75 to 1281.25
MI 636.7188 to 1125
MN 832.0313 to 1148.4375
MO 675.7813 to 1164.0625
MS 1093.75
NC 648.4375 to 1277.3438
NE 1160.1563
NY 1144.5313
OH 984.375 to 1195.3125
OR 1281.25
PA 1136.7188
SC 1113.2813
TN 1199.2188 to 1253.9063
TX 531.25 to 1195.3125
VA 894.5313 to 1132.8125
WA 878.9063 to 1242.1875
141
Di erence For X Continued...
Education
Bachelor 429.6875 to 1296.875
Grammar 671.875 to 1214.8438
High School 828.125 to 1253.9063
Master 675.7813 to 1371.0938
MD 976.5625 to 1328.125
PHD 757.8125 to 1285.1563
Some college 621.0938 to 1277.3438
Vocational 988.2813
Area
Rural 621.0938 to 1328.125
Small Town 648.4375 to 1296.875
Suburb 429.6875 to 1371.0938
Urban 542.9688 to 1199.2188
Height
4 to 5 925.7813 to 1195.3125
5.1 to 5.3 429.6875 to 1238.2813
5.4 to 5.6 542.9688 to 1371.0938
5.7 to 5.9 531.25 to 1285.1563
5.10 to 6.0 671.875 to 1250
6.1 to 6.3 722.6563 to 1328.125
6.4 to 6.6 1105.4688
6.7 to 6.9 1105.4688
6.10 to 7.0 933.5938
142
Appendix B
Screen Shots of HTML Pages Used For Data Collection
Figure B.1: This is the information page from data gathering website
143
Figure B.2: This is the Demographic Survey page from data gathering website
144
Figure B.3: This is the Phone Instruction Page from data gathering website
145
Figure B.4: User ID and PIN given on Phone Instruction Page from data gathering website
Figure B.5: Four (4) digit number given on Phone Instruction Page from data gathering
website
Figure B.6: The message that all participants will leave located on the Phone Instruction
Page from data gathering website
146