Classifying Speakers Using Voice Biometrics In a Multimodal World




Rouse, Kenneth

Type of Degree



Computer Science


The following dissertation document is a research study conducted to determine whether a classi cation for a person is obtainable by using the person's voice. The intent of this work was to investigate a collection of voice samples for trends that potentially lead to parameters to be used in the classi cation of an individual. No classi cation area was sought speci cally; for instance gender or ethnicity, as it was preferred to allow the results to dictate the characteristics that point to a particular classi cation group. In the data collection stage, each participant was given the same task and then analysis was done on the voice sample given. Analysis was conducted in phases, with the rst phase focusing on the time domain which resulted with parameters approximating speed of speech and the amount of pauses in the sample. Next the frequency domain was investigated focusing on the complexity of speech and voice tone attributes. The results of the inquiries into this domain concluded with the peaks, in the frequency of the voice, being tracked by frequency threads and represented numerically by a third order polynomial. It is the coe cients of this polynomial that give a representation of an individual's voice, making it possible to classify them to a particular group. To verify this, the coe cients from these polynomials iv were used with a clustering application to validate the hypotheses of this study, substantiating an objective to provide empirical user data to contribute to the design of future phone system communications.