Post-Speech-Recognition Processing in Domain-Specific Text-Corpus-Based Distributed Listening System: Analysis, Interpretation and Selection of Speech Recognition Results
Type of DegreeThesis
Computer Science and Software Engineering
MetadataShow full item record
Achieving usable recognition rates has been an almost never-ending quest in speech recognition research for more than three decades. Recently speech recognition rates have dramatically improved in conjunction with the rapid development of computer technology, but it has never been enough to satisfy human expectation. Many researchers tried to testify the benefit of using multiple speech recognizers in improving recognition rates. The fundamental idea supporting this research trend is that recognition results agreed upon by a majority of recognizers can be considered correct. This paper tries to break the old idea which may prevent multi-recognizer researches forever from achieving usable recognition rates, by revealing the existence of common misrecognition (CMR) results agreed upon by the majority. The common misrecognition results are classified into several categories (contraction, missed words, spoken stop words, homophone, and combined misrecognition) and treated according to their characteristics. A collection of sentences users may speak (simple text-corpus) is used in order to overcome very low sentence recognition rates of speech systems. It is suggested that composite information made out of multiple recognition results is enough to correctly find its actual target sentence among thousands of sentences in a specific domain. Overall the results (87% of sentence recognition rate) of experiments conducted in this research strongly support that the processes described in this paper can greatly improve speech recognition rates of multi-recognizer systems.