Technical Program

Paper Detail

Paper ID	E-2-3.5
Paper Title	REDUCTION OF SPEECH DATA POSTERIORGRAMS BY COMPRESSING MAXIMUM-LIKELIHOOD STATE SEQUENCES IN QUERY BY EXAMPLE
Authors	Takashi Yokota, Kazunori Kojima, Iwate Prefectural University, Japan; Shi-wook Lee, National Institute of Advanced Industrial Science and Technology, Japan; Yoshiaki Itoh, Iwate Prefectural University, Japan
Session	E-2-3: Speech Recognition
Time	Wednesday, 09 December, 17:15 - 19:15
Presentation Time:	Wednesday, 09 December, 18:15 - 18:30 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Speech, Language, and Audio (SLA):
Abstract	Spoken term detection (STD) has been a hot topic in speech retrieval researches. STD is a task of finding matched sections in speech data with a query consisting of one or more words. Query-by-example (QbE) has also been an important research in STD, and uses spoken queries. Although the use of posteriorgrams [1][2] that are sequences of output probabilities generated by deep neural networks from speech data is a promising way in QbE, it requires a long retrieval time and a large amount of memory space for posteriorgrams of speech data. We have proposed the method that replaces posteriorgrams of a spoken query to a sequence of a state number of Triphone Hidden Markov Models (HMM) and omits the calculation of the local distance. Though the method realized the large reduction of the retrieval time, it still requires a large amount of memory space to store the posteriorgrams of speech data. The paper, therefore, proposes a method to reduce memory space and the retrieval time by compressing posteriorgrams of speech data into sets of posterior probability vectors for each utterance, each speech document or all speech data instead of holding all posterior probability vectors for each frame of speech data.