Technical Program

Paper Detail

Paper ID	E-2-3.4
Paper Title	QUERY-BY-EXAMPLE SPOKEN TERM DETECTION USING GENERATIVE ADVERSARIAL NETWORK
Authors	Neil Shah, Sreeraj R, Dhirubhai Ambani Institute of Information and Communication Technology, India; Maulik Madhavi, National University of Singapore, Singapore; Nirmesh Shah, Hemant Patil, Dhirubhai Ambani Institute of Information and Communication Technology, India
Session	E-2-3: Speech Recognition
Time	Wednesday, 09 December, 17:15 - 19:15
Presentation Time:	Wednesday, 09 December, 18:00 - 18:15 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Speech, Language, and Audio (SLA):
Abstract	Several Neural Network (NN)-based representation techniques have already been proposed for Query-by-Example Spoken Term Detection (QbE-STD) task. The recent advancement in Generative Adversarial Network (GAN) for several speech technology applications, motivated us to explore the GAN in QbE-STD. In this work, we propose to exploit GAN with the regularized cross-entropy loss, and develop a framework featuring GAN, trained using Gaussian Mixture Model (GMM)-based posterior labels. The proposed GAN maps the speech-specific features to the unsupervised posterior labels. This mapping represents the speech through an unsupervised GAN posteriorgram (uGAN-PG), for matching the query (keyword) with the utterances in the document. The QbE-STD, using the proposed posteriorgram is performed on the TIMIT database. We compare the performance of the proposed uGAN-PG with the unsupervised DNN posteriorgram (uDNN-PG) and obtained the relative performance improvement of 10.32 % Mean Average Precision and 5.6 % precision@N over uDNN-PG.