Paper ID | E-2-3.4 |
Paper Title |
QUERY-BY-EXAMPLE SPOKEN TERM DETECTION USING GENERATIVE ADVERSARIAL NETWORK |
Authors |
Neil Shah, Sreeraj R, Dhirubhai Ambani Institute of Information and Communication Technology, India; Maulik Madhavi, National University of Singapore, Singapore; Nirmesh Shah, Hemant Patil, Dhirubhai Ambani Institute of Information and Communication Technology, India |
Session |
E-2-3: Speech Recognition |
Time | Wednesday, 09 December, 17:15 - 19:15 |
Presentation Time: | Wednesday, 09 December, 18:00 - 18:15 Check your Time Zone |
|
All times are in New Zealand Time (UTC +13) |
Topic |
Speech, Language, and Audio (SLA): |
Abstract |
Several Neural Network (NN)-based representation techniques have already been proposed for Query-by-Example Spoken Term Detection (QbE-STD) task. The recent advancement in Generative Adversarial Network (GAN) for several speech technology applications, motivated us to explore the GAN in QbE-STD. In this work, we propose to exploit GAN with the regularized cross-entropy loss, and develop a framework featuring GAN, trained using Gaussian Mixture Model (GMM)-based posterior labels. The proposed GAN maps the speech-specific features to the unsupervised posterior labels. This mapping represents the speech through an unsupervised GAN posteriorgram (uGAN-PG), for matching the query (keyword) with the utterances in the document. The QbE-STD, using the proposed posteriorgram is performed on the TIMIT database. We compare the performance of the proposed uGAN-PG with the unsupervised DNN posteriorgram (uDNN-PG) and obtained the relative performance improvement of 10.32 % Mean Average Precision and 5.6 % precision@N over uDNN-PG. |