Technical Program

Paper Detail

Paper ID	C-3-2.3
Paper Title	IMPROVING KEYWORDS SPOTTING PERFORMANCE IN NOISE WITH AUGMENTED DATASET FROM VOCODED SPEECH
Authors	Ruohao Li, Kaibao Nie, University of Washington Bothell, United States
Session	C-3-2: Machine Learning and Data Analysis 2
Time	Thursday, 10 December, 15:30 - 17:15
Presentation Time:	Thursday, 10 December, 16:00 - 16:15 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Machine Learning and Data Analytics (MLDA):
Abstract	While more and more electronic devices have an on-device speech recognition system, producing and deploying trained models for keyword(s) detection is becoming more and more demanding. The dataset preparation is one of the most challenging and tedious tasks in Keywords Spotting (KWS) since it requires a significant amount of time obtaining raw or segmented audio speeches. In this paper, we proposed a data augmentation strategy using a speech vocoder to artificially generate vocoded speech at different numbers of channels. A trained KWS system was first tested with vocoded speech and it showed consistent performance with studies from human subjects listening to vocoded speeches. Furthermore, the KWS system trained with the augmented dataset showed promising improvement evaluated at +10 dB SNR.