Technical Program

Paper Detail

Paper IDC-3-2.3
Paper Title IMPROVING KEYWORDS SPOTTING PERFORMANCE IN NOISE WITH AUGMENTED DATASET FROM VOCODED SPEECH
Authors Ruohao Li, Kaibao Nie, University of Washington Bothell, United States
Session C-3-2: Machine Learning and Data Analysis 2
TimeThursday, 10 December, 15:30 - 17:15
Presentation Time:Thursday, 10 December, 16:00 - 16:15 Check your Time Zone
All times are in New Zealand Time (UTC +13)
Topic Machine Learning and Data Analytics (MLDA):
Abstract While more and more electronic devices have an on-device speech recognition system, producing and deploying trained models for keyword(s) detection is becoming more and more demanding. The dataset preparation is one of the most challenging and tedious tasks in Keywords Spotting (KWS) since it requires a significant amount of time obtaining raw or segmented audio speeches. In this paper, we proposed a data augmentation strategy using a speech vocoder to artificially generate vocoded speech at different numbers of channels. A trained KWS system was first tested with vocoded speech and it showed consistent performance with studies from human subjects listening to vocoded speeches. Furthermore, the KWS system trained with the augmented dataset showed promising improvement evaluated at +10 dB SNR.