Paper ID | C-3-2.3 |
Paper Title |
IMPROVING KEYWORDS SPOTTING PERFORMANCE IN NOISE WITH AUGMENTED DATASET FROM VOCODED SPEECH |
Authors |
Ruohao Li, Kaibao Nie, University of Washington Bothell, United States |
Session |
C-3-2: Machine Learning and Data Analysis 2 |
Time | Thursday, 10 December, 15:30 - 17:15 |
Presentation Time: | Thursday, 10 December, 16:00 - 16:15 Check your Time Zone |
|
All times are in New Zealand Time (UTC +13) |
Topic |
Machine Learning and Data Analytics (MLDA): |
Abstract |
While more and more electronic devices have an on-device speech recognition system, producing and deploying trained models for keyword(s) detection is becoming more and more demanding. The dataset preparation is one of the most challenging and tedious tasks in Keywords Spotting (KWS) since it requires a significant amount of time obtaining raw or segmented audio speeches. In this paper, we proposed a data augmentation strategy using a speech vocoder to artificially generate vocoded speech at different numbers of channels. A trained KWS system was first tested with vocoded speech and it showed consistent performance with studies from human subjects listening to vocoded speeches. Furthermore, the KWS system trained with the augmented dataset showed promising improvement evaluated at +10 dB SNR. |