Paper ID | F-2-3.3 |
Paper Title |
A STUDY ON MORE REALISTIC ROOM SIMULATION FOR FAR-FIELD KEYWORD SPOTTING |
Authors |
Eric Bezzam, Sonos Inc., France; Robin Scheibler, Line Corporation, Japan; Cyril Cadoux, École Polytechnique Fédérale de Lausanne, Switzerland; Thibault Gisselbrecht, Sonos Inc., France |
Session |
F-2-3: Speech Enhancement 2 |
Time | Wednesday, 09 December, 17:15 - 19:15 |
Presentation Time: | Wednesday, 09 December, 17:45 - 18:00 Check your Time Zone |
|
All times are in New Zealand Time (UTC +13) |
Topic |
Speech, Language, and Audio (SLA): |
Abstract |
We investigate the impact of more realistic room simulation for training far-field keyword spotting systems without fine-tuning on in-domain data. To this end, we study the impact of incorporating the following factors in the room impulse response (RIR) generation: air absorption, surface- and frequency-dependent coefficients of real materials, and stochastic ray tracing. Through an ablation study, a wake word task is used to measure the impact of these factors in comparison with a ground-truth set of measured RIRs. On a hold-out set of re-recordings under clean and noisy far-field conditions, we demonstrate up to 35.8% relative improvement over the commonly-used (single absorption coefficient) image source method. Source code is made available in the Pyroomacoustics package, allowing others to incorporate these techniques in their work. |