Paper ID | F-2-3.2 |
Paper Title |
Deep Residual Network-Based Augmented Kalman Filter for Speech Enhancement |
Authors |
Sujan Kumar Roy, Kuldip K. Paliwal, Griffith University, Australia |
Session |
F-2-3: Speech Enhancement 2 |
Time | Wednesday, 09 December, 17:15 - 19:15 |
Presentation Time: | Wednesday, 09 December, 17:30 - 17:45 Check your Time Zone |
|
All times are in New Zealand Time (UTC +13) |
Topic |
Speech, Language, and Audio (SLA): |
Abstract |
Speech enhancement using augmented Kalman filter (AKF) suffers from the inaccurate estimates of the key parameters, linear prediction coefficients (LPCs) of speech and noise signal in noisy conditions. The existing AKF particularly enhances speech in colored noise conditions. In this paper, a deep residual network (ResNet)-based method utilizes the LPC estimates of the AKF for speech enhancement in various noise conditions. Specifically, a ResNet20 (constructed with 20 layers) gives an estimate of the noise waveform for each noisy speech frame to compute the noise LPC parameters. Each noisy speech frame is pre-whitened by a whitening filter, which is constructed with the corresponding noise LPCs. The speech LPC parameters are computed from the pre-whitened speech. The improved speech and noise LPC parameters enable the AKF to minimize residual noise as well as distortion in the enhanced speech. Objective and subjective testing on NOIZEUS corpus reveal that the proposed method exhibits higher quality and intelligibility in the enhanced speech than some benchmark methods in various noise conditions for a wide range of SNR levels. |