Technical Program

Paper Detail

Paper IDF-2-3.2
Paper Title Deep Residual Network-Based Augmented Kalman Filter for Speech Enhancement
Authors Sujan Kumar Roy, Kuldip K. Paliwal, Griffith University, Australia
Session F-2-3: Speech Enhancement 2
TimeWednesday, 09 December, 17:15 - 19:15
Presentation Time:Wednesday, 09 December, 17:30 - 17:45 Check your Time Zone
All times are in New Zealand Time (UTC +13)
Topic Speech, Language, and Audio (SLA):
Abstract Speech enhancement using augmented Kalman filter (AKF) suffers from the inaccurate estimates of the key parameters, linear prediction coefficients (LPCs) of speech and noise signal in noisy conditions. The existing AKF particularly enhances speech in colored noise conditions. In this paper, a deep residual network (ResNet)-based method utilizes the LPC estimates of the AKF for speech enhancement in various noise conditions. Specifically, a ResNet20 (constructed with 20 layers) gives an estimate of the noise waveform for each noisy speech frame to compute the noise LPC parameters. Each noisy speech frame is pre-whitened by a whitening filter, which is constructed with the corresponding noise LPCs. The speech LPC parameters are computed from the pre-whitened speech. The improved speech and noise LPC parameters enable the AKF to minimize residual noise as well as distortion in the enhanced speech. Objective and subjective testing on NOIZEUS corpus reveal that the proposed method exhibits higher quality and intelligibility in the enhanced speech than some benchmark methods in various noise conditions for a wide range of SNR levels.