Technical Program

Paper Detail

Paper IDF-3-1.4
Paper Title Exploring Feature Enhancement in The Modulation Spectrum Domain via Ideal Ratio Mask for Robust Speech Recognition
Authors Bi-Cheng Yan, National Taiwan Normal University, Taiwan, Taiwan; Meng-Che Wu, ASUS, Taiwan; Berlin chen, National Taiwan Normal University, Taiwan, Taiwan
Session F-3-1: Speech Enhancement 3
TimeThursday, 10 December, 12:30 - 14:00
Presentation Time:Thursday, 10 December, 13:15 - 13:30 Check your Time Zone
All times are in New Zealand Time (UTC +13)
Topic Speech, Language, and Audio (SLA):
Abstract Development of robustness techniques is of paramount importance to the success of automatic speech recognition (ASR) systems. In this paper, we present a novel use of the ideal ratio mask (IRM) method to improve ASR robustness. IRM was originally proposed for time-frequency (T-F) masking-based speech enhancement and has shown considerable promise in preserving the intelligibility of a noisy mixture signal. Further, IRM is alternatively used to normalize the intermediate representations of speech feature vector sequences, in a holistic manner, for both training and test utterances. Finally, we instead treat IRM as a data augmentation method, conducted on speech feature vectors of training utterances or their intermediate representations, to generate additional augmented data for increasing the diversity of training data. A series of experiments carried out on the standard Aurora-4 database and task confirm the effectiveness of our methods.