Technical Program

Paper Detail

Paper ID	F-3-1.4
Paper Title	Exploring Feature Enhancement in The Modulation Spectrum Domain via Ideal Ratio Mask for Robust Speech Recognition
Authors	Bi-Cheng Yan, National Taiwan Normal University, Taiwan, Taiwan; Meng-Che Wu, ASUS, Taiwan; Berlin chen, National Taiwan Normal University, Taiwan, Taiwan
Session	F-3-1: Speech Enhancement 3
Time	Thursday, 10 December, 12:30 - 14:00
Presentation Time:	Thursday, 10 December, 13:15 - 13:30 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Speech, Language, and Audio (SLA):
Abstract	Development of robustness techniques is of paramount importance to the success of automatic speech recognition (ASR) systems. In this paper, we present a novel use of the ideal ratio mask (IRM) method to improve ASR robustness. IRM was originally proposed for time-frequency (T-F) masking-based speech enhancement and has shown considerable promise in preserving the intelligibility of a noisy mixture signal. Further, IRM is alternatively used to normalize the intermediate representations of speech feature vector sequences, in a holistic manner, for both training and test utterances. Finally, we instead treat IRM as a data augmentation method, conducted on speech feature vectors of training utterances or their intermediate representations, to generate additional augmented data for increasing the diversity of training data. A series of experiments carried out on the standard Aurora-4 database and task confirm the effectiveness of our methods.