Technical Program

Paper Detail

Paper IDE-3-1.2
Paper Title Optimal scale-invariant signal-to-noise ratio and curriculum learning for monaural multi-speaker speech separation in noisy environment
Authors Chao Ma, Dongmei Li, Xupeng Jia, Tsinghua University, China
Session E-3-1: Speech Separation 1
TimeThursday, 10 December, 12:30 - 14:00
Presentation Time:Thursday, 10 December, 12:45 - 13:00 Check your Time Zone
All times are in New Zealand Time (UTC +13)
Topic Speech, Language, and Audio (SLA):
Abstract In daily listening environments, speech is always distorted by background noise, room reverberation and interference speakers. With the developing of deep learning approaches, much progress has been performed on monaural multi-speaker speech separation. Nevertheless, most studies in this area focus on a simple problem setup of laboratory environment, which background noises and room reverberations are not considered. In this paper, we develop a new objective function named optimal scale-invariant signal-noise ratio (OSI-SNR), which are better than original SI-SNR at any circumstances. In addition, we propose a curriculum learning method based on conv-TasNet to deal with the notable effects of noises and interference speakers. By jointly using the OSI-SNR with curriculum learning method, our algorithm outperforms separation baseline substantially.