Paper ID | E-3-1.2 |
Paper Title |
Optimal scale-invariant signal-to-noise ratio and curriculum learning for monaural multi-speaker speech separation in noisy environment |
Authors |
Chao Ma, Dongmei Li, Xupeng Jia, Tsinghua University, China |
Session |
E-3-1: Speech Separation 1 |
Time | Thursday, 10 December, 12:30 - 14:00 |
Presentation Time: | Thursday, 10 December, 12:45 - 13:00 Check your Time Zone |
|
All times are in New Zealand Time (UTC +13) |
Topic |
Speech, Language, and Audio (SLA): |
Abstract |
In daily listening environments, speech is always distorted by background noise, room reverberation and interference speakers. With the developing of deep learning approaches, much progress has been performed on monaural multi-speaker speech separation. Nevertheless, most studies in this area focus on a simple problem setup of laboratory environment, which background noises and room reverberations are not considered. In this paper, we develop a new objective function named optimal scale-invariant signal-noise ratio (OSI-SNR), which are better than original SI-SNR at any circumstances. In addition, we propose a curriculum learning method based on conv-TasNet to deal with the notable effects of noises and interference speakers. By jointly using the OSI-SNR with curriculum learning method, our algorithm outperforms separation baseline substantially. |