Technical Program

Paper Detail

Paper ID	E-3-1.2
Paper Title	Optimal scale-invariant signal-to-noise ratio and curriculum learning for monaural multi-speaker speech separation in noisy environment
Authors	Chao Ma, Dongmei Li, Xupeng Jia, Tsinghua University, China
Session	E-3-1: Speech Separation 1
Time	Thursday, 10 December, 12:30 - 14:00
Presentation Time:	Thursday, 10 December, 12:45 - 13:00 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Speech, Language, and Audio (SLA):
Abstract	In daily listening environments, speech is always distorted by background noise, room reverberation and interference speakers. With the developing of deep learning approaches, much progress has been performed on monaural multi-speaker speech separation. Nevertheless, most studies in this area focus on a simple problem setup of laboratory environment, which background noises and room reverberations are not considered. In this paper, we develop a new objective function named optimal scale-invariant signal-noise ratio (OSI-SNR), which are better than original SI-SNR at any circumstances. In addition, we propose a curriculum learning method based on conv-TasNet to deal with the notable effects of noises and interference speakers. By jointly using the OSI-SNR with curriculum learning method, our algorithm outperforms separation baseline substantially.