Technical Program

Paper Detail

Paper ID	E-3-2.3
Paper Title	Computer-Resource-Aware Deep Speech Separation with a Run-Time-Specified Number of BLSTM Layers
Authors	Masahito Togami, Line corporation, Japan; Yoshiki Masuyama, Waseda University, Japan; Tatsuya Komatsu, Line corporation, Japan; Kazuyoshi Yoshii, Tatsuya Kawahara, Kyoto University, Japan
Session	E-3-2: Speech Separation 2, Sound source separation
Time	Thursday, 10 December, 15:30 - 17:15
Presentation Time:	Thursday, 10 December, 16:00 - 16:15 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Speech, Language, and Audio (SLA):
Abstract	Recently, deep neural networks (DNNs) with multiple bidirectional long short term memory (BLSTM) layers have been successfully applied to supervised multi-channel speech separation. When it is applied for industrial products, one shortage is that the number of the BLSTM layers is not variable according to the available computational resource once the DNN is trained. Since available computational resource varies from device to device, it is preferable that the number of the BLSTM layers can be changed for optimal performance. In this paper, we propose a DNN based speech separation, in which each BLSTM layer is connected with a signal processing layer. It can output a separated speech signal, which can also be fed into the successive BLSTM layer. The proposed method trains two types of BLSTM layers. The first one is utilized for initialization of speech separation. The second one is utilized for enhancing separation performance. The proposed method can increase the number of the BLSTM layers by stacking the second type of the BLSTM layer to improve separation performance. Experimental results show that the proposed method is effective.