Technical Program

Paper Detail

Paper IDE-1-3.5
Paper Title FULL-SPHERE BINAURAL SOUND SOURCE LOCALIZATION USING MULTI-TASK NEURAL NETWORK
Authors Yichen Yang, Jingwei Xi, Wen Zhang, Lijun Zhang, Northwestern Polytechnical University, China
Session E-1-3: Array Processing of Microphones and Loud Speakers
TimeTuesday, 08 December, 17:15 - 19:15
Presentation Time:Tuesday, 08 December, 18:15 - 18:30 Check your Time Zone
All times are in New Zealand Time (UTC +13)
Topic Speech, Language, and Audio (SLA):
Abstract The accuracy of binaural sound source localization is faced with the challenge of localizing azimuth and elevation simultaneously in noisy and reverberant environments. In this work, a full-sphere binaural sound source localization system is proposed using convolutional neural network and multi-task neural network connected to learn the localization features. The log-magnitudes and interaural phase difference (IPD) of binaural signals are used as inputs to a two-branch convolutional neural network, from which interaural and monaural cues are extracted and combined. Then, the full-sphere localization is formulated as two subtasks of estimating azimuth and elevation separately using multi-task neural network. To reduce reverberation effects, the interaural coherence based pre-processing is used to select the direct-path dominated time-frequency bins for localization. The proposed system is evaluated at a variety of noise and reverberation conditions, in comparison with two baseline systems. The results indicate that the proposed system achieves better localization performance, especially for elevation estimation, at low SNR and strong reverberation conditions.