Technical Program

Paper Detail

Paper ID	F-2-2.1
Paper Title	Context-adaptive Gaussian Attention for Text-independent Speaker Verification
Authors	Junyi Peng, Rongzhi Gu, Haoran Zhang, Yuexian Zou, Peking University Shenzhen Graduate School, China
Session	F-2-2: Speaker Recognition 2, Sound Classification
Time	Wednesday, 09 December, 15:30 - 17:00
Presentation Time:	Wednesday, 09 December, 15:30 - 15:45 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Speech, Language, and Audio (SLA):
Abstract	Multi-head attention (MHA) has shown its effectiveness on aggregating frame-level features for speaker verification task. However, MHA weights each frame individually without considering context information which is important for modeling speaker characteristics of the speech. Based on the assumption that the highly relevant context information should follow a temporal Gaussian distribution, we propose a novel variant of multi-head attention, named as context-adaptive Gaussian attention (CGA), which employs a set of Gaussian functions with different parameters to dynamically model the distributions of the weights obtained from each head. Furthermore, a Gaussian Clustering algorithm (GC) is designed to merge the overlapped Gaussian distributions between different heads. In this way, the proposed method can facilitate the model to better capture multi-span context information compared to the traditional multi-head attention. Experiments on Voxceleb1 dataset demonstrate that the proposed CGA outperforms the state-of-the-art pooling approaches.