Paper ID | F-2-2.1 |
Paper Title |
Context-adaptive Gaussian Attention for Text-independent Speaker Verification |
Authors |
Junyi Peng, Rongzhi Gu, Haoran Zhang, Yuexian Zou, Peking University Shenzhen Graduate School, China |
Session |
F-2-2: Speaker Recognition 2, Sound Classification |
Time | Wednesday, 09 December, 15:30 - 17:00 |
Presentation Time: | Wednesday, 09 December, 15:30 - 15:45 Check your Time Zone |
|
All times are in New Zealand Time (UTC +13) |
Topic |
Speech, Language, and Audio (SLA): |
Abstract |
Multi-head attention (MHA) has shown its effectiveness on aggregating frame-level features for speaker verification task. However, MHA weights each frame individually without considering context information which is important for modeling speaker characteristics of the speech. Based on the assumption that the highly relevant context information should follow a temporal Gaussian distribution, we propose a novel variant of multi-head attention, named as context-adaptive Gaussian attention (CGA), which employs a set of Gaussian functions with different parameters to dynamically model the distributions of the weights obtained from each head. Furthermore, a Gaussian Clustering algorithm (GC) is designed to merge the overlapped Gaussian distributions between different heads. In this way, the proposed method can facilitate the model to better capture multi-span context information compared to the traditional multi-head attention. Experiments on Voxceleb1 dataset demonstrate that the proposed CGA outperforms the state-of-the-art pooling approaches. |