Paper ID | F-2-2.2 |
Paper Title |
Optimizing Speaker Embeddings using Meta-Training Sets |
Authors |
Nakamasa Inoue, Keita Goto, Tokyo Institute of Technology, Japan |
Session |
F-2-2: Speaker Recognition 2, Sound Classification |
Time | Wednesday, 09 December, 15:30 - 17:00 |
Presentation Time: | Wednesday, 09 December, 15:45 - 16:00 Check your Time Zone |
|
All times are in New Zealand Time (UTC +13) |
Topic |
Speech, Language, and Audio (SLA): |
Abstract |
This paper presents a method to learn speaker embeddings for text-independent speaker verification. The proposed method aims to optimize embeddings for unseen enrollment/test speakers by training a network with a meta-training set. The main procedure consists of two steps. The first step generates a meta-training set, a set of episodes each with a pair of intra-episode training and testing sets. The second step optimizes network parameters so that the average verification performance over the generated episodes is maximized. An advantage of our approach lies in its complementarity to studies focusing on network structure and we demonstrate its effectiveness with recent ResNet based models in experiments on the VoxCeleb dataset. |