Technical Program

Paper Detail

Paper ID	F-2-2.2
Paper Title	Optimizing Speaker Embeddings using Meta-Training Sets
Authors	Nakamasa Inoue, Keita Goto, Tokyo Institute of Technology, Japan
Session	F-2-2: Speaker Recognition 2, Sound Classification
Time	Wednesday, 09 December, 15:30 - 17:00
Presentation Time:	Wednesday, 09 December, 15:45 - 16:00 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Speech, Language, and Audio (SLA):
Abstract	This paper presents a method to learn speaker embeddings for text-independent speaker verification. The proposed method aims to optimize embeddings for unseen enrollment/test speakers by training a network with a meta-training set. The main procedure consists of two steps. The first step generates a meta-training set, a set of episodes each with a pair of intra-episode training and testing sets. The second step optimizes network parameters so that the average verification performance over the generated episodes is maximized. An advantage of our approach lies in its complementarity to studies focusing on network structure and we demonstrate its effectiveness with recent ResNet based models in experiments on the VoxCeleb dataset.