Technical Program

Paper Detail

Paper IDF-2-2.2
Paper Title Optimizing Speaker Embeddings using Meta-Training Sets
Authors Nakamasa Inoue, Keita Goto, Tokyo Institute of Technology, Japan
Session F-2-2: Speaker Recognition 2, Sound Classification
TimeWednesday, 09 December, 15:30 - 17:00
Presentation Time:Wednesday, 09 December, 15:45 - 16:00 Check your Time Zone
All times are in New Zealand Time (UTC +13)
Topic Speech, Language, and Audio (SLA):
Abstract This paper presents a method to learn speaker embeddings for text-independent speaker verification. The proposed method aims to optimize embeddings for unseen enrollment/test speakers by training a network with a meta-training set. The main procedure consists of two steps. The first step generates a meta-training set, a set of episodes each with a pair of intra-episode training and testing sets. The second step optimizes network parameters so that the average verification performance over the generated episodes is maximized. An advantage of our approach lies in its complementarity to studies focusing on network structure and we demonstrate its effectiveness with recent ResNet based models in experiments on the VoxCeleb dataset.