Technical Program

Paper Detail

Paper ID	C-3-3.1
Paper Title	Speaker Verification System Based on Deformable CNN and Time-Frequency Attention
Authors	Yiming Zhang, Beijing University of Posts and Telecommunications, China; Hong Yu, Ludong University, China; Zhanyu Ma, Beijing University of Posts and Telecommunications, China
Session	C-3-3: Machine Learning for Small-sample Data Analysis
Time	Thursday, 10 December, 17:30 - 19:30
Presentation Time:	Thursday, 10 December, 17:30 - 17:45 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Machine Learning and Data Analytics (MLDA): Special Session: Machine Learning for Small-sample Data Analysis
Abstract	Speaker verification (SV), especially short utterances SV, needs to be robust under complex noisy and far-field conditions. Majority of recent works apply attention mechanism on aggregation of frame-level speaker embeddings which are extracted by deep neural network. In this paper, a novel speaker verification system based on the deformable convolution module and the time-frequency attention module has been proposed. In the deformable convolution module, the convolutional sampling locations are adaptively adjusted by additional offsets which are learnt from the spectrogram. Meanwhile, in order to extract the more effective speaker discrimination information for short utterances, the time-frequency attention module is used to help the system focus on the important regions of the short utterances along the time and the frequency domain. Experiments on the HI-MIA database show that the proposed modules can improve the equal error rate (EER) of speaker verification system by relatively 24% compared to the baseline model, at a result of 8.51%.