Technical Program

Paper Detail

Paper ID	F-2-1.6
Paper Title	ADVERSARIAL POST-PROCESSING OF VOICE CONVERSION AGAINST SPOOFING DETECTION
Authors	Yi-Yang Ding, Jing-Xuan Zhang, University of Science and Technology of China, China; Li-Juan Liu, Yuan Jiang, Yu Hu, iFLYTEK Co., Ltd., China; Zhen-Hua Ling, University of Science and Technology of China, China
Session	F-2-1: Speaker Recognition 1, Language Recognition
Time	Wednesday, 09 December, 12:30 - 14:00
Presentation Time:	Wednesday, 09 December, 13:45 - 14:00 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Speech, Language, and Audio (SLA):
Abstract	With the development of speech synthesis and voice conversion techniques, the anti-spoofing task that detects artificial speech signals has received more and more research attentions recently. State-of-the-art spoofing detectors can distinguish the utterances generated by voice conversion from natural ones with high accuracy. This paper proposes a method that improves the ability of voice conversion models against spoofing detection by post-processing the converted speech using a neural network. The network is built using long short-term memories (LSTM) and trained by reducing the distance between the linear frequency cepstrum coefficients (LFCC) of converted utterances and natural references. In our experiments, the SAS dataset was adopted to construct the anti-spoofing system, and the VCTK dataset was used to build voice conversion models. Experimental results show that our proposed method can reduce the detection rate of the anti-spoofing system significantly without losing subjective performance of converted speech.