Technical Program

Paper Detail

Paper IDF-2-1.6
Paper Title ADVERSARIAL POST-PROCESSING OF VOICE CONVERSION AGAINST SPOOFING DETECTION
Authors Yi-Yang Ding, Jing-Xuan Zhang, University of Science and Technology of China, China; Li-Juan Liu, Yuan Jiang, Yu Hu, iFLYTEK Co., Ltd., China; Zhen-Hua Ling, University of Science and Technology of China, China
Session F-2-1: Speaker Recognition 1, Language Recognition
TimeWednesday, 09 December, 12:30 - 14:00
Presentation Time:Wednesday, 09 December, 13:45 - 14:00 Check your Time Zone
All times are in New Zealand Time (UTC +13)
Topic Speech, Language, and Audio (SLA):
Abstract With the development of speech synthesis and voice conversion techniques, the anti-spoofing task that detects artificial speech signals has received more and more research attentions recently. State-of-the-art spoofing detectors can distinguish the utterances generated by voice conversion from natural ones with high accuracy. This paper proposes a method that improves the ability of voice conversion models against spoofing detection by post-processing the converted speech using a neural network. The network is built using long short-term memories (LSTM) and trained by reducing the distance between the linear frequency cepstrum coefficients (LFCC) of converted utterances and natural references. In our experiments, the SAS dataset was adopted to construct the anti-spoofing system, and the VCTK dataset was used to build voice conversion models. Experimental results show that our proposed method can reduce the detection rate of the anti-spoofing system significantly without losing subjective performance of converted speech.