Paper ID | F-3-2.2 |
Paper Title |
ONLINE SPEAKER ADAPTATION FOR WAVENET-BASED NEURAL VOCODERS |
Authors |
Qiuchen Huang, Yang Ai, Zhenhua Ling, University of Science and Technology of China, China |
Session |
F-3-2: Speech Synthesis |
Time | Thursday, 10 December, 15:30 - 17:15 |
Presentation Time: | Thursday, 10 December, 15:45 - 16:00 Check your Time Zone |
|
All times are in New Zealand Time (UTC +13) |
Topic |
Speech, Language, and Audio (SLA): |
Abstract |
In this paper, we propose an online speaker adaptation method for WaveNet-based neural vocoders in order to improve their performance on speaker-independent waveform generation. In this method, a speaker encoder is first constructed using a large speaker-verification dataset which can extract a speaker embedding vector from an utterance pronounced by an arbitrary speaker. At the training stage, a speaker-aware WaveNet vocoder is then built using a multi-speaker dataset which adopts both acoustic feature sequences and speaker embedding vectors as conditions. At the generation stage, we first feed the acoustic feature sequence from a test speaker into the speaker encoder to obtain the speaker embedding vector of the utterance. Then, both the speaker embedding vector and acoustic features pass the speaker-aware WaveNet vocoder to reconstruct speech waveforms. Experimental results demonstrate that our method can achieve a better objective and subjective performance on reconstructing waveforms of unseen speakers than the conventional speaker-independent WaveNet vocoder. |