Paper ID | F-3-2.3 |
Paper Title |
IMPLEMENTATION OF SEQUENTIAL REAL-TIME WAVEFORM GENERATOR FOR HIGH-QUALITY VOCODER |
Authors |
Masanori Morise, Meiji University, Japan |
Session |
F-3-2: Speech Synthesis |
Time | Thursday, 10 December, 15:30 - 17:15 |
Presentation Time: | Thursday, 10 December, 16:00 - 16:15 Check your Time Zone |
|
All times are in New Zealand Time (UTC +13) |
Topic |
Speech, Language, and Audio (SLA): |
Abstract |
We describe an implementation of real-time waveform generation from vocoded speech parameters. High-quality vocoders such as STRAIGHT and WORLD have been used for voice conversion and statistical parametric speech synthesis. The current implementation of such vocoders has a function for generating the whole waveform from the speech parameters in all frames at one time. To sequentially generate a short-period waveform, implementations such as realtime STRAIGHT have been proposed. However, the generated speech waveform is inferior in sound quality to that of the original vocoder. To achieve sequential real-time waveform generation, a struct named WorldSynthesizer (WS struct) and six functions were implemented. The implementation is based on the WORLD vocoder, and it can generate the completely same waveform as the original except for the several points such as random seed used for generating the white noise. We therefore evaluated its processing speed by using the real time factor (RTF). The results showed that the processing speed of the proposed implementation decreased by 14.5% compared with the original WORLD. On the other hand, the RTF of the proposed implementation calculated from female speech was below 0.1, which suggests that the implementation is able to carry out real-time synthesis. |