SPE-L5: Speech Synthesis and Voice Conversion I |
Session Type: Lecture |
Time: Wednesday, 6 May, 09:00 - 11:00 |
Location: On-Demand |
Virtual Session: View on Virtual Platform |
Session Chair: Junichi Yamagishi, National Institute of Informatics, Japan & University of Edinburgh, UK
|
|
SPE-L5.1: USING VAES AND NORMALIZING FLOWS FOR ONE-SHOT TEXT-TO-SPEECH SYNTHESIS OF EXPRESSIVE SPEECH |
Vatsal Aggarwal; Amazon, Inc. |
Marius Cotescu; Amazon, Inc. |
Nishant Prateek; Amazon, Inc. |
Jaime Lorenzo-Trueba; Amazon, Inc. |
Roberto Barra-Chicote; Amazon, Inc. |
|
SPE-L5.2: ZERO-SHOT MULTI-SPEAKER TEXT-TO-SPEECH WITH STATE-OF-THE-ART NEURAL SPEAKER EMBEDDINGS |
Erica Cooper; National Institute of Informatics |
Cheng-I Lai; Massachusetts Institute of Technology |
Yusuke Yasuda; National Institute of Informatics |
Fuming Fang; National Institute of Informatics |
Xin Wang; National Institute of Informatics |
Nanxin Chen; Johns Hopkins University |
Junichi Yamagishi; National Institute of Informatics |
|
SPE-L5.3: MELLOTRON: MULTISPEAKER EXPRESSIVE VOICE SYNTHESIS BY CONDITIONING ON RHYTHM, PITCH AND GLOBAL STYLE TOKENS |
Rafael Valle; NVIDIA |
Jason Li; NVIDIA |
Ryan Prenger; NVIDIA |
Bryan Catanzaro; NVIDIA |
|
SPE-L5.4: LOCATION-RELATIVE ATTENTION MECHANISMS FOR ROBUST LONG-FORM SPEECH SYNTHESIS |
Eric Battenberg; Google |
RJ Skerry-Ryan; Google |
Soroosh Mariooryad; Google |
Daisy Stanton; Google |
David Kao; Google |
Matt Shannon; Google |
Tom Bagby; Google |
|
SPE-L5.5: PARALLEL WAVEGAN: A FAST WAVEFORM GENERATION MODEL BASED ON GENERATIVE ADVERSARIAL NETWORKS WITH MULTI-RESOLUTION SPECTROGRAM |
Ryuichi Yamamoto; LINE Corporation |
Eunwoo Song; Naver Corporation |
Jae-Min Kim; Naver Corporation |
|
SPE-L5.6: GAUSSIAN LPCNET FOR MULTISAMPLE SPEECH SYNTHESIS |
Vadim Popov; Huawei Technologies |
Mikhail Kudinov; Huawei Technologies |
Tasnima Sadekova; Huawei Technologies |
|