SPE-L5: Speech Synthesis and Voice Conversion I |
| Session Type: Lecture |
| Time: Wednesday, 6 May, 09:00 - 11:00 |
| Location: On-Demand |
| Virtual Session: View on Virtual Platform |
| Session Chair: Junichi Yamagishi, National Institute of Informatics, Japan & University of Edinburgh, UK |
| SPE-L5.1: USING VAES AND NORMALIZING FLOWS FOR ONE-SHOT TEXT-TO-SPEECH SYNTHESIS OF EXPRESSIVE SPEECH |
| Vatsal Aggarwal; Amazon, Inc. |
| Marius Cotescu; Amazon, Inc. |
| Nishant Prateek; Amazon, Inc. |
| Jaime Lorenzo-Trueba; Amazon, Inc. |
| Roberto Barra-Chicote; Amazon, Inc. |
| SPE-L5.2: ZERO-SHOT MULTI-SPEAKER TEXT-TO-SPEECH WITH STATE-OF-THE-ART NEURAL SPEAKER EMBEDDINGS |
| Erica Cooper; National Institute of Informatics |
| Cheng-I Lai; Massachusetts Institute of Technology |
| Yusuke Yasuda; National Institute of Informatics |
| Fuming Fang; National Institute of Informatics |
| Xin Wang; National Institute of Informatics |
| Nanxin Chen; Johns Hopkins University |
| Junichi Yamagishi; National Institute of Informatics |
| SPE-L5.3: MELLOTRON: MULTISPEAKER EXPRESSIVE VOICE SYNTHESIS BY CONDITIONING ON RHYTHM, PITCH AND GLOBAL STYLE TOKENS |
| Rafael Valle; NVIDIA |
| Jason Li; NVIDIA |
| Ryan Prenger; NVIDIA |
| Bryan Catanzaro; NVIDIA |
| SPE-L5.4: LOCATION-RELATIVE ATTENTION MECHANISMS FOR ROBUST LONG-FORM SPEECH SYNTHESIS |
| Eric Battenberg; Google |
| RJ Skerry-Ryan; Google |
| Soroosh Mariooryad; Google |
| Daisy Stanton; Google |
| David Kao; Google |
| Matt Shannon; Google |
| Tom Bagby; Google |
| SPE-L5.5: PARALLEL WAVEGAN: A FAST WAVEFORM GENERATION MODEL BASED ON GENERATIVE ADVERSARIAL NETWORKS WITH MULTI-RESOLUTION SPECTROGRAM |
| Ryuichi Yamamoto; LINE Corporation |
| Eunwoo Song; Naver Corporation |
| Jae-Min Kim; Naver Corporation |
| SPE-L5.6: GAUSSIAN LPCNET FOR MULTISAMPLE SPEECH SYNTHESIS |
| Vadim Popov; Huawei Technologies |
| Mikhail Kudinov; Huawei Technologies |
| Tasnima Sadekova; Huawei Technologies |