SLP-P22.1

GLA-GRAD: A GRIFFIN-LIM EXTENDED WAVEFORM GENERATION DIFFUSION MODEL

Haocheng Liu, Teysir Baoueb, Mathieu Fontaine, Télécom Paris, France; Jonathan Le Roux, Mitsubishi Electric Research Laboratories (MERL), United States of America; Gaël Richard, Télécom Paris, France

Session:
SLP-P22: Text to Speech Generation - P2 Poster

Track:
Speech and Language Processing

Location:
Poster Zone 2A
Poster Board PZ-2A.1

Presentation Time:
Thu, 18 Apr, 13:10 - 15:10 (UTC +9)

Session Chair:
Midia Yousefi, Microsoft
View Manuscript
Presentation
Discussion
Resources
Session SLP-P22
SLP-P22.1: GLA-GRAD: A GRIFFIN-LIM EXTENDED WAVEFORM GENERATION DIFFUSION MODEL
Haocheng Liu, Teysir Baoueb, Mathieu Fontaine, Télécom Paris, France; Jonathan Le Roux, Mitsubishi Electric Research Laboratories (MERL), United States of America; Gaël Richard, Télécom Paris, France
SLP-P22.2: TRAINING GENERATIVE ADVERSARIAL NETWORK-BASED VOCODER WITH LIMITED DATA USING AUGMENTATION-CONDITIONAL DISCRIMINATOR
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, NTT Corporation, Japan
SLP-P22.3: PERIODGRAD: TOWARDS PITCH-CONTROLLABLE NEURAL VOCODER BASED ON A DIFFUSION PROBABILISTIC MODEL
Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda, Nagoya Institute of Technology, Japan
SLP-P22.4: ED-TTS: MULTI-SCALE EMOTION MODELING USING CROSS-DOMAIN EMOTION DIARIZATION FOR EMOTIONAL SPEECH SYNTHESIS
Haobin Tang, University of Science & Tecnology of China, China; Xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang, Ping An Technology (Shenzhen) Co., Ltd., China
SLP-P22.5: CONSIDERING TEMPORAL CONNECTION BETWEEN TURNS FOR CONVERSATIONAL SPEECH SYNTHESIS
Kangdi Mei, Zhaoci Liu, Huipeng Du, Hengyu Li, Yang Ai, Liping Chen, Zhenhua Ling, University of Science and Technology of China, China
SLP-P22.6: HIERARCHICAL EMOTION PREDICTION AND CONTROL IN TEXT-TO-SPEECH SYNTHESIS
Sho Inoue, School of Data Science, Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), China; Kun Zhou, Speech Lab of DAMO Academy, Alibaba Group, Singapore; Shuai Wang, Shenzhen Research Institute of Big Data, Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), China; Haizhou Li, School of Data Science, Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), China
SLP-P22.7: CONTROLLABLE SPEAKING STYLES USING A LARGE LANGUAGE MODEL
Atli Sigurgeirsson, Simon King, University of Edinburgh, United Kingdom of Great Britain and Northern Ireland
SLP-P22.8: CONCSS: CONTRASTIVE-BASED CONTEXT COMPREHENSION FOR DIALOGUE-APPROPRIATE PROSODY IN CONVERSATIONAL SPEECH SYNTHESIS
Yayue Deng, Jinlong Xue, Beijing University of Posts and Telecommunications, China; Yukang Jia, Perfect World Co., Ltd, China; Qifei Li, Yichen Han, Fengping Wang, Yingming Gao, Beijing University of Posts and Telecommunications, China; Dengfeng Ke, Beijing Language and Culture University, China; Ya Li, Beijing University of Posts and Telecommunications, China
SLP-P22.10: SponTTS: modeling and transferring spontaneous style for TTS
Hanzhao Li, Xinfa Zhu, Northwestern Polytechnical University, China; Liumeng Xue, The Chinese University of Hong Kong, China; Yang Song, None, China; Yunlin Chen, Shanghai Mobvoi Information Technology Co., Ltd, China; Lei Xie, Northwestern Polytechnical University, China
SLP-P22.11: CONTROLLABLE PROSODY GENERATION WITH PARTIAL INPUTS
Dan Andrei Iliescu, University of Cambridge, United Kingdom of Great Britain and Northern Ireland; Devang Savita Ram Mohan, Papercup Technologies Ltd, United Kingdom of Great Britain and Northern Ireland; Tian Huey Teh, Google Deepmind, United Kingdom of Great Britain and Northern Ireland; Zack Hodari, Papercup Technologies Ltd, United Kingdom of Great Britain and Northern Ireland
SLP-P22.12: STYLESPEECH: SELF-SUPERVISED STYLE ENHANCING WITH VQ-VAE-BASED PRE-TRAINING FOR EXPRESSIVE AUDIOBOOK SPEECH SYNTHESIS
Xueyuan Chen, The Chinese University of Hong Kong, Hong Kong; Xi Wang, Shaofei Zhang, Lei He, Microsoft, China; Zhiyong Wu, Tsinghua University, China; Xixin Wu, Helen Meng, The Chinese University of Hong Kong, Hong Kong
Contacts