SLP-P22: Text to Speech Generation - P2
Thu, 18 Apr, 13:10 - 15:10 (UTC +9)
Location: Poster Zone 2A
Session Type: Poster
Session Chair: Midia Yousefi, Microsoft
Track: Speech and Language Processing
Click the to view the manuscript on IEEE Xplore Open Preview
 

SLP-P22.1: GLA-GRAD: A GRIFFIN-LIM EXTENDED WAVEFORM GENERATION DIFFUSION MODEL

Haocheng Liu, Teysir Baoueb, Mathieu Fontaine, Télécom Paris, France; Jonathan Le Roux, Mitsubishi Electric Research Laboratories (MERL), United States of America; Gaël Richard, Télécom Paris, France
 

SLP-P22.3: PERIODGRAD: TOWARDS PITCH-CONTROLLABLE NEURAL VOCODER BASED ON A DIFFUSION PROBABILISTIC MODEL

Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda, Nagoya Institute of Technology, Japan
 

SLP-P22.4: ED-TTS: MULTI-SCALE EMOTION MODELING USING CROSS-DOMAIN EMOTION DIARIZATION FOR EMOTIONAL SPEECH SYNTHESIS

Haobin Tang, University of Science & Tecnology of China, China; Xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang, Ping An Technology (Shenzhen) Co., Ltd., China
 

SLP-P22.5: CONSIDERING TEMPORAL CONNECTION BETWEEN TURNS FOR CONVERSATIONAL SPEECH SYNTHESIS

Kangdi Mei, Zhaoci Liu, Huipeng Du, Hengyu Li, Yang Ai, Liping Chen, Zhenhua Ling, University of Science and Technology of China, China
 

SLP-P22.6: HIERARCHICAL EMOTION PREDICTION AND CONTROL IN TEXT-TO-SPEECH SYNTHESIS

Sho Inoue, School of Data Science, Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), China; Kun Zhou, Speech Lab of DAMO Academy, Alibaba Group, Singapore; Shuai Wang, Shenzhen Research Institute of Big Data, Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), China; Haizhou Li, School of Data Science, Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), China
 

SLP-P22.7: CONTROLLABLE SPEAKING STYLES USING A LARGE LANGUAGE MODEL

Atli Sigurgeirsson, Simon King, University of Edinburgh, United Kingdom of Great Britain and Northern Ireland
 

SLP-P22.8: CONCSS: CONTRASTIVE-BASED CONTEXT COMPREHENSION FOR DIALOGUE-APPROPRIATE PROSODY IN CONVERSATIONAL SPEECH SYNTHESIS

Yayue Deng, Jinlong Xue, Beijing University of Posts and Telecommunications, China; Yukang Jia, Perfect World Co., Ltd, China; Qifei Li, Yichen Han, Fengping Wang, Yingming Gao, Beijing University of Posts and Telecommunications, China; Dengfeng Ke, Beijing Language and Culture University, China; Ya Li, Beijing University of Posts and Telecommunications, China
 

SLP-P22.10: SponTTS: modeling and transferring spontaneous style for TTS

Hanzhao Li, Xinfa Zhu, Northwestern Polytechnical University, China; Liumeng Xue, The Chinese University of Hong Kong, China; Yang Song, None, China; Yunlin Chen, Shanghai Mobvoi Information Technology Co., Ltd, China; Lei Xie, Northwestern Polytechnical University, China
 

SLP-P22.11: CONTROLLABLE PROSODY GENERATION WITH PARTIAL INPUTS

Dan Andrei Iliescu, University of Cambridge, United Kingdom of Great Britain and Northern Ireland; Devang Savita Ram Mohan, Papercup Technologies Ltd, United Kingdom of Great Britain and Northern Ireland; Tian Huey Teh, Google Deepmind, United Kingdom of Great Britain and Northern Ireland; Zack Hodari, Papercup Technologies Ltd, United Kingdom of Great Britain and Northern Ireland
 

SLP-P22.12: STYLESPEECH: SELF-SUPERVISED STYLE ENHANCING WITH VQ-VAE-BASED PRE-TRAINING FOR EXPRESSIVE AUDIOBOOK SPEECH SYNTHESIS

Xueyuan Chen, The Chinese University of Hong Kong, Hong Kong; Xi Wang, Shaofei Zhang, Lei He, Microsoft, China; Zhiyong Wu, Tsinghua University, China; Xixin Wu, Helen Meng, The Chinese University of Hong Kong, Hong Kong