SLP-P11.6
IMPROVING LANGUAGE MODEL-BASED ZERO-SHOT TEXT-TO-SPEECH SYNTHESIS WITH MULTI-SCALE ACOUSTIC PROMPTS
Shun Lei, Yixuan Zhou, Liyang Chen, Dan Luo, Zhiyong Wu, Tsinghua University, China; Xixin Wu, The Chinese University of Hong Kong, China; Shiyin Kang, Tao Jiang, Yahui Zhou, Skywork AI PTE. LTD., China; Yuxing Han, Tsinghua University, China; Helen Meng, The Chinese University of Hong Kong, China
Session:
SLP-P11: Text to Speech Generation - P1 Poster
Track:
Speech and Language Processing
Location:
Poster Zone 6A
Poster Board PZ-6A.6
Poster Board PZ-6A.6
Presentation Time:
Wed, 17 Apr, 13:10 - 15:10 (UTC +9)
Session Chair:
Jiangyan Yi, Institute of Automation, CAS
Session SLP-P11
SLP-P11.1: DETS: End-to-End Single-Stage Text-to-Speech via Hierarchical Diffusion Gan Models
Linqin Wang, Zhengtao Yu, Shengxiang Gao, Cunli Mao, Yuxin Huang, Kunming University of Science and Technology, China
SLP-P11.2: LATENT FILLING: LATENT SPACE DATA AUGMENTATION FOR ZERO-SHOT SPEECH SYNTHESIS
Jae-Sung Bae, Joun Yeop Lee, Ji-Hyun Lee, Seongkyu Mun, Taehwa Kang, Hoon-Young Cho, Samsung Research, Korea, Republic of; Chanwoo Kim, Korea University, Korea, Republic of
SLP-P11.3: AN EXPERIMENTAL COMPARISON OF NOISE-ROBUST TEXT-TO-SPEECH SYNTHESIS SYSTEMS BASED ON SELF-SUPERVISED REPRESENTATION
Xiaoying Zhao, Qiushi Zhu, University of Science and Technology of China, China; Yuchen Hu, Nanyang Technological University, Singapore
SLP-P11.4: MELS-TTS : MULTI-EMOTION MULTI-LINGUAL MULTI-SPEAKER TEXT-TO-SPEECH SYSTEM VIA DISENTANGLED STYLE TOKENS
Heejin Choi, Jae-Sung Bae, Joun Yeop Lee, Seongkyu Mun, Samsung Research, Korea, Republic of; Jihwan Lee, University of Southern California, Korea, Republic of; Hoon-Young Cho, Samsung Research, Korea, Republic of; Chanwoo Kim, Korea University, Korea, Republic of
SLP-P11.5: ENERGY-BASED MODELS FOR SPEECH SYNTHESIS
Wanli Sun, Zehai Tu, Anton Ragni, University of Sheffield, United Kingdom of Great Britain and Northern Ireland
SLP-P11.6: IMPROVING LANGUAGE MODEL-BASED ZERO-SHOT TEXT-TO-SPEECH SYNTHESIS WITH MULTI-SCALE ACOUSTIC PROMPTS
Shun Lei, Yixuan Zhou, Liyang Chen, Dan Luo, Zhiyong Wu, Tsinghua University, China; Xixin Wu, The Chinese University of Hong Kong, China; Shiyin Kang, Tao Jiang, Yahui Zhou, Skywork AI PTE. LTD., China; Yuxing Han, Tsinghua University, China; Helen Meng, The Chinese University of Hong Kong, China
SLP-P11.7: MINIMALLY-SUPERVISED SPEECH SYNTHESIS WITH CONDITIONAL DIFFUSION MODEL AND LANGUAGE MODEL: A COMPARATIVE STUDY OF SEMANTIC CODING
Chunyu Qiang, Tianjin University, China; Hao Li, Hao Ni, He Qu, Kuaishou Technology, China; Ruibo Fu, Tao Wang, Institute of Automation, Chinese Academy of Sciences, China; Longbiao Wang, Jianwu Dang, Tianjin University, China
SLP-P11.8: High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models
Chunyu Qiang, Tianjin University, China; Hao Li, Yixin Tian, Yi Zhao, Ying Zhang, Kuaishou Technology, China; Longbiao Wang, Jianwu Dang, Tianjin University, China
SLP-P11.9: REFLOW-TTS: A RECTIFIED FLOW MODEL FOR HIGH-FIDELITY TEXT-TO-SPEECH
Wenhao Guan, Qi Su, Haodong Zhou, Shiyu Miao, Xingjia Xie, Lin Li, Qingyang Hong, Xiamen University, China
SLP-P11.10: ADVERSARIAL LEARNING ON COMPRESSED POSTERIOR SPACE FOR NON-ITERATIVE SCORE-BASED END-TO-END TEXT-TO-SPEECH
Won-Gook Choi, Donghyun Seong, Joon-Hyuk Chang, Hanyang University, Korea, Republic of
SLP-P11.11: DCTTS: DISCRETE DIFFUSION MODEL WITH CONTRASTIVE LEARNING FOR TEXT-TO-SPEECH GENERATION
Zhichao Wu, Qiulin Li, Sixing Liu, Qun Yang, Nanjing University of Aeronautics and Astronautics, China
SLP-P11.12: ENHANCING MULTILINGUAL TTS WITH VOICE CONVERSION BASED DATA AUGMENTATION AND POSTERIOR EMBEDDING
Hyun-Wook Yoon, Jin-Seob Kim, NAVER Cloud, Korea, Republic of; Ryuichi Yamamoto, Ryo Terashima, LINE, Japan; Chan-Ho Song, Jae-Min Kim, Eunwoo Song, NAVER Cloud, Korea, Republic of
Contacts