SLP-L13.2

DMP-TTS: DISENTANGLED MULTI-MODAL PROMPTING FOR CONTROLLABLE TEXT-TO-SPEECH WITH CHAINED GUIDANCE

Kang Yin, University of Science and Technology of China, China; Chunyu Qiang, Kuaishou Technology, China; Sirui Zhao, University of Science and Technology of China, China; Xiaopeng Wang, Yuzhe Liang, Kuaishou Technology, China; Pengfei Cai, Tong Xu, University of Science and Technology of China, China; Chen Zhang, Kuaishou Technology, China; Enhong Chen, University of Science and Technology of China, China

Session:
SLP-L13: Instruction-Guided and Preference-Aligned TTS Oral

Track:
Speech and Language Processing [SL]

Location:
Room 114

Presentation Time:
Thu, 7 May, 14:20 - 14:40

Presentation
Discussion
Resources
No resources available.
Session SLP-L13
SLP-L13.1: Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems
Yi-Cheng Lin, National Taiwan University, Taiwan; Huang-Cheng Chou, University of Southern California, USA, United States of America; Tzu-Chieh Wei, University of Michigan, United States of America; Kuan-Yu Chen, Hung-yi Lee, National Taiwan University, Taiwan
SLP-L13.2: DMP-TTS: DISENTANGLED MULTI-MODAL PROMPTING FOR CONTROLLABLE TEXT-TO-SPEECH WITH CHAINED GUIDANCE
Kang Yin, University of Science and Technology of China, China; Chunyu Qiang, Kuaishou Technology, China; Sirui Zhao, University of Science and Technology of China, China; Xiaopeng Wang, Yuzhe Liang, Kuaishou Technology, China; Pengfei Cai, Tong Xu, University of Science and Technology of China, China; Chen Zhang, Kuaishou Technology, China; Enhong Chen, University of Science and Technology of China, China
SLP-L13.3: OV-InstructTTS: Towards Open-Vocabulary Instruct Text-to-Speech
Yong Ren, Institute of Automation, Chinese Academy of Sciences, China; Jiangyan Yi, Jianhua Tao, Tsinghua University, China; Haiyang Sun, Institute of Automation, Chinese Academy of Sciences, China; Zhengqi Wen, Tsinghua University, China; Hao Gu, Le Xu, Ye Bai, Institute of Automation, Chinese Academy of Sciences, China
SLP-L13.4: HD-PPT: HIERARCHICAL DECODING OF CONTENT- AND PROMPT-PREFERENCE TOKENS FOR INSTRUCTION-BASED TTS
Sihang Nie, Xiaofen Xing, Jingyuan Xing, South China University of Technology, China; Baiji Liu, South China University of Technology, Guangzhou Quwan Network Technology, China; Xiangmin Xu, Foshan University, South China University of Technology, China
SLP-L13.5: EMOTION-ALIGNED GENERATION IN DIFFUSION TEXT TO SPEECH MODELS VIA PREFERENCE-GUIDED OPTIMIZATION
Jiacheng Shi, Hongfei Du, College of William & Mary, United States of America; Yangfan He, University of Minnesota-twin cities, United States of America; Y. Alicia Hong, George Mason University, United States of America; Ye Gao, College of William & Mary, United States of America
SLP-L13.6: RRPO: ROBUST REWARD POLICY OPTIMIZATION FOR LLM-BASED EMOTIONAL TTS
Cong Wang, Beijing University of Posts and Telecommunications, China; Changfeng Gao, Yang Xiang, Zhihao Du, Keyu An, Han Zhao, Qian Chen, Xiangang Li, Unaffiliated, China; Yingming Gao, Ya Li, Beijing University of Posts and Telecommunications, China
Contacts