IEEE ICASSP 2026 || Barcelona, Spain || 4-8 May 2026

SLP-L13.2

DMP-TTS: DISENTANGLED MULTI-MODAL PROMPTING FOR CONTROLLABLE TEXT-TO-SPEECH WITH CHAINED GUIDANCE

Kang Yin, University of Science and Technology of China, China; Chunyu Qiang, Kuaishou Technology, China; Sirui Zhao, University of Science and Technology of China, China; Xiaopeng Wang, Yuzhe Liang, Kuaishou Technology, China; Pengfei Cai, Tong Xu, University of Science and Technology of China, China; Chen Zhang, Kuaishou Technology, China; Enhong Chen, University of Science and Technology of China, China

Session:

SLP-L13: Instruction-Guided and Preference-Aligned TTS Oral

Location:

Room 114

Presentation Time:

Thu, 7 May, 14:20 - 14:40

Session Chair:

Liping Chen, University of Science and Technology of China

View Manuscript

Session SLP-L13

SLP-L13.1: Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems

Yi-Cheng Lin, National Taiwan University, Taiwan; Huang-Cheng Chou, University of Southern California, USA, United States of America; Tzu-Chieh Wei, University of Michigan, United States of America; Kuan-Yu Chen, Hung-yi Lee, National Taiwan University, Taiwan

SLP-L13.2: DMP-TTS: DISENTANGLED MULTI-MODAL PROMPTING FOR CONTROLLABLE TEXT-TO-SPEECH WITH CHAINED GUIDANCE

Kang Yin, University of Science and Technology of China, China; Chunyu Qiang, Kuaishou Technology, China; Sirui Zhao, University of Science and Technology of China, China; Xiaopeng Wang, Yuzhe Liang, Kuaishou Technology, China; Pengfei Cai, Tong Xu, University of Science and Technology of China, China; Chen Zhang, Kuaishou Technology, China; Enhong Chen, University of Science and Technology of China, China

SLP-L13.3: OV-InstructTTS: Towards Open-Vocabulary Instruct Text-to-Speech

Yong Ren, Institute of Automation, Chinese Academy of Sciences, China; Jiangyan Yi, Jianhua Tao, Tsinghua University, China; Haiyang Sun, Institute of Automation, Chinese Academy of Sciences, China; Zhengqi Wen, Tsinghua University, China; Hao Gu, Le Xu, Ye Bai, Institute of Automation, Chinese Academy of Sciences, China

SLP-L13.4: HD-PPT: HIERARCHICAL DECODING OF CONTENT- AND PROMPT-PREFERENCE TOKENS FOR INSTRUCTION-BASED TTS

Sihang Nie, Xiaofen Xing, Jingyuan Xing, South China University of Technology, China; Baiji Liu, South China University of Technology, Guangzhou Quwan Network Technology, China; Xiangmin Xu, Foshan University, South China University of Technology, China

SLP-L13.5: EMOTION-ALIGNED GENERATION IN DIFFUSION TEXT TO SPEECH MODELS VIA PREFERENCE-GUIDED OPTIMIZATION

Jiacheng Shi, Hongfei Du, College of William & Mary, United States of America; Yangfan He, University of Minnesota-twin cities, United States of America; Y. Alicia Hong, George Mason University, United States of America; Ye Gao, College of William & Mary, United States of America

SLP-L13.6: RRPO: ROBUST REWARD POLICY OPTIMIZATION FOR LLM-BASED EMOTIONAL TTS

Cong Wang, Beijing University of Posts and Telecommunications, China; Changfeng Gao, Yang Xiang, Zhihao Du, Keyu An, Han Zhao, Qian Chen, Xiangang Li, Unaffiliated, China; Yingming Gao, Ya Li, Beijing University of Posts and Telecommunications, China

Contact | Accessibility | Nondiscrimination Policy | IEEE Ethics Reporting | IEEE Privacy Policy | Terms | Signal Processing Society

©2026 IEEE – All rights reserved.

Last updated Last updated 22 April 2026.

Use of this website signifies your agreement to the IEEE Terms and Conditions.

Support: webmaster@2026.ieeeicassp.org Host: https://cmsworldwide.com/