SLP-L19: Speech LLM: Training & Generation
Oral
Tue, 5 May, 14:00 - 16:00
Location: Room 114
Session Type: Oral
Track: Speech and Language Processing [SL]
Click the to view the manuscript on IEEE Xplore Open Preview
Tue, 5 May, 14:00 - 14:20

SLP-L19.1: CROSS-MODAL KNOWLEDGE DISTILLATION FOR SPEECH LARGE LANGUAGE MODELS

Enzhi Wang, QIcheng Li, Nankai University, China; Zhiyuan Tang, Tencent Corporation, China; Yuhang Jia, Nankai University, China
Tue, 5 May, 14:20 - 14:40

SLP-L19.2: WHY DO SPEECH LANGUAGE MODELS FAIL TO GENERATE SEMANTICALLY COHERENT OUTPUTS? A MODALITY EVOLVING PERSPECTIVE

Hankun Wang, Haoran Wang, Yiwei Guo, Zhihan Li, Chenpeng Du, Kai Yu, Shanghai Jiao Tong University, China
Tue, 5 May, 14:40 - 15:00

SLP-L19.3: GELINA: UNIFIED SPEECH AND GESTURE SYNTHESIS VIA INTERLEAVED TOKEN PREDICTION

Téo Guichoux, ISIR, STMS Lab – IRCAM, Sorbonne Université, France; Théodor Lemerle, STMS Lab – IRCAM, Sorbonne Université, France; Shivam Mehta, Jonas Beskow, Gustav Eje Henter, Department of Speech, Music, and Hearing, KTH Royal Institute of Technology,, Sweden; Laure Soulier, ISIR, Sorbonne Université, France; Catherine Pelachaud, ISIR, CNRS, Sorbonne Université, France; Nicolas Obin, STMS Lab – IRCAM, Sorbonne Universié, France
Tue, 5 May, 15:00 - 15:20

SLP-L19.4: LEVERAGING PREDICTION ENTROPY FOR AUTOMATIC PROMPT WEIGHTING IN ZERO-SHOT AUDIO-LANGUAGE CLASSIFICATION

Karim El Khoury, Maxime Zanella, Tiffanie Godelaine, Christophe De Vleeschouwer, Benoit Macq, UCLouvain, Belgium
Tue, 5 May, 15:20 - 15:40

SLP-L19.5: GROUP RELATIVE POLICY OPTIMIZATION FOR TEXT-TO-SPEECH WITH LARGE LANGUAGE MODELS

Chang Liu, University of Science and Technology of China, China; Ya-Jun Hu, iFLYTEK, China; Ying-Ying Gao, Shi-Lei Zhang, China Mobile, China; Zhen-Hua Ling, University of Science and Technology of China, China
Tue, 5 May, 15:40 - 16:00

SLP-L19.6: PERSONAPLEX: VOICE AND ROLE CONTROL FOR FULL DUPLEX CONVERSATIONAL SPEECH MODELS

Rajarshi Roy, Jonathan Raiman, Sang-gil Lee, Teodor-Dumitru Ene, Robert Kirby, Sungwon Kim, Jaehyeon Kim, Bryan Catanzaro, Nvidia, United States of America