SLP-P12: Multimodal Modeling
Poster
Tue, 5 May, 16:30 - 18:30
Location: Poster Area 32
Session Type: Poster
Track: Speech and Language Processing [SL]
Click the to view the manuscript on IEEE Xplore Open Preview

SLP-P12.1: SMOOTHCLAP: SOFT-TARGET ENHANCED CONTRASTIVE LANGUAGE-AUDIO PRETRAINING FOR AFFECTIVE COMPUTING

Xin Jing, Jiadong Wang, Andreas Triantafyllopoulos, Maurice Gerczuk, Technical University of Munich, Germany; Shahin Amiriparian, Jun Luo, Huawei, Netherlands; Björn Schuller, Technical University of Munich, Germany

SLP-P12.2: Synaspot: A Lightweight, Streaming Multi-modal Framework for Keyword Spotting with Audio-Text Synergy

Kewei Li, Yinan Zhong, Xiaotao Liang, Tianchi Dai, Shaofei Xue, Alibaba Group, China

SLP-P12.3: VOCALNET-M2: ADVANCING LOW-LATENCY SPOKEN LANGUAGE MODELING VIA INTEGRATED MULTI-CODEBOOK TOKENIZATION AND MULTI-TOKEN PREDICTION

Yuhao Wang, Ziyang Cheng, Heyang Liu, Shanghai Jiao Tong University, China; Ronghua Wu, Qunshan Gu, Ant Group, China; Yanfeng Wang, Yu Wang, Shanghai Jiao Tong University, China

SLP-P12.4: MITIGATING LANGUAGE PRIOR-INDUCED HALLUCINATIONS VIA BI-LEVEL CONTRASTIVE DECODING

Tianze Xia, Hongcheng Liu, Lina Yang, Yu Wang, Shanghai Jiao Tong University, China

SLP-P12.5: PROTOTYPE-GUIDED CROSS-MODAL CONTRASTIVE LEARNING FOR CONTINUAL AUDIO-VISUAL SOUND SEPARATION

Wanrong Ma, Hongyu Wen, Zijian Gao, Qisheng Xu, Kele Xu, National University of Defense Technology, China

SLP-P12.6: CONDITIONAL VARIATIONAL AUTOENCODER FOR GLOSS-FREE SIGN LANGUAGE TRANSLATION

Jiannan Mao, Gifu University, Japan; Chenchen Ding, National Institute of Information and Communications Technology, Japan; Tadahiro Matsumoto, Gifu University, Japan; Hideki Tanaka, Masao Utiyama, National Institute of Information and Communications Technology, Japan

SLP-P12.7: AFFECT-JIGSAW: INTEGRATING CORE AND PERIPHERAL EMOTIONS FOR HARMONIOUS FINE-GRAINED MULTIMODAL EMOTION RECOGNITION

Shihao Gao, Zixing Zhang, Zhiqiang Gao, Hongyu Chen, Hunan University, China; Jing Han, University of Cambridge, United Kingdom of Great Britain and Northern Ireland

SLP-P12.8: SESSION-LEVEL SPOKEN LANGUAGE ASSESSMENT WITH A MULTIMODAL FOUNDATION MODEL VIA MULTI-TARGET LEARNING

Hong-Yun Lin, Jhen-Ke Lin, Chung-Chun Wang, Hao-Chien Lu, Berlin Chen, National Taiwan Normal University, Taiwan

SLP-P12.9: SLOT FILLING AS A REASONING TASK FOR SPEECHLLMS

Kadri Hacioglu, Manjunath K. E., Andreas Stolcke, Uniphore, United States of America

SLP-P12.10: Selective Hub Fusion with Modality-Heterogeneous Experts for Multimodal Emotion Recognition

Huan Zhao, Ling Xiong, Kehan Wang, Hunan University, United States of America