SLP-P12.1

SMOOTHCLAP: SOFT-TARGET ENHANCED CONTRASTIVE LANGUAGE-AUDIO PRETRAINING FOR AFFECTIVE COMPUTING

Xin Jing, Jiadong Wang, Andreas Triantafyllopoulos, Maurice Gerczuk, Technical University of Munich, Germany; Shahin Amiriparian, Jun Luo, Huawei, Netherlands; Björn Schuller, Technical University of Munich, Germany

Session:
SLP-P12: Multimodal Modeling Poster

Track:
Speech and Language Processing [SL]

Location:
Poster Area 32

Presentation Time:
Tue, 5 May, 16:30 - 18:30

Presentation
Discussion
Resources
No resources available.
Session SLP-P12
SLP-P12.1: SMOOTHCLAP: SOFT-TARGET ENHANCED CONTRASTIVE LANGUAGE-AUDIO PRETRAINING FOR AFFECTIVE COMPUTING
Xin Jing, Jiadong Wang, Andreas Triantafyllopoulos, Maurice Gerczuk, Technical University of Munich, Germany; Shahin Amiriparian, Jun Luo, Huawei, Netherlands; Björn Schuller, Technical University of Munich, Germany
SLP-P12.2: Synaspot: A Lightweight, Streaming Multi-modal Framework for Keyword Spotting with Audio-Text Synergy
Kewei Li, Yinan Zhong, Xiaotao Liang, Tianchi Dai, Shaofei Xue, Alibaba Group, China
SLP-P12.3: VOCALNET-M2: ADVANCING LOW-LATENCY SPOKEN LANGUAGE MODELING VIA INTEGRATED MULTI-CODEBOOK TOKENIZATION AND MULTI-TOKEN PREDICTION
Yuhao Wang, Ziyang Cheng, Heyang Liu, Shanghai Jiao Tong University, China; Ronghua Wu, Qunshan Gu, Ant Group, China; Yanfeng Wang, Yu Wang, Shanghai Jiao Tong University, China
SLP-P12.4: MITIGATING LANGUAGE PRIOR-INDUCED HALLUCINATIONS VIA BI-LEVEL CONTRASTIVE DECODING
Tianze Xia, Hongcheng Liu, Lina Yang, Yu Wang, Shanghai Jiao Tong University, China
SLP-P12.5: PROTOTYPE-GUIDED CROSS-MODAL CONTRASTIVE LEARNING FOR CONTINUAL AUDIO-VISUAL SOUND SEPARATION
Wanrong Ma, Hongyu Wen, Zijian Gao, Qisheng Xu, Kele Xu, National University of Defense Technology, China
SLP-P12.6: CONDITIONAL VARIATIONAL AUTOENCODER FOR GLOSS-FREE SIGN LANGUAGE TRANSLATION
Jiannan Mao, Gifu University, Japan; Chenchen Ding, National Institute of Information and Communications Technology, Japan; Tadahiro Matsumoto, Gifu University, Japan; Hideki Tanaka, Masao Utiyama, National Institute of Information and Communications Technology, Japan
SLP-P12.7: AFFECT-JIGSAW: INTEGRATING CORE AND PERIPHERAL EMOTIONS FOR HARMONIOUS FINE-GRAINED MULTIMODAL EMOTION RECOGNITION
Shihao Gao, Zixing Zhang, Zhiqiang Gao, Hongyu Chen, Hunan University, China; Jing Han, University of Cambridge, United Kingdom of Great Britain and Northern Ireland
SLP-P12.8: SESSION-LEVEL SPOKEN LANGUAGE ASSESSMENT WITH A MULTIMODAL FOUNDATION MODEL VIA MULTI-TARGET LEARNING
Hong-Yun Lin, Jhen-Ke Lin, Chung-Chun Wang, Hao-Chien Lu, Berlin Chen, National Taiwan Normal University, Taiwan
SLP-P12.9: SLOT FILLING AS A REASONING TASK FOR SPEECHLLMS
Kadri Hacioglu, Manjunath K. E., Andreas Stolcke, Uniphore, United States of America
SLP-P12.10: Selective Hub Fusion with Modality-Heterogeneous Experts for Multimodal Emotion Recognition
Huan Zhao, Ling Xiong, Kehan Wang, Hunan University, United States of America
Contacts