SLP-P25.3
TICL: TEXT-EMBEDDING KNN FOR SPEECH IN-CONTEXT LEARNING UNLOCKS SPEECH RECOGNITION ABILITIES OF LARGE MULTIMODAL MODELS
Haolong Zheng, Yekaterina Yegorova, Mark Hasegawa-Johnson, University of Illinois Urbana Champaign, United States of America
Session:
SLP-P25: Audio-Visual Speech Recognition Poster
Track:
Speech and Language Processing [SL]
Location:
Poster Area 27
Presentation Time:
Wed, 6 May, 16:30 - 18:30
Presentation
Discussion
Resources
No resources available.
Session SLP-P25
SLP-P25.1: LEVERAGING AUDIO-VISUAL DATA TO REDUCE THE MULTILINGUAL GAP IN SELF-SUPERVISED SPEECH MODELS
María Andrea Cruz Blandón, Tampere University, Finland; Zakaria Aldeneh, Jie Chi, Maureen de Seyssel, Apple, United States of America
SLP-P25.2: LEND A HAND: SEMI TRAINING-FREE CUED SPEECH RECOGNITION VIA MLLM-DRIVEN HAND MODELING FOR BARRIER-FREE COMMUNICATION
Guanjie Huang, Danny H.K. Tsang, The Hong Kong University of Science and Technology (Guangzhou), China; Xiao-Ping Zhang, Tsinghua University, China; Li Liu, The Hong Kong University of Science and Technology (Guangzhou), China
SLP-P25.3: TICL: TEXT-EMBEDDING KNN FOR SPEECH IN-CONTEXT LEARNING UNLOCKS SPEECH RECOGNITION ABILITIES OF LARGE MULTIMODAL MODELS
Haolong Zheng, Yekaterina Yegorova, Mark Hasegawa-Johnson, University of Illinois Urbana Champaign, United States of America
SLP-P25.4: THE CURIOUS CASE OF VISUAL GROUNDING: DIFFERENT EFFECTS FOR SPEECH- AND TEXT-BASED LANGUAGE ENCODERS
Adrian Sauter, Willem Zuidema, Marianne de Heer Kloots, University of Amsterdam, Netherlands
SLP-P25.5: MULTILINGUAL SUPERVISED PRETRAINING WITH LM-ASSISTED DECODING FOR VISUAL SPEECH RECOGNITION
Mengyang Yu, Yue Zhao, Minzu University of China, China; Haizhou Li, The Chinese University of Hong Kong, Shenzhen, China
SLP-P25.6: AISHELL6-WHISPER: A CHINESE MANDARIN AUDIO-VISUAL WHISPER SPEECH DATASET WITH SPEECH RECOGNITION BASELINES
Cancan Li, Fei Su, Juan Liu, Wuhan University, China; Hui Bu, Beijing AISHELL Technology Co., Ltd., China; Yulong Wan, Hongbin Suo, OPPO, China; Ming Li, Duke Kunshan University, China
SLP-P25.7: PURIFICATION BEFORE FUSION: TOWARD MASK-FREE SPEECH ENHANCEMENT FOR ROBUST AUDIO-VISUAL SPEECH RECOGNITION
Linzhi Wu, University of Electronic Science and Technology of China, China; Xingyu Zhang, Academy of Military Sciences, China; Hao Yuan, Peking University, China; Yakun Zhang, Changyan Zheng, Liang Xie, Academy of Military Sciences, China; Tiejun Liu, University of Electronic Science and Technology of China, China; Erwei Yin, Academy of Military Sciences, China
SLP-P25.8: NOISE-ROBUST AV-ASR USING VISUAL FEATURES BOTH IN THE WHISPER ENCODER AND DECODER
Zhengyang Li, Thomas Graave, Björn Möller, Zehang Wu, Matthias Franz, Tim Fingscheidt, Technische Universität Braunschweig, Germany
SLP-P25.9: MITIGATING ATTENTION SINKS AND MASSIVE ACTIVATIONS IN AUDIO-VISUAL SPEECH RECOGNITION WITH LLMS
Anand ., The University of British Columbia, Canada; Umberto Cappellazzo, Stavros Petridis, Maja Pantic, Imperial College London, United Kingdom of Great Britain and Northern Ireland
SLP-P25.10: Cross-Modal Bottleneck Fusion for Noise Robust Audio-Visual Speech Recognition
Seaone Ok, Min Jun Choi, Eungbeom Kim, Seungu Han, Kyogu Lee, Seoul National Universtiy, Korea, Republic of
Contacts