SLP-P42.4

TRIVISIONTALK: MANDARIN LIP-TO-SPEECH SYNTHESIS WITH MULTIPLE VISUAL PATTERN INFORMATION AND MULTI-SCALE HYBRID ATTENTION

Hao Meng, Tianjin University, China; Qiang Fang, CASS, China; Minghao Guo, Jianguo Wei, Tianjin University, China

Session:
SLP-P42: Multimodal Understanding & Generation Poster

Track:
Speech and Language Processing [SL]

Location:
Poster Area 27

Presentation Time:
Thu, 7 May, 16:30 - 18:30

Presentation
Discussion
Resources
No resources available.
Session SLP-P42
SLP-P42.1: VIVIDVOICE: A UNIFIED FRAMEWORK FOR SCENE-AWARE VISUALLY-DRIVEN SPEECH SYNTHESIS
Chengyuan Ma, Jiawei Jin, Tsinghua university, China; Ruijie Xiong, Chunxiang Jin, Canxiang Yan, Ant Group, China; Wenming Yang, Tsinghua university, China
SLP-P42.2: CS3-BENCH: EVALUATING AND ENHANCING SPEECH-TO-SPEECH LLMS FOR MANDARIN-ENGLISH CODE-SWITCHING
Heyang Liu, Yuhao Wang, Shanghai Jiao Tong University, Ant Group, China; Ziyang Cheng, Shanghai Jiao Tong University, China; Ronghua Wu, Qunshan Gu, Ant Group, China; Yanfeng Wang, Yu Wang, Shanghai Jiao Tong University, China
SLP-P42.3: LipSody: Lip-to-Speech Synthesis with Enhanced Prosody Consistency
Jaejun Lee, Music and Audio Research Group / Seoul National University, Korea, Republic of; Yoori Oh, Seoul National University, Korea, Republic of; Kyogu Lee, Music and Audio Research Group / Seoul National University, Korea, Republic of
SLP-P42.4: TRIVISIONTALK: MANDARIN LIP-TO-SPEECH SYNTHESIS WITH MULTIPLE VISUAL PATTERN INFORMATION AND MULTI-SCALE HYBRID ATTENTION
Hao Meng, Tianjin University, China; Qiang Fang, CASS, China; Minghao Guo, Jianguo Wei, Tianjin University, China
SLP-P42.5: FROM HYPE TO INSIGHT: RETHINKING LARGE LANGUAGE MODEL INTEGRATION IN VISUAL SPEECH RECOGNITION
Rishabh Jain, Naomi Harte, Trinity College Dublin, Ireland
SLP-P42.6: No Word Left Behind: Mitigating Prefix Bias in Open-Vocabulary Keyword Spotting
Yi Liu, University of California San Diego, United States of America; Chuan-che Huang, Xiao Quan, Bose Corporation, United States of America
SLP-P42.7: STANCEMSA: A MULTIMODAL SELF-ATTENTION FRAMEWORK FOR ACCOUNT-LEVEL IMPLICIT STANCE DETECTION IN SHORT VIDEOS
Hao Hu, Keyi Song, Xiaoyu Wang, Feiyang Li, Department of Cryptogram Engineering, Information Engineering University, China; Jian Qin, Baihang Liu, Institute of Information Engineering Chinese Academy of Sciences, China; Hongwei Zhou, Department of Cryptogram Engineering, Information Engineering University, China
SLP-P42.8: DATA-BRIDGE: A MULTI-AGENT SYSTEM FOR CODE-BASED MULTIMODAL SCHEMA ALIGNMENT
Siyuan Zhou, Beijing University of Posts and Telecommunications, China; Zijun Dou, Tsinghua University, China; Fangxiang Feng, Beijing University of Posts and Telecommunications, China
SLP-P42.9: DYNAMIC MULTI-EXPERT PROJECTORS WITH STABILIZED ROUTING FOR MULTILINGUAL SPEECH RECOGNITION
Isha Pandey, IIT Bombay, India; Ashish Mittal, IBM Research/IIT Bombay, India; Vartul Bahuguna, Ganesh Ramakrishnan, IIT Bombay, India
SLP-P42.10: ProKWS: Personalized Keyword Spotting via Collaborative Learning of Phonemes and Prosody
Jianan Pan, Yuanming Zhang, Kejie Huang, Zhejiang University, China
Contacts