SLP-L25.1
Conformer is all you need for visual speech recognition
Oscar Chang, Hank Liao, Dmitriy Serdyuk, Ankit Shah, Olivier Siohan, Google, United States of America
Session:
SLP-L25: Audio-visual speech/intent recognition Lecture
Track:
Speech and Language Processing
Location:
Room 102
Presentation Time:
Fri, 19 Apr, 08:20 - 08:40 (UTC +9)
Session Co-Chairs:
Albert Zeyer, AppTek GmbH and Dmitriy Serdyuk, Google
Session SLP-L25
SLP-L25.1: Conformer is all you need for visual speech recognition
Oscar Chang, Hank Liao, Dmitriy Serdyuk, Ankit Shah, Olivier Siohan, Google, United States of America
SLP-L25.2: LITEVSR: EFFICIENT VISUAL SPEECH RECOGNITION BY LEARNING FROM SPEECH REPRESENTATIONS OF UNLABELED DATA
Hendrik Laux, Emil Mededovic, University Hospital RWTH Aachen, Germany; Ahmed Hallawa, Lukas Martin, Arne Peine, Clinomic Medical GmbH, Germany; Anke Schmeink, RWTH Aachen University, Germany
SLP-L25.3: MULTILINGUAL AUDIO-VISUAL SPEECH RECOGNITION WITH HYBRID CTC/RNN-T FAST CONFORMER
Maxime Burchi, University of Würzburg, Germany; Krishna C. Puvvada, Jagadeesh Balam, Boris Ginsburg, NVIDIA, United States of America; Radu Timofte, University of Würzburg, Germany
SLP-L25.4: LCB-NET: LONG-CONTEXT BIASING FOR AUDIO-VISUAL SPEECH RECOGNITION
Fan Yu, Speech Lab of DAMO Academy, Alibaba Group, China; Haoxu Wang, Wuhan University, China; Xian Shi, Shiliang Zhang, Speech Lab of DAMO Academy, Alibaba Group, China
SLP-L25.5: VILAS: EXPLORING THE EFFECTS OF VISION AND LANGUAGE CONTEXT IN AUTOMATIC SPEECH RECOGNITION
Ziyi Ni, Minglun Han, Feilong Chen, Linghui Meng, Jing Shi, Pin Lv, Bo Xu, Institute of Automation, Chinese Academy of Sciences, China
SLP-L25.6: SDIF-DA: A SHALLOW-TO-DEEP INTERACTION FRAMEWORK WITH DATA AUGMENTATION FOR MULTI-MODAL INTENT DETECTION
Shijue Huang, Harbin Institute of Technology (Shenzhen), China; Libo Qin, Central South University, China; Bingbing Wang, Geng Tu, Ruifeng Xu, Harbin Institute of Technology (Shenzhen), China
Contacts