IEEE ICASSP 2024 || Seoul, Korea || 14-19 April 2024

MMSP-L2.2

HOURGLASS-AVSR: DOWN-UP SAMPLING-BASED COMPUTATIONAL EFFICIENCY MODEL FOR AUDIO-VISUAL SPEECH RECOGNITION

Fan Yu, Haoxu Wang, Ziyang Ma, Shiliang Zhang, Speech Lab of DAMO Academy, Alibaba Group, China

Session:

MMSP-L2: Audio-Visual Speech Processing Lecture

Location:

Room E1

Presentation Time:

Wed, 17 Apr, 08:40 - 09:00 (UTC +9)

Session Co-Chairs:

Li Liu, HKUST Guangzhou, China and Prasanta Ghosh, Indian Institute of Science (IISc), Bangalore

View Manuscript

Session MMSP-L2

MMSP-L2.1: THE MULTIMODAL INFORMATION BASED SPEECH PROCESSING (MISP) 2023 CHALLENGE: AUDIO-VISUAL TARGET SPEAKER EXTRACTION

Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, University of Science and Technology of China, China; Chin-hui Lee, Georgia Institute of Technology, United States of America; Jingdong Chen, Northwestern Polytechnical University, China; Sabato Marco Siniscalchi, Kore University of Enna, Italy; Odette Scharenborg, Delft University of Technology, Netherlands; Zhong-Qiu Wang, Carnegie Mellon University, United States of America; Jia Pan, Jianqing Gao, iFlytek Research, China

MMSP-L2.2: HOURGLASS-AVSR: DOWN-UP SAMPLING-BASED COMPUTATIONAL EFFICIENCY MODEL FOR AUDIO-VISUAL SPEECH RECOGNITION

Fan Yu, Haoxu Wang, Ziyang Ma, Shiliang Zhang, Speech Lab of DAMO Academy, Alibaba Group, China

MMSP-L2.3: TALKNCE: IMPROVING ACTIVE SPEAKER DETECTION WITH TALK-AWARE CONTRASTIVE LEARNING

Chaeyoung Jung, Suyeon Lee, Kihyun Nam, Kyeongha Rho, Korea Advanced Institute of Science and Technology, Korea, Republic of; You Jin Kim, Naver Cloud Corporation, Korea, Republic of; Youngjoon Jang, Joon Son Chung, Korea Advanced Institute of Science and Technology, Korea, Republic of

MMSP-L2.4: MLCA-AVSR: MULTI-LAYER CROSS ATTENTION FUSION BASED AUDIO-VISUAL SPEECH RECOGNITION

He Wang, Pengcheng Guo, Northwestern Polytechnical University, China; Pan Zhou, Li Auto, China; Lei Xie, Northwestern Polytechnical University, China

MMSP-L2.5: AUDIO-VISUAL SPEECH RECOGNITION IN-THE-WILD: MULTI-ANGLE VEHICLE CABIN CORPUS AND ATTENTION-BASED METHOD

Alexandr Axyonov, Dmitry Ryumin, Denis Ivanko, St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), Russian Federation; Alexey Kashevnik, ITMO University, Russian Federation; Alexey Karpov, St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), Russian Federation

MMSP-L2.6: GLMB 3D SPEAKER TRACKING WITH VIDEO-ASSISTED MULTI-CHANNEL AUDIO OPTIMIZATION FUNCTIONS

Xinyuan Qian, University of Science and Technology Beijing, China; Zexu Pan, National University of Singapore, Singapore; Qiquan Zhang, University of New South Wales, Australia; Kainan Chen, Eigenspace, China; Shoufeng Lin, Curtin Unviersity, Australia

Contact | Accessibility | Nondiscrimination Policy | IEEE Ethics Reporting | IEEE Privacy Policy | Terms | Signal Processing Society

©2026 IEEE – All rights reserved.

Last updated Last updated 11 April 2024.

Use of this website signifies your agreement to the IEEE Terms and Conditions.

Support: info@2024.ieeeicassp.org Host: https://cmsworldwide.com/