IEEE ICIP 2023 || Kuala Lumpur, Malaysia || 8-11 October 2023

MP2.L306: Video Captioning and Visual Question Answering

Mon, 9 Oct, 16:30 - 18:00 Malaysia Time (UTC +8)

Location: Room 306

Session Type: Lecture

Session Chair: Puneet Goyal, Indian Institute of Technology Ropar

Track: Image and Video Analysis, Synthesis, and Retrieval

Mon, 9 Oct, 16:30 - 16:48 Malaysia Time (UTC +8)

MP2.L306.1: Video Question Answering using Clip-guided Visual-text Attention

Shuhong Ye, Weikai Kong, Chenglin Yao, Jianfeng Ren, University of Nottingham Ningbo China, China; Xudong Jiang, Nanyang Technological University, Singapore

Mon, 9 Oct, 16:48 - 17:06 Malaysia Time (UTC +8)

MP2.L306.2: MULTI-MODAL HIERARCHICAL ATTENTION-BASED DENSE VIDEO CAPTIONING

Hemalatha Munusamy, Anna University, Indian Institute of Technology Madras, India; Chandra Sekhar C, Indian Institute of Technology Madras, India

Mon, 9 Oct, 17:06 - 17:24 Malaysia Time (UTC +8)

MP2.L306.3: INTERPRETABLE VISUAL QUESTION ANSWERING REFERRING TO OUTSIDE KNOWLEDGE

He Zhu, Ren Togo, Takahiro Ogawa, Miki Haseyama, Hokkaido University, Japan

Mon, 9 Oct, 17:24 - 17:42 Malaysia Time (UTC +8)

MP2.L306.4: A GLOBAL-LOCAL CONTRASTIVE LEARNING FRAMEWORK FOR VIDEO CAPTIONING

Qunyue Huang, Bin Fang, Xi Ai, Chongqing University, China

Mon, 9 Oct, 17:42 - 18:00 Malaysia Time (UTC +8)

MP2.L306.5: INTERPRETABLE VISUAL QUESTION ANSWERING VIA REASONING SUPERVISION

Maria Parelli, ETH Zürich, Switzerland; Dimitrios Mallis, DeepLab, Greece; Markos Diomataris, ETH Zürich, Switzerland; Vassilis Pitsikalis, DeepLab, Greece