IEEE ICIP 2023 || Kuala Lumpur, Malaysia || 8-11 October 2023

MP2.L306.2

MULTI-MODAL HIERARCHICAL ATTENTION-BASED DENSE VIDEO CAPTIONING

Hemalatha Munusamy, Anna University, Indian Institute of Technology Madras, India; Chandra Sekhar C, Indian Institute of Technology Madras, India

Session:

MP2.L306: Video Captioning and Visual Question Answering Lecture

Location:

Room 306

Presentation Time:

Mon, 9 Oct, 16:48 - 17:06 Malaysia Time (UTC +8)

Session Chair:

Puneet Goyal, Indian Institute of Technology Ropar

View Manuscript

Session MP2.L306

MP2.L306.1: Video Question Answering using Clip-guided Visual-text Attention

Shuhong Ye, Weikai Kong, Chenglin Yao, Jianfeng Ren, University of Nottingham Ningbo China, China; Xudong Jiang, Nanyang Technological University, Singapore

MP2.L306.2: MULTI-MODAL HIERARCHICAL ATTENTION-BASED DENSE VIDEO CAPTIONING

Hemalatha Munusamy, Anna University, Indian Institute of Technology Madras, India; Chandra Sekhar C, Indian Institute of Technology Madras, India

MP2.L306.3: INTERPRETABLE VISUAL QUESTION ANSWERING REFERRING TO OUTSIDE KNOWLEDGE

He Zhu, Ren Togo, Takahiro Ogawa, Miki Haseyama, Hokkaido University, Japan

MP2.L306.4: A GLOBAL-LOCAL CONTRASTIVE LEARNING FRAMEWORK FOR VIDEO CAPTIONING

Qunyue Huang, Bin Fang, Xi Ai, Chongqing University, China

MP2.L306.5: INTERPRETABLE VISUAL QUESTION ANSWERING VIA REASONING SUPERVISION

Maria Parelli, ETH Zürich, Switzerland; Dimitrios Mallis, DeepLab, Greece; Markos Diomataris, ETH Zürich, Switzerland; Vassilis Pitsikalis, DeepLab, Greece

Contact | Accessibility | Nondiscrimination Policy | IEEE Ethics Reporting | IEEE Privacy Policy | Terms | Signal Processing Society

©2026 IEEE – All rights reserved.

Last updated Last updated 07 October 2023.

Use of this website signifies your agreement to the IEEE Terms and Conditions.

Support: icip2023@cmsworkshops.com Host: https://cmsworldwide.com/