MP2.L306: Video Captioning and Visual Question Answering
Mon, 9 Oct, 16:30 - 18:00 Malaysia Time (UTC +8)
Location: Room 306
Session Type: Lecture
Session Chair: Puneet Goyal, Indian Institute of Technology Ropar
Track: Image and Video Analysis, Synthesis, and Retrieval
Click the to view the manuscript on IEEE Xplore Open Preview
Mon, 9 Oct, 16:30 - 16:48 Malaysia Time (UTC +8)
 

MP2.L306.1: Video Question Answering using Clip-guided Visual-text Attention

Shuhong Ye, Weikai Kong, Chenglin Yao, Jianfeng Ren, University of Nottingham Ningbo China, China; Xudong Jiang, Nanyang Technological University, Singapore
Mon, 9 Oct, 16:48 - 17:06 Malaysia Time (UTC +8)
 

MP2.L306.2: MULTI-MODAL HIERARCHICAL ATTENTION-BASED DENSE VIDEO CAPTIONING

Hemalatha Munusamy, Anna University, Indian Institute of Technology Madras, India; Chandra Sekhar C, Indian Institute of Technology Madras, India
Mon, 9 Oct, 17:06 - 17:24 Malaysia Time (UTC +8)
 

MP2.L306.3: INTERPRETABLE VISUAL QUESTION ANSWERING REFERRING TO OUTSIDE KNOWLEDGE

He Zhu, Ren Togo, Takahiro Ogawa, Miki Haseyama, Hokkaido University, Japan
Mon, 9 Oct, 17:24 - 17:42 Malaysia Time (UTC +8)
 

MP2.L306.4: A GLOBAL-LOCAL CONTRASTIVE LEARNING FRAMEWORK FOR VIDEO CAPTIONING

Qunyue Huang, Bin Fang, Xi Ai, Chongqing University, China
Mon, 9 Oct, 17:42 - 18:00 Malaysia Time (UTC +8)
 

MP2.L306.5: INTERPRETABLE VISUAL QUESTION ANSWERING VIA REASONING SUPERVISION

Maria Parelli, ETH Zürich, Switzerland; Dimitrios Mallis, DeepLab, Greece; Markos Diomataris, ETH Zürich, Switzerland; Vassilis Pitsikalis, DeepLab, Greece