MP2.L306.2
MULTI-MODAL HIERARCHICAL ATTENTION-BASED DENSE VIDEO CAPTIONING
Hemalatha Munusamy, Anna University, Indian Institute of Technology Madras, India; Chandra Sekhar C, Indian Institute of Technology Madras, India
Session:
MP2.L306: Video Captioning and Visual Question Answering Lecture
Track:
Image and Video Analysis, Synthesis, and Retrieval
Location:
Room 306
Presentation Time:
Mon, 9 Oct, 16:48 - 17:06 Malaysia Time (UTC +8)
Session Chair:
Puneet Goyal, Indian Institute of Technology Ropar
Session MP2.L306
MP2.L306.1: Video Question Answering using Clip-guided Visual-text Attention
Shuhong Ye, Weikai Kong, Chenglin Yao, Jianfeng Ren, University of Nottingham Ningbo China, China; Xudong Jiang, Nanyang Technological University, Singapore
MP2.L306.2: MULTI-MODAL HIERARCHICAL ATTENTION-BASED DENSE VIDEO CAPTIONING
Hemalatha Munusamy, Anna University, Indian Institute of Technology Madras, India; Chandra Sekhar C, Indian Institute of Technology Madras, India
MP2.L306.3: INTERPRETABLE VISUAL QUESTION ANSWERING REFERRING TO OUTSIDE KNOWLEDGE
He Zhu, Ren Togo, Takahiro Ogawa, Miki Haseyama, Hokkaido University, Japan
MP2.L306.4: A GLOBAL-LOCAL CONTRASTIVE LEARNING FRAMEWORK FOR VIDEO CAPTIONING
Qunyue Huang, Bin Fang, Xi Ai, Chongqing University, China
MP2.L306.5: INTERPRETABLE VISUAL QUESTION ANSWERING VIA REASONING SUPERVISION
Maria Parelli, ETH Zürich, Switzerland; Dimitrios Mallis, DeepLab, Greece; Markos Diomataris, ETH Zürich, Switzerland; Vassilis Pitsikalis, DeepLab, Greece
Contacts