MMSP-L1: Multimodal Processing: Vision + Language 1
Tue, 16 Apr, 16:30 - 18:30 (UTC +9)
Location: Room 201
Session Type: Lecture
Session Co-Chairs: Jin Zeng, Tongji University, Shanghai, China and Fernando Pereira, IST, Portugal
Track: Multimedia Signal Processing
Click the to view the manuscript on IEEE Xplore Open Preview
Tue, 16 Apr, 16:30 - 16:50 (UTC +9)
 

MMSP-L1.1: A Relation-Aware Heterogeneous Graph Transformer on Dynamic Fusion for Multimodal Classification Tasks

Yimo Ren, Jinfa Wang, Jie Liu, Peipei Liu, Hong Li, Hongsong Zhu, Limin Sun, Institute of Information Engineering, Chinese Academy of Sciences, China
Tue, 16 Apr, 16:50 - 17:10 (UTC +9)
 

MMSP-L1.2: Multi-Source Dynamic Interactive Network Collaborative Reasoning Image Captioning

Qiang Su, Zhixin Li, Guangxi Normal University, China
Tue, 16 Apr, 17:10 - 17:30 (UTC +9)
 

MMSP-L1.3: TOWARDS PRACTICAL AND EFFICIENT IMAGE-TO-SPEECH CAPTIONING WITH VISION-LANGUAGE PRE-TRAINING AND MULTI-MODAL TOKENS

Minsu Kim, Jeongsoo Choi, KAIST, Korea, Republic of; Soumi Maiti, Carnegie Mellon University, United States of America; Jeong Hun Yeo, KAIST, Korea, Republic of; Shinji Watanabe, Carnegie Mellon University, United States of America; Yong Man Ro, KAIST, Korea, Republic of
Tue, 16 Apr, 17:30 - 17:50 (UTC +9)
 

MMSP-L1.4: TEXTUAL TOKENS CLASSIFICATION FOR MULTI-MODAL ALIGNMENT IN VISION-LANGUAGE TRACKING

Zhongjie Mao, Yucheng Wang, Xi Chen, Jia Yan, Wuhan University, China
Tue, 16 Apr, 17:50 - 18:10 (UTC +9)
 

MMSP-L1.5: CAPTION UNIFICATION FOR MULTI-VIEW LIFELOGGING IMAGES BASED ON IN-CONTEXT LEARNING WITH HETEROGENEOUS SEMANTIC CONTENTS

Masaya Sato, Keisuke Maeda, Ren Togo, Takahiro Ogawa, Miki Haseyama, Hokkaido University, Japan
Tue, 16 Apr, 18:10 - 18:30 (UTC +9)
 

MMSP-L1.6: CONTROLCAP: CONTROLLABLE CAPTIONING VIA NO-FUSS LEXICON

Qiujie Xie, Qiming Feng, Yuejie Zhang, Rui Feng, Fudan University, China; Tao Zhang, Shanghai University of Finance and Economics, China; Shang Gao, Deakin University, Australia