MMSP-L1.5
CAPTION UNIFICATION FOR MULTI-VIEW LIFELOGGING IMAGES BASED ON IN-CONTEXT LEARNING WITH HETEROGENEOUS SEMANTIC CONTENTS
Masaya Sato, Keisuke Maeda, Ren Togo, Takahiro Ogawa, Miki Haseyama, Hokkaido University, Japan
Session:
MMSP-L1: Multimodal Processing: Vision + Language 1 Lecture
Track:
Multimedia Signal Processing
Location:
Room 201
Presentation Time:
Tue, 16 Apr, 17:50 - 18:10 (UTC +9)
Session Co-Chairs:
Jin Zeng, Tongji University, Shanghai, China and Fernando Pereira, IST, Portugal
Session MMSP-L1
MMSP-L1.1: A Relation-Aware Heterogeneous Graph Transformer on Dynamic Fusion for Multimodal Classification Tasks
Yimo Ren, Jinfa Wang, Jie Liu, Peipei Liu, Hong Li, Hongsong Zhu, Limin Sun, Institute of Information Engineering, Chinese Academy of Sciences, China
MMSP-L1.2: Multi-Source Dynamic Interactive Network Collaborative Reasoning Image Captioning
Qiang Su, Zhixin Li, Guangxi Normal University, China
MMSP-L1.3: TOWARDS PRACTICAL AND EFFICIENT IMAGE-TO-SPEECH CAPTIONING WITH VISION-LANGUAGE PRE-TRAINING AND MULTI-MODAL TOKENS
Minsu Kim, Jeongsoo Choi, KAIST, Korea, Republic of; Soumi Maiti, Carnegie Mellon University, United States of America; Jeong Hun Yeo, KAIST, Korea, Republic of; Shinji Watanabe, Carnegie Mellon University, United States of America; Yong Man Ro, KAIST, Korea, Republic of
MMSP-L1.4: TEXTUAL TOKENS CLASSIFICATION FOR MULTI-MODAL ALIGNMENT IN VISION-LANGUAGE TRACKING
Zhongjie Mao, Yucheng Wang, Xi Chen, Jia Yan, Wuhan University, China
MMSP-L1.5: CAPTION UNIFICATION FOR MULTI-VIEW LIFELOGGING IMAGES BASED ON IN-CONTEXT LEARNING WITH HETEROGENEOUS SEMANTIC CONTENTS
Masaya Sato, Keisuke Maeda, Ren Togo, Takahiro Ogawa, Miki Haseyama, Hokkaido University, Japan
MMSP-L1.6: CONTROLCAP: CONTROLLABLE CAPTIONING VIA NO-FUSS LEXICON
Qiujie Xie, Qiming Feng, Yuejie Zhang, Rui Feng, Fudan University, China; Tao Zhang, Shanghai University of Finance and Economics, China; Shang Gao, Deakin University, Australia
Contacts