MMSP-L1: Multimodal Processing: Vision + Language 1
Tue, 16 Apr, 16:30 - 18:30 (UTC +9)
Location: Room 201
Session Type: Lecture
Session Co-Chairs: Jin Zeng, Tongji University, Shanghai, China and Fernando Pereira, IST, Portugal
Track: Multimedia Signal Processing
Click the to view the manuscript on IEEE Xplore Open Preview
Tue, 16 Apr, 16:30 - 16:50 (UTC +9)
MMSP-L1.1: A Relation-Aware Heterogeneous Graph Transformer on Dynamic Fusion for Multimodal Classification Tasks
Tue, 16 Apr, 16:50 - 17:10 (UTC +9)
MMSP-L1.2: Multi-Source Dynamic Interactive Network Collaborative Reasoning Image Captioning
Tue, 16 Apr, 17:10 - 17:30 (UTC +9)
MMSP-L1.3: TOWARDS PRACTICAL AND EFFICIENT IMAGE-TO-SPEECH CAPTIONING WITH VISION-LANGUAGE PRE-TRAINING AND MULTI-MODAL TOKENS
Tue, 16 Apr, 17:30 - 17:50 (UTC +9)
MMSP-L1.4: TEXTUAL TOKENS CLASSIFICATION FOR MULTI-MODAL ALIGNMENT IN VISION-LANGUAGE TRACKING
Tue, 16 Apr, 17:50 - 18:10 (UTC +9)
MMSP-L1.5: CAPTION UNIFICATION FOR MULTI-VIEW LIFELOGGING IMAGES BASED ON IN-CONTEXT LEARNING WITH HETEROGENEOUS SEMANTIC CONTENTS
Tue, 16 Apr, 18:10 - 18:30 (UTC +9)