IVMSP-L1: Vision and language
Tue, 16 Apr, 13:10 - 15:10 (UTC +9)
Location: Room 103
Session Type: Lecture
Session Chair: Dimitrios Dimitriadis, Amazon
Track: Image, Video, and Multidimensional Signal Processing
Click the to view the manuscript on IEEE Xplore Open Preview
Tue, 16 Apr, 13:10 - 13:30 (UTC +9)
IVMSP-L1.1: JM-CLIP: A Joint Modal Similarity Contrastive Learning Model for Video-Text Retrieval
Tue, 16 Apr, 13:30 - 13:50 (UTC +9)
IVMSP-L1.2: LANGUAGE-FREE COMPOSITIONAL ACTION GENERATION VIA DECOUPLING REFINEMENT
Tue, 16 Apr, 13:50 - 14:10 (UTC +9)
IVMSP-L1.3: DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation
Tue, 16 Apr, 14:10 - 14:30 (UTC +9)
IVMSP-L1.4: M3SUM: A NOVEL UNSUPERVISED LANGUAGE-GUIDED VIDEO SUMMARIZATION
Tue, 16 Apr, 14:30 - 14:50 (UTC +9)
IVMSP-L1.5: WAVER: WRITING-STYLE AGNOSTIC TEXT-VIDEO RETRIEVAL VIA DISTILLING VISION-LANGUAGE MODELS THROUGH OPEN-VOCABULARY KNOWLEDGE
Tue, 16 Apr, 14:50 - 15:10 (UTC +9)