IVMSP-L1.1
JM-CLIP: A Joint Modal Similarity Contrastive Learning Model for Video-Text Retrieval
Mingyuan Ge, Yewen Li, Honghao wu, Mingyong Li, Chongqing Normal University, China
Session:
IVMSP-L1: Vision and language Lecture
Track:
Image, Video, and Multidimensional Signal Processing
Location:
Room 103
Presentation Time:
Tue, 16 Apr, 13:10 - 13:30 (UTC +9)
Session Chair:
Dimitrios Dimitriadis, Amazon
Session IVMSP-L1
IVMSP-L1.1: JM-CLIP: A Joint Modal Similarity Contrastive Learning Model for Video-Text Retrieval
Mingyuan Ge, Yewen Li, Honghao wu, Mingyong Li, Chongqing Normal University, China
IVMSP-L1.2: LANGUAGE-FREE COMPOSITIONAL ACTION GENERATION VIA DECOUPLING REFINEMENT
Xiao Liu, Tsinghua University, China; Guangyi Chen, Carnegie Mellon University, United States of America; Yansong Tang, Tsinghua University, China; Guangrun Wang, University of Oxford, United Kingdom of Great Britain and Northern Ireland; Xiao-Ping Zhang, Tsinghua University, China; Ser-Nam Lim, University of Central Florida, United States of America
IVMSP-L1.3: DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation
Ting Liu, Yue Hu, Wansen Wu, National University of Defense Technology, China; Youkai Wang, National University of Defense Technology; Hunan Institute of Advanced Technology, China; Kai Xu, National University of Defense Technology, China; Quanjun Yin, National University of Defense Technology; Hunan Institute of Advanced Technology, China
IVMSP-L1.4: M3SUM: A NOVEL UNSUPERVISED LANGUAGE-GUIDED VIDEO SUMMARIZATION
Hongru Wang, The Chinese University of Hong Kong, Hong Kong; Baohang Zhou, Zhengkun Zhang, Nankai University, China; Yiming Du, David Ho, Kam-Fai Wong, The Chinese University of Hong Kong, Hong Kong
IVMSP-L1.5: WAVER: WRITING-STYLE AGNOSTIC TEXT-VIDEO RETRIEVAL VIA DISTILLING VISION-LANGUAGE MODELS THROUGH OPEN-VOCABULARY KNOWLEDGE
Huy Le, International University - Vietnam National University, Viet Nam; Tung Kieu, RMIT University, Viet Nam; Anh Nguyen, University of Liverpool, United Kingdom of Great Britain and Northern Ireland; Ngan Le, University of Arkansas, United States of America
IVMSP-L1.6: MTIDNET: A MULTIMODAL TEMPORAL INTEREST DETECTION NETWORK FOR VIDEO SUMMARIZATION
Xiaoyan Tian, Ye Jin, Zhao Zhang, Peng Liu, Xianglong Tang, Harbin Institute of Technology, China
Contacts