MP-V1.V5.3

CONTEXT-AWARE HIERARCHICAL TRANSFORMER FOR FINE-GRAINED VIDEO-TEXT RETRIEVAL

Mingliang Chen, Yurui Ren, Ge Li, Peking University Shenzhen Graduate School, China; Weimin Zhang, AVS Industry Alliance, China

Session:
Machine Learning for Image & Video Analysis, Synthesis, and Retrieval
Virtual Poster

Track:
Applications of Machine Learning

Location:
Gather.Town 5

Presentation Time:
Mon, 3 Oct, 21:00 - 22:00 China Standard Time (UTC +8)
Mon, 3 Oct, 15:00 - 16:00 Central European Time (UTC +1)
Mon, 3 Oct, 13:00 - 14:00 UTC
Mon, 3 Oct, 09:00 - 10:00 Eastern Time (UTC -5)

Session Co-Chairs:
Jean-Christophe Pesquet, CentraleSupélec and Andrea Cavallaro, Queen Mary University of London and Rebecca Willett, University of Chicago
Presentation
Discussion
Resources
No resources available.
Session MP-V1.V5
MP-V1.V5.1: Conditional RGB-T Fusion for Effective Crowd Counting
Esha Pahwa, Sanjeet Kapadia, Achleshwar Luthra, Shreyas Sheeranali, Birla Institute of Technology and Science, Pilani, India
MP-V1.V5.2: LEARNING FROM SYNTHETIC DATA FOR CROWD INSTANCE SEGMENTATION IN THE WILD
Yue Wu, Yuan Yuan, Qi Wang, Northwestern Polytechnical University, China
MP-V1.V5.3: CONTEXT-AWARE HIERARCHICAL TRANSFORMER FOR FINE-GRAINED VIDEO-TEXT RETRIEVAL
Mingliang Chen, Yurui Ren, Ge Li, Peking University Shenzhen Graduate School, China; Weimin Zhang, AVS Industry Alliance, China
MP-V1.V5.4: 3DCNN-BASED PALPATION LOCALIZATION WITH TEMPORAL ATTENTION MODULE
Guanhao Huang, Hong Lu, Jingjing Luo, Fudan University, China; Xing Zhu, Jihua Laboratory, China
MP-V1.V5.5: SELF-SUPERVISED DOMAIN ADAPTATION IN CROWD COUNTING
Pha Nguyen, Thanh-Dat Truong, Miaoqing Huang, Yi Liang, Ngan Le, Khoa Luu, University of Arkansas, United States of America
MP-V1.V5.6: REVISITING SPATIAL INDUCTIVE BIAS WITH MLP-LIKE MODEL
Akihiro Imamura, Nana Arizumi, DENSO CORPORATION, Japan
MP-V1.V5.7: 3D HUMAN MOTION GENERATION FROM THE TEXT VIA GESTURE ACTION CLASSIFICATION AND THE AUTOREGRESSIVE MODEL
Gwantae Kim, Youngsuk Ryu, Junyeop Lee, Hanseok Ko, Korea University, Korea, Republic of; David Han, Drexel University, United States of America; Jeongmin Bae, DMLab, Korea, Republic of
MP-V1.V5.8: DIFAI: Diverse Facial Inpainting using StyleGAN Inversion
Dongsik Yoon, Jeong-gi Kwak, Yuanming Li, Hanseok Ko, Korea University, Korea, Republic of; David Han, Drexel Univiersity, United States of America
MP-V1.V5.9: DISTILLING DETR-LIKE DETECTORS WITH INSTANCE-AWARE FEATURE
Honglie Wang, Shouqian Sun, Zhejiang University, China; Jian Xu, Tsinghua University, China
MP-V1.V5.10: DISPENSE MODE FOR INFERENCE TO ACCELERATE BRANCHYNET
Zhiwei Liang, Yuezhi Zhou, Tsinghua University, China
MP-V1.V5.11: Partition and Reunion: A Viewpoint-Aware Loss for Vehicle Re-identification
Haobo Chen, Yang Liu, Yang Huang, Hao Sheng, Beihang University, China; Wei Ke, Macao Polytechnic Institute, China
MP-V1.V5.12: GAUSSIAN KERNEL-BASED CROSS MODAL NETWORK FOR SPATIO-TEMPORAL VIDEO GROUNDING
Zeyu Xiong, Zhou Pan, Huazhong University of Science and Technology, China; Daizong Liu, Peking University, China
MP-V1.V5.13: SYGNET: A SVD-YOLO BASED GHOSTNET FOR REAL-TIME DRIVING SCENE PARSING
Hewei Wang, Bolun Zhu, Yijie Li, Kaiwen Gong, Ziyuan Wen, Shaofan Wang, Beijing University of Technology, China, China; Soumyabrata Dev, University College Dublin, Ireland, Ireland
MP-V1.V5.14: VIDEO-GROUNDED DIALOGUES WITH JOINT VIDEO AND IMAGE TRAINING
Hangyu Zhang, Yingming Li, Zhejiang University, China; Zhongfei Zhang, Binghamton University, United States of America
MP-V1.V5.15: Exploring Spatial Diversity for Region-Based Active Learning
, , ; Lile Cai, Xun Xu, Lining Zhang, Chuan-Sheng Foo, Institute for Infocomm Research, Singapore