MMSP-P16: Visual Grounding and Open-Vocabulary Segmentation
Poster
Thu, 7 May, 09:00 - 11:00
Location: Poster Area 21
Session Type: Poster
Track: Multimedia Signal Processing [MM]
Click the to view the manuscript on IEEE Xplore Open Preview

MMSP-P16.1: Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning

Bob Zhang, Xiaohongshu Inc., China; Haoran Li, University of Science and Technology of China, China; Tao Zhang, Wuhan University, China; Jianan Li, Technical University of Munich, China; Cilin Yan, Xikai Liu, Jiayin Cai, Xiaohongshu Inc., China; Yanbin Hao, Hefei University of Technology, China

MMSP-P16.2: COMPOSED VISUAL GROUNDING IN REMOTE SENSING IMAGES

Yuxi Sun, Sen Jia, Meng Xu, Shenzhen University, China; Baoquan Zhang, Harbin Institute of Technology, China; Jian Kang, Soochow University, China

MMSP-P16.3: Refining Open-Vocabulary Semantic Segmentation via Regional Semantics and Visual Prototypes

Aijing Yu, Institute of Information Engineering, Chinese Academy of Sciences, China; Zhengbo Wang, Lijun Sheng, University of Science and Technology of China, China; Jian Liang, NLPR & MAIS, Institute of Automation, Chinese Academy of Sciences, China; Xiaoyu Zhang, Institute of Information Engineering, Chinese Academy of Sciences, China

MMSP-P16.4: AUGMENTING IMAGE LLMS FOR DIVERSE VIDEO GROUNDING TASKS WITHOUT TRAINING

Mohan Chen, Chunguang Du, Qingqiu Li, Yuejie Zhang, Rui Feng, Fudan University, China; Tao Zhang, Shanghai University of Finance and Economics, China; Shang Gao, Deakin University, Australia

MMSP-P16.5: RGSC: Retrieve and then Generate Image-text Pairs from Semantic Concepts for Unsupervised Vision-Language Pre-training

Zhaopan Xu, Harbin Institute of Technology, China; Wangbo Zhao, National University of Singapore, Singapore; Sijie JI, California Institute of Technology, United States of America; Panpan Zhang, National University of Singapore, Singapore; Kaipeng Zhang, Shanghai Artificial Intelligence Laboratory, China; Hongxun Yao, Harbin Institute of Technology, China

MMSP-P16.6: Zero-Shot VISUAL GROUNDING in 3D Gaussians via View Retrieval

Liwei Liao, Peking Univerisity, China; Xufeng Li, City University of Hong Kong, China; Xiaoyun Zheng, Boning Liu, Pengcheng Laboratory, China; Feng Gao, Peking University, China; Ronggang Wang, Peking Univerisity, China

MMSP-P16.7: ScaleMamba: Multi-scale Context Fusion for Training-Free Open-Vocabulary Remote Sensing Segmentation

Zhicai Huang, Mingming Chen, Xiamen Huaxia University, China; Mingqiang Huang, Xiamen University of Technology, China

MMSP-P16.8: OVID: Text-Guided Open-Vocabulary Dense Object Counting and Localization

Ma Hao-Yuan, Li Zhang, Qiang Minjie, Soochow University, China

MMSP-P16.9: OPENHIER: AN OPEN-VOCABULARY HIERARCHICAL IMAGE CLASSIFICATION FRAMEWORK

Pu Yang, Dongjing Miao, Harbin Institute of Technology, China

MMSP-P16.10: DISTRIBUTION-AWARE DATA CURATION FOR SEMANTIC SEGMENTATION VIA MIXTURE OF VMFS

ZHI HU, KEXIN YANG, Aravindkumar Vijayalingam, ZHONGHUAN DAI, Klass Engineering and Solutions, Singapore