MMSP-P20: Efficient Multimodal Large Language Models and Evaluation
Poster
Thu, 7 May, 16:30 - 18:30
Location: Poster Area 19
Session Type: Poster
Track: Multimedia Signal Processing [MM]
Click the to view the manuscript on IEEE Xplore Open Preview

MMSP-P20.1: iMathBench: Is Your Multi-modal Large Language Model Ready to Solve Mathematical Problems Embedded in Images?

Junhao Guo, Xinyi Jiang, Guoming Wang, Zhejiang University, China; Rongxing Lu, Qween‘s University, Canada; Siliang Tang, Zhejiang University, China

MMSP-P20.2: Trajectory-Enhanced Camera Motion Understanding for Multimodal Large Language Models

Yuanxin Liu, Sida Li, Kun Ouyang, Shicheng Li, Linli Yao, Xu Sun, Peking University, China; Weike Jin, Huawei Technologies Co., Ltd, China

MMSP-P20.3: PAR: Prompt-Aware Token Reduction Method for Efficient Large Multimodal Models

Yingen Liu, Fan Wu, Ruihui Li, Zhuo Tang, Kenli Li, Hunan University, China

MMSP-P20.4: ENRICH VISUAL FEATURES BY HOLISTIC SAMPLING AND HIERARCHICAL CONDENSING IN MULTIMODAL LARGE LANGUAGE MODELS

Yuting Bai, Harbin Institute of Technology, China; Suiwu Bai, Jianli Ran, B-AI Lab, China; Tonghua Su, Harbin Institute of Technology, China; Zixing Bai, Fudan University, China

MMSP-P20.6: M2FNET: MULTI-LEVEL MODALITY-FUSED NETWORK FOR ROBUST FINGERPRINT AND FINGER VEIN RECOGNITION

Wenyang Miao, Xionghan Zhao, Hengyi Ren, Xing Li, Jinting Ren, Nanjing Forestry University, China

MMSP-P20.7: CHAIN-OF-CAPTION: TRAINING-FREE IMPROVEMENT OF MULTIMODAL LARGE LANGUAGE MODEL ON REFERRING EXPRESSION COMPREHENSION

Yik Lung Pang, Changjae Oh, Queen Mary University of London, United Kingdom of Great Britain and Northern Ireland

MMSP-P20.8: LaPrune: Layout-Aware Pruning for Efficient Multimodal Large Language Models

Hao Wu, Ke Lu, Xiuyuan Zhu, Yuqiu Li, Jian Xue, University of Chinese Academy of Sciences, China; Yi Liu, State Key Laboratory of Communication Content Cognition, Beijing, China, China

MMSP-P20.9: SEEING IS BELIEVING: COMPREHENSIVE SELF-REFLECTIVE EVALUATION SYSTEM FOR LARGE MULTI-MODAL MODELS

Guocheng Hu, Chaoqun Zheng, Hongjiao Guan, Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences); Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science., China; Hui Cui, Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences); Shandong Provincial Key Laboratory of Industrial Network and Information System Security, Shandong Fundamental Research Center for Computer Science., China; Shiwei Wu, Evay Info Co., Ltd., China; Wenpeng Lu, Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences); Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science., China

MMSP-P20.10: SVCF: ENABLING ZERO-SHOT CORRECTION OF REASONING STEPS IN MULTI-MODAL LARGE LANGUAGE MODELS

Boyang Jiang, University of Electronic Science and Technology of China, China; Huang Tianxi, School of Humanities and General Education, Chengdu Textile College, China; Yu Yang, Yue Zhang, Guiduo Duan, Laboratory of Intelligent Collaborative Computing, University of Electronic Science and Technology of China, China; Tao He, University of Electronic Science and Technology of China, China