MMSP-P23.2
SOUNDING HIGHLIGHTS: DUAL-PATHWAY AUDIO ENCODERS FOR AUDIO-VISUAL VIDEO HIGHLIGHT DETECTION
Seohyun Joo, Gwangju Institute of Science and Technology, Korea, Republic of; Yoori Oh, Seoul National University, Korea, Republic of
Session:
MMSP-P23: Audio-Visual Video Parsing and Scene Understanding Poster
Track:
Multimedia Signal Processing [MM]
Location:
Poster Area 19
Presentation Time:
Fri, 8 May, 09:00 - 11:00
Presentation
Discussion
Resources
No resources available.
Session MMSP-P23
MMSP-P23.1: Teacher-Guided Pseudo Supervision and Cross-Modal Alignment for Audio-Visual Video Parsing
Yaru Chen, University of Surrey, United Kingdom of Great Britain and Northern Ireland; Ruohao Guo, Peking University, China; Liting Gao, Yang Xiang, Qingyu Luo, University of Surrey, United Kingdom of Great Britain and Northern Ireland; Zhenbo Li, China Agricultural University, China; Wenwu Wang, University of Surrey, United Kingdom of Great Britain and Northern Ireland
MMSP-P23.2: SOUNDING HIGHLIGHTS: DUAL-PATHWAY AUDIO ENCODERS FOR AUDIO-VISUAL VIDEO HIGHLIGHT DETECTION
Seohyun Joo, Gwangju Institute of Science and Technology, Korea, Republic of; Yoori Oh, Seoul National University, Korea, Republic of
MMSP-P23.3: CONSTRUCTING COMPOSITE FEATURES FOR INTERPRETABLE MUSIC-TAGGING
Chenhao Xue, University of Oxford, United Kingdom of Great Britain and Northern Ireland; Weitao Hu, Independent Researcher, United Kingdom of Great Britain and Northern Ireland; Joyraj Chakraborty, Zhijin Guo, Kang Li, University of Oxford, United Kingdom of Great Britain and Northern Ireland; Tianyu Shi, University of Toronto, Canada; Martin Reed, Nikolaos Thomos, University of Essex, United Kingdom of Great Britain and Northern Ireland
MMSP-P23.4: An End-to-End Multimodal System for Subtitle Recognition and Chinese-Japanese Translation in Short Dramas
Jing An, Beijing International Studies University, China; Haofei Chang, Renmin University of China, China; Rui-Yang Ju, Kyoto University, Japan; Jinhua Su, Renmin University of China / Simashuhui Ltd., China; Yanbing Bai, Renmin University of China, China; Xin Qu, Beijing International Studies University, China
MMSP-P23.5: Can Large Audio Language Models Understand Audio Well? Speech, Scene and Events Understanding Benchmark for LALMs
Han Yin, Jung-Woo Choi, Korea Advanced Institute of Science and Technology, Korea, Republic of
MMSP-P23.6: ROVLM: REGION-AWARE OPTIMAL VISION-LANGUAGE ALIGNMENT FOR ZERO-SHOT RECOGNITION
Feng Guo, Zhongshu Chen, Yunqian Yu, Mengmeng Jing, Lin Zuo, University of Electronic Science and Technology of China, China
MMSP-P23.7: GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Constrative and Generative Pretraining
Shentong Mo, Carnegie Mellon University, United States of America; Zehua Chen, Jun Zhu, Tsinghua University, China
MMSP-P23.8: REALCOUNT: ROBUST OPEN-WORLD OBJECT COUNTING VIA DUPLEX CONTRASTIVE LEARNING
Ziqiang Shi, Rujie Liu, Fujitsu Research & Development Center Co.,LTD., China; Jun Takahashi, Shan Jiang, Fujitsu Limited, Japan
MMSP-P23.9: AVO-65: A LARGE-SCALE HIERARCHICAL AUDIO-VISUAL OBJECT DATASET
Zehao Yao, Guanghui Zhang, Lei Wang, Dongchen Zhu, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, China
MMSP-P23.10: HARMONET: MUSIC GROUNDING BY SHORT VIDEO VIA HARMONIC RESAMPLE AND DYNAMIC SPARSE ALIGNMENT
Yaomin Shen, Nanchang Research Institute, Zhejiang University, China; Wei Fan, Independent Researcher, China; Haichuan Hu, Alibaba Cloud, China; Xinqi Liu, The University of Hong Kong, Hong Kong; Min Yang, Nanchang Research Institute, Zhejiang University, China; Rui Jia, East China Normal University, China; Junbiao Cai, Independent Researcher, China
Contacts