AASP-P16.2
HIERARCHICAL ACTIVITY RECOGNITION AND CAPTIONING FROM LONG-FORM AUDIO
Peng Zhang, Qingyu Luo, Philip Jackson, Wenwu Wang, University of Surrey, United Kingdom of Great Britain and Northern Ireland
Session:
AASP-P16: Audio Captioning, Retrieval, and Understanding Poster
Track:
Audio and Acoustic Signal Processing [AA]
Location:
Poster Area 25
Presentation Time:
Thu, 7 May, 09:00 - 11:00
Presentation
Discussion
Resources
No resources available.
Session AASP-P16
AASP-P16.1: SEGMENTWISE PRUNING IN AUDIO-LANGUAGE MODELS
Marcel Gibier, Inria, France; Raphael Duroselle, AMIAD, France; Pierre Serrano, Olivier Boeffard, Inria, France; Jean-François Bonastre, AMIAD, France
AASP-P16.2: HIERARCHICAL ACTIVITY RECOGNITION AND CAPTIONING FROM LONG-FORM AUDIO
Peng Zhang, Qingyu Luo, Philip Jackson, Wenwu Wang, University of Surrey, United Kingdom of Great Britain and Northern Ireland
AASP-P16.3: FROM CONTRAST TO COMMONALITY: AUDIO COMMONALITY CAPTIONING FOR ENHANCED AUDIO-TEXT CROSS-MODAL UNDERSTANDING IN MULTIMODAL LLMS
Yuhang Jia, Xu Zhang, Yujie Guo, Yang Chen, Shiwan Zhao, nankai.edu.cn, China
AASP-P16.4: IMPROVING AUDIO QUESTION ANSWERING WITH VARIATIONAL INFERENCE
Haolin Chen, Idiap Research Institute, Switzerland
AASP-P16.5: ONE MODEL–THREE TASKS: DISCOVERING A SHARED WINNING TICKET FOR LOW-COMPLEXITY AUDIO INTELLIGENCE
Maxim Surkov, ITMO University, Russian Federation
AASP-P16.6: GAUSSIAN LOCALITY PRIOR FOR CONTRAST–RECONSTRUCTION LEARNING: STATE–SPACE MODEL-BASED TIME–SERIES ANOMALY DETECTION
Yadong Niu, MiLM Plus, Xiaomi Inc, Beijing, China, China; Tianzi Wang, The Chinese University of Hong Kong, Hong Kong, China, China; Heinrich Dinkel, Xingwei Sun, MiLM Plus, Xiaomi Inc, Beijing, China, China; Jiahao Zhou, Beijing University of Posts and Telecommunications, Beijing, China, China; Gang Li, Jizhong Liu, Junbo Zhang, Jian Luan, MiLM Plus, Xiaomi Inc, Beijing, China, China
AASP-P16.7: CASTELLA: LONG AUDIO DATASET WITH CAPTIONS AND TEMPORAL BOUNDARIES
Hokuto Munakata, LY Corporation, Japan; Takehiro Imamura, Nagoya University, Japan; Taichi Nishimura, Tatsuya Komatsu, LY Corporation, Japan
AASP-P16.8: Teaching Audio Models to Reason: A Unified Framework for Source- and Layer-wise Distillation
Runyan Yang, Yuke Si, Yingying Gao, Junlan Feng, Chao Deng, Shilei Zhang, China Mobile, China
AASP-P16.9: AUDIOSETCAPS: AN ENRICHED AUDIO-CAPTION DATASET USING AUTOMATED GENERATION PIPELINE WITH LARGE AUDIO AND LANGUAGE MODELS
Jisheng Bai, Xi'an University of Posts & Telecommunications, China; Haohe Liu, Meta, United States of America; Mou Wang, Institute of Acoustics, Chinese Academy of Sciences, China; Dongyuan Shi, Northwestern Polytechnical University, China; Wenwu Wang, University of Surrey, United Kingdom of Great Britain and Northern Ireland; Mark Plumbley, King's College London, United Kingdom of Great Britain and Northern Ireland; Woon-seng Gan, Nanyang Technological University, Singapore; Jianfeng Chen, Northwestern Polytechnical University, China
AASP-P16.10: Acoustic Prompt Tuning: Empowering Large Language Models With Audition Capabilities
Jinhua Liang, Queen Mary University of London, United Kingdom of Great Britain and Northern Ireland; Xubo Liu, Wenwu Wang, Mark Plumbley, University of Surrey, United Kingdom of Great Britain and Northern Ireland; Huy Phan, Meta, France; Emmanouil Benetos, Queen Mary University of London, United Kingdom of Great Britain and Northern Ireland
Contacts