AASP-P16: Audio Captioning, Retrieval, and Understanding
Poster
Thu, 7 May, 09:00 - 11:00
Location: Poster Area 25
Session Type: Poster
Track: Audio and Acoustic Signal Processing [AA]
Click the to view the manuscript on IEEE Xplore Open Preview

AASP-P16.1: SEGMENTWISE PRUNING IN AUDIO-LANGUAGE MODELS

Marcel Gibier, Inria, France; Raphael Duroselle, AMIAD, France; Pierre Serrano, Olivier Boeffard, Inria, France; Jean-François Bonastre, AMIAD, France

AASP-P16.2: HIERARCHICAL ACTIVITY RECOGNITION AND CAPTIONING FROM LONG-FORM AUDIO

Peng Zhang, Qingyu Luo, Philip Jackson, Wenwu Wang, University of Surrey, United Kingdom of Great Britain and Northern Ireland

AASP-P16.4: IMPROVING AUDIO QUESTION ANSWERING WITH VARIATIONAL INFERENCE

Haolin Chen, Idiap Research Institute, Switzerland

AASP-P16.6: GAUSSIAN LOCALITY PRIOR FOR CONTRAST–RECONSTRUCTION LEARNING: STATE–SPACE MODEL-BASED TIME–SERIES ANOMALY DETECTION

Yadong Niu, MiLM Plus, Xiaomi Inc, Beijing, China, China; Tianzi Wang, The Chinese University of Hong Kong, Hong Kong, China, China; Heinrich Dinkel, Xingwei Sun, MiLM Plus, Xiaomi Inc, Beijing, China, China; Jiahao Zhou, Beijing University of Posts and Telecommunications, Beijing, China, China; Gang Li, Jizhong Liu, Junbo Zhang, Jian Luan, MiLM Plus, Xiaomi Inc, Beijing, China, China

AASP-P16.7: CASTELLA: LONG AUDIO DATASET WITH CAPTIONS AND TEMPORAL BOUNDARIES

Hokuto Munakata, LY Corporation, Japan; Takehiro Imamura, Nagoya University, Japan; Taichi Nishimura, Tatsuya Komatsu, LY Corporation, Japan

AASP-P16.8: Teaching Audio Models to Reason: A Unified Framework for Source- and Layer-wise Distillation

Runyan Yang, Yuke Si, Yingying Gao, Junlan Feng, Chao Deng, Shilei Zhang, China Mobile, China

AASP-P16.9: AUDIOSETCAPS: AN ENRICHED AUDIO-CAPTION DATASET USING AUTOMATED GENERATION PIPELINE WITH LARGE AUDIO AND LANGUAGE MODELS

Jisheng Bai, Xi'an University of Posts & Telecommunications, China; Haohe Liu, Meta, United States of America; Mou Wang, Institute of Acoustics, Chinese Academy of Sciences, China; Dongyuan Shi, Northwestern Polytechnical University, China; Wenwu Wang, University of Surrey, United Kingdom of Great Britain and Northern Ireland; Mark Plumbley, King's College London, United Kingdom of Great Britain and Northern Ireland; Woon-seng Gan, Nanyang Technological University, Singapore; Jianfeng Chen, Northwestern Polytechnical University, China

AASP-P16.10: Acoustic Prompt Tuning: Empowering Large Language Models With Audition Capabilities

Jinhua Liang, Queen Mary University of London, United Kingdom of Great Britain and Northern Ireland; Xubo Liu, Wenwu Wang, Mark Plumbley, University of Surrey, United Kingdom of Great Britain and Northern Ireland; Huy Phan, Meta, France; Emmanouil Benetos, Queen Mary University of London, United Kingdom of Great Britain and Northern Ireland