AASP-L5.3

PICOAUDIO2: TEMPORAL CONTROLLABLE TEXT-TO-AUDIO GENERATION WITH NATURAL LANGUAGE DESCRIPTION

Zihao Zheng, Zeyu Xie, Xuenan Xu, SJTU, China; Wen Wu, Chao Zhang, Shanghai AI Lab, China; Mengyue Wu, SJTU, China

Session:
AASP-L5: Audio Understanding and Generation Oral

Track:
Audio and Acoustic Signal Processing [AA]

Location:
Room 127+128

Presentation Time:
Wed, 6 May, 17:10 - 17:30

Presentation
Discussion
Resources
No resources available.
Session AASP-L5
AASP-L5.1: AUDIOGENIE-REASONER: A TRAINING-FREE MULTI-AGENT FRAMEWORK FOR COARSE-TO-FINE AUDIO DEEP REASONING
Yan Rong, The Hong Kong University of Science and Technology (Guangzhou), China; Chenxing Li, Dong Yu, Tencent AI Lab, China; Li Liu, The Hong Kong University of Science and Technology (Guangzhou), China
AASP-L5.2: LAMB: LLM-BASED AUDIO CAPTIONING WITH MODALITY GAP BRIDGING VIA CAUCHY-SCHWARZ DIVERGENCE
Hyeongkeun Lee, Jongmin Choi, KiHyun Nam, Joon Son Chung, Korea Advanced Institute of Science and Technology, Korea, Republic of
AASP-L5.3: PICOAUDIO2: TEMPORAL CONTROLLABLE TEXT-TO-AUDIO GENERATION WITH NATURAL LANGUAGE DESCRIPTION
Zihao Zheng, Zeyu Xie, Xuenan Xu, SJTU, China; Wen Wu, Chao Zhang, Shanghai AI Lab, China; Mengyue Wu, SJTU, China
AASP-L5.4: FOLEYBENCH: A BENCHMARK FOR VIDEO-TO-AUDIO MODELS
Satvik Dixit, Carnegie Mellon University, United States of America; Koichi Saito, Sony AI, United States of America; Zhi Zhong, Sony Group Corporation, United States of America; Yuki Mitsufuji, Sony AI, United States of America; Chris Donahue, Carnegie Mellon University, United States of America
AASP-L5.5: SLAP: SCALABLE LANGUAGE-AUDIO PRETRAINING WITH VARIABLE-DURATION AUDIO AND MULTI-OBJECTIVE TRAINING
Xinhao Mei, Gael Le Lan, Haohe Liu, Zhaoheng Ni, Varun Nagaraja, Yang Liu, Yangyang Shi, Vikas Chandra, Meta, United States of America
AASP-L5.6: AUDIOCARDS: STRUCTURED METADATA IMPROVES AUDIO LANGUAGE MODELS FOR SOUND DESIGN
Sripathi Sridhar, New Jersey Institute of Technology, United States of America; Prem Seetharaman, Oriol Nieto, Adobe Research, United States of America; Mark Cartwright, New Jersey Institute of Technology, United States of America; Justin Salamon, Adobe Research, United States of America
Contacts