SS-L10: Question Answering and Reasoning on Audio and Time-Series Data
Oral
Thu, 7 May, 14:00 - 16:00
Location: Room 111
Session Type: Oral
Track: Special Sessions
Click the to view the manuscript on IEEE Xplore Open Preview
Thu, 7 May, 14:00 - 14:20

SS-L10.1: Multi-Domain Audio Question Answering Benchmark Toward Acoustic Content Reasoning

Chao-Han Huck Yang, Sreyan Ghosh, NVIDIA, United States of America; Qing Wang, USTC, China; Jaeyeon Kim, CMU, United States of America; Hengyi Hong, USTC, China; Sonal Kumar, University of Maryland, United States of America; Guirui Zhong, USTC, China; Zhifeng Kong, NVIDIA, United States of America; S Sakshi, Vaibhavi Lokegaonkar, University of Maryland, United States of America; Oriol Nieto, Adobe, United States of America; Ramani Duraiswami, Dinesh Manocha, University of Maryland, United States of America; Gunhee Kim, Seoul National University, Korea, Republic of; Jun Du, USTC, Korea, Republic of; Rafael Valle, Bryan Catanzaro, NVIDIA, Korea, Republic of
Thu, 7 May, 14:20 - 14:40

SS-L10.2: ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models

Bohan Li, Wenbin Huang, Yuhang Qiu, Yiwei Guo, Hankun Wang, Zhihan Li, Jing Peng, Ziyang Ma, Xie Chen, Kai Yu, Shanghai Jiao Tong University, China
Thu, 7 May, 14:40 - 15:00

SS-L10.3: Exploring Audio Hallucination in Egocentric Video Understanding

Ashish Seth, University Of Maryland, College Park, United States of America; Xinhao Mei, Changsheng Zhao, Varun Nagaraja, Ernie Chang, Greg Meyer, Gael Le Lan, Yunyang Xiong, Vikas Chandra, Yangyang Shi, Meta, United States of America; Dinesh Manocha, University Of Maryland, College Park, United States of America; Zhipeng Cai, Meta, United States of America
Thu, 7 May, 15:00 - 15:20

SS-L10.4: Reasoning Beyond Majority Vote: An Explainable SpeechLM Framework for Speech Emotion Recognition

Bo-Hao Su, Carnegie Mellon University, United States of America; Hui-Ying Shih, National Tsing Hua University, Taiwan; Jinchuan Tian, Jiatong Shi, Carnegie Mellon University, United States of America; Chi-Chun Lee, National Tsing Hua University, Taiwan; Carlos Busso, Shinji Watanabe, Carnegie Mellon University, United States of America
Thu, 7 May, 15:20 - 15:40

SS-L10.5: UNIPACT: A MULTIMODAL FRAMEWORK FOR PROGNOSTIC QUESTION ANSWERING ON RAW ECG AND STRUCTURED EHR

Jialu Tang, Eindhoven University of Technology, Netherlands; Tong Xia, Tsinghua University, United Kingdom of Great Britain and Northern Ireland; Yuan Lu, Aaqib Saeed, Eindhoven University of Technology, Netherlands
Thu, 7 May, 15:40 - 16:00

SS-L10.6: DIFFNATOR: GENERATING STRUCTURED EXPLANATIONS OF TIME-SERIES DIFFERENCES

Kota Dohi, Tomoya Nishida, Harsh Purohit, Takashi Endo, Yohei Kawaguchi, Hitachi, Ltd., Japan