SS-P1.1

BEYOND VIDEO-TO-SFX: VIDEO TO AUDIO SYNTHESIS WITH ENVIRONMENTALLY AWARE SPEECH

Xinlei Niu, Australian National University, Australia; Jianbo Ma, Dylan Harper-Harris, Dolby Laboratories, Australia; Xiangyu Zhang, The University of New South Wales, Australia; Charles Partick Martin, Jing Zhang, Australian National University, Australia

Session:
SS-P1: Multimodal Ambient Scene Perception, Understanding and Modeling Poster

Track:
Special Sessions

Location:
Poster Area 2

Presentation Time:
Thu, 7 May, 14:00 - 16:00

Presentation
Discussion
Resources
No resources available.
Session SS-P1
SS-P1.1: BEYOND VIDEO-TO-SFX: VIDEO TO AUDIO SYNTHESIS WITH ENVIRONMENTALLY AWARE SPEECH
Xinlei Niu, Australian National University, Australia; Jianbo Ma, Dylan Harper-Harris, Dolby Laboratories, Australia; Xiangyu Zhang, The University of New South Wales, Australia; Charles Partick Martin, Jing Zhang, Australian National University, Australia
SS-P1.2: SIMTOKEN: A SIMPLE BASELINE FOR REFERRING AUDIO-VISUAL SEGMENTATION
Dian Jin, Hefei University of Technology, China; Yanghao Zhou, National University of Singapore, China; Jinxing Zhou, Jiaqi Ma, Mohamed bin Zayed University of Artificial Intelligence, China; Ruohao Guo, Peking University, China; Dan Guo, Hefei University of Technology, China
SS-P1.3: DIFFUSION-BASED UNSUPERVISED AUDIO-VISUAL SPEECH SEPARATION IN NOISY ENVIRONMENTS WITH NOISE PRIOR
Yochai Yemini, Bar-Ilan University, Israel; Rami Ben-Ari, OriginAI, Israel; Sharon Gannot, Ethan Fetaya, Bar-Ilan University, Israel
SS-P1.4: CIP-DOA: Cross-Instance Prompted DoA Estimation via Semantic-Spatial Matching
Yu Chen, The Chinese University of Hong Kong, Shenzhen, China; Qiquan Zhang, Tongyi Speech Lab, Alibaba Group, China; Jiadong Wang, Technical University of Munich, Germany; Kainan Chen, Eigenspace GmbH, Germany; Xinyuan Qian, University of Science and Technology Beijing, China
SS-P1.5: MULTIMODAL DEEP LEARNING METHOD FOR REAL-TIME SPATIAL ROOM IMPULSE RESPONSE COMPUTING
Zhiyu Li, Xinwen Yue, Shenghui Zhao, Jing Wang, Beijing Institute of Technology, China
SS-P1.6: LUSEEL: LANGUAGE-QUERIED BINAURAL UNIVERSAL SOUND EVENT EXTRACTION AND LOCALIZATION
Zexu Pan, Shengkui Zhao, Yukun Ma, Haoxu Wang, Yiheng Jiang, Biao Tian, Bin Ma, Alibaba Group, Singapore
SS-P1.7: A Multi-View Fusion Framework for Audio-Visual Multi-Speaker Tracking
Yihan Li, Hao Guo, Zhenhuan Xu, Yidi Li, Taiyuan University of Technology, China; Weiwei Wan, Osaka University, Japan
SS-P1.8: DGF-Net: Underwater Image Enhancement via Depth Priors and Frequency-Domain Modeling
Chang Huang, Jiatong Shen, Jingyao Liu, Kaixin Chen, Jun Ma, The Hong Kong University of Science and Technology (Guangzhou), China; Huayong Yang, Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), China; Kaishun Wu, The Hong Kong University of Science and Technology (Guangzhou), China
SS-P1.9: TOS: A TEAM OF SPECIALISTS ENSEMBLE FRAMEWORK FOR STEREO SOUND EVENT LOCALIZATION AND DETECTION WITH DISTANCE ESTIMATION IN VIDEO
Davide Berghi, Philip J. B. Jackson, University of Surrey, United Kingdom of Great Britain and Northern Ireland
SS-P1.10: PHYSICS-AWARE NOVEL-VIEW ACOUSTIC SYNTHESIS WITH VISION-LANGUAGE PRIORS AND 3D ACOUSTIC ENVIRONMENT MODELING
Congyi Fan, Jian Guan, Harbin Engineering University, China; Youtian Lin, Nanjing University, China; Dongli Xu, KU Leuven, Belgium; Tong Ye, Harbin Engineering University, China; Qiaoxi Zhu, University of Technology Sydney, Australia; Pengming Feng, State Key Laboratory of Space Information System and Integrated Application, China; Wenwu Wang, University of Surrey, United Kingdom of Great Britain and Northern Ireland
Contacts