MLSP-P8.11

POSITIVE TRANSFER OF THE WHISPER SPEECH TRANSFORMER TO HUMAN AND ANIMAL VOICE ACTIVITY DETECTION

Nianlong Gu, University of Zurich, Switzerland; Kanghwi Lee, Maris Basha, University of Zurich and ETH Zurich, Switzerland; Sumit Kumar Ram, Guanghao You, University of Zurich, Switzerland; Richard H. R. Hahnloser, University of Zurich and ETH Zurich, Switzerland

Session:
MLSP-P8: Machine Learning for Audio, Speech and Music Processing Poster

Track:
Machine Learning for Signal Processing

Location:
Poster Zone 4C
Poster Board PZ-4C.11

Presentation Time:
Tue, 16 Apr, 16:30 - 18:30 (UTC +9)

Session Chair:
Tomohiro Nakatani, NTT
View Manuscript
Presentation
Discussion
Resources
Session MLSP-P8
MLSP-P8.1: AUDIO MATCH CUTTING: FINDING AND CREATING MATCHING AUDIO TRANSITIONS IN MOVIES AND VIDEOS
Dennis Fedorishin, Dolby Laboratories, University at Buffalo, United States of America; Lie Lu, Dolby Laboratories, United States of America; Srirangaraj Setlur, Venu Govindaraju, University at Buffalo, United States of America
MLSP-P8.2: ON THE OPEN PROMPT CHALLENGE IN CONDITIONAL AUDIO GENERATION
Ernie Chang, Sidd Srinivasan, Mahi Luthra, Meta AI, United States of America; Pin-Jie Lin, Saarland University, Germany; Varun Nagaraja, Forrest Iandola, Zechun Liu, Zhaoheng Ni, Changsheng Zhao, Yangyang Shi, Vikas Chandra, Meta AI, United States of America
MLSP-P8.3: IN-CONTEXT PROMPT EDITING FOR CONDITIONAL AUDIO GENERATION
Ernie Chang, Meta AI, Germany; Pin-Jie Lin, Saarland University, Germany; Yang Li, Sidd Srinivasan, Gael Le Lan, David Kant, Yangyang Shi, Forrest Iandola, Vikas Chandra, Meta AI, United States of America
MLSP-P8.4: HyperGANStrument: Instrument Sound Synthesis and Editing with Pitch-Invariant Hypernetworks
Zhe Zhang, Taketo Akama, Sony Computer Science Laboratories, Japan
MLSP-P8.5: EXPRESSIVE ACOUSTIC GUITAR SOUND SYNTHESIS WITH AN INSTRUMENT-SPECIFIC INPUT REPRESENTATION AND DIFFUSION OUTPAINTING
Hounsu Kim, Soonbeom Choi, Juhan Nam, Korea Advanced Institute of Science and Technology, Korea, Republic of
MLSP-P8.6: PERFORMANCE CONDITIONING FOR DIFFUSION-BASED MULTI-INSTRUMENT MUSIC SYNTHESIS
Ben Maman, Tel Aviv University, Israel; Johannes Zeitler, Meinard Müller, International Audio Laboratories Erlangen, Germany, Germany; Amit H. Bermano, Tel Aviv University, Israel
MLSP-P8.7: Multi-band speech tensor decomposition for interactive feature extraction in early dysphagia screening
Fei He, Yipeng Liu, Da Shen, University of Electronic Science and Technology of China, China; Yangyang Jiang, Ying Li, Sichuan University, China; Ce Zhu, University of Electronic Science and Technology of China, China
MLSP-P8.8: A SOUND APPROACH: USING LARGE LANGUAGE MODELS TO GENERATE AUDIO DESCRIPTIONS FOR EGOCENTRIC TEXT-AUDIO RETRIEVAL
Andreea-Maria Oncescu, João F. Henriques, Andrew Zisserman, University of Oxford, United Kingdom of Great Britain and Northern Ireland; Samuel Albanie, University of Cambridge, United Kingdom of Great Britain and Northern Ireland; A. Sophia Koepke, University of Tübingen, Germany
MLSP-P8.9: HIERARCHICAL METADATA INFORMATION CONSTRAINED SELF-SUPERVISED LEARNING FOR ANOMALOUS SOUND DETECTION UNDER DOMAIN SHIFT
Haiyan Lan, Harbin Engineering University, China; Qiaoxi Zhu, University of Technology Sydney, Australia; Jian Guan, Yuming Wei, Harbin Engineering University, China; Wenwu Wang, University of Surrey, United Kingdom of Great Britain and Northern Ireland
MLSP-P8.10: GPT-4 DRIVEN CINEMATIC MUSIC GENERATION THROUGH TEXT PROCESSING
Muhammad Taimoor Haseeb, Ahmad Hammoudeh, Gus Xia, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), United Arab Emirates
MLSP-P8.11: POSITIVE TRANSFER OF THE WHISPER SPEECH TRANSFORMER TO HUMAN AND ANIMAL VOICE ACTIVITY DETECTION
Nianlong Gu, University of Zurich, Switzerland; Kanghwi Lee, Maris Basha, University of Zurich and ETH Zurich, Switzerland; Sumit Kumar Ram, Guanghao You, University of Zurich, Switzerland; Richard H. R. Hahnloser, University of Zurich and ETH Zurich, Switzerland
Contacts