WS-3a.4
Open Implementation and Study of BEST-RQ for Speech Processing
Ryan Whetten, Avignon Université, France; Titouan Parcollet, Samsung AI, United Kingdom of Great Britain and Northern Ireland; Marco Dinarelli, Univervisté Grenoble Alpes, France; Yannick Estève, Avignon Université, France
Session:
WS-3a: Self-supervision in Audio, Speech and Beyond I Poster
Track:
Satellite Workshops
Location:
Workshop Poster
Poster Board WSP.4
Poster Board WSP.4
Presentation Time:
Tue, 16 Apr, 13:10 - 15:10 (UTC +9)
Presentation
Discussion
Resources
No resources available.
Session WS-3a
WS-3a.1: EXPLORING FEDERATED SELF-SUPERVISED LEARNING FOR GENERAL-PURPOSE AUDIO UNDERSTANDING
Yasar Abbas UR Rehman, Kin Wai Lau, Yuyang Xie, Lan Ma, Jiajun Shen, TCL, Hong Kong
WS-3a.2: Investigating Self-Supervised Features for Expressive, Multilingual Voice Conversion
Álvaro Martín-Cortinas, Daniel Sáez-Trigueros, Grzegorz Beringer, Iván Vallés-Pérez, Roberto Barra-Chicote, Biel Tura-Vecino, Adam Gabryś, Thomas Merritt, Piotr Biliński, Jaime Lorenzo-Trueba, Alexa AI, Spain
WS-3a.3: ACOUSTIC-TO-ARTICULATORY INVERSION FOR DYSARTHRIC SPEECH: ARE PRE-TRAINED SELF-SUPERVISED REPRESENTATIONS FAVORABLE?
Sarthak Kumar Maharana, Krishna Kamal Adidam, Shoumik Nandi, Ajitesh Srivastava, University of Southern California, United States of America
WS-3a.4: Open Implementation and Study of BEST-RQ for Speech Processing
Ryan Whetten, Avignon Université, France; Titouan Parcollet, Samsung AI, United Kingdom of Great Britain and Northern Ireland; Marco Dinarelli, Univervisté Grenoble Alpes, France; Yannick Estève, Avignon Université, France
WS-3a.5: SPEECHCLIP+: SELF-SUPERVISED MULTI-TASK REPRESENTATION LEARNING FOR SPEECH VIA CLIP AND SPEECH-IMAGE DATA
Hsuan-Fu Wang, National Taiwan University, Taiwan; Yi-Jen Shih, The University of Texas at Austin, USA, Taiwan; Heng-Jui Chang, Massachusetts Institute of Technology, Taiwan; Layne Berry, Puyuan Peng, The University of Texas at Austin, USA, Taiwan; Hung-yi Lee, National Taiwan University, Taiwan, Taiwan; Hsin-Min Wang, Institute of Information Science, Academia Sinica, Taiwan, Taiwan; David Harwath, The University of Texas at Austin, USA, Taiwan
WS-3a.6: VICMUS: VARIANCE-INVARIANCE-COVARIANCE REGULARIZATION FOR MUSIC REPRESENTATION LEARNING
Sebastian Löf, Epidemic Sound, Sweden; Cody Hesse, Chalmers University of Technology, Sweden; Carl Thomé, Carlos Lordelo, Epidemic Sound, Sweden; Jens Ahrens, Chalmers University of Technology, Sweden
WS-3a.7: Enhancing GAN-Based Vocoders with Contrastive Learning Under Data-limited Condition
Haoming Guo, Seth Zhao, Jiachen Lian, Gerald Friedland, Gopala Anumanchipalli, UC Berkeley, United States of America
WS-3a.8: noise robust distillation of self-supervised speech models via correlation metrics
Fabian Ritter-Gutierrez, Nanyang Technological University, Singapore; Kuan-Po Huang, National Taiwan University, Taiwan; Dianwen Ng, Alibaba Group / Nanyang Technological University, Singapore; Jeremy Wong, Institute for Infocomm Research, Singapore; Hung-yi Lee, National Taiwan University, Taiwan; Eng Siong Chng, Nanyang Technological University, Singapore; Nancy Chen, Institute for Infocomm Research, Singapore
WS-3a.9: Benchmarking Representations for Speech, Music, and Acoustic Events
Moreno La Quatra, Kore University of Enna, Italy; Alkis Koudounas, Lorenzo Vaiani, Elena Baralis, Luca Cagliero, Paolo Garza, Politecnico di Torino, Italy; Sabato Marco Siniscalchi, Università degli Studi di Palermo, Italy
WS-3a.10: On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification
Calum Heggan, University Of Edinburgh, United Kingdom of Great Britain and Northern Ireland; Sam Budgett, Thales UK, United Kingdom of Great Britain and Northern Ireland; Timothy Hosepedales, Mehrdad Yeghoobi, University Of Edinburgh, United Kingdom of Great Britain and Northern Ireland
WS-3a.11: Probing Self-supervised Learning Models with Target Speech Extraction
Junyi Peng, Brno University of Technology, Czechia; Marc Delcroix, Tsubasa Ochiai, NTT Corporation, Japan; Oldrich Plchot, Brno University of Technology, Czechia; Takanori Ashihara, Shoko Araki, NTT Corporation, Japan; Honza Černocký, Brno University of Technology, Czechia
WS-3a.12: INVESTIGATING ZERO-SHOT GENERALIZABILITY ON MANDARIN-ENGLISH CODE-SWITCHED ASR AND SPEECH-TO-TEXT TRANSLATION OF RECENT FOUNDATION MODELS WITH SELF-SUPERVISION AND WEAK SUPERVISION
Chih-Kai Yang, Kuan Po Huang, Ke-Han Lu, Chun-Yi Kuan, Chi-Yuan Hsiao, Hung-yi Lee, National Taiwan University, Taiwan
Contacts