WS-3b.1
Positive and Negative Sampling Strategies for Self-supervised Learning on Audio-Video Data
Shanshan Wang, Soumya Tripathy, Toni Heittola, Annamaria Mesaros, Tampere University, Finland
Session:
WS-3b: Self-supervision in Audio, Speech and Beyond II Poster
Track:
Satellite Workshops
Location:
Workshop Poster
Poster Board WSP.1
Poster Board WSP.1
Presentation Time:
Tue, 16 Apr, 13:10 - 15:10 (UTC +9)
Presentation
Discussion
Resources
No resources available.
Session WS-3b
WS-3b.1: Positive and Negative Sampling Strategies for Self-supervised Learning on Audio-Video Data
Shanshan Wang, Soumya Tripathy, Toni Heittola, Annamaria Mesaros, Tampere University, Finland
WS-3b.2: Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations
Jialu Li, Mark Hasegawa-Johnson, Nancy McElwain, University of Illinois Urbana-Champaign, United States of America
WS-3b.3: Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency
Sungho Jeon, Heidelberg Institute of Theoretical Studies, Germany; Ching-Feng Yeh, Hakan Inan, Wei-Ning Hsu, Rashi Rungta, Yashar Mehdad, Daniel Bikel, Meta, United States of America
WS-3b.4: SOA: Reducing domain mismatch in SSL Pipeline by Speech Only Adaptation for low resource ASR
Natarajan Balaji Shankar, Ruchao Fan, Abeer Alwan, University of California, Los Angeles, United States of America
WS-3b.5: Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction
Aditya Ravuri, University of Cambridge, United Kingdom of Great Britain and Northern Ireland; Erica Cooper, Junichi Yamagishi, National Institute of Informatics, Japan
WS-3b.6: SYNTHETIC SPEECH DETECTION WITH WAV2VEC 2.0 IN VARIOUS LANGUAGE SETTINGS
Branimir Dropuljić, Miljenko Šuflaj, Andrej Jertec, Leo Obadić, RealNetworks, Croatia
WS-3b.7: Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition
Alexandra Saliba, Yuanchao Li, Ramon Sanabria, Catherine Lai, University of Edinburgh, United Kingdom of Great Britain and Northern Ireland
WS-3b.8: Self-Supervised Learning for Few-Shot Bird Sound Classification
Ilyass Moummad, IMT Atlantique, France; Romain Serizel, University of Lorraine, INRIA, LORIA, France; Nicolas Farrugia, IMT Atlantique, France
WS-3b.9: Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations
Panagiotis Kakoulidis, Nikolaos Ellinas, Georgios Vamvoukakis, Myrsini Christidou, Alexandra Vioni, Georgia Maniati, Junkwang Oh, Gunu Jho, Inchul Hwang, Pirros Tsiakoulis, Aimilios Chalamandaris, Samsung Electronics, Greece
WS-3b.10: A Study on the Impact of Self-Supervised Learning on Automatic Dysarthric Speech Assessment
Xavier Cadet, Ranya Aloufi, Sara Ahmadi-Abhari, Hamed Haddadi, Imperial College London, United Kingdom of Great Britain and Northern Ireland
WS-3b.11: Low-Resourced Phonetic and Prosodic Feature Estimation with Self-Supervised-Learning-Based Acoustic Modeling
Kiyoshi Kurihara, Masanori Sano, NHK (Japan Broadcasting Corporation), Japan
WS-3b.12: INTEGRATING SELF-SUPERVISED SPEECH MODEL WITH PSEUDO WORD-LEVEL TARGETS FROM VISUALLY-GROUNDED SPEECH MODEL
Hung-Chieh Fang, Nai-Xuan Ye, National Taiwan University, Taiwan; Yi-Jen Shih, Puyuan Peng, The University of Texas at Austin, Taiwan; Hsuan-Fu Wang, National Taiwan University, Taiwan; Layne Berry, The University of Texas at Austin, United States of America; Hung-yi Lee, National Taiwan University, Taiwan; David Harwath, The University of Texas at Austin, United States of America
Contacts