WS-3b: Self-supervision in Audio, Speech and Beyond II
Tue, 16 Apr, 13:10 - 15:10 (UTC +9)
Location: Workshop Poster
Session Type: Poster
Track: Satellite Workshops
Click the to view the manuscript on IEEE Xplore Open Preview

WS-3b.1: Positive and Negative Sampling Strategies for Self-supervised Learning on Audio-Video Data

Shanshan Wang, Soumya Tripathy, Toni Heittola, Annamaria Mesaros, Tampere University, Finland

WS-3b.2: Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations

Jialu Li, Mark Hasegawa-Johnson, Nancy McElwain, University of Illinois Urbana-Champaign, United States of America

WS-3b.3: Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency

Sungho Jeon, Heidelberg Institute of Theoretical Studies, Germany; Ching-Feng Yeh, Hakan Inan, Wei-Ning Hsu, Rashi Rungta, Yashar Mehdad, Daniel Bikel, Meta, United States of America

WS-3b.4: SOA: Reducing domain mismatch in SSL Pipeline by Speech Only Adaptation for low resource ASR

Natarajan Balaji Shankar, Ruchao Fan, Abeer Alwan, University of California, Los Angeles, United States of America

WS-3b.5: Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction

Aditya Ravuri, University of Cambridge, United Kingdom of Great Britain and Northern Ireland; Erica Cooper, Junichi Yamagishi, National Institute of Informatics, Japan

WS-3b.6: SYNTHETIC SPEECH DETECTION WITH WAV2VEC 2.0 IN VARIOUS LANGUAGE SETTINGS

Branimir Dropuljić, Miljenko Šuflaj, Andrej Jertec, Leo Obadić, RealNetworks, Croatia

WS-3b.7: Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition

Alexandra Saliba, Yuanchao Li, Ramon Sanabria, Catherine Lai, University of Edinburgh, United Kingdom of Great Britain and Northern Ireland

WS-3b.8: Self-Supervised Learning for Few-Shot Bird Sound Classification

Ilyass Moummad, IMT Atlantique, France; Romain Serizel, University of Lorraine, INRIA, LORIA, France; Nicolas Farrugia, IMT Atlantique, France

WS-3b.9: Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations

Panagiotis Kakoulidis, Nikolaos Ellinas, Georgios Vamvoukakis, Myrsini Christidou, Alexandra Vioni, Georgia Maniati, Junkwang Oh, Gunu Jho, Inchul Hwang, Pirros Tsiakoulis, Aimilios Chalamandaris, Samsung Electronics, Greece

WS-3b.10: A Study on the Impact of Self-Supervised Learning on Automatic Dysarthric Speech Assessment

Xavier Cadet, Ranya Aloufi, Sara Ahmadi-Abhari, Hamed Haddadi, Imperial College London, United Kingdom of Great Britain and Northern Ireland

WS-3b.11: Low-Resourced Phonetic and Prosodic Feature Estimation with Self-Supervised-Learning-Based Acoustic Modeling

Kiyoshi Kurihara, Masanori Sano, NHK (Japan Broadcasting Corporation), Japan

WS-3b.12: INTEGRATING SELF-SUPERVISED SPEECH MODEL WITH PSEUDO WORD-LEVEL TARGETS FROM VISUALLY-GROUNDED SPEECH MODEL

Hung-Chieh Fang, Nai-Xuan Ye, National Taiwan University, Taiwan; Yi-Jen Shih, Puyuan Peng, The University of Texas at Austin, Taiwan; Hsuan-Fu Wang, National Taiwan University, Taiwan; Layne Berry, The University of Texas at Austin, United States of America; Hung-yi Lee, National Taiwan University, Taiwan; David Harwath, The University of Texas at Austin, United States of America