ASMSP-L3: Multimodal and Cross-Domain Audio Learning
Wed, 10 Sep, 09:00 - 10:40 Italy Time (UTC +2)
Location: Teatro del Sole
Session Type: Lecture
Session Co-Chairs: Tuomas Virtanen, Tampere University and Mark Sandler,
Track: ASMSP - Acoustic, Speech and Music Signal Processing
Wed, 10 Sep, 09:00 - 09:20 Italy Time (UTC +2)

ASMSP-L3.1: Representation Learning for Semantic Alignment of Language, Audio, and Visual Modalities.

Parthasaarathy Sudarsanam, Irene Martín-Morató, Tuomas Virtanen, Tampere University, Finland
Wed, 10 Sep, 09:20 - 09:40 Italy Time (UTC +2)

ASMSP-L3.2: LEVERAGING WAV2VEC2.0 AND DISTILBERT WITH AUTOENCODER-BASED DIMENSIONALITY REDUCTION FOR CONTINUOUS MULTIMODAL EMOTION RECOGNITION

Awatef Messaoudi, Hayet Boughrara, Zied Lachiri, University of Tunis El Manar National Engineering School of Tunis, Tunisia
Wed, 10 Sep, 09:40 - 10:00 Italy Time (UTC +2)

ASMSP-L3.3: Exploring Whisper Embeddings for Stutter Detection: A Layer-Wise Study

Ashita Batra, Brajesh kar, Prof Pradip K Das, Indian Institute of Technology Guwahati, India; ,
Wed, 10 Sep, 10:00 - 10:20 Italy Time (UTC +2)

ASMSP-L3.4: Face-Voice Association for Audiovisual Active Speaker Detection in Egocentric Recordings

Jason Clarke, Yoshi Gotoh, University of Sheffield, United Kingdom; Stefan Goetze, South Westphalia University of Applied Sciences, Germany
Wed, 10 Sep, 10:20 - 10:40 Italy Time (UTC +2)

ASMSP-L3.5: Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences

Hugo Malard, Salah Zaiem, Telecom Paris, France; Robin Algayres, Mohamed Bin Zayed University of Artificial Intelligence, France