ASMSP-L3: Multimodal and Cross-Domain Audio Learning
Wed, 10 Sep, 09:00 - 10:40 Italy Time (UTC +2)
Location: Teatro del Sole
Session Type: Lecture
Session Co-Chairs: Tuomas Virtanen, Tampere University and Mark Sandler,
Track: ASMSP - Acoustic, Speech and Music Signal Processing
Wed, 10 Sep, 09:00 - 09:20 Italy Time (UTC +2)
ASMSP-L3.1: Representation Learning for Semantic Alignment of Language, Audio, and Visual Modalities.
Wed, 10 Sep, 09:20 - 09:40 Italy Time (UTC +2)
ASMSP-L3.2: LEVERAGING WAV2VEC2.0 AND DISTILBERT WITH AUTOENCODER-BASED DIMENSIONALITY REDUCTION FOR CONTINUOUS MULTIMODAL EMOTION RECOGNITION
Wed, 10 Sep, 09:40 - 10:00 Italy Time (UTC +2)
ASMSP-L3.3: Exploring Whisper Embeddings for Stutter Detection: A Layer-Wise Study
Wed, 10 Sep, 10:00 - 10:20 Italy Time (UTC +2)
ASMSP-L3.4: Face-Voice Association for Audiovisual Active Speaker Detection in Egocentric Recordings
Wed, 10 Sep, 10:20 - 10:40 Italy Time (UTC +2)