ASMSP-L3.2
LEVERAGING WAV2VEC2.0 AND DISTILBERT WITH AUTOENCODER-BASED DIMENSIONALITY REDUCTION FOR CONTINUOUS MULTIMODAL EMOTION RECOGNITION
Awatef Messaoudi, Hayet Boughrara, Zied Lachiri, University of Tunis El Manar National Engineering School of Tunis, Tunisia
Session:
ASMSP-L3: Multimodal and Cross-Domain Audio Learning Lecture
Track:
ASMSP - Acoustic, Speech and Music Signal Processing
Location:
Teatro del Sole
Presentation Time:
Wed, 10 Sep, 09:20 - 09:40 Italy Time (UTC +2)
Session Co-Chairs:
Tuomas Virtanen, Tampere University and Mark Sandler,
Presentation
Discussion
Resources
No resources available.
Session ASMSP-L3
ASMSP-L3.1: Representation Learning for Semantic Alignment of Language, Audio, and Visual Modalities.
Parthasaarathy Sudarsanam, Irene MartÃn-Morató, Tuomas Virtanen, Tampere University, Finland
ASMSP-L3.2: LEVERAGING WAV2VEC2.0 AND DISTILBERT WITH AUTOENCODER-BASED DIMENSIONALITY REDUCTION FOR CONTINUOUS MULTIMODAL EMOTION RECOGNITION
Awatef Messaoudi, Hayet Boughrara, Zied Lachiri, University of Tunis El Manar National Engineering School of Tunis, Tunisia
ASMSP-L3.3: Exploring Whisper Embeddings for Stutter Detection: A Layer-Wise Study
Ashita Batra, Brajesh kar, Prof Pradip K Das, Indian Institute of Technology Guwahati, India; ,
ASMSP-L3.4: Face-Voice Association for Audiovisual Active Speaker Detection in Egocentric Recordings
Jason Clarke, Yoshi Gotoh, University of Sheffield, United Kingdom; Stefan Goetze, South Westphalia University of Applied Sciences, Germany
ASMSP-L3.5: Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences
Hugo Malard, Salah Zaiem, Telecom Paris, France; Robin Algayres, Mohamed Bin Zayed University of Artificial Intelligence, France