EUSIPCO 2025 || Palermo, Italy || 8 - 12 September 2025

ASMSP-L3.5

Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences

Hugo Malard, Salah Zaiem, Telecom Paris, France; Robin Algayres, Mohamed Bin Zayed University of Artificial Intelligence, France

Session:

ASMSP-L3: Multimodal and Cross-Domain Audio Learning Lecture

Location:

Teatro del Sole

Presentation Time:

Wed, 10 Sep, 10:20 - 10:40 Italy Time (UTC +2)

Session Co-Chairs:

Tuomas Virtanen, Tampere University and Mark Sandler, Queen Mary University of London

Session ASMSP-L3

ASMSP-L3.1: Representation Learning for Semantic Alignment of Language, Audio, and Visual Modalities

Parthasaarathy Sudarsanam, Irene Martín-Morató, Tuomas Virtanen, Tampere University, Finland

ASMSP-L3.2: LEVERAGING WAV2VEC2.0 AND DISTILBERT WITH AUTOENCODER-BASED DIMENSIONALITY REDUCTION FOR CONTINUOUS MULTIMODAL EMOTION RECOGNITION

Awatef Messaoudi, Hayet Boughrara, Zied Lachiri, University of Tunis El Manar National Engineering School of Tunis, Tunisia

ASMSP-L3.3: Exploring Whisper Embeddings for Stutter Detection: A Layer-Wise Study

Ashita Batra, Brajesh kar, Prof Pradip K Das, Indian Institute of Technology Guwahati, India; ,

ASMSP-L3.4: Face-Voice Association for Audiovisual Active Speaker Detection in Egocentric Recordings

Jason Clarke, Yoshi Gotoh, University of Sheffield, United Kingdom; Stefan Goetze, South Westphalia University of Applied Sciences, Germany

ASMSP-L3.5: Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences

Hugo Malard, Salah Zaiem, Telecom Paris, France; Robin Algayres, Mohamed Bin Zayed University of Artificial Intelligence, France