SLP-L16.4

CALM: JOINT CONTEXTUAL ACOUSTIC-LINGUISTIC MODELING FOR PERSONALIZATION OF MULTI-SPEAKER ASR

Muhammad Shakeel, Yosuke Fukumoto, Chikara Maeda, Honda Research Institute Japan Co.,Ltd., Japan; Chyi-Jiunn Lin, Shinji Watanabe, Carnegie Mellon University, USA, Japan

Session:
SLP-L16: Multi-Talker & Conversational ASR Oral

Track:
Speech and Language Processing [SL]

Location:
Room 115

Presentation Time:
Fri, 8 May, 15:00 - 15:20

Presentation
Discussion
Resources
No resources available.
Session SLP-L16
SLP-L16.1: ADVANCING LLM-BASED MULTI-CHANNEL MULTI-SPEAKER SPEECH RECOGNITION WITH GLOBAL CROSS-CHANNEL ATTENTION AND SENTENCE-ORDERED FIRST-IN FIRST-OUT SERIALIZED OUTPUT TRAINING
Genshun Wan, Lijuan Liu, University of Science and Technology of China, China, China; Changfeng Xi, iFlytek Research, China, China; Hang Chen, University of Science and Technology of China, China, China; Xindi Yu, Jia Pan, iFlytek Research, China, China; Jun Du, Zhongfu Ye, University of Science and Technology of China, China, China
SLP-L16.2: SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper
Alexander Polok, Dominik Klement, Brno University of Technology, Czechia; Samuele Cornell, Carnegie Mellon University, United States of America; Matthew Wiesner, Johns Hopkins University, United States of America; Jan Černocký, Brno University of Technology, Czechia; Sanjeev Khudanpur, Johns Hopkins University, United States of America; Lukáš Burget, Brno University of Technology, Czechia
SLP-L16.3: ADAPTING DIARIZATION-CONDITIONED WHISPER FOR END-TO-END MULTI-TALKER SPEECH RECOGNITION
Martin Kocour, Filevine, United States of America; Martin Karafiát, Alexander Polok, Dominik Klement, Lukáš Burget, Jan Černocký, Brno University of Technology, Czechia
SLP-L16.4: CALM: JOINT CONTEXTUAL ACOUSTIC-LINGUISTIC MODELING FOR PERSONALIZATION OF MULTI-SPEAKER ASR
Muhammad Shakeel, Yosuke Fukumoto, Chikara Maeda, Honda Research Institute Japan Co.,Ltd., Japan; Chyi-Jiunn Lin, Shinji Watanabe, Carnegie Mellon University, USA, Japan
SLP-L16.5: SCALING MULTI-TALKER ASR WITH SPEAKER-AGNOSTIC ACTIVITY STREAMS
Xiluo He, Johns Hopkins University, United States of America; Alexander Polok, Brno University of Technology, Czechia; Jesus Villalba, Thomas Thebaud, Matthew Maciejewski, Johns Hopkins University, United States of America
SLP-L16.6: TARGET-SPEAKER LLM-ASR WITH SPEAKER-AWARE SPEECH ENCODER
Minsoo Kim, SangHun Kim, Electronics and Telecommunications Research Institute, Korea, Republic of
Contacts