SLP-L16: Multi-Talker & Conversational ASR
Oral
Fri, 8 May, 14:00 - 16:00
Location: Room 115
Session Type: Oral
Track: Speech and Language Processing [SL]
Click the to view the manuscript on IEEE Xplore Open Preview
Fri, 8 May, 14:00 - 14:20

SLP-L16.1: ADVANCING LLM-BASED MULTI-CHANNEL MULTI-SPEAKER SPEECH RECOGNITION WITH GLOBAL CROSS-CHANNEL ATTENTION AND SENTENCE-ORDERED FIRST-IN FIRST-OUT SERIALIZED OUTPUT TRAINING

Genshun Wan, Lijuan Liu, University of Science and Technology of China, China, China; Changfeng Xi, iFlytek Research, China, China; Hang Chen, University of Science and Technology of China, China, China; Xindi Yu, Jia Pan, iFlytek Research, China, China; Jun Du, Zhongfu Ye, University of Science and Technology of China, China, China
Fri, 8 May, 14:20 - 14:40

SLP-L16.2: SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper

Alexander Polok, Dominik Klement, Brno University of Technology, Czechia; Samuele Cornell, Carnegie Mellon University, United States of America; Matthew Wiesner, Johns Hopkins University, United States of America; Jan Černocký, Brno University of Technology, Czechia; Sanjeev Khudanpur, Johns Hopkins University, United States of America; Lukáš Burget, Brno University of Technology, Czechia
Fri, 8 May, 14:40 - 15:00

SLP-L16.3: ADAPTING DIARIZATION-CONDITIONED WHISPER FOR END-TO-END MULTI-TALKER SPEECH RECOGNITION

Martin Kocour, Filevine, United States of America; Martin Karafiát, Alexander Polok, Dominik Klement, Lukáš Burget, Jan Černocký, Brno University of Technology, Czechia
Fri, 8 May, 15:00 - 15:20

SLP-L16.4: CALM: JOINT CONTEXTUAL ACOUSTIC-LINGUISTIC MODELING FOR PERSONALIZATION OF MULTI-SPEAKER ASR

Muhammad Shakeel, Yosuke Fukumoto, Chikara Maeda, Honda Research Institute Japan Co.,Ltd., Japan; Chyi-Jiunn Lin, Shinji Watanabe, Carnegie Mellon University, USA, Japan
Fri, 8 May, 15:20 - 15:40

SLP-L16.5: SCALING MULTI-TALKER ASR WITH SPEAKER-AGNOSTIC ACTIVITY STREAMS

Xiluo He, Johns Hopkins University, United States of America; Alexander Polok, Brno University of Technology, Czechia; Jesus Villalba, Thomas Thebaud, Matthew Maciejewski, Johns Hopkins University, United States of America
Fri, 8 May, 15:40 - 16:00

SLP-L16.6: TARGET-SPEAKER LLM-ASR WITH SPEAKER-AWARE SPEECH ENCODER

Minsoo Kim, SangHun Kim, Electronics and Telecommunications Research Institute, Korea, Republic of