SLP-P15.8
VBX FOR END-TO-END NEURAL AND CLUSTERING-BASED DIARIZATION
Petr Pálka, Jiangyu Han, Brno University of Technology, Czechia; Marc Delcroix, Naohiro Tawara, NTT, Inc., Japan; Lukáš Burget, Brno University of Technology, Czechia
Session:
SLP-P15: Advances in Neural Speaker Diarization Poster
Track:
Speech and Language Processing [SL]
Location:
Poster Area 29
Presentation Time:
Wed, 6 May, 09:00 - 11:00
Session Chair:
Taejin Park, Sr. Research Scientist at NVIDIA
Presentation
Discussion
Resources
No resources available.
Session SLP-P15
SLP-P15.1: Integrating Speaker Embeddings and LLM-Derived Semantic Representations for Streaming Speaker Diarization
Tianyou Cheng, University of Science and Technology of China, China; Changfeng Xi, Jia Pan, iFlytek Research, China; Ruoyu Wang, Hang Chen, University of Science and Technology of China, China; Jiangyu Han, Lukáš Burget, Brno University of Technology, China; Jianqing Gao, iFlytek Research, China; Jun Du, University of Science and Technology of China, China
SLP-P15.2: Train Short, Infer Long: Speech-LLM Enables Zero-Shot Streamable Joint ASR and Diarization on Long Audio
Mohan Shi, University of California, Los Angeles, United States of America; Xiong Xiao, Ruchao Fan, Shaoshi Ling, Jinyu Li, Microsoft, United States of America
SLP-P15.3: SPATIALLY AWARE SELF-SUPERVISED MODELS FOR MULTI-CHANNEL NEURAL SPEAKER DIARIZATION
Jiangyu Han, Brno University of Technology, Czechia; Ruoyu Wang, University of Science and Technology of China, China; Yoshiki Masuyama, Mitsubishi Electric Research Laboratories (MERL), United States of America; Marc Delcroix, NTT, Inc., Japan; Johan Rohdin, Brno University of Technology, Czechia; Jun Du, University of Science and Technology of China, China; Lukas Burget, Brno University of Technology, Czechia
SLP-P15.4: β-AVSDNET: A NOVEL END-TO-END NEURAL NETWORK ARCHITECTURE FOR AUDIO-VISUAL SPEAKER DIARIZATION
Changhuai You, Institute for Infocomm Research, Singapore
SLP-P15.5: AUTOMATIC ESTIMATION OF SPEAKER DIARIZATION ERROR RATE BASED ON FEATURES OF AUDIO QUALITY AND SPEAKER DISCRIMINABILITY
Kenkichi Ishizuka, Chang Zeng, Masaki Ono, Taiichi Hashimoto, RevComm Inc., Japan
SLP-P15.6: A FRAMEWORK FOR CONTROLLED MULTI-SPEAKER AUDIO SYNTHESIS FOR ROBUSTNESS EVALUATION OF SPEAKER DIARISATION SYSTEMS
Shreyas Ramoji, Vivek Kumar Thoppe Ravindranath, Thomas Hain, University of Sheffield, United Kingdom of Great Britain and Northern Ireland
SLP-P15.7: MITIGATING INTRA-SPEAKER VARIABILITY IN DIARIZATION WITH STYLE-CONTROLLABLE SPEECH AUGMENTATION
Miseul Kim, Yonsei University, Korea, Republic of; Soo Jin Park, Kyungguen Byun, Hyeon-Kyeong Shin, Sunkuk Moon, Shuhua Zhang, Erik Visser, Qualcomm Technologies Inc., United States of America
SLP-P15.8: VBX FOR END-TO-END NEURAL AND CLUSTERING-BASED DIARIZATION
Petr Pálka, Jiangyu Han, Brno University of Technology, Czechia; Marc Delcroix, Naohiro Tawara, NTT, Inc., Japan; Lukáš Burget, Brno University of Technology, Czechia
SLP-P15.9: Dual-Strategy-Enhanced ConBiMamba for Neural Speaker Diarization
liao zhen, Gaole Dai, Mengqiao Chen, Wenqing Cheng, Wei Xu, Huazhong University of Science and Technology, China
SLP-P15.10: ATTENTION-BASED ENCODER-DECODER TARGET-SPEAKER VOICE ACTIVITY DETECTION FOR ROBUST SPEAKER DIARIZATION
Zeyan Song, Tianyi Tan, Yushi Wang, Zheng Wang, Jing Lu, Key Laboratory of Modern Acoustics, Nanjing University, China
Contacts