SLP-L23.3
AUDIO-VISUAL ACTIVE SPEAKER EXTRACTION FOR SPARSELY OVERLAPPED MULTI-TALKER SPEECH
Junjie Li, Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), China; Ruijie Tao, Zexu Pan, Meng Ge, Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore; Shuai Wang, Haizhou Li, Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), China
Session:
SLP-L23: Speech separation and extraction Lecture
Track:
Speech and Language Processing
Location:
Room 104
Presentation Time:
Thu, 18 Apr, 17:10 - 17:30 (UTC +9)
Session Co-Chairs:
Gordon Wichern , Mitsubihi Electric Research Labs (MERL) and Katerina Zmolikova, Meta
Session SLP-L23
SLP-L23.1: NEUROHEED+: IMPROVING NEURO-STEERED SPEAKER EXTRACTION WITH JOINT AUDITORY ATTENTION DETECTION
Zexu Pan, Gordon Wichern, Francois Germain, Sameer Khurana, Jonathan Le Roux, Mitsubishi Electric Research Laboratories, United States of America
SLP-L23.2: TARGET SPEECH EXTRACTION WITH PRE-TRAINED SELF-SUPERVISED LEARNING MODELS
Junyi Peng, Brno University of Technology, Czechia; Marc Delcroix, Tsubasa Ochiai, NTT Corporation, Japan; Oldřich Plchot, Brno University of Technology, Czechia; Shoko Araki, NTT Corporation, Japan; Jan Černocký, Brno University of Technology, Czechia
SLP-L23.3: AUDIO-VISUAL ACTIVE SPEAKER EXTRACTION FOR SPARSELY OVERLAPPED MULTI-TALKER SPEECH
Junjie Li, Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), China; Ruijie Tao, Zexu Pan, Meng Ge, Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore; Shuai Wang, Haizhou Li, Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), China
SLP-L23.4: AUDIOVISUAL SPEAKER SEPARATION WITH FULL- AND SUB-BAND MODELING IN THE TIME-FREQUENCY DOMAIN
Vahid Ahmadi Kalkhorani, Ohio State University, United States of America; Anurag Kumar, Ke Tan, Buye Xu, Meta Reality Labs, United States of America; DeLiang Wang, Ohio State University, United States of America
SLP-L23.5: Combining Conformer and Dual-Path-Transformer Networks for Single Channel Noisy Reverberant Speech Separation
William Ravenscroft, Stefan Goetze, Thomas Hain, The University of Sheffield, United Kingdom of Great Britain and Northern Ireland
SLP-L23.6: Generation-based Target Speech Extraction with Speech Discretization and Vocoder
Linfeng Yu, Wangyou Zhang, Chenpeng Du, Leying Zhang, Zheng Liang, Yanmin Qian, Shanghai Jiao Tong University, China
Contacts