IEEE ICASSP 2026 || Barcelona, Spain || 4-8 May 2026

SLP-P46.10

TWO-STAGE AUDIO-VISUAL TARGET SPEAKER EXTRACTION SYSTEM FOR REAL-TIME PROCESSING ON EDGE DEVICE

Zixuan Li, Xueliang Zhang, Inner Mongolia University, China; Lei Miao, Zhipeng Yan, Ying Sun, Chong Zhu, Lenovo, China

Session:

SLP-P46: Speech Enhancement for Real-World Applications Poster

Location:

Poster Area 31

Presentation Time:

Thu, 7 May, 16:30 - 18:30

Session Co-Chairs:

John Murray-Bruce, University of South Florida and Robin Scheibler, Google DeepMind

View Manuscript

Session SLP-P46

SLP-P46.1: TEST-TIME ADAPTATION FOR SPEECH ENHANCEMENT VIA MASK POLARIZATION

Tobias Raichle, Erfan Amini, Bin Yang, University of Stuttgart, Germany

SLP-P46.2: LExTra: Folded Prompt and Split-Role Attention for Target Speaker Extraction

Pengjie Shen, Inner Mongolia University, China; Shulin He, Southern University of Science and Technology, China; Xueliang Zhang, Inner Mongolia University, China; Zhong-Qiu Wang, Southern University of Science and Technology, China

SLP-P46.3: TOWARDS NOISE-ROBUST SPEECH INVERSION THROUGH MULTI-TASK LEARNING WITH SPEECH ENHANCEMENT

Saba Tabatabaee, Carol Espy-Wilson, University of Maryland, United States of America

SLP-P46.4: ENSEMBLE FOR REDUCING TARGET SPEECH EXTRACTION ERRORS

Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Takanori Ashihara, Naohiro Tawara, Tomohiro Nakatani, NTT, Inc., Japan

SLP-P46.5: LOW-POWER END-TO-END COCHLEAR IMPLANT SPEECH DENOISING WITH SPIKING NEURAL NETWORKS

Ludovic Boulanger, Sean U N Wood, University of Sherbrooke, Canada

SLP-P46.6: NLDSI-BWE: NON LINEAR DYNAMICAL SYSTEMS-INSPIRED MULTI RESOLUTION DISCRIMINATORS FOR SPEECH BANDWIDTH EXTENSION

Tarikul Islam Tamiti, Anomadarshi Barua, George Mason University, United States of America

SLP-P46.7: GENERATING TRAINING TARGETS FOR REAL-WORLD SPEECH ENHANCEMENT VIA CLOSE-TO-DISTANT MICROPHONE PROJECTION

Tomohiro Nakatani, Rintaro Ikeshita, Naoyuki Kamo, Marc Delcroix, Shoko Araki, NTT, Inc., Japan

SLP-P46.8: JUND-F0: A Novel Deep Learning Framework for Joint Unvoiced/Voiced Detection and F0 Estimation

Yuang Chen, Rui Feng, Yin-Long Liu, Yu Hu, Jiahong Yuan, University of Science and Technology of China, China

SLP-P46.9: Joint Enhancement and Bandwidth Extension for Radar Through-Barrier Speech Acquisition

Zhi-Wei Tan, V. G. Reju, Ritesh Chandra Tewari, Ruotong Ding, Nanyang Technological University, Singapore; Andy W. H. Khong, Nanyang Technological University; Lee Kong Chian School of Medicine, Singapore

SLP-P46.10: TWO-STAGE AUDIO-VISUAL TARGET SPEAKER EXTRACTION SYSTEM FOR REAL-TIME PROCESSING ON EDGE DEVICE

Zixuan Li, Xueliang Zhang, Inner Mongolia University, China; Lei Miao, Zhipeng Yan, Ying Sun, Chong Zhu, Lenovo, China

Contact | Accessibility | Nondiscrimination Policy | IEEE Ethics Reporting | IEEE Privacy Policy | Terms | Signal Processing Society

©2026 IEEE – All rights reserved.

Last updated Last updated 22 April 2026.

Use of this website signifies your agreement to the IEEE Terms and Conditions.

Support: webmaster@2026.ieeeicassp.org Host: https://cmsworldwide.com/