OD-SLA-3.1

A TARGET SPEAKER SEPARATION NEURAL NETWORK WITH JOINT-TRAINING

Wenjing Yang, Jing Wang, Kai Qian, Shenghua Hu, Beijing Institute of Technology, China; Hongfeng Li, Na Xu, Fei Xiang, Xiaomi, China

Session:
Speech Enhancement and Separation

Track:
Speech, Language, and Audio (SLA)

Session Time:
Thu, 16 Dec, 09:00 - 11:00 Japan Standard Time (UTC +9)
Thu, 16 Dec, 00:00 - 02:00 Coordinated Universal Time
Wed, 15 Dec, 19:00 - 21:00 Eastern Standard Time (UTC -4)
Wed, 15 Dec, 16:00 - 18:00 Pacific Standard Time (UTC -7)

Session Chair:
Robin Scheibler, LINE Corporation
Presentation
Not logged in.
Discussion
Not logged in.
Resources
Not logged in.
Session OD-SLA-3
TH1.OD-A.1: A TARGET SPEAKER SEPARATION NEURAL NETWORK WITH JOINT-TRAINING
Wenjing Yang, Jing Wang, Kai Qian, Shenghua Hu, Beijing Institute of Technology, China; Hongfeng Li, Na Xu, Fei Xiang, Xiaomi, China
TH1.OD-A.2: Improvement of Spatial Ambiguity in Multi-Channel Speech Separation Using Channel Attention
Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, National Cheng Kung University and Academia Sinica, Taiwan; Thanh Binh Nguyen, National Cheng Kung University, Viet Nam
TH1.OD-A.3: NOISE-TOLERANT TIME-DOMAIN SPEECH SEPARATION WITH NOISE BASES
Kohei Ozamoto, Kuniaki Uto, Koichi Shinoda, Tokyo Institute of Technology, Japan; Koji Iwano, Tokyo City University, Japan
TH1.OD-A.4: Minimum-volume regularized ILRMA for blind audio source separation
Jianyu Wang, Shanzheng Guan, Xiao-Lei Zhang, Northwestern Polytechnical University, China
TH1.OD-A.5: A comparison of handcrafted, parameterized, and learnable features for speech separation
Wenbo Zhu, Mou Wang, Xiao-Lei Zhang, Susanto Rahardja, Northwestern Polytechnical University, China
TH1.OD-A.6: Over-Determined Semi-Blind Speech Source Separation
Masahito Togami, Robin Scheibler, Line corporation, Japan
TH1.OD-A.7: GROUP MULTI-SCALE CONVOLUTIONAL NETWORK FOR MONAURAL SPEECH ENHANCEMENT IN TIME-DOMAIN
Juntao Yu, Jiang Ting, Jiacheng Yu, Beijing University of Posts and Telecommunications, China
TH1.OD-A.8: PRIOR DISTRIBUTION DESIGN FOR MUSIC BLEEDING-SOUND REDUCTION BASED ON NONNEGATIVE MATRIX FACTORIZATION
Yusaku Mizobuchi, Daichi Kitamura, National Institute of Technology, Kagawa College, Japan; Tomohiko Nakamura, Hiroshi Saruwatari, The University of Tokyo, Japan; Yu Takahashi, Kazunobu Kondo, Yamaha Corporation, Japan
TH1.OD-A.9: A STUDY ON SPEECH ENHANCEMENT BASED ON DIFFUSION PROBABILISTIC MODEL
Yen-Ju Lu, Yu Tsao, Academia Sinica, Taiwan; Shinji Watanabe, Carnegie Mellon University, United States of America
TH1.OD-A.10: A Deep Analysis of Speech Separation Guided Diarization Under Realistic Conditions
Xin Fang, Zhen-hua Ling, Shu-Tong Niu, Jun Du, University of Science and Technology of China, China; Lei Sun, Cong Liu, Zhi-chao Sheng, iFlytek Research, China
TH1.OD-A.11: Target Speaker Extraction for Customizable Query-by-Example Keyword Spotting
Qijie Shao, Jingyong Hou, Yanxin Hu, Qing Wang, Lei Xie, Northwestern Polytechnical University, China; Xin Lei, Mobvoi, United States of America
TH1.OD-A.12: Time Domain Speech Enhancement With Attentive Multi-scale Approach
Chen Chen, Nana Hou, Eng Siong Chng, Nanyang Technological University, Singapore; Duo Ma, National University of Singapore, Singapore
TH1.OD-A.13: On Speech Sparsity for Computational Efficiency and Noise Reduction in Hearing Aids
Adrien Llave, Simon Leglaive, CentraleSupélec, IETR, France
TH1.OD-A.14: SPARSELY OVERLAPPED SPEECH TRAINING IN THE TIME DOMAIN: JOINT LEARNING OF TARGET SPEECH SEPARATION AND PERSONAL VAD BENEFITS
Qingjian Lin, Lin Yang, Xuyang Wang, Luyuan Xie, Chen Jia, Junjie Wang, Lenovo, China