SPE-P13: Speech Separation and Extraction III |
Session Type: Poster |
Time: Thursday, 7 May, 11:30 - 13:30 |
Location: On-Demand |
Virtual Session: View on Virtual Platform |
Session Chairs: Hakan Erdogan, Google and Marc Delcroix, NTT
|
|
SPE-P13.1: AN EMPIRICAL STUDY OF CONV-TASNET |
Berkan Kadioglu; Northeastern University |
Michael Horgan; Dolby Laboratories |
Xiaoyu Liu; Dolby Laboratories |
Jordi Pons; Dolby Laboratories |
Dan Darcy; Dolby Laboratories |
Vivek Kumar; Dolby Laboratories |
|
SPE-P13.2: MASK-DEPENDENT PHASE ESTIMATION FOR MONAURAL SPEAKER SEPARATION |
Zhaoheng Ni; Graduate Center, City University of New York |
Michael I Mandel; Brooklyn College, City University of New York |
|
SPE-P13.3: JOINT PHONEME ALIGNMENT AND TEXT-INFORMED SPEECH SEPARATION ON HIGHLY CORRUPTED SPEECH |
Kilian Schulze-Forster; LTCI, Télécom Paris, Institut Polytechnique de Paris |
Clement S. J. Doire; Audionamix |
Gaël Richard; LTCI, Télécom Paris, Institut Polytechnique de Paris |
Roland Badeau; LTCI, Télécom Paris, Institut Polytechnique de Paris |
|
SPE-P13.4: SINGLE-CHANNEL SPEECH SEPARATION INTEGRATING PITCH INFORMATION BASED ON A MULTI TASK LEARNING FRAMEWORK |
Xiang Li; Peking University |
Rui Liu; Peking University |
Tao Song; Peking University |
Xihong Wu; Peking University |
Jing Chen; Peking University |
|
SPE-P13.5: CONTINUOUS SPEECH SEPARATION: DATASET AND ANALYSIS |
Zhuo Chen; Microsoft |
Takuya Yoshioka; Microsoft |
Liang Lu; Microsoft |
Tianyan Zhou; Microsoft |
Zhong Meng; Microsoft |
Yi Luo; Microsoft |
Jian Wu; Microsoft |
Xiong Xiao; Microsoft |
Jinyu Li; Microsoft |
|
SPE-P13.6: THE SOUND OF MY VOICE: SPEAKER REPRESENTATION LOSS FOR TARGET VOICE SEPARATION |
Seongkyu Mun; Naver Corporation |
Soyeon Choe; Naver Corporation |
Jaesung Huh; Naver Corporation |
Joon Son Chung; Naver Corporation |
|
SPE-P13.7: SPEAKER-AWARE TARGET SPEAKER ENHANCEMENT BY JOINTLY LEARNING WITH SPEAKER EMBEDDING EXTRACTION |
Xuan Ji; Tencent |
Meng Yu; Tencent |
Chunlei Zhang; Tencent |
Dan Su; Tencent |
Tao Yu; Tencent |
Xiaoyu Liu; Tencent |
Dong Yu; Tencent |
|
SPE-P13.8: FAR-FIELD LOCATION GUIDED TARGET SPEECH EXTRACTION USING END-TO-END SPEECH RECOGNITION OBJECTIVES |
Aswin Shanmugam Subramanian; Johns Hopkins University |
Chao Weng; Tencent AI |
Meng Yu; Tencent AI |
Shi-Xiong Zhang; Tencent AI Lab |
Yong Xu; Tencent AI |
Shinji Watanabe; Johns Hopkins University |
Dong Yu; Tencent AI |
|
SPE-P13.9: A STUDY OF CHILD SPEECH EXTRACTION USING JOINT SPEECH ENHANCEMENT AND SEPARATION IN REALISTIC CONDITIONS |
Xin Wang; University of Science and Technology of China |
Jun Du; University of Science and Technology of China |
Alejandrina Cristia; Laboratoire de Sciences Cognitives et Psycholinguistique |
Lei Sun; University of Science and Technology of China |
Chin-Hui Lee; Georgia Institute of Technology |
|
SPE-P13.10: AN ANALYSIS OF SPEECH ENHANCEMENT AND RECOGNITION LOSSES IN LIMITED RESOURCES MULTI-TALKER SINGLE CHANNEL AUDIO-VISUAL ASR |
Luca Pasa; University of Padova |
Giovanni Morrone; University of Modena and Reggio Emilia |
Leonardo Badino; Istituto Italiano di Tecnologia (IIT) |
|
SPE-P13.11: DEEP AUDIO-VISUAL SPEECH SEPARATION WITH ATTENTION MECHANISM |
Chenda Li; Shanghai Jiao Tong University |
Yanmin Qian; Shanghai Jiao Tong University |
|
SPE-P13.12: ENHANCING END-TO-END MULTI-CHANNEL SPEECH SEPARATION VIA SPATIAL FEATURE LEARNING |
Rongzhi Gu; Peking University Shenzhen Graduate School |
Shi-Xiong Zhang; Tencent AI Lab |
Lianwu Chen; Tencent |
Yong Xu; Tencent |
Meng Yu; Tencent |
Dan Su; Tencent |
Yuexian Zou; Peking University Shenzhen Graduate School |
Dong Yu; Tencent |
|