SPE-P8: Robust Speech Recognition |
Session Type: Poster |
Time: Wednesday, 6 May, 11:30 - 13:30 |
Location: On-Demand |
Virtual Session: View on Virtual Platform |
Session Chairs: Ozlem Kalinli, Apple and Ebru Arisoy-Saraclar, MEF University |
SPE-P8.1: IMPROVING REVERBERANT SPEECH TRAINING USING DIFFUSE ACOUSTIC SIMULATION |
Zhenyu Tang; University of Maryland |
Lianwu Chen; Tencent AI Lab |
Bo Wu; Tencent AI Lab |
Dong Yu; Tencent AI Lab |
Dinesh Manocha; University of Maryland |
SPE-P8.2: LOW-FREQUENCY COMPENSATED SYNTHETIC IMPULSE RESPONSES FOR IMPROVED FAR-FIELD SPEECH RECOGNITION |
Zhenyu Tang; University of Maryland |
Hsien-Yu Meng; University of Maryland |
Dinesh Manocha; University of Maryland |
SPE-P8.3: AIPNET: GENERATIVE ADVERSARIAL PRE-TRAINING OF ACCENT-INVARIANT NETWORKS FOR END-TO-END SPEECH RECOGNITION |
Yi-Chen Chen; National Taiwan University |
Zhaojun Yang; Facebook |
Ching-Feng Yeh; Facebook |
Mahaveer Jain; Facebook |
Michael L. Seltzer; Facebook |
SPE-P8.4: AUDIO-VISUAL RECOGNITION OF OVERLAPPED SPEECH FOR THE LRS2 DATASET |
Jianwei Yu; Chinese University of Hong Kong |
Shi-Xiong Zhang; Tencent AI Lab |
Jian Wu; Tencent |
Shahram Ghorbani; University of Texas at Dallas |
Bo Wu; Tencent |
Shiyin Kang; Tencent |
Shansong Liu; Chinese University of Hong Kong |
Xunying Liu; Chinese University of Hong Kong |
Helen Meng; Chinese University of Hong Kong |
Dong Yu; Tencent |
SPE-P8.5: MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION |
Mirco Ravanelli; Université de Montréal |
Jianyuan Zhong; University of Rochester |
Santiago Pascual; Universitat Politecnica de Catalunya |
Pawel Swietojanski; University of New South Wales |
Joao Monteiro; Institut National de la Recherche Scientifique/Computer Research Institute of Montréal |
Jan Trmal; Johns Hopkins University |
Yoshua Bengio; Université de Montréal |
SPE-P8.6: END-TO-END MULTI-PERSON AUDIO/VISUAL AUTOMATIC SPEECH RECOGNITION |
Otavio Braga; Google |
Takaki Makino; Google |
Olivier Siohan; Google |
Hank Liao; Google |
SPE-P8.7: END-TO-END AUTOMATIC SPEECH RECOGNITION INTEGRATED WITH CTC-BASED VOICE ACTIVITY DETECTION |
Takenori Yoshimura; Nagoya University |
Tomoki Hayashi; Nagoya University |
Kazuya Takeda; Nagoya University |
Shinji Watanabe; Johns Hopkins University |
SPE-P8.8: END-TO-END TRAINING OF TIME DOMAIN AUDIO SEPARATION AND RECOGNITION |
Thilo von Neumann; Paderborn University |
Keisuke Kinoshita; NTT |
Lukas Drude; Paderborn University |
Christoph Boeddeker; Paderborn University |
Marc Delcroix; NTT |
Tomohiro Nakatani; NTT |
Reinhold Haeb-Umbach; Paderborn University |
SPE-P8.9: IMPROVING NOISE ROBUST AUTOMATIC SPEECH RECOGNITIONWITH SINGLE-CHANNEL TIME-DOMAIN ENHANCEMENT NETWORK |
Keisuke Kinoshita; NTT |
Tsubasa Ochiai; NTT |
Marc Delcroix; NTT |
Tomohiro Nakatani; NTT |
SPE-P8.10: A PRACTICAL TWO-STAGE TRAINING STRATEGY FOR MULTI-STREAM END-TO-END SPEECH RECOGNITION |
Ruizhi Li; Johns Hopkins University |
Gregory Sell; Johns Hopkins University |
Xiaofei Wang; Microsoft |
Shinji Watanabe; Johns Hopkins University |
Hynek Hermansky; Johns Hopkins University |
SPE-P8.11: MULTI-SCALE OCTAVE CONVOLUTIONS FOR ROBUST SPEECH RECOGNITION |
Joanna Rownicka; University of Edinburgh |
Peter Bell; University of Edinburgh |
Steve Renals; University of Edinburgh |
SPE-P8.12: LEARNING NOISE INVARIANT FEATURES THROUGH TRANSFER LEARNING FOR ROBUST END-TO-END SPEECH RECOGNITION |
Shucong Zhang; University of Edinburgh |
Cong-Thanh Do; Toshiba Research Europe Limited Company |
Rama Doddipatla; Toshiba Research Europe Limited Company |
Steve Renals; University of Edinburgh |