SPE-P8: Robust Speech Recognition |
| Session Type: Poster |
| Time: Wednesday, 6 May, 11:30 - 13:30 |
| Location: On-Demand |
| Virtual Session: View on Virtual Platform |
| Session Chairs: Ozlem Kalinli, Apple and Ebru Arisoy-Saraclar, MEF University |
| SPE-P8.1: IMPROVING REVERBERANT SPEECH TRAINING USING DIFFUSE ACOUSTIC SIMULATION |
| Zhenyu Tang; University of Maryland |
| Lianwu Chen; Tencent AI Lab |
| Bo Wu; Tencent AI Lab |
| Dong Yu; Tencent AI Lab |
| Dinesh Manocha; University of Maryland |
| SPE-P8.2: LOW-FREQUENCY COMPENSATED SYNTHETIC IMPULSE RESPONSES FOR IMPROVED FAR-FIELD SPEECH RECOGNITION |
| Zhenyu Tang; University of Maryland |
| Hsien-Yu Meng; University of Maryland |
| Dinesh Manocha; University of Maryland |
| SPE-P8.3: AIPNET: GENERATIVE ADVERSARIAL PRE-TRAINING OF ACCENT-INVARIANT NETWORKS FOR END-TO-END SPEECH RECOGNITION |
| Yi-Chen Chen; National Taiwan University |
| Zhaojun Yang; Facebook |
| Ching-Feng Yeh; Facebook |
| Mahaveer Jain; Facebook |
| Michael L. Seltzer; Facebook |
| SPE-P8.4: AUDIO-VISUAL RECOGNITION OF OVERLAPPED SPEECH FOR THE LRS2 DATASET |
| Jianwei Yu; Chinese University of Hong Kong |
| Shi-Xiong Zhang; Tencent AI Lab |
| Jian Wu; Tencent |
| Shahram Ghorbani; University of Texas at Dallas |
| Bo Wu; Tencent |
| Shiyin Kang; Tencent |
| Shansong Liu; Chinese University of Hong Kong |
| Xunying Liu; Chinese University of Hong Kong |
| Helen Meng; Chinese University of Hong Kong |
| Dong Yu; Tencent |
| SPE-P8.5: MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION |
| Mirco Ravanelli; Université de Montréal |
| Jianyuan Zhong; University of Rochester |
| Santiago Pascual; Universitat Politecnica de Catalunya |
| Pawel Swietojanski; University of New South Wales |
| Joao Monteiro; Institut National de la Recherche Scientifique/Computer Research Institute of Montréal |
| Jan Trmal; Johns Hopkins University |
| Yoshua Bengio; Université de Montréal |
| SPE-P8.6: END-TO-END MULTI-PERSON AUDIO/VISUAL AUTOMATIC SPEECH RECOGNITION |
| Otavio Braga; Google |
| Takaki Makino; Google |
| Olivier Siohan; Google |
| Hank Liao; Google |
| SPE-P8.7: END-TO-END AUTOMATIC SPEECH RECOGNITION INTEGRATED WITH CTC-BASED VOICE ACTIVITY DETECTION |
| Takenori Yoshimura; Nagoya University |
| Tomoki Hayashi; Nagoya University |
| Kazuya Takeda; Nagoya University |
| Shinji Watanabe; Johns Hopkins University |
| SPE-P8.8: END-TO-END TRAINING OF TIME DOMAIN AUDIO SEPARATION AND RECOGNITION |
| Thilo von Neumann; Paderborn University |
| Keisuke Kinoshita; NTT |
| Lukas Drude; Paderborn University |
| Christoph Boeddeker; Paderborn University |
| Marc Delcroix; NTT |
| Tomohiro Nakatani; NTT |
| Reinhold Haeb-Umbach; Paderborn University |
| SPE-P8.9: IMPROVING NOISE ROBUST AUTOMATIC SPEECH RECOGNITIONWITH SINGLE-CHANNEL TIME-DOMAIN ENHANCEMENT NETWORK |
| Keisuke Kinoshita; NTT |
| Tsubasa Ochiai; NTT |
| Marc Delcroix; NTT |
| Tomohiro Nakatani; NTT |
| SPE-P8.10: A PRACTICAL TWO-STAGE TRAINING STRATEGY FOR MULTI-STREAM END-TO-END SPEECH RECOGNITION |
| Ruizhi Li; Johns Hopkins University |
| Gregory Sell; Johns Hopkins University |
| Xiaofei Wang; Microsoft |
| Shinji Watanabe; Johns Hopkins University |
| Hynek Hermansky; Johns Hopkins University |
| SPE-P8.11: MULTI-SCALE OCTAVE CONVOLUTIONS FOR ROBUST SPEECH RECOGNITION |
| Joanna Rownicka; University of Edinburgh |
| Peter Bell; University of Edinburgh |
| Steve Renals; University of Edinburgh |
| SPE-P8.12: LEARNING NOISE INVARIANT FEATURES THROUGH TRANSFER LEARNING FOR ROBUST END-TO-END SPEECH RECOGNITION |
| Shucong Zhang; University of Edinburgh |
| Cong-Thanh Do; Toshiba Research Europe Limited Company |
| Rama Doddipatla; Toshiba Research Europe Limited Company |
| Steve Renals; University of Edinburgh |