SPE-P20: Speech Recognition: Acoustic Modelling II |
| Session Type: Poster |
| Time: Friday, 8 May, 11:45 - 13:45 |
| Location: On-Demand |
| Virtual Session: View on Virtual Platform |
| Session Chairs: Dorothea Kolossa, Ruhr University Bochum and Arun Narayanan, Google |
| SPE-P20.1: LIBRI-LIGHT: A BENCHMARK FOR ASR WITH LIMITED OR NO SUPERVISION |
| Jacob Kahn; Facebook |
| Morgane Rivière; Facebook |
| Weiyi Zheng; Facebook |
| Eugene Kharitonov; Facebook |
| Qiantong Xu; Facebook |
| Pierre-Emmanuel Mazaré; Facebook |
| Julien Karadayi; ENS |
| Vitaly Liptchinsky; Facebook |
| Ronan Collobert; Facebook |
| Christian Fuegen; Facebook |
| Tatiana Likhomanenko; Facebook |
| Gabriel Synnaeve; Facebook |
| Armand Joulin; Facebook |
| Abdelrahman Mohamed; Facebook |
| Emmanuel Dupoux; Facebook / EHESS |
| SPE-P20.2: A COMPREHENSIVE STUDY OF RESIDUAL CNNS FOR ACOUSTIC MODELING IN ASR |
| Vitalii Bozheniuk; RWTH Aachen University |
| Albert Zeyer; RWTH Aachen University |
| Ralf Schlüter; RWTH Aachen University |
| Hermann Ney; RWTH Aachen University |
| SPE-P20.3: LAYER-NORMALIZED LSTM FOR HYBRID-HMM AND END-TO-END ASR |
| Mohammad Zeineldeen; RWTH Aachen University |
| Albert Zeyer; RWTH Aachen University |
| Ralf Schlüter; RWTH Aachen University |
| Hermann Ney; RWTH Aachen University |
| SPE-P20.4: SMALL ENERGY MASKING FOR IMPROVED NEURAL NETWORK TRAINING FOR END-TO-END SPEECH RECOGNITION |
| Chanwoo Kim; Samsung Research |
| Kwangyoun Kim; Samsung Research |
| Sathish Reddy Indurthi; Samsung Research |
| SPE-P20.5: IMPROVING SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION TRAINING WITH ON-THE-FLY DATA AUGMENTATION |
| Thai Son Nguyen; Karlsruhe Institute of Technology |
| Sebastian Stueker; Karlsruhe Institute of Technology |
| Jan Niehues; Maastricht University |
| Alex Waibel; Karlsruhe Institute of Technology |
| SPE-P20.6: EFFECTIVENESS OF SELF-SUPERVISED PRE-TRAINING FOR ASR |
| Alexei Baevski; Facebook |
| Abdelrahman Mohamed; Facebook |
| SPE-P20.7: HIGH-ACCURACY AND LOW-LATENCY SPEECH RECOGNITION WITH TWO-HEAD CONTEXTUAL LAYER TRAJECTORY LSTM MODEL |
| Jinyu Li; Microsoft |
| Rui Zhao; Microsoft |
| Eric Sun; Microsoft |
| Jeremy Wong; Microsoft |
| Amit Das; Microsoft |
| Zhong Meng; Microsoft |
| Yifan Gong; Microsoft |
| SPE-P20.8: DFSMN-SAN WITH PERSISTENT MEMORY MODEL FOR AUTOMATIC SPEECH RECOGNITION |
| Zhao You; Tencent |
| Dan Su; Tencent |
| Jie Chen; Tencent |
| Chao Weng; Tencent |
| Dong Yu; Tencent |
| SPE-P20.9: DYNAMIC TEMPORAL RESIDUAL LEARNING FOR SPEECH RECOGNITION |
| Jiaqi Xie; Tsinghua University |
| Ruijie Yan; Tsinghua University |
| Shanyu Xiao; Tsinghua University |
| Liangrui Peng; Tsinghua University |
| Michael T. Johnson; University of Kentucky |
| Wei-Qiang Zhang; Tsinghua University |
| SPE-P20.10: E2E-SINCNET: TOWARD FULLY END-TO-END SPEECH RECOGNITION |
| Titouan Parcollet; University of Oxford |
| Mohamed Morchid; University of Avignon |
| Georges Linarès; University of Avignon |
| SPE-P20.11: SPEAKER AUGMENTATION FOR LOW RESOURCE SPEECH RECOGNITION |
| Chenpeng Du; Shanghai Jiao Tong University |
| Kai Yu; Shanghai Jiao Tong University |
| SPE-P20.12: CGCNN: COMPLEX GABOR CONVOLUTIONAL NEURAL NETWORK ON RAW SPEECH |
| Paul-Gauthier Noé; Avignon Université |
| Titouan Parcollet; University of Oxford |
| Mohamed Morchid; Avignon Université |