ASR-1: Automatic Speech Recognition I |
Session Type: Poster |
Time: Sunday, December 15, 10:30 - 12:00 |
Location: VHS Event Centre, Level 1 |
Session Chair: Koichi Shinoda, Tokyo Institute of Technology
|
|
ASR-1.1: INCREMENTAL LATTICE DETERMINIZATION FOR WFST DECODERS |
Zhehuai Chen, Shanghai Jiao Tong University, China; Mahsa Yarmohammadi, Hainan Xu, Johns Hopkins University, United States; Hang Lv, Lei Xie, Northwestern Polytechnical University, China; Daniel Povey, Sanjeev Khudanpur, Johns Hopkins University, United States |
|
ASR-1.2: A COMPARISON OF TRANSFORMER AND LSTM ENCODER DECODER MODELS FOR ASR |
Albert Zeyer, Parnia Bahar, Kazuki Irie, Ralf Schlüter, Hermann Ney, RWTH Aachen University, Germany |
|
ASR-1.3: A DROPOUT-BASED SINGLE MODEL COMMITTEE APPROACH FOR ACTIVE LEARNING IN ASR |
Jiayi Fu, Kuang Ru, Zhuiyi Technology Company, China |
|
ASR-1.4: PERSONALIZATION OF END-TO-END SPEECH RECOGNITION ON MOBILE DEVICES FOR NAMED ENTITIES |
Khe Chai Sim, Francoise Beaufays, Arnaud Benard, Dhruv Guliani, Andreas Kabel, Nikhil Khare, Tamar Lucassen, Petr Zadrazil, Harry Zhang, Leif Johnson, Giovanni Motta, Lillian Zhou, Google, United States |
|
ASR-1.5: SIMULTANEOUS SPEECH RECOGNITION AND SPEAKER DIARIZATION FOR MONAURAL DIALOGUE RECORDINGS WITH TARGET-SPEAKER ACOUSTIC MODELS |
Naoyuki Kanda, Shota Horiguchi, Yusuke Fujita, Yawen Xue, Kenji Nagamatsu, Hitachi, Ltd., Japan; Shinji Watanabe, Johns Hopkins University, United States |
|
ASR-1.6: INTEGRATING SOURCE-CHANNEL AND ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION |
Qiujia Li, Chao Zhang, Phil Woodland, University of Cambridge, United Kingdom |
|
ASR-1.7: AN INVESTIGATION INTO THE EFFECTIVENESS OF ENHANCEMENT IN ASR TRAINING AND TEST FOR CHIME-5 DINNER PARTY TRANSCRIPTION |
Catalin Zorila, Toshiba Cambridge Research Laboratory, United Kingdom; Christoph Boeddeker, Paderborn University, Germany; Rama Doddipatla, Toshiba Cambridge Research Laboratory, United Kingdom; Reinhold Haeb-Umbach, Paderborn University, Germany |
|
ASR-1.8: STATE-OF-THE-ART SPEECH RECOGNITION USING MULTI-STREAM SELF-ATTENTION WITH DILATED 1D CONVOLUTIONS |
Kyu Han, Ramon Prieto, Tao Ma, ASAPP, Inc., United States |
|
ASR-1.9: HIGHLY EFFICIENT NEURAL NETWORK LANGUAGE MODEL COMPRESSION USING SOFT BINARIZATION TRAINING |
Rao Ma, Qi Liu, Kai Yu, Shanghai Jiao Tong University, China |
|
ASR-1.10: IMPROVED MULTI-STAGE TRAINING OF ONLINE ATTENTION-BASED ENCODER-DECODER MODELS |
Abhinav Garg, Dhananjaya Gowda, Ankur Kumar, Kwangyoun Kim, Mehul Kumar, Chanwoo Kim, Samsung Research, Korea (South) |
|
ASR-1.11: LEAD2GOLD: TOWARDS EXPLOITING THE FULL POTENTIAL OF NOISY TRANSCRIPTIONS FOR SPEECH RECOGNITION |
Adrien Dufraux, Facebook AI Research, France; Emmanuel Vincent, INRIA, France; Awni Hannun, Facebook AI Research, United States; Armelle Brun, Université de Lorraine, France; Matthijs Douze, Facebook AI Research, France |
|
ASR-1.12: ORTHOGONALITY CONSTRAINED MULTI-HEAD ATTENTION FOR KEYWORD SPOTTING |
Mingu Lee, Jinkyu Lee, Hye Jin Jang, Byeonggeun Kim, Wonil Chang, Kyuwoong Hwang, Qualcomm AI Research, Korea (South) |
|
ASR-1.13: LEARNING BETWEEN DIFFERENT TEACHER AND STUDENT MODELS IN ASR |
Jeremy Heng Meng Wong, Microsoft, United States; Mark John Francis Gales, Yu Wang, University of Cambridge, United Kingdom |
|
ASR-1.14: A UNIFIED ENDPOINTER USING MULTITASK AND MULTIDOMAIN TRAINING |
Shuo-Yiin Chang, Bo Li, Gabor Simko, Google, United States |
|
ASR-1.15: DOMAIN EXPANSION IN DNN-BASED ACOUSTIC MODELS FOR ROBUST SPEECH RECOGNITION |
Shahram Ghorbani, Soheil Khorram, John H.L. Hansen, University of Texas at Dallas, United States |
|
ASR-1.16: IMPROVING RNN TRANSDUCER MODELING FOR END-TO-END SPEECH RECOGNITION |
Jinyu Li, Rui Zhao, Hu Hu, Yifan Gong, Microsoft, United States |
|
ASR-1.17: SIMPLE GATED CONVNET FOR SMALL FOOTPRINT ACOUSTIC MODELING |
Lukas Lee, Jinhwan Park, Wonyong Sung, Seoul National University, Korea (South) |
|
ASR-1.18: GANS FOR CHILDREN: A GENERATIVE DATA AUGMENTATION STRATEGY FOR CHILDREN SPEECH RECOGNITION |
Peiyao Sheng, Zhuolin Yang, Yanmin Qian, Shanghai Jiao Tong University, China |
|
ASR-1.19: ESPRESSO: A FAST END-TO-END NEURAL SPEECH RECOGNITION TOOLKIT |
Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Johns Hopkins University, United States; Hang Lv, Northwestern Polytechnical University, China; Yiwen Shao, Johns Hopkins University, United States; Nanyun Peng, University of Southern California, United States; Lei Xie, Northwestern Polytechnical University, China; Shinji Watanabe, Sanjeev Khudanpur, Johns Hopkins University, United States |
|