SPE-P9: End-to-end Speech Recognition III: General Topics |
Session Type: Poster |
Time: Wednesday, 6 May, 16:30 - 18:30 |
Location: On-Demand |
Virtual Session: View on Virtual Platform |
Session Chairs: Takaaki Hori, MERL and Yifan Gong, Microsoft
|
|
SPE-P9.1: IMPROVING SPEECH RECOGNITION USING CONSISTENT PREDICTIONS ON SYNTHESIZED SPEECH |
Gary Wang; Simon Fraser University |
Andrew Rosenberg; Google |
Zhehuai Chen; Google |
Yu Zhang; Google |
Bhuvana Ramabhadran; Google |
Yonghui Wu; Google |
Pedro Moreno; Google |
|
SPE-P9.2: ATTENTION-BASED ASR WITH LIGHTWEIGHT AND DYNAMIC CONVOLUTIONS |
Yuya Fujita; Yahoo Japan Corporation |
Aswin Shanmugam Subramanian; Johns Hopkins University |
Motoi Omachi; Yahoo Japan Corporation |
Shinji Watanabe; Johns Hopkins University |
|
SPE-P9.3: AN ATTENTION-BASED JOINT ACOUSTIC AND TEXT ON-DEVICE END-TO-END MODEL |
Tara Sainath; Google, Inc. |
Ruoming Pang; Google, Inc. |
Ron Weiss; Google, Inc. |
Yanzhang He; Google, Inc. |
Chung-cheng Chiu; Google, Inc. |
Trevor Strohman; Google, Inc. |
|
SPE-P9.4: STRUCTURED SPARSE ATTENTION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION |
Jiabin Xue; Harbin Institute of Technology |
Tieran Zheng; Harbin Institute of Technology |
Jiqing Han; Harbin Institute of Technology |
|
SPE-P9.5: RNN-TRANSDUCER WITH STATELESS PREDICTION NETWORK |
Mohammadreza Ghodsi; Google |
Xiaofeng Liu; Google |
James Apfel; Google |
Rodrigo Cabrera; Google |
Eugene Weinstein; Google |
|
SPE-P9.6: SEQUENCE-LEVEL CONSISTENCY TRAINING FOR SEMI-SUPERVISED END-TO-END AUTOMATIC SPEECH RECOGNITION |
Ryo Masumura; NTT Corporation |
Mana Ihori; NTT Corporation |
Akihiko Takashima; NTT Corporation |
Takafumi Moriya; NTT Corporation |
Atsushi Ando; NTT Corporation |
Yusuke Shinohara; NTT Corporation |
|
SPE-P9.7: INDEPENDENT LANGUAGE MODELING ARCHITECTURE FOR END-TO-END ASR |
Van Tung Pham; Nanyang Technological University |
Haihua Xu; Nanyang Technological University |
Yerbolat Khassanov; Nazarbayev University |
Zhiping Zeng; Nanyang Technological University |
Eng Siong Chng; Nanyang Technological University |
Chongjia Ni; Alibaba Group |
Bin Ma; Alibaba Group |
Haizhou Li; National University of Singapore |
|
SPE-P9.8: SPEAKER-AWARE TRAINING OF ATTENTION-BASED END-TO-END SPEECH RECOGNITION USING NEURAL SPEAKER EMBEDDINGS |
Aku Rouhe; Aalto University |
Tuomas Kaseva; Aalto University |
Mikko Kurimo; Aalto University |
|
SPE-P9.9: GENERATING SYNTHETIC AUDIO DATA FOR ATTENTION-BASED SPEECH RECOGNITION SYSTEMS |
Nick Rossenbach; RWTH Aachen University |
Albert Zeyer; RWTH Aachen University |
Ralf Schlüter; RWTH Aachen University |
Hermann Ney; RWTH Aachen University |
|
SPE-P9.10: CORRECTION OF AUTOMATIC SPEECH RECOGNITION WITH TRANSFORMER SEQUENCE-TO-SEQUENCE MODEL |
Oleksii Hrinchuk; Moscow Institute of Physics and Technology, NVIDIA |
Mariya Popova; Carnegie Mellon University and NVIDIA |
Boris Ginsburg; NVIDIA |
|
SPE-P9.11: EXPLORING PRE-TRAINING WITH ALIGNMENTS FOR RNN TRANSDUCER BASED END-TO-END SPEECH RECOGNITION |
Hu Hu; Georgia Institute of Technology |
Rui Zhao; Microsoft |
Jinyu Li; Microsoft |
Liang Lu; Microsoft |
Yifan Gong; Microsoft |
|
SPE-P9.12: SELF-TRAINING FOR END-TO-END SPEECH RECOGNITION |
Jacob Kahn; Facebook |
Ann Lee; Facebook |
Awni Hannun; Facebook |
|