SPE-L3: End-to-end Speech Recognition II: New Models |
Session Type: Lecture |
Time: Tuesday, 5 May, 16:30 - 18:30 |
Location: On-Demand |
Virtual Session: View on Virtual Platform |
Session Chair: Tara Sainath, Google |
SPE-L3.1: JOINT PHONEME-GRAPHEME MODEL FOR END-TO-END SPEECH RECOGNITION |
Yotaro Kubo; Google |
Michiel Bacchiani; Google |
SPE-L3.2: QUARTZNET: DEEP AUTOMATIC SPEECH RECOGNITION WITH 1D TIME-CHANNEL SEPARABLE CONVOLUTIONS |
Samuel Kriman; University of Illinois at Urbana-Champaign |
Stanislav Beliaev; University of Saint Petersburg |
Boris Ginsburg; NVIDIA |
Jocelyn Huang; NVIDIA |
Oleksii Kuchaiev; NVIDIA |
Vitaly Lavrukhin; NVIDIA |
Ryan Leary; NVIDIA |
Jason Li; NVIDIA |
Yang Zhang; NVIDIA |
SPE-L3.3: END-TO-END MULTI-TALKER OVERLAPPING SPEECH RECOGNITION |
Anshuman Tripathi; Google |
Han Lu; Google |
Hasim Sak; Google |
SPE-L3.4: END-TO-END MULTI-SPEAKER SPEECH RECOGNITION WITH TRANSFORMER |
Xuankai Chang; Johns Hopkins University |
Wangyou Zhang; Shanghai Jiao Tong University |
Yanmin Qian; Shanghai Jiao Tong University |
Jonathan Le Roux; Mitsubishi Electric Research Laboratories (MERL) |
Shinji Watanabe; Johns Hopkins University |
SPE-L3.5: HYBRID AUTOREGRESSIVE TRANSDUCER (HAT) |
Ehsan Variani; Google |
David Rybach; Google |
Cyril Allauzen; Google |
Michael Riley; Google |
SPE-L3.6: LIGHTWEIGHT AND EFFICIENT END-TO-END SPEECH RECOGNITION USING LOW-RANK TRANSFORMER |
Genta Indra Winata; Hong Kong University of Science and Technology |
Samuel Cahyawijaya; Hong Kong University of Science and Technology |
Zhaojiang Lin; Hong Kong University of Science and Technology |
Zihan Liu; Hong Kong University of Science and Technology |
Pascale Fung; Hong Kong University of Science and Technology |