ASR-3: Automatic Speech Recognition III |
| Session Type: Poster |
| Time: Wednesday, December 18, 10:30 - 12:00 |
| Location: VHS Event Centre, Level 1 |
| Session Chair: Rohan Kumar Das, National University of Singapore |
| ASR-3.1: SEMI-SUPERVISED TRAINING AND DATA AUGMENTATION FOR ADAPTATION OF AUTOMATIC BROADCAST NEWS CAPTIONING SYSTEMS |
| Yinghui Huang, Samuel Thomas, Masayuki Suzuki, Zoltan Tuske, Larry Sansone, Michael Picheny, IBM, United States |
| ASR-3.2: ONLINE BATCH NORMALIZATION ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION |
| Franco Mana, Felix Weninger, Roberto Gemello, Puming Zhan, NUANCE Communications, Italy |
| ASR-3.3: SPEAKER ADAPTIVE TRAINING USING MODEL AGNOSTIC META-LEARNING |
| Ondrej Klejch, Joachim Fainberg, Peter Bell, Steve Renals, University of Edinburgh, United Kingdom |
| ASR-3.4: A COMPARISON OF END-TO-END MODELS FOR LONG-FORM SPEECH RECOGNITION |
| Chung-Cheng Chiu, Wei Han, Yu Zhang, Ruoming Pang, Sergey Kishchenko, Google Inc, United States; Patrick Nguyen, Grab Technologies, United States; Hagen Soltau, Arun Narayanan, Hank Liao, Shuyuan Zhang, Anjuli Kannan, Rohit Prabhavalkar, Zhifeng Chen, Tara Sainath, Yonghui Wu, Google Inc, United States |
| ASR-3.5: ACOUSTIC MODEL ADAPTATION FROM RAW WAVEFORMS WITH SINCNET |
| Joachim Fainberg, Ondrej Klejch, Erfan Loweimi, Peter Bell, Steve Renals, University of Edinburgh, United Kingdom |
| ASR-3.6: RECURRENT NEURAL NETWORK TRANSDUCER FOR AUDIO-VISUAL SPEECH RECOGNITION |
| Takaki Makino, Hank Liao, Google Inc., United States; Yannis Assael, Brendan Shillingford, DeepMind, United Kingdom; Basilio Garcia, Otavio Braga, Olivier Siohan, Google Inc., United States |
| ASR-3.7: EXPLICIT ALIGNMENT OF TEXT AND SPEECH ENCODINGS FOR ATTENTION-BASED END-TO-END SPEECH RECOGNITION |
| Jennifer Drexler, James Glass, Massachusetts Institute of Technology, United States |
| ASR-3.8: RECOGNIZING LONG-FORM SPEECH USING STREAMING END-TO-END MODELS |
| Arun Narayanan, Rohit Prabhavalkar, Chung-Cheng Chiu, David Rybach, Tara Sainath, Trevor Strohman, Google Inc., United States |
| ASR-3.9: LEVERAGING LANGUAGE ID IN MULTILINGUAL END-TO-END SPEECH RECOGNITION |
| Austin Waters, Neeraj Gaur, Parisa Haghani, Pedro Moreno, Zhongdi Qu, Google, United States |
| ASR-3.10: STREAMING END-TO-END SPEECH RECOGNITION WITH JOINT CTC-ATTENTION BASED MODELS |
| Niko Moritz, Takaaki Hori, Jonathan Le Roux, Mitsubishi Electric Research Laboratories (MERL), United States |
| ASR-3.11: MONOTONIC RECURRENT NEURAL NETWORK TRANSDUCER AND DECODING STRATEGIES |
| Anshuman Tripathi, Han Lu, Hasim Sak, Hagen Soltau, Google, United States |
| ASR-3.12: CHARACTER-AWARE ATTENTION-BASED END-TO-END SPEECH RECOGNITION |
| Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong, Microsoft Corporation, United States |
| ASR-3.13: ATTENTION BASED ON-DEVICE STREAMING SPEECH RECOGNITION WITH LARGE SPEECH CORPUS |
| Kwangyoun Kim, Kyungmin Lee, Dhananjaya Gowda, Junmo Park, Sungsoo Kim, Sichen Jin, Young-Yoon Lee, Jinsu Yeo, Daehyun Kim, Seokyeong Jung, Jungin Lee, Myoungji Han, Chanwoo Kim, Samsung Electronics, Korea (South) |
| ASR-3.14: ZERO-SHOT CODE-SWITCHING ASR AND TTS WITH MULTILINGUAL MACHINE SPEECH CHAIN |
| Sahoko Nakayama, Nara Institute of Science and Technology / RIKEN Center for Advanced Intelligence Project AIP, Japan; Andros Tjandra, Nara Institute of Science and Technology, Japan; Sakriani Sakti, Satoshi Nakamura, Nara Institute of Science and Technology / RIKEN Center for Advanced Intelligence Project AIP, Japan |
| ASR-3.15: END-TO-END CODE-SWITCHING ASR FOR LOW-RESOURCED LANGUAGE PAIRS |
| Xianghu Yue, Beijing Institute of Technology / National University of Singapore, Singapore; Grandee Lee, Emre Yilmaz, National University of Singapore, Singapore; Fang Deng, Beijing Institute of Technology, China; Haizhou Li, National University of Singapore, Singapore |
| ASR-3.16: UNSUPERVISED ADAPTATION OF ACOUSTIC MODELS FOR ASR USING UTTERANCE-LEVEL EMBEDDINGS FROM SQUEEZE AND EXCITATION NETWORKS |
| Hardik Sailor, Salil Deena, Md Asif Jalal, Rasa Lileikyte, Thomas Hain, University of Sheffield, United Kingdom |
| ASR-3.17: POWER-LAW NONLINEARITY WITH MAXIMALLY UNIFORM DISTRIBUTION CRITERION FOR IMPROVED NEURAL NETWORK TRAINING IN AUTOMATIC SPEECH RECOGNITION |
| Chanwoo Kim, Mehul Kumar, Kwangyoun Kim, Dhananjaya Gowda, Samsung Research, Korea (South) |
| ASR-3.18: SPEECH RECOGNITION WITH AUGMENTED SYNTHESIZED SPEECH |
| Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Ye Jia, Pedro Moreno, Yonghui Wu, Zelin Wu, Google, United States |