ASR-3: Automatic Speech Recognition III |
Session Type: Poster |
Time: Wednesday, December 18, 10:30 - 12:00 |
Location: VHS Event Centre, Level 1 |
Session Chair: Rohan Kumar Das, National University of Singapore
|
|
ASR-3.1: SEMI-SUPERVISED TRAINING AND DATA AUGMENTATION FOR ADAPTATION OF AUTOMATIC BROADCAST NEWS CAPTIONING SYSTEMS |
Yinghui Huang, Samuel Thomas, Masayuki Suzuki, Zoltan Tuske, Larry Sansone, Michael Picheny, IBM, United States |
|
ASR-3.2: ONLINE BATCH NORMALIZATION ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION |
Franco Mana, Felix Weninger, Roberto Gemello, Puming Zhan, NUANCE Communications, Italy |
|
ASR-3.3: SPEAKER ADAPTIVE TRAINING USING MODEL AGNOSTIC META-LEARNING |
Ondrej Klejch, Joachim Fainberg, Peter Bell, Steve Renals, University of Edinburgh, United Kingdom |
|
ASR-3.4: A COMPARISON OF END-TO-END MODELS FOR LONG-FORM SPEECH RECOGNITION |
Chung-Cheng Chiu, Wei Han, Yu Zhang, Ruoming Pang, Sergey Kishchenko, Google Inc, United States; Patrick Nguyen, Grab Technologies, United States; Hagen Soltau, Arun Narayanan, Hank Liao, Shuyuan Zhang, Anjuli Kannan, Rohit Prabhavalkar, Zhifeng Chen, Tara Sainath, Yonghui Wu, Google Inc, United States |
|
ASR-3.5: ACOUSTIC MODEL ADAPTATION FROM RAW WAVEFORMS WITH SINCNET |
Joachim Fainberg, Ondrej Klejch, Erfan Loweimi, Peter Bell, Steve Renals, University of Edinburgh, United Kingdom |
|
ASR-3.6: RECURRENT NEURAL NETWORK TRANSDUCER FOR AUDIO-VISUAL SPEECH RECOGNITION |
Takaki Makino, Hank Liao, Google Inc., United States; Yannis Assael, Brendan Shillingford, DeepMind, United Kingdom; Basilio Garcia, Otavio Braga, Olivier Siohan, Google Inc., United States |
|
ASR-3.7: EXPLICIT ALIGNMENT OF TEXT AND SPEECH ENCODINGS FOR ATTENTION-BASED END-TO-END SPEECH RECOGNITION |
Jennifer Drexler, James Glass, Massachusetts Institute of Technology, United States |
|
ASR-3.8: RECOGNIZING LONG-FORM SPEECH USING STREAMING END-TO-END MODELS |
Arun Narayanan, Rohit Prabhavalkar, Chung-Cheng Chiu, David Rybach, Tara Sainath, Trevor Strohman, Google Inc., United States |
|
ASR-3.9: LEVERAGING LANGUAGE ID IN MULTILINGUAL END-TO-END SPEECH RECOGNITION |
Austin Waters, Neeraj Gaur, Parisa Haghani, Pedro Moreno, Zhongdi Qu, Google, United States |
|
ASR-3.10: STREAMING END-TO-END SPEECH RECOGNITION WITH JOINT CTC-ATTENTION BASED MODELS |
Niko Moritz, Takaaki Hori, Jonathan Le Roux, Mitsubishi Electric Research Laboratories (MERL), United States |
|
ASR-3.11: MONOTONIC RECURRENT NEURAL NETWORK TRANSDUCER AND DECODING STRATEGIES |
Anshuman Tripathi, Han Lu, Hasim Sak, Hagen Soltau, Google, United States |
|
ASR-3.12: CHARACTER-AWARE ATTENTION-BASED END-TO-END SPEECH RECOGNITION |
Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong, Microsoft Corporation, United States |
|
ASR-3.13: ATTENTION BASED ON-DEVICE STREAMING SPEECH RECOGNITION WITH LARGE SPEECH CORPUS |
Kwangyoun Kim, Kyungmin Lee, Dhananjaya Gowda, Junmo Park, Sungsoo Kim, Sichen Jin, Young-Yoon Lee, Jinsu Yeo, Daehyun Kim, Seokyeong Jung, Jungin Lee, Myoungji Han, Chanwoo Kim, Samsung Electronics, Korea (South) |
|
ASR-3.14: ZERO-SHOT CODE-SWITCHING ASR AND TTS WITH MULTILINGUAL MACHINE SPEECH CHAIN |
Sahoko Nakayama, Nara Institute of Science and Technology / RIKEN Center for Advanced Intelligence Project AIP, Japan; Andros Tjandra, Nara Institute of Science and Technology, Japan; Sakriani Sakti, Satoshi Nakamura, Nara Institute of Science and Technology / RIKEN Center for Advanced Intelligence Project AIP, Japan |
|
ASR-3.15: END-TO-END CODE-SWITCHING ASR FOR LOW-RESOURCED LANGUAGE PAIRS |
Xianghu Yue, Beijing Institute of Technology / National University of Singapore, Singapore; Grandee Lee, Emre Yilmaz, National University of Singapore, Singapore; Fang Deng, Beijing Institute of Technology, China; Haizhou Li, National University of Singapore, Singapore |
|
ASR-3.16: UNSUPERVISED ADAPTATION OF ACOUSTIC MODELS FOR ASR USING UTTERANCE-LEVEL EMBEDDINGS FROM SQUEEZE AND EXCITATION NETWORKS |
Hardik Sailor, Salil Deena, Md Asif Jalal, Rasa Lileikyte, Thomas Hain, University of Sheffield, United Kingdom |
|
ASR-3.17: POWER-LAW NONLINEARITY WITH MAXIMALLY UNIFORM DISTRIBUTION CRITERION FOR IMPROVED NEURAL NETWORK TRAINING IN AUTOMATIC SPEECH RECOGNITION |
Chanwoo Kim, Mehul Kumar, Kwangyoun Kim, Dhananjaya Gowda, Samsung Research, Korea (South) |
|
ASR-3.18: SPEECH RECOGNITION WITH AUGMENTED SYNTHESIZED SPEECH |
Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Ye Jia, Pedro Moreno, Yonghui Wu, Zelin Wu, Google, United States |
|