ASR-2: Automatic Speech Recognition II |
| Session Type: Poster |
| Time: Monday, December 16, 10:30 - 12:00 |
| Location: VHS Event Centre, Level 1 |
| Session Chair: Hemant Patil, Dhirubhai Ambani Institute of Information and Communication Technology |
| ASR-2.1: TRAINING LANGUAGE MODELS FOR LONG-SPAN CROSS-SENTENCE EVALUATION |
| Kazuki Irie, Albert Zeyer, Ralf Schlüter, Hermann Ney, RWTH Aachen University, Germany |
| ASR-2.2: TRANSFORMER ASR WITH CONTEXTUAL BLOCK PROCESSING |
| Emiru Tsunoo, Yosuke Kashiwagi, Toshiyuki Kumakura, Sony Corporation, Japan; Shinji Watanabe, Johns Hopkins University, United States |
| ASR-2.3: A DENSITY RATIO APPROACH TO LANGUAGE MODEL FUSION IN END-TO-END AUTOMATIC SPEECH RECOGNITION |
| Erik McDermott, Hasim Sak, Ehsan Variani, Google Inc, United States |
| ASR-2.4: IMPROVING GRAPHEME-TO-PHONEME CONVERSION BY INVESTIGATING COPYING MECHANISM IN RECURRENT ARCHITECTURES |
| Abhishek Niranjan, Mahaboob Ali Basha Shaik, Samsung Research and Development Institute, India |
| ASR-2.5: A COMPARATIVE STUDY ON TRANSFORMER VS RNN IN SPEECH APPLICATIONS |
| Shigeki Karita, NTT Communication Science Laboratories, Japan; Nanxin Chen, Johns Hopkins University, United States; Tomoki Hayashi, Nagoya University, Japan; Takaaki Hori, Mitsubishi Electric Research Laboratories (MERL), United States; Hirofumi Inaguma, Kyoto University, Japan; Ziyan Jiang, Johns Hopkins University, United States; Masao Someki, Nagoya University, Japan; Nelson Enrique Yalta Soplin, Waseda University, Japan; Ryuichi Yamamoto, LINE Corporation, Japan; Xiaofei Wang, Shinji Watanabe, Johns Hopkins University, United States; Takenori Yoshimura, Nagoya University, Japan; Wangyou Zhang, Shanghai Jiao Tong University, China |
| ASR-2.6: FROM SENONES TO CHENONES: TIED CONTEXT-DEPENDENT GRAPHEMES FOR HYBRID SPEECH RECOGNITION |
| Duc Le, Xiaohui Zhang, Weiyi Zheng, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer, Facebook, United States |
| ASR-2.7: ATTENTION-BASED SPEECH RECOGNITION USING GAZE INFORMATION |
| Osamu Segawa, Chubu Electric Power Co., Inc., Japan; Tomoki Hayashi, Kazuya Takeda, Nagoya University, Japan |
| ASR-2.8: LISTENING WHILE SPEAKING AND VISUALIZING: IMPROVING ASR THROUGH MULTIMODAL CHAIN |
| Johanes Effendi, Nara Institute of Science and Technology / RIKEN Center for Advanced Intelligence Project AIP, Japan; Andros Tjandra, Nara Institute of Science and Technology, Japan; Sakriani Sakti, Satoshi Nakamura, Nara Institute of Science and Technology / RIKEN Center for Advanced Intelligence Project AIP, Japan |
| ASR-2.9: EMBEDDINGS FOR DNN SPEAKER ADAPTIVE TRAINING |
| Joanna Rownicka, Peter Bell, Steve Renals, University of Edinburgh, United Kingdom |
| ASR-2.10: LANGUAGE MODEL BOOTSTRAPPING USING NEURAL MACHINE TRANSLATION FOR CONVERSATIONAL SPEECH RECOGNITION |
| Surabhi Punjabi, Harish Arsikere, Sri Garimella, Amazon, India |
| ASR-2.11: SPEAKER AND LANGUAGE AWARE TRAINING FOR END-TO-END ASR |
| Shubham Bansal, Karan Malhotra, Sriram Ganapathy, Indian Institute of Science, Bangalore, India |
| ASR-2.12: DATA AUGMENTATION BASED ON VOWEL STRETCH FOR IMPROVING CHILDREN'S SPEECH RECOGNITION |
| Tohru Nagano, Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, IBM, Japan |
| ASR-2.13: MIXED BANDWIDTH ACOUSTIC MODELING LEVERAGING KNOWLEDGE DISTILLATION |
| Takashi Fukuda, Samuel Thomas, IBM, Japan |
| ASR-2.14: ON TEMPORAL CONTEXT INFORMATION FOR HYBRID BLSTM-BASED PHONEME RECOGNITION |
| Timo Lohrenz, Maximilian Strake, Tim Fingscheidt, Technische Universität Braunschweig, Germany |
| ASR-2.15: EXPLORING MODEL UNITS AND TRAINING STRATEGIES FOR END-TO-END SPEECH RECOGNITION |
| Mingkun Huang, YiZhou Lu, Shanghai Jiao Tong University, China; Lan Wang, Chinese Academy of Sciences, China; Yanmin Qian, Kai Yu, Shanghai Jiao Tong University, China |
| ASR-2.16: QUERY-BY-EXAMPLE ON-DEVICE KEYWORD SPOTTING |
| Byeonggeun Kim, Mingu Lee, Jinkyu Lee, Yeonseok Kim, Kyuwoong Hwang, Qualcomm, Korea (South) |
| ASR-2.17: SMALL-FOOTPRINT KEYWORD SPOTTING WITH GRAPH CONVOLUTIONAL NETWORK |
| Xi Chen, Shouyi Yin, Tsinghua University, China; Dandan Song, Peng Ouyang, TsingMicro Co. Ltd., China; Leibo Liu, Shaojun Wei, Tsinghua University, China |
| ASR-2.18: SIMPLIFIED LSTMS FOR SPEECH RECOGNITION |
| George Saon, Zoltan Tuske, Kartik Audhkhasi, Brian Kingsbury, Michael Picheny, Samuel Thomas, IBM, United States |
| ASR-2.19: GENERALIZED LARGE-CONTEXT LANGUAGE MODELS BASED ON FORWARD-BACKWARD HIERARCHICAL RECURRENT ENCODER-DECODER MODELS |
| Ryo Masumura, Mana Ihori, Tomohiro Tanaka, Itsumi Saito, Kyosuke Nishida, Takanobu Oba, NTT Corporation, Japan |
| ASR-2.20: END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM |
| Chanwoo Kim, Sungsoo Kim, Kwangyoun Kim, Mehul Kumar, Jiyeon Kim, Kyungmin Lee, Changwoo Han, Abhinav Garg, Eunhyang Kim, Minkyoo Shin, Shatrughan Singh, Larry Heck, Dhananjaya Gowda, Samsung Research, Korea (South) |