ASR-2: Automatic Speech Recognition II |
Session Type: Poster |
Time: Monday, December 16, 10:30 - 12:00 |
Location: VHS Event Centre, Level 1 |
Session Chair: Hemant Patil, Dhirubhai Ambani Institute of Information and Communication Technology
|
|
ASR-2.1: TRAINING LANGUAGE MODELS FOR LONG-SPAN CROSS-SENTENCE EVALUATION |
Kazuki Irie, Albert Zeyer, Ralf Schlüter, Hermann Ney, RWTH Aachen University, Germany |
|
ASR-2.2: TRANSFORMER ASR WITH CONTEXTUAL BLOCK PROCESSING |
Emiru Tsunoo, Yosuke Kashiwagi, Toshiyuki Kumakura, Sony Corporation, Japan; Shinji Watanabe, Johns Hopkins University, United States |
|
ASR-2.3: A DENSITY RATIO APPROACH TO LANGUAGE MODEL FUSION IN END-TO-END AUTOMATIC SPEECH RECOGNITION |
Erik McDermott, Hasim Sak, Ehsan Variani, Google Inc, United States |
|
ASR-2.4: IMPROVING GRAPHEME-TO-PHONEME CONVERSION BY INVESTIGATING COPYING MECHANISM IN RECURRENT ARCHITECTURES |
Abhishek Niranjan, Mahaboob Ali Basha Shaik, Samsung Research and Development Institute, India |
|
ASR-2.5: A COMPARATIVE STUDY ON TRANSFORMER VS RNN IN SPEECH APPLICATIONS |
Shigeki Karita, NTT Communication Science Laboratories, Japan; Nanxin Chen, Johns Hopkins University, United States; Tomoki Hayashi, Nagoya University, Japan; Takaaki Hori, Mitsubishi Electric Research Laboratories (MERL), United States; Hirofumi Inaguma, Kyoto University, Japan; Ziyan Jiang, Johns Hopkins University, United States; Masao Someki, Nagoya University, Japan; Nelson Enrique Yalta Soplin, Waseda University, Japan; Ryuichi Yamamoto, LINE Corporation, Japan; Xiaofei Wang, Shinji Watanabe, Johns Hopkins University, United States; Takenori Yoshimura, Nagoya University, Japan; Wangyou Zhang, Shanghai Jiao Tong University, China |
|
ASR-2.6: FROM SENONES TO CHENONES: TIED CONTEXT-DEPENDENT GRAPHEMES FOR HYBRID SPEECH RECOGNITION |
Duc Le, Xiaohui Zhang, Weiyi Zheng, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer, Facebook, United States |
|
ASR-2.7: ATTENTION-BASED SPEECH RECOGNITION USING GAZE INFORMATION |
Osamu Segawa, Chubu Electric Power Co., Inc., Japan; Tomoki Hayashi, Kazuya Takeda, Nagoya University, Japan |
|
ASR-2.8: LISTENING WHILE SPEAKING AND VISUALIZING: IMPROVING ASR THROUGH MULTIMODAL CHAIN |
Johanes Effendi, Nara Institute of Science and Technology / RIKEN Center for Advanced Intelligence Project AIP, Japan; Andros Tjandra, Nara Institute of Science and Technology, Japan; Sakriani Sakti, Satoshi Nakamura, Nara Institute of Science and Technology / RIKEN Center for Advanced Intelligence Project AIP, Japan |
|
ASR-2.9: EMBEDDINGS FOR DNN SPEAKER ADAPTIVE TRAINING |
Joanna Rownicka, Peter Bell, Steve Renals, University of Edinburgh, United Kingdom |
|
ASR-2.10: LANGUAGE MODEL BOOTSTRAPPING USING NEURAL MACHINE TRANSLATION FOR CONVERSATIONAL SPEECH RECOGNITION |
Surabhi Punjabi, Harish Arsikere, Sri Garimella, Amazon, India |
|
ASR-2.11: SPEAKER AND LANGUAGE AWARE TRAINING FOR END-TO-END ASR |
Shubham Bansal, Karan Malhotra, Sriram Ganapathy, Indian Institute of Science, Bangalore, India |
|
ASR-2.12: DATA AUGMENTATION BASED ON VOWEL STRETCH FOR IMPROVING CHILDREN'S SPEECH RECOGNITION |
Tohru Nagano, Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, IBM, Japan |
|
ASR-2.13: MIXED BANDWIDTH ACOUSTIC MODELING LEVERAGING KNOWLEDGE DISTILLATION |
Takashi Fukuda, Samuel Thomas, IBM, Japan |
|
ASR-2.14: ON TEMPORAL CONTEXT INFORMATION FOR HYBRID BLSTM-BASED PHONEME RECOGNITION |
Timo Lohrenz, Maximilian Strake, Tim Fingscheidt, Technische Universität Braunschweig, Germany |
|
ASR-2.15: EXPLORING MODEL UNITS AND TRAINING STRATEGIES FOR END-TO-END SPEECH RECOGNITION |
Mingkun Huang, YiZhou Lu, Shanghai Jiao Tong University, China; Lan Wang, Chinese Academy of Sciences, China; Yanmin Qian, Kai Yu, Shanghai Jiao Tong University, China |
|
ASR-2.16: QUERY-BY-EXAMPLE ON-DEVICE KEYWORD SPOTTING |
Byeonggeun Kim, Mingu Lee, Jinkyu Lee, Yeonseok Kim, Kyuwoong Hwang, Qualcomm, Korea (South) |
|
ASR-2.17: SMALL-FOOTPRINT KEYWORD SPOTTING WITH GRAPH CONVOLUTIONAL NETWORK |
Xi Chen, Shouyi Yin, Tsinghua University, China; Dandan Song, Peng Ouyang, TsingMicro Co. Ltd., China; Leibo Liu, Shaojun Wei, Tsinghua University, China |
|
ASR-2.18: SIMPLIFIED LSTMS FOR SPEECH RECOGNITION |
George Saon, Zoltan Tuske, Kartik Audhkhasi, Brian Kingsbury, Michael Picheny, Samuel Thomas, IBM, United States |
|
ASR-2.19: GENERALIZED LARGE-CONTEXT LANGUAGE MODELS BASED ON FORWARD-BACKWARD HIERARCHICAL RECURRENT ENCODER-DECODER MODELS |
Ryo Masumura, Mana Ihori, Tomohiro Tanaka, Itsumi Saito, Kyosuke Nishida, Takanobu Oba, NTT Corporation, Japan |
|
ASR-2.20: END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM |
Chanwoo Kim, Sungsoo Kim, Kwangyoun Kim, Mehul Kumar, Jiyeon Kim, Kyungmin Lee, Changwoo Han, Abhinav Garg, Eunhyang Kim, Minkyoo Shin, Shatrughan Singh, Larry Heck, Dhananjaya Gowda, Samsung Research, Korea (South) |
|