OD-SLA-1.9

AN INVESTIGATION OF ENHANCING CTC MODEL FOR TRIGGERED ATTENTION-BASED STREAMING ASR

Huaibo Zhao, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Waseda University, Japan

Session:
Speech Recognition

Track:
Speech, Language, and Audio (SLA)

Session Time:
Wed, 15 Dec, 11:20 - 13:20 Japan Standard Time (UTC +9)
Wed, 15 Dec, 02:20 - 04:20 Coordinated Universal Time
Tue, 14 Dec, 21:20 - 23:20 Eastern Standard Time (UTC -5)
Tue, 14 Dec, 18:20 - 20:20 Pacific Standard Time (UTC -8)

Session Chair:
Eng Siong Chng, Nanyang Technological University
Presentation
Not logged in.
Discussion
Not logged in.
Resources
Not logged in.
Session OD-SLA-1
WE1.OD-A.1: On the Use of Speaker Information for Automatic Speech Recognition in Speaker-imbalanced Corpora
Kak Soky, Masato Mimura, Chenhui Chu, Tatsuya Kawahara, Kyoto University, Japan; Sheng Li, National Institute of Information and Communications Technology, Japan
WE1.OD-A.2: SPECTROGRAMS FUSION-BASED END-TO-END ROBUST AUTOMATIC SPEECH RECOGNITION
Hao Shi, Tatsuya Kawahara, Graduate School of Informatics, Kyoto University, Japan; Longbiao Wang, Tianjin University, China; Sheng Li, National Institute of Information and Communications Technology (NICT), Japan; Cunhang Fan, Anhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, China; Jianwu Dang, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
WE1.OD-A.3: Conformer-based End-to-end Speech Recognition With Rotary Position Embedding
Shengqiang Li, Menglong Xu, Xiao-Lei Zhang, Northwestern Polytechnical University, China
WE1.OD-A.4: Efficient conformer-based speech recognition with linear attention
Shengqiang Li, Menglong Xu, Xiao-Lei Zhang, Northwestern Polytechnical University, China
WE1.OD-A.5: One In A Hundred: Selecting the Best Predicted Sequence from Numerous Candidates for Speech Recognition
Zhengkun Tian, Jianhua Tao, Shuai Zhang, Zhengqi Wen, Institute of Automation, Chinese Academy of Sciences, China; Jiangyan Yi, Ye Bai, 1,2, China
WE1.OD-A.6: LARGE-CONTEXT AUTOMATIC SPEECH RECOGNITION BASED ON RNN TRANSDUCER
Atsushi Kojima, Advanced Media, Inc., Japan
WE1.OD-A.7: AN END-TO-END MODEL FROM SPEECH TO CLEAN TRANSCRIPT FOR PARLIAMENTARY MEETINGS
Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara, Kyoto University, Japan
WE1.OD-A.8: DATA AUGMENTATION BASED ON FREQUENCY WARPING FOR RECOGNITION OF CLEFT PALATE SPEECH
Kento Fujiwara, Ryoichi Takashima, Tetsuya Takiguchi, Graduate School of System Informatics, Kobe University, Japan; Chihiro Sugiyama, Nobukazu Tanaka, Kanji Nohara, Kazunori Nozaki, Graduate School of Dentistry, Osaka University, Japan
WE1.OD-A.9: AN INVESTIGATION OF ENHANCING CTC MODEL FOR TRIGGERED ATTENTION-BASED STREAMING ASR
Huaibo Zhao, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Waseda University, Japan
WE1.OD-A.10: Significance of Data Augmentation for Improving Cleft Lip and Palate Speech Recognition
Protima Nomo Sudro, Rohit Sinha, Indian Institute of Technology Guwahati, India; Rohan Kumar Das, Fortemedia Singapore, Singapore; S R Mahadeva Prasanna, Indian Institute of Technology Dharwad, India
WE1.OD-A.11: Teager Energy Subband Filtered Features for Near and Far-Field Automatic Speech Recognition
Madhu Kamble, EURECOM, France, France; Shekhar Nayak, M. Ali Basha Shaik, Shakti P. Rath, Vikram Vij, SRI-B, India; Hemant Patil, DA-IICT, Gandhinagar, Gujarat, India
WE1.OD-A.12: MULTITASK-BASED JOINT LEARNING APPROACH TO ROBUST ASR FOR RADIO COMMUNICATION SPEECH
Duo Ma, National University of Singapore, Singapore; Nana Hou, Van Tung Pham, Haihua Xu, Eng Siong Chng, Nanyang Technological University, Singapore
WE1.OD-A.13: ADVANCED LANGUAGE MODEL FUSION METHOD FOR ENCODER-DECODER MODEL IN JAPANESE SPEECH RECOGNITION
Daiki Mori, Norihide Kitaoka, Toyohashi University of Technology, Japan; Kengo Ohta, Anan National College of Technology, Japan; Ryota Nishimura, Tokushima University, Japan; Atsunori Ogawa, Nippon Telegraph and Telephone Corporation, Japan
WE1.OD-A.14: CSTD-Telugu Corpus: Crowd-Sourced Approach for Large-Scale Speech data collection
Ganesh S Mirishkar, Vishnu Vidyadhara Raju V, Prakash Yalla, Anil Kumar Vuppala, IIIT Hyderabad, India; Meher Dinesh Naroju, Pacteraedge, India; Sudhamay Maity, Ozonetel, India
WE1.OD-A.15: AN EMPIRICAL STUDY ON TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION WITH NOVEL DECODER MASKING
Shi-Yan Weng, Berlin Chen, National Taiwan Normal University, Taiwan; Hsuan-Sheng Chiu, Chunghwa Telecom Laboratories, Taiwan