SLP-P4.14

LOSS MASKING IS NOT NEEDED IN DECODER-ONLY TRANSFORMER FOR DISCRETE-TOKEN-BASED ASR

Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Yukun Ma, Hai Yu, Jiaqing Liu, Chong Zhang, Alibaba Group, China

Session:
SLP-P4: ASR - New algorithms and approaches Poster

Track:
Speech and Language Processing

Location:
Poster Zone 2A
Poster Board PZ-2A.14

Presentation Time:
Wed, 17 Apr, 08:20 - 10:20 (UTC +9)

Session Chair:
Yifan Gong, Microsoft
View Manuscript
Presentation
Discussion
Resources
Session SLP-P4
SLP-P4.1: Task vector algebra for ASR models
Gowtham Ramesh, University of Wisconsin - Madison, United States of America; Kartik Audhkhasi, Bhuvana Ramabhadran, Google, United States of America
SLP-P4.2: CIF-RNNT: Streaming ASR via Acoustic Word Embeddings with Continuous Integrate-and-Fire and RNN-Transducers
Wen Shen Teo, Yasuhiro Minami, University of Electro-Communications, Japan
SLP-P4.3: JOINT UNSUPERVISED AND SUPERVISED TRAINING FOR AUTOMATIC SPEECH RECOGNITION VIA BILEVEL OPTIMIZATION
A F M Saif, Rensselaer Polytechnic Institute, United States of America; Xiaodong Cui, International Business Machines Corporation, United States of America; Han Shen, Rensselaer Polytechnic Institute, United States of America; Songtao Lu, Brian Kingsbury, International Business Machines Corporation, United States of America; Tianyi Chen, Rensselaer Polytechnic Institute, United States of America
SLP-P4.4: HOT-FIXING WAKE WORD RECOGNITION FOR END-TO-END ASR VIA NEURAL MODEL REPROGRAMMING
Pin-Jui Ku, Georgia Tech, United States of America; I-Fan Chen, Chao-Han Huck Yang, Anirudh Raju, Pranav Dheram, Pegah Ghahremani, Brian King, Jing Liu, Roger Ren, Phani Nidadavolu, Amazon, United States of America
SLP-P4.5: Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
Hainan Xu, Zhehuai Chen, Fei Jia, Boris Ginsburg, NVIDIA, United States of America
SLP-P4.6: TASK ORIENTED DIALOGUE AS A CATALYST FOR SELF-SUPERVISED AUTOMATIC SPEECH RECOGNITION
David Chan, UC Berkeley, United States of America; Shalini Ghosh, Hitesh Tulsiani, Ariya Rastrow, Bjorn Hoffmeister, Amazon, United States of America
SLP-P4.7: UNIMODAL AGGREGATION FOR CTC-BASED SPEECH RECOGNITION
Ying Fang, Zhejiang University; Westlake University, China; Xiaofei Li, Westlake University; Westlake Institute for Advanced Study, China
SLP-P4.8: EXPLORING SPEECH RECOGNITION, TRANSLATION, AND UNDERSTANDING WITH DISCRETE SPEECH UNITS: A COMPARATIVE STUDY
Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-weon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Carnegie Mellon University, United States of America; Yuya Fujita, Takashi Maekaku, Yahoo Japan, Japan; Pengcheng Guo, Northwestern Polytechnical University, China; Yao-Fei Cheng, University of Washington, United States of America; Pavel Denisov, University of Stuttgart, Germany; Kohei Saijo, Waseda University, Japan; Hsiu-Hsuan Wang, National Taiwan University, Taiwan
SLP-P4.9: KNN-CTC: ENHANCING ASR VIA RETRIEVAL OF CTC PSEUDO LABELS
Jiaming Zhou, Nankai University, China; Shiwan Zhao, Independent Researcher, China; Yaqi Liu, Beijing University of Technology, China; Wenjia Zeng, Yong Chen, Lingxi (Beijing)Technology Co., Ltd., China; Yong Qin, Nankai University, China
SLP-P4.10: AUGMENTING CONFORMERS WITH STRUCTURED STATE-SPACE SEQUENCE MODELS FOR ONLINE SPEECH RECOGNITION
Haozhe Shan, Harvard University, United States of America; Albert Gu, Carnegie Mellon University, United States of America; Zhong Meng, Weiran Wang, Krzysztof Choromanski, Tara Sainath, Google Inc., United States of America
SLP-P4.11: A CTC ALIGNMENT-BASED NON-AUTOREGRESSIVE TRANSFORMER FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
Ruchao Fan, University of California, Los Angeles, United States of America; Wei Chu, Peng Chang, PAII Inc., United States of America; Abeer Alwan, University of California, Los Angeles, United States of America
SLP-P4.12: SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
Zhiyun Fan, Linhao Dong, Jun Zhang, Lu Lu, Zejun Ma, Bytedance, China
SLP-P4.13: CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
Tian-Hao Zhang, University of Science and Technology Beijing, China; Dinghao Zhou, Guiping Zhong, SenseTime Research, China; Jiaming Zhou, Nankai University, China; Baoxiang Li, SenseTime Research, China
SLP-P4.14: LOSS MASKING IS NOT NEEDED IN DECODER-ONLY TRANSFORMER FOR DISCRETE-TOKEN-BASED ASR
Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Yukun Ma, Hai Yu, Jiaqing Liu, Chong Zhang, Alibaba Group, China
Contacts