SLP-P4: ASR - New algorithms and approaches
Wed, 17 Apr, 08:20 - 10:20 (UTC +9)
Location: Poster Zone 2A
Session Type: Poster
Session Chair: Yifan Gong, Microsoft
Track: Speech and Language Processing
Click the to view the manuscript on IEEE Xplore Open Preview
 

SLP-P4.1: Task vector algebra for ASR models

Gowtham Ramesh, University of Wisconsin - Madison, United States of America; Kartik Audhkhasi, Bhuvana Ramabhadran, Google, United States of America
 

SLP-P4.2: CIF-RNNT: Streaming ASR via Acoustic Word Embeddings with Continuous Integrate-and-Fire and RNN-Transducers

Wen Shen Teo, Yasuhiro Minami, University of Electro-Communications, Japan
 

SLP-P4.3: JOINT UNSUPERVISED AND SUPERVISED TRAINING FOR AUTOMATIC SPEECH RECOGNITION VIA BILEVEL OPTIMIZATION

A F M Saif, Rensselaer Polytechnic Institute, United States of America; Xiaodong Cui, International Business Machines Corporation, United States of America; Han Shen, Rensselaer Polytechnic Institute, United States of America; Songtao Lu, Brian Kingsbury, International Business Machines Corporation, United States of America; Tianyi Chen, Rensselaer Polytechnic Institute, United States of America
 

SLP-P4.4: HOT-FIXING WAKE WORD RECOGNITION FOR END-TO-END ASR VIA NEURAL MODEL REPROGRAMMING

Pin-Jui Ku, Georgia Tech, United States of America; I-Fan Chen, Chao-Han Huck Yang, Anirudh Raju, Pranav Dheram, Pegah Ghahremani, Brian King, Jing Liu, Roger Ren, Phani Nidadavolu, Amazon, United States of America
 

SLP-P4.5: Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition

Hainan Xu, Zhehuai Chen, Fei Jia, Boris Ginsburg, NVIDIA, United States of America
 

SLP-P4.6: TASK ORIENTED DIALOGUE AS A CATALYST FOR SELF-SUPERVISED AUTOMATIC SPEECH RECOGNITION

David Chan, UC Berkeley, United States of America; Shalini Ghosh, Hitesh Tulsiani, Ariya Rastrow, Bjorn Hoffmeister, Amazon, United States of America
 

SLP-P4.7: UNIMODAL AGGREGATION FOR CTC-BASED SPEECH RECOGNITION

Ying Fang, Zhejiang University; Westlake University, China; Xiaofei Li, Westlake University; Westlake Institute for Advanced Study, China
 

SLP-P4.8: EXPLORING SPEECH RECOGNITION, TRANSLATION, AND UNDERSTANDING WITH DISCRETE SPEECH UNITS: A COMPARATIVE STUDY

Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-weon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Carnegie Mellon University, United States of America; Yuya Fujita, Takashi Maekaku, Yahoo Japan, Japan; Pengcheng Guo, Northwestern Polytechnical University, China; Yao-Fei Cheng, University of Washington, United States of America; Pavel Denisov, University of Stuttgart, Germany; Kohei Saijo, Waseda University, Japan; Hsiu-Hsuan Wang, National Taiwan University, Taiwan
 

SLP-P4.9: KNN-CTC: ENHANCING ASR VIA RETRIEVAL OF CTC PSEUDO LABELS

Jiaming Zhou, Nankai University, China; Shiwan Zhao, Independent Researcher, China; Yaqi Liu, Beijing University of Technology, China; Wenjia Zeng, Yong Chen, Lingxi (Beijing)Technology Co., Ltd., China; Yong Qin, Nankai University, China
 

SLP-P4.10: AUGMENTING CONFORMERS WITH STRUCTURED STATE-SPACE SEQUENCE MODELS FOR ONLINE SPEECH RECOGNITION

Haozhe Shan, Harvard University, United States of America; Albert Gu, Carnegie Mellon University, United States of America; Zhong Meng, Weiran Wang, Krzysztof Choromanski, Tara Sainath, Google Inc., United States of America

SLP-P4.11: A CTC ALIGNMENT-BASED NON-AUTOREGRESSIVE TRANSFORMER FOR END-TO-END AUTOMATIC SPEECH RECOGNITION

Ruchao Fan, University of California, Los Angeles, United States of America; Wei Chu, Peng Chang, PAII Inc., United States of America; Abeer Alwan, University of California, Los Angeles, United States of America
 

SLP-P4.12: SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR

Zhiyun Fan, Linhao Dong, Jun Zhang, Lu Lu, Zejun Ma, Bytedance, China
 

SLP-P4.13: CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition

Tian-Hao Zhang, University of Science and Technology Beijing, China; Dinghao Zhou, Guiping Zhong, SenseTime Research, China; Jiaming Zhou, Nankai University, China; Baoxiang Li, SenseTime Research, China
 

SLP-P4.14: LOSS MASKING IS NOT NEEDED IN DECODER-ONLY TRANSFORMER FOR DISCRETE-TOKEN-BASED ASR

Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Yukun Ma, Hai Yu, Jiaqing Liu, Chong Zhang, Alibaba Group, China