SLP-P33.4
FinHuBERT: Hierarchical Feature Imitating Networks for Low-Resource Speech Recognition
Kavan Fatehi, University of York, United Kingdom of Great Britain and Northern Ireland; Amir Shirian, EmergeSound.AI, United Kingdom of Great Britain and Northern Ireland; Erfan Loweimi, Cisco, United Kingdom of Great Britain and Northern Ireland
Session:
SLP-P33: Streaming and Low-Resource ASR, and Data Approaches Poster
Track:
Speech and Language Processing [SL]
Location:
Poster Area 29
Presentation Time:
Thu, 7 May, 09:00 - 11:00
Session Chair:
Naohiro Tawara, NTT
Presentation
Discussion
Resources
No resources available.
Session SLP-P33
SLP-P33.1: ONLINE REGISTER FOR DUAL-MODE SELF-SUPERVISED SPEECH MODELS: MITIGATING THE LACK OF FUTURE CONTEXT
Keita Goto, Takashi Maekaku, Jin Sakuma, LY Corporation, Japan; Jinchuan Tian, Carnegie Mellon University, United States of America; Yusuke Shinohara, LY Corporation, Japan; Shinji Watanabe, Carnegie Mellon University, United States of America
SLP-P33.2: CHUNK-WISE ATTENTION TRANSDUCERS FOR FAST AND ACCURATE STREAMING SPEECH-TO-TEXT
Hainan Xu, Vladimir Bataev, Travis Bartley, Jagadeesh Balam, NVIDIA, United States of America
SLP-P33.3: CHUNKWISE ALIGNERS FOR STREAMING SPEECH RECOGNITION
Wen Shen Teo, University of Electro-Communications, Japan; Takafumi Moriya, Masato Mimura, NTT, Inc., Japan
SLP-P33.4: FinHuBERT: Hierarchical Feature Imitating Networks for Low-Resource Speech Recognition
Kavan Fatehi, University of York, United Kingdom of Great Britain and Northern Ireland; Amir Shirian, EmergeSound.AI, United Kingdom of Great Britain and Northern Ireland; Erfan Loweimi, Cisco, United Kingdom of Great Britain and Northern Ireland
SLP-P33.5: Learning to Align with Unbalanced Optimal Transport in Linguistic Knowledge Transfer for ASR
Xugang Lu, peng Shen, Hisashi Kawai, National Institute of Information and Communications Technology, Japan
SLP-P33.6: HOW FAR DO SSL SPEECH MODELS LISTEN FOR TONE? TEMPORAL FOCUS OF TONE REPRESENTATION UNDER LOW-RESOURCE TRANSFER
Minu Kim, Ji Sub Um, Hoirin Kim, KAIST, Korea, Republic of
SLP-P33.7: UMA-SPLIT: UNIMODAL AGGREGATION FOR BOTH ENGLISH AND MANDARIN NON-AUTOREGRESSIVE SPEECH RECOGNITION
Ying Fang, Zhejiang University, Hangzhou, China; 2 Westlake University & Westlake Institute for Advanced Study, Hangzhou, China, China; Xiaofei Li, Westlake University & Westlake Institute for Advanced Study, Hangzhou, China, China
SLP-P33.8: Dual Data Scaling for Robust Two-Stage User-Defined Keyword Spotting
Zhiqi Ai, Han Cheng, Yuxin Wang, Shiyi Mu, Yongjin Zhou, Shanghai University, China; Shugong Xu, Xi’an Jiaotong Liverpool University, China
SLP-P33.9: MNV-17: A High-Quality Performative Mandarin Dataset for Nonverbal Vocalization Recognition in Speech
Jialong Mai, South China University of Technology, China; Jinxin Ji, The Hong Kong Polytechnic University, China; Xiaofen Xing, South China University of Technology, China; Chen Yang, Shanghai Jiaotong University, China; Weidong Chen, The Chinese University of Hong Kong, China; Jingyuan Xing, Xiangmin Xu, South China University of Technology, China
SLP-P33.10: Uncertainty-Based Streaming ASR with Evidential Deep Learning
Hiroaki Sato, Tetsuji Ogawa, Asahi Sakuma, Ryuga Sugano, Tadashi Kumano, Yoshihiko Kawai,
Contacts