SLP-P23.3
PHONOLOGICAL TOKENIZER: PROSODY-AWARE PHONETIC TOKEN VIA MULTI-OBJECTIVE FINE-TUNING WITH DIFFERENTIABLE K-MEANS
Kentaro Onda, The University of Tokyo / Sony Group Corporation, Japan; Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Sony Group Corporation, Japan; Shinji Watanabe, Carnegie Mellon University, United States of America
Session:
SLP-P23: Discrete Representations for ASR, Tokenization, and Segmentation Poster
Track:
Speech and Language Processing [SL]
Location:
Poster Area 31
Presentation Time:
Wed, 6 May, 14:00 - 16:00
Session Chair:
Hagai Aronowitz, IBM Research
Presentation
Discussion
Resources
No resources available.
Session SLP-P23
SLP-P23.1: SED: STRUCTURAL ENTROPY BASED SPEECH DISCRETIZATION FOR DISCRETE TOKEN-BASED ASR
Ling Dong, Wenjun Wang, Yan Xiang, Yantuan Xian, Shengxiang Gao, Kunming University of Science and Technology, China
SLP-P23.2: TOKENCHAIN: A DISCRETE SPEECH CHAIN VIA SEMANTIC TOKEN MODELING
Mingxuan Wang, Satoshi Nakamura, The Chinese University of Hong Kong, Shenzhen, China
SLP-P23.3: PHONOLOGICAL TOKENIZER: PROSODY-AWARE PHONETIC TOKEN VIA MULTI-OBJECTIVE FINE-TUNING WITH DIFFERENTIABLE K-MEANS
Kentaro Onda, The University of Tokyo / Sony Group Corporation, Japan; Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Sony Group Corporation, Japan; Shinji Watanabe, Carnegie Mellon University, United States of America
SLP-P23.4: ADVANCED MODELING OF INTERLANGUAGE SPEECH INTELLIGIBILITY BENEFIT WITH L1-L2 MULTI-TASK LEARNING USING DIFFERENTIABLE K-MEANS FOR ACCENT-ROBUST DISCRETE TOKEN-BASED ASR
Kentaro Onda, The University of Tokyo / National Institute of Advanced Industrial Science and Technology (AIST), Japan; Satoru Fukayama, National Institute of Advanced Industrial Science and Technology (AIST), Japan; Daisuke Saito, Nobuaki Minematsu, The University of Tokyo, Japan
SLP-P23.5: FRONTEND TOKEN ENHANCEMENT FOR TOKEN-BASED SPEECH RECOGNITION
Takanori Ashihara, Shota Horiguchi, Kohei Matsuura, Tsubasa Ochiai, Marc Delcroix, NTT, Inc., Japan
SLP-P23.6: CONTENT-PRESERVING SPEECH REPRESENTATION LEARNING VIA ADAPTIVE SEGMENT-LEVEL ALIGNMENT
Ling Dong, Wenjun Wang, Zhengtao Yu, Yan Xiang, Yantuan Xian, Yuxin Huang, Kunming University of Science and Technology, China
SLP-P23.7: LEVERAGING SEGMENT-LEVEL SPEECH REPRESENTATIONS FOR LLM-BASED SPEECH RECOGNITION
Sanlong Jiang, Ling Dong, Wenjun Wang, Shengxiang Gao, Kunming University of Science and Technology, China
SLP-P23.8: EXPLORING SSL DISCRETE TOKENS FOR MULTILINGUAL AUTOMATIC SPEECH RECOGNITION
Mingyu Cui, The Chinese University of Hong Kong, Hong Kong SAR, China, Hong Kong; Mengzhe Geng, National Research Council, Canada, Canada; Yiwen Shao, Tencent, China; Jiawen Kang, Lingwei Meng, Dingdong Wang, The Chinese University of Hong Kong, Hong Kong SAR, China, China; Chenxing Li, Meng Yu, Tencent, China; Xunying Liu, The Chinese University of Hong Kong, Hong Kong SAR, China, Hong Kong
SLP-P23.9: ATOM: Adaptive Token-level Optimal Transport Mixup for Speech Translation
Jialing Wang, Yue Zhao, Minzu University of China, China; Yuhao Zhang, Haizhou Li, The Chinese University of Hong Kong, Shenzhen, China
Contacts