SLP-P34.4
Training Flow Matching Models with Reliable Labels via Self-Purification
Hyeongju Kim, Yechan Yu, June Young Yi, Juheon Lee, Supertone, Inc., Korea, Republic of
Session:
SLP-P34: Diffusion and Flow-Based Speech Synthesis Models Poster
Track:
Speech and Language Processing [SL]
Location:
Poster Area 30
Presentation Time:
Thu, 7 May, 09:00 - 11:00
Presentation
Discussion
Resources
No resources available.
Session SLP-P34
SLP-P34.1: INT-MEANFLOW: FEW-STEP SPEECH GENERATION WITH INTEGRAL VELOCITY DISTILLATION
Wei Wang, Rong Cao, Yi Guo, Zhengyang Chen, Kuan Chen, Yuanyuan Huo, ByteDance, China
SLP-P34.2: SFM-TTS: LIGHTWEIGHT AND RAPID SPEECH SYNTHESIS WITH FLEXIBLE SHORTCUT FLOW MATCHING
Jin Shi, Yan Shi, Minchuan Chen, Shaojun Wang, Jing Xiao, Ping An Technology, China
SLP-P34.3: NCF-TTS: ENHANCING FLOW MATCHING BASED TEXT-TO-SPEECH WITH NEIGHBORHOOD CONSISTENCY FLOW
Yan Shi, Jin Shi, Minchuan Chen, Ziyang Zhuang, Ping An Technology, China; Peng Qi, Shanghai Jiao Tong University Chongqing Artificial Intelligence Research Institute, China; Shaojun Wang, Jing Xiao, Ping An Technology, China
SLP-P34.4: Training Flow Matching Models with Reliable Labels via Self-Purification
Hyeongju Kim, Yechan Yu, June Young Yi, Juheon Lee, Supertone, Inc., Korea, Republic of
SLP-P34.5: MELA-TTS: JOINT TRANSFORMER-DIFFUSION MODEL WITH REPRESENTATION ALIGNMENT FOR SPEECH SYNTHESIS
Keyu An, alibaba, China; Zhiyu Zhang, Southeast University, China; Changfeng Gao, Yabin Li, zhendong peng, Haoxu Wang, Zhihao Du, Han Zhao, Zhifu Gao, Xiangang Li, alibaba, China
SLP-P34.6: ARCHI-TTS: A FLOW-MATCHING-BASED TEXT-TO-SPEECH MODEL WITH SELF-SUPERVISED SEMANTIC ALIGNER AND ACCELERATED INFERENCE
Chunyat Wu, Jiajun Deng, Zhengxi Liu, Zheqi Dai, Haolin He, Qiuqiang Kong, The Chinese University of Hong Kong, Hong Kong
SLP-P34.7: Hierarchical Discrete Flow Matching for Multi-Codebook Codec-based Text-to-Speech
Joun Yeop Lee, Heejin Choi, Min-Kyung Kim, Ji-Hyun Lee, Hoon-Young Cho, Samsung Research, Korea, Republic of
SLP-P34.8: FRAME-STACKED LOCAL TRANSFORMERS FOR EFFICIENT MULTI-CODEBOOK SPEECH GENERATION
Roy Fejgin, Paarth Neekhara, Xuesong Yang, Edresson Casanova, Ryan Langman, Jaehyeon Kim, Subhankar Ghosh, Shehzeen Hussain, Jason Li, NVIDIA, United States of America
SLP-P34.9: Direct Preference Optimization for Speech Autoregressive Diffusion Models
Zhijun Liu, The Chinese University of Hong Kong, Shenzhen, China; Dongya Jia, Xiaoqiang Wang, Chenpeng Du, Bytedance, China; Shuai Wang, Nanjing University, China; Zhuo Chen, Bytedance, United States of America; Haizhou Li, The Chinese University of Hong Kong, Shenzhen, China
SLP-P34.10: Speaker-conditioned phrase break prediction for text-to-speech with phoneme-level pre-trained language model
Dong Yang, Yuki Saito, Takaaki Saeki, The University of Tokyo, Japan; Tomoki Koriyama, CyberAgent, Inc., Japan; Wataru Nakata, Detai Xin, Hiroshi Saruwatari, The University of Tokyo, Japan
Contacts