SLP-P13.5

Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis

Qingyu Liu, Johns Hopkins University, United States of America; Yushen Chen, Zhikang Niu, Shanghai Jiao Tong University, China; Chunhui Wang, Yunting Yang, Bowen Zhang, Jian Zhao, Pengcheng Zhu, Geely, China; Kai Yu, Xie Chen, Shanghai Jiao Tong University, China

Session:
SLP-P13: Voice Conversion and Controllable Speech Generation Poster

Track:
Speech and Language Processing [SL]

Location:
Poster Area 27

Presentation Time:
Wed, 6 May, 09:00 - 11:00

Presentation
Discussion
Resources
No resources available.
Session SLP-P13
SLP-P13.1: QE-XVC: ZERO-SHOT CROSS-LINGUAL VOICE CONVERSION VIA QUERY-ENHANCEMENT AND CONDITIONAL FLOW MATCHING
Han-Jie Guo, Hui-Peng Du, Shi-Ming Wang, Xiao-Hang Jiang, University of Science and Technology of China, China; Ying-Ying Gao, Shi-Lei Zhang, China Mobile, China; Zhen-Hua Ling, University of Science and Technology of China, China
SLP-P13.2: MEANVC: LIGHTWEIGHT AND STREAMING ZERO-SHOT VOICE CONVERSION VIA MEAN FLOWS
Guobin Ma, Jixun Yao, Ziqian Ning, Yuepeng Jiang, Northwestern Polytechnical University, China; Lingxin Xiong, Geely Automobile Research Institute (Ningbo) Company Ltd, China; Lei Xie, Northwestern Polytechnical University, China; Pengcheng Zhu, Geely Automobile Research Institute (Ningbo) Company Ltd, China
SLP-P13.3: MEANVOICEFLOW: ONE-STEP NONPARALLEL VOICE CONVERSION WITH MEAN FLOWS
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo, NTT, Inc., Japan
SLP-P13.4: MASKVCT: MASKED VOICE CODEC TRANSFORMER FOR ZERO-SHOT VOICE CONVERSION WITH INCREASED CONTROLLABILITY VIA MULTIPLE GUIDANCES
Junhyeok Lee, Helin Wang, Yaohan Guan, Thomas Thebaud, Laureano Moro-Velazquez, Jesús Villalba, Najim Dehak, Johns Hopkins University, United States of America
SLP-P13.5: Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis
Qingyu Liu, Johns Hopkins University, United States of America; Yushen Chen, Zhikang Niu, Shanghai Jiao Tong University, China; Chunhui Wang, Yunting Yang, Bowen Zhang, Jian Zhao, Pengcheng Zhu, Geely, China; Kai Yu, Xie Chen, Shanghai Jiao Tong University, China
SLP-P13.6: EXPRESSIVE VOICE CONVERSION WITH CONTROLLABLE EMOTIONAL INTENSITY
Nannan Teng, Ying Hu, Xinjiang University, China; Zhijian Ou, Tsinghua University, China; Sheng Li, Institute of Science Tokyo, Japan
SLP-P13.7: PRETRAINING AND FINE-TUNING TECHNIQUES FOR ELECTROLARYNGEAL SPEECH ENHANCEMENT BASED ON SEQUENCE-TO-SEQUENCE VOICE CONVERSION
Ding Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, Tomoki Toda, Nagoya University, Japan
SLP-P13.8: CosyAccent: Duration-Controllable Accent Normalization Using Source-Synthesis Training Data
Qibing Bai, Shuhao Shi, The Chinese University of Hong Kong, Shenzhen, China; Shuai Wang, Nanjing University, China; Yukai Ju, Yannan Wang, Tencent, China; Haizhou Li, The Chinese University of Hong Kong, Shenzhen, China
SLP-P13.9: LIGHTWEIGHT AND PERCEPTUALLY-GUIDED VOICE CONVERSION FOR ELECTRO-LARYNGEAL SPEECH
Benedikt Mayrhofer, Franz Pernkopf, Graz University of Technology, Austria; Philipp Aichinger, Medical University of Vienaa, Austria; Martin Hagmüller, Graz University of Technology, Austria
SLP-P13.10: MIND YOUR [m]S, CROSS YOUR [t]S: A LARGE-SCALE PHONETIC ANALYSIS OF SPEECH REPRODUCTION IN MODERN SPEECH GENERATORS
Boo Fullwood, Fabian Monrose, Georgia Institute of Technology, United States of America
Contacts