SLP-P18.10
TTSOPS: A CLOSED-LOOP CORPUS OPTIMIZATION FRAMEWORK FOR TRAINING MULTI-SPEAKER TTS MODELS FROM DARK DATA
Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari, The University of Tokyoe/Japan, Japan
Session:
SLP-P18: Speaker Modeling, Multilinguality, and Speech Resources in TTS Poster
Track:
Speech and Language Processing [SL]
Location:
Poster Area 32
Presentation Time:
Wed, 6 May, 09:00 - 11:00
Presentation
Discussion
Resources
No resources available.
Session SLP-P18
SLP-P18.1: Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling
Huan Liao, Qinke Ni, Yuancheng Wang, Yiheng Lu, The Chinese University of Hong Kong (Shenzhen), China; Haoyue Zhan, Pengyuan Xie, Qiang Zhang, Guangzhou Quwan Network Technology, China; Zhizheng Wu, The Chinese University of Hong Kong (Shenzhen), China
SLP-P18.2: BRIDGECODE: A DUAL SPEECH REPRESENTATION PARADIGM FOR AUTOREGRESSIVE ZERO-SHOT TEXT-TO-SPEECH SYNTHESIS
Jingyuan Xing, Mingru Yang, Zhipeng Li, Xiaofen Xing, South China University of Technology, China; Xiangmin Xu, Foshan University, China
SLP-P18.3: Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs
Xinlu He, Worcester Polytechnic Institute, United States of America; Swayambhu Nath Ray, Harish Mallidi, Jia-Hong Huang, Ashwin Bellur, Chander Chandak, M. Maruf, Venkatesh Ravichandran, Amazon, United States of America
SLP-P18.4: Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis
Thanathai Lertpetchpun, Yoonjeong Lee, Thanapat Trachu, Jihwan Lee, Tiantian Feng, Dani Byrd, Shrikanth Narayanan, University of Southern California, United States of America
SLP-P18.5: TMD-TTS: A UNIFIED TIBETAN MULTI-DIALECT TEXT-TO-SPEECH FRAMEWORK FOR U-TSANG, AMDO AND KHAM SPEECH DATASET GENERATION
Yutong Liu, Ziyue Zhang, BAN Ma-bao, University of Electronic Science and Technology of China, China; Renzeng Duojie, Tibet University, China; Yuqing Cai, Yongbin Yu, Xiangxiang Wang, Fan Gao, University of Electronic Science and Technology of China, China; Cheng Huang, Southern Methodist University, United States of America; Nyima Tashi, Tibet University, China
SLP-P18.6: PFLUXTTS: HYBRID FLOW-MATCHING TTS WITH ROBUST CROSS-LINGUAL VOICE CLONING AND INFERENCE-TIME MODEL FUSION
Vikentii Pankov, Artem Gribul, Oktai Tatanov, Vladislav Proskurov, Rask AI, Portugal; Yuliya Korotkova, École Polytechnique, France; Darima Mylzenova, TBC bank, Uzbekistan; Dmitrii Vypirailenko, Rask AI, Spain
SLP-P18.7: Sidon: Fast and Robust Open-Source Multilingual Speech Restoration for Large-scale Dataset Cleansing
Wataru Nakata, Yuki Saito, Yota Ueda, Hiroshi Saruwatari, The University of Tokyo, Japan
SLP-P18.8: DEEP DUBBING: END-TO-END AUTO-AUDIOBOOK SYSTEM WITH TEXT-TO-TIMBRE AND CONTEXT-AWARE INSTRUCT-TTS
Ziqi Dai, Beijing University of Civil Engineering and Architecture, China; Yiting Chen, Jiacheng Xu, Liufei Xie, Yuchen Wang, Zhenchuan Yang, Tencent Music Entertainment Lyra Lab, China; Bingsong Bai, Beijing University of Posts and Telecommunications, China; Yangsheng Gao, Wenjiang Zhou, Weifeng Zhao, Tencent Music Entertainment Lyra Lab, China; Ruohua Zhou, Beijing University of Civil Engineering and Architecture, China
SLP-P18.9: ERASING YOUR VOICE BEFORE IT’S HEARD: TRAINING-FREE SPEAKER UNLEARNING FOR ZERO-SHOT TEXT-TO-SPEECH
Myungjin Lee, Eunji Shin, Jiyoung Lee, Ewha Womans University, Korea, Republic of
SLP-P18.10: TTSOPS: A CLOSED-LOOP CORPUS OPTIMIZATION FRAMEWORK FOR TRAINING MULTI-SPEAKER TTS MODELS FROM DARK DATA
Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari, The University of Tokyoe/Japan, Japan
Contacts