SLP-P40.10
TEXT-TO-SPEECH WITH LIP SYNCHRONIZATION BASED ON SPEECH-ASSISTED TEXT-TO-VIDEO ALIGNMENT AND MASKED UNIT PREDICTION
Youngdo Ahn, Jongwook Chae, Jong Won Shin, Gwangju Institute of Science and Technology (GIST), Korea, Republic of
Session:
SLP-P40: Alignment and Linguistic Modeling in TTS Poster
Track:
Speech and Language Processing [SL]
Location:
Poster Area 30
Presentation Time:
Thu, 7 May, 14:00 - 16:00
Presentation
Discussion
Resources
No resources available.
Session SLP-P40
SLP-P40.1: UniDiff-TTS: Aligner-Free Diffusion Speech Synthesis with Duration Guidance
YangMing Chen, School of Computer Science and Cyber Engineer, Guangzhou University, China; Yutao Qi, School of Computer Engineering Guangzhou Huali College, China; Wenbin Chen, School of Computer Science and Cyber Engineer, Guangzhou University, China
SLP-P40.2: Length-Aware Rotary Position Embedding for Text-Speech Alignment
Hyeongju Kim, Juheon Lee, Jinhyeok Yang, Jacob Morton, Supertone, Inc., Korea, Republic of
SLP-P40.3: SUPER MONOTONIC ALIGNMENT SEARCH
Junhyeok Lee, Johns Hopkins University, United States of America; Hyeongju Kim, Supertone Inc., Korea, Republic of
SLP-P40.4: CC-G2PNP: STREAMING GRAPHEME-TO-PHONEME AND PROSODY WITH CONFORMER-CTC FOR UNSEGMENTED LANGUAGES
Yuma Shirahata, Ryuichi Yamamoto, LY Corporation, Japan
SLP-P40.5: LEVERAGING LARGE LANGUAGE MODELS FOR TEXT NORMALIZATION OF NON-STANDARD WORDS IN TEXT-TO-SPEECH SYNTHESIS
Min Ma, Heiga Zen, Google DeepMind, United States of America; James Zhao, Google, United States of America
SLP-P40.6: F5E-TTS: ENHANCING SPEECH SYNTHESIS BY ALIGNING TEXT WITH RICH SEMANTIC REPRESENTATIONS
Yihang Chen, University of California San Diego, United States of America; Hualei Wang, Na Li, Tencent AI Lab, China; Zhifeng Li, Tencent, China
SLP-P40.7: Robust and Efficient Autoregressive Speech Synthesis with Dynamic Chunk-wise Prediction Policy
Bohan Li, Zhihan Li, Haoran Wang, Hanglei Zhang, Yiwei Guo, Hankun Wang, Kai Yu, Shanghai Jiao Tong University, China
SLP-P40.8: Align2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization
Shehzeen Hussain, Paarth Neekhara, Xuesong Yang, Edresson Casanova, Subhankar Ghosh, Roy Fejgin, Ryan Langman, Mikyas Desta, Leili Tavabi, Jason Li, NVIDIA Corporation, United States of America
SLP-P40.9: IPACue-TTS: Integrating Prosody and Articulatory Cues in Conditional Flow Matching for Multilingual Zero-Shot TTS
Giridhar Pamisetty, Atul Shree, Convin.ai, India
SLP-P40.10: TEXT-TO-SPEECH WITH LIP SYNCHRONIZATION BASED ON SPEECH-ASSISTED TEXT-TO-VIDEO ALIGNMENT AND MASKED UNIT PREDICTION
Youngdo Ahn, Jongwook Chae, Jong Won Shin, Gwangju Institute of Science and Technology (GIST), Korea, Republic of
Contacts