SLP-P21.6
GLA-GRAD++: AN IMPROVED GRIFFIN-LIM GUIDED DIFFUSION MODEL FOR SPEECH SYNTHESIS
Teysir Baoueb, Xiaoyu Bie, Mathieu Fontaine, Gaël Richard, LTCI, Télécom Paris, Institut polytechnique de Paris, France
Session:
SLP-P21: Singing Voice, Music, and Multimodal Audio Generation Poster
Track:
Speech and Language Processing [SL]
Location:
Poster Area 29
Presentation Time:
Wed, 6 May, 14:00 - 16:00
Presentation
Discussion
Resources
No resources available.
Session SLP-P21
SLP-P21.1: THE SINGING VOICE CONVERSION CHALLENGE 2025: FROM SINGER IDENTITY CONVERSION TO SINGING STYLE CONVERSION
Lester Phillip Violeta, Nagoya University, Japan; Xueyao Zhang, The Chinese University of Hong Kong, China; Jiatong Shi, Carnegie Mellon University, United States of America; Yusuke Yasuda, National Institute of Informatics, Japan; Wen-Chin Huang, Nagoya University, Japan; Zhizheng Wu, The Chinese University of Hong Kong, China; Tomoki Toda, Nagoya University, Japan
SLP-P21.2: S 2VOICE: STYLE-AWARE AUTOREGRESSIVE MODELING WITH ENHANCED CONDITIONING FOR SINGING STYLE CONVERSION
Ziqian Wang, Northwestern Polytechnical University, China; Xianjun Xia, Chuanzeng Huang, Bytedance, Australia; Lei Xie, Northwestern Polytechnical University, China
SLP-P21.3: DITSINGER: SCALING SINGING VOICE SYNTHESIS WITH DIFFUSION TRANSFORMER AND IMPLICIT ALIGNMENT
Zongcai Du, Guilin Deng, Xiaofeng Guo, Xin Gao, Linke Li, Kaichang Cheng, Fubo Han, Siyu Yang, Peng Liu, Pan Zhong, Qiang Fu, Migu Music, China Mobile Communications Corporation, China
SLP-P21.4: InstructAudio: Unified speech and music generation with natural language instruction
Chunyu Qiang, Tianjin University, China; Kang Yin, Xiaopeng Wang, Yuzhe Liang, Jiahui Zhao, Kuaishou Technology, China; Ruibo Fu, Institute of Automation, Chinese Academy of Sciences, China; Tianrui Wang, Cheng Gong, Tianjin University, China; Chen Zhang, Kuaishou Technology, China; Longbiao Wang, Jianwu Dang, Tianjin University, China
SLP-P21.5: LP-CFM: PERCEPTUAL INVARIANCE-AWARE CONDITIONAL FLOW MATCHING FOR SPEECH MODELING
Doyeop Kwak, Youngjoon Jang, Joon Son Chung, Korea Advanced Institute of Science and Technology, Korea, Republic of
SLP-P21.6: GLA-GRAD++: AN IMPROVED GRIFFIN-LIM GUIDED DIFFUSION MODEL FOR SPEECH SYNTHESIS
Teysir Baoueb, Xiaoyu Bie, Mathieu Fontaine, Gaël Richard, LTCI, Télécom Paris, Institut polytechnique de Paris, France
SLP-P21.7: PROSODY-GUIDED HARMONIC ATTENTION FOR PHASE-COHERENT NEURAL VOCODING IN THE COMPLEX SPECTRUM
Mohammed Salah Al-Radhi, Riad Larbi, Mátyás Bartalis, Géza Németh, Budapest University of Technology and Economics, Hungary
SLP-P21.8: Learning Vocal-Tract Area and Radiation with a Physics-Informed Webster Model
Minhui Lu, Joshua Reiss, Queen Mary University of London, United Kingdom of Great Britain and Northern Ireland
SLP-P21.9: MEANFLOW-ACCELERATED MULTIMODAL VIDEO-TO-AUDIO SYNTHESIS VIA ONE-STEP GENERATION
Xiaoran Yang, Wuhan University, China; Jianxuan Yang, Xinyue Guo, MiLM Plus, Xiaomi Inc., China; Haoyu Wang, Ningning Pan, Southwestern University of Finance and Economics, China; Gongping Huang, Wuhan University, China
SLP-P21.10: TAG: STRUCTURED TEMPORAL AUDIO GENERATION VIA LLM-GUIDED MANUAL SCRIPTION AND CONTROL
Hanwen Zhang, University of Southern California, China; Jinshen Zhang, Huazhong University of Science & Technology, China; Cong Zhang, Shuhui Wang, University of Chinese Academy of Sciences, China; Wei Yang, Huazhong University of Science & Technology, China
Contacts