EUSIPCO 2025 || Palermo, Italy || 8 - 12 September 2025

ASMSP-P10.9

VAE-SiFiGAN: Source-Filter HiFi-GAN Based on Variational Autoencoder Representations with Enhanced Pitch Controllability

Kenichi Ogita, Reo Yoneyama, Wen-Chin Huang, Tomoki Toda, Nagoya University, Japan

Session:

ASMSP-P10: Speech Synthesis, Translation, and Assessment Poster

Location:

Poster Area C

Presentation Time:

Thu, 11 Sep, 11:00 - 12:40 Italy Time (UTC +2)

Session Chair:

Andreas Brendel, Fraunhofer IIS

Session ASMSP-P10

ASMSP-P10.1: Improving Speech Translation through Data Augmentation with Data in Similar Languages

Yu-Chien Lin, University of Illinois Urbana-Champaign, United States; Chia-Hua Wu, Yu Tsao, Hsin-Min Wang, Academia Sinica, Taiwan

ASMSP-P10.2: SOURCE TRACING OF SYNTHETIC SPEECH SYSTEMS THROUGH PARALINGUISTIC PRE-TRAINED REPRESENTATIONS

Girish ., UPES, India, India; Mohd Mujtaba Akhtar, Veer Bahadur Singh Purvanchal University, India, India; Orchid Chetia Phukan, Drishti Singh, Indraprastha Institute of Information Technology Delhi, India, India; Swarup Ranjan Behera, Independent Researcher, India, India; Pailla Balakrishna Reddy, Reliance AI, India, India; Arun Balaji Buduru, Indraprastha Institute of Information Technology Delhi, India, India; Rajesh Sharma, University of Tartu, Estonia, Estonia

ASMSP-P10.3: ARE MAMBA-BASED AUDIO FOUNDATION MODELS THE BEST FIT FOR NON-VERBAL EMOTION RECOGNITION?

Mohd Mujtaba Akhtar, Veer Bahadur Singh Purvanchal University, India, India; Orchid Chetia Phukan, Indraprastha Institute of Information Technology Delhi, India, India; Girish ., UPES, India, India; Swarup Ranjan Behera, Independent Researcher, India, India; Ananda Chandra Nayak, Kendrapara Autonomous College, India, India; Sanjib Kumar Nayak, Veer Surendra Sai University of Technology, India, India; Arun Balaji Buduru, Indraprastha Institute of Information Technology Delhi, India, India; Rajesh Sharma, University of Tartu, Estonia, Estonia

ASMSP-P10.4: QR-VC: Leveraging Quantization Residuals for Linear Disentanglement in Zero-Shot Voice Conversion

Youngjun Sim, Jinsung Yoon, Wooyeol Jeong, Young-Joo Suh, Pohang University of Science and Technology, Korea (South)

ASMSP-P10.5: DIFF-DEQ: DIFFERENTIABLE DYNAMIC EQUALIZATION FOR STUDIO-QUALITY SPEECH PROCESSING

Parakrant Sarkar, Permagnus Lindborg, City University of Hong Kong, China

ASMSP-P10.6: SWAR: A LONGFORMER-BASED GAN VOCODER FOR GUJARATI LANGUAGE

Ravindrakumar Purohit, Hemant Patil, DAIICT, India

ASMSP-P10.7: IMPROVED DYSARTHRIC SPEECH TO TEXT CONVERSION VIA TTS PERSONALIZATION

Péter Mihajlik, Budapest University of Technology and Economics, Hungary; Éva Székely, KTH Royal Institute of Technology, Sweden; Piroska Barta, Budapest University of Technology and Economics, Hungary; Máté Kádár, HUN-REN Linguistic Research Center, Hungary; Gergely Dobsinszki, SpeechTex Ltd., Hungary; László Tóth, University of Szeged, Hungary

ASMSP-P10.8: Phoneme-Level Speech Intelligibility Reduction

Aine Drelingyte, Romain Serizel, Laboratoire lorrain de recherche en informatique et ses applications, France; Mathieu Lagrange, Le Laboratoire des Sciences du Numérique de Nantes, France

ASMSP-P10.9: VAE-SiFiGAN: Source-Filter HiFi-GAN Based on Variational Autoencoder Representations with Enhanced Pitch Controllability

Kenichi Ogita, Reo Yoneyama, Wen-Chin Huang, Tomoki Toda, Nagoya University, Japan

ASMSP-P10.10: CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment

Papa Séga WADE, Orange Innovation - IMT Atlantique, France; Mihai ANDRIES, Ioannis KANELLOS, IMT Atlantique, France; Thierry MOUDENC, Orange Innovation, France

ASMSP-P10.11: DISTILLATION-FREE, STABLE TRAINING OF CLARINET VOCODER WITH SPECTRAL ENERGY DISTANCE

Eirini Sisamaki, University of Crete, Greece; Yannis Pantazis, Foundation for Research and Technology - Hellas, Greece; Vassilis Tsiaras, Yannis Stylianou, University of Crete, Greece