ASMSP-P10.9
VAE-SiFiGAN: Source-Filter HiFi-GAN Based on Variational Autoencoder Representations with Enhanced Pitch Controllability
Kenichi Ogita, Reo Yoneyama, Wen-Chin Huang, Tomoki Toda, Nagoya University, Japan
Session:
ASMSP-P10: Speech Synthesis, Translation, and Assessment Poster
Track:
ASMSP - Acoustic, Speech and Music Signal Processing
Location:
Poster Area C
Presentation Time:
Thu, 11 Sep, 11:00 - 12:40 Italy Time (UTC +2)
Session Chair:
Andreas Brendel,
Presentation
Discussion
Resources
No resources available.
Session ASMSP-P10
ASMSP-P10.1: Improving Speech Translation through Data Augmentation with Data in Similar Languages
Yu-Chien Lin, University of Illinois Urbana-Champaign, United States; Chia-Hua Wu, Yu Tsao, Hsin-Min Wang, Academia Sinica, Taiwan
ASMSP-P10.2: SOURCE TRACING OF SYNTHETIC SPEECH SYSTEMS THROUGH PARALINGUISTIC PRE-TRAINED REPRESENTATIONS
Girish ., UPES, India, India; Mohd Mujtaba Akhtar, Veer Bahadur Singh Purvanchal University, India, India; Orchid Chetia Phukan, Drishti Singh, Indraprastha Institute of Information Technology Delhi, India, India; Swarup Ranjan Behera, Independent Researcher, India, India; Pailla Balakrishna Reddy, Reliance AI, India, India; Arun Balaji Buduru, Indraprastha Institute of Information Technology Delhi, India, India; Rajesh Sharma, University of Tartu, Estonia, Estonia
ASMSP-P10.3: ARE MAMBA-BASED AUDIO FOUNDATION MODELS THE BEST FIT FOR NON-VERBAL EMOTION RECOGNITION?
Mohd Mujtaba Akhtar, Veer Bahadur Singh Purvanchal University, India, India; Orchid Chetia Phukan, Indraprastha Institute of Information Technology Delhi, India, India; Girish ., UPES, India, India; Swarup Ranjan Behera, Independent Researcher, India, India; Ananda Chandra Nayak, Kendrapara Autonomous College, India, India; Sanjib Kumar Nayak, Veer Surendra Sai University of Technology, India, India; Arun Balaji Buduru, Indraprastha Institute of Information Technology Delhi, India, India; Rajesh Sharma, University of Tartu, Estonia, Estonia
ASMSP-P10.4: QR-VC: Leveraging Quantization Residuals for Linear Disentanglement in Zero-Shot Voice Conversion
Youngjun Sim, Jinsung Yoon, Wooyeol Jeong, Young-Joo Suh, Pohang University of Science and Technology, Korea (South)
ASMSP-P10.5: DIFF-DEQ: DIFFERENTIABLE DYNAMIC EQUALIZATION FOR STUDIO-QUALITY SPEECH PROCESSING
Parakrant Sarkar, Permagnus Lindborg, City University of Hong Kong, China
ASMSP-P10.6: SWAR: A LONGFORMER-BASED GAN VOCODER FOR GUJARATI LANGUAGE
Ravindrakumar Purohit, Hemant Patil, DAIICT, India
ASMSP-P10.7: IMPROVED DYSARTHRIC SPEECH TO TEXT CONVERSION VIA TTS PERSONALIZATION
Péter Mihajlik, Budapest University of Technology and Economics, Hungary; Éva Székely, KTH Royal Institute of Technology, Sweden; Piroska Barta, Budapest University of Technology and Economics, Hungary; Máté Kádár, HUN-REN Linguistic Research Center, Hungary; Gergely Dobsinszki, SpeechTex Ltd., Hungary; László Tóth, University of Szeged, Hungary
ASMSP-P10.8: Phoneme-Level Speech Intelligibility Reduction
Aine Drelingyte, Romain Serizel, Laboratoire lorrain de recherche en informatique et ses applications, France; Mathieu Lagrange, Le Laboratoire des Sciences du Numérique de Nantes, France
ASMSP-P10.9: VAE-SiFiGAN: Source-Filter HiFi-GAN Based on Variational Autoencoder Representations with Enhanced Pitch Controllability
Kenichi Ogita, Reo Yoneyama, Wen-Chin Huang, Tomoki Toda, Nagoya University, Japan
ASMSP-P10.10: CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment
Papa Séga WADE, Orange Innovation - IMT Atlantique, France; Mihai ANDRIES, Ioannis KANELLOS, IMT Atlantique, France; Thierry MOUDENC, Orange Innovation, France
ASMSP-P10.11: DISTILLATION-FREE, STABLE TRAINING OF CLARINET VOCODER WITH SPECTRAL ENERGY DISTANCE
Eirini Sisamaki, University of Crete, Greece; Yannis Pantazis, Foundation for Research and Technology - Hellas, Greece; Vassilis Tsiaras, Yannis Stylianou, University of Crete, Greece