IEEE IWAENC 2024 || Aalborg, Denmark || 9-12 September 2024

TH1.P2.6

PAD-VC: A PROSODY-AWARE DECODER FOR ANY-TO-FEW VOICE CONVERSION

Arunava Kr Kalita, Indian Institute of Information Technology Guwahati, India; Christian Dittmar, Paolo Sani, Frank Zalkow, Fraunhofer IIS, Erlangen, Germany, Germany; Emanuël A. P. Habets, International Audio Laboratories Erlangen, Germany, Germany; Rusha Patra, Indian Institute of Information Technology Guwahati, India

Session:

TH1.P2: Poster Session VII: Speech and audio coding, New and emerging topics in speech and audio processing, Special Session: Deep learning-based approaches to audio telepresence Poster

Location:

Indgangsfoyer

Presentation Time:

Thu, 12 Sep, 10:00 - 12:00 Central European Time (UTC +2)

Session Co-Chairs:

Rainer Martin, Ruhr University Bochum and Mingsian Bai, National Tsing Hua University

Session TH1.P2

TH1.P2.1: HIGH-FIDELITY DIFFUSION-BASED AUDIO CODEC

Zhengpu Zhang, Jianyuan Feng, Yongjian Mao, Yehang Zhu, Junjie Shi, Xuzhou Ye, Shilei Liu, Derong Liu, Chuanzeng Huang, ByteDance, China

TH1.P2.2: A CROSS-DOMAIN APPROACH TO TEMPORAL ENVELOPE SHAPING IN PARAMETRIC STEREO CODING USING DEEP LEARNING

Patrick Kechichian, Akshaya Ravi, Erik Schuijers, Philips, Netherlands

TH1.P2.3: Non-Causal to Causal SSL-Supported Transfer Learning: Towards a High-Performance Low-Latency Speech Vocoder

Renzheng Shi, Andreas Bär, Marvin Sach, Technische Universität Braunschweig, Germany; Wouter Tirry, GOODiX Technology Belgium B.V., Germany; Tim Fingscheidt, Technische Universität Braunschweig, Germany

TH1.P2.4: Gaussian Flow Bridges for Audio Domain Transfer with Unpaired Data

Eloi Moliner Juanpere, Aalto University, Finland; Sebastian Braun, Hannes Gamper, Microsoft Research, United States of America

TH1.P2.5: Complexity Reduction for Classification of Musical Instruments Using Element Selection

Ryu Kato, Natsuki Ueno, Nobutaka Ono, Tokyo Metropolitan University, Japan; Ryo Matsuda, Kazunobu Kondo, Yamaha Corporation, Japan

TH1.P2.6: PAD-VC: A PROSODY-AWARE DECODER FOR ANY-TO-FEW VOICE CONVERSION

Arunava Kr Kalita, Indian Institute of Information Technology Guwahati, India; Christian Dittmar, Paolo Sani, Frank Zalkow, Fraunhofer IIS, Erlangen, Germany, Germany; Emanuël A. P. Habets, International Audio Laboratories Erlangen, Germany, Germany; Rusha Patra, Indian Institute of Information Technology Guwahati, India

TH1.P2.7: LONG-TERM CONVERSATION ANALYSIS: PRIVACY-UTILITY TRADE-OFF UNDER NOISE AND REVERBERATION

Jule Pohlhausen, Jade University of Applied Sciences, Oldenburg, Germany; Francesco Nespoli, Imperial College, London, United Kingdom of Great Britain and Northern Ireland; Jörg Bitzer, Jade University of Applied Sciences, Oldenburg, Germany

TH1.P2.8: HARMONICS TO THE RESCUE: WHY VOICED SPEECH IS NOT A WSS PROCESS

Giovanni Bologni, Delft University of Technology, Netherlands; Richard Heusdens, Netherlands Defence Academy, Netherlands; Richard C. Hendriks, Delft University of Technology, Netherlands

TH1.P2.9: DERIVATIVE FEATURES OF SHORT-TIME HOLOMORPHIC FOURIER TRANSFORM

Iori Hashimoto, Yu Morinaga, Suehiro Shimauchi, Shigeaki Aoki, Kanazawa Institute of Technology, Japan

TH1.P2.10: FEASIBILITY OF IMAGLS-BSM - ILD INFORMED BINAURAL SIGNAL MATCHING WITH ARBITRARY MICROPHONE ARRAYS

Or Berebi, Ben-Gurion University of the Negev, Israel; Zamir Ben-Hur, David Lou Alon, Meta, United States of America; Boaz Rafaely, Ben-Gurion University of the Negev, Israel

TH1.P2.11: RGI-NET: 3D ROOM GEOMETRY INFERENCE FROM ROOM IMPULSE RESPONSES WITH HIDDEN FIRST-ORDER REFLECTIONS

Inmo Yeon, Jung-Woo Choi, Korea Advanced Institute of Science and Technology (KAIST), Korea, Republic of

TH1.P2.12: A tunable binaural audio telepresence system capable of balancing immersive and enhanced modes

Yicheng Hsu, Mingsian Bai, National Tsing Hua University, Taiwan

TH1.P2.13: NEURAL DIRECTIONAL FILTERING: FAR-FIELD DIRECTIVITY CONTROL WITH A SMALL MICROPHONE ARRAY

Julian Wechsler, Srikanth Raj Chetupalli, Mhd Modar Halimeh, Oliver Thiergart, Emanuël A. P. Habets, International Audio Laboratories Erlangen, Germany

TH1.P2.14: Magnitude Least-Squares based Ambisonics Estimation of Head-Worn Device Microphone Measurements for Binaural Reproduction

AMY BASTINE, Lachlan Birnie, Thushara Abhayapala, Prasanga Samarasinghe, The Australian National University, Australia; Vladimir Tourbabin, Reality Labs Research, Meta, United States of America

Contact | Accessibility | Nondiscrimination Policy | IEEE Ethics Reporting | IEEE Privacy Policy | Terms | Signal Processing Society

©2026 IEEE – All rights reserved.

Last updated Last updated 09 August 2024.

Use of this website signifies your agreement to the IEEE Terms and Conditions.

Support: iwaenc2024@cmsworkshops.com Host: https://cmsworldwide.com/