AASP-P11.8
GENERATIVE DE-QUANTIZATION FOR NEURAL SPEECH CODEC VIA LATENT DIFFUSION
Haici Yang, Indiana University, United States of America; Inseon Jang, Electronics and Telecommunications Research Institute, Korea, Republic of; Minje Kim, University of Illinois at Urbana-Champaign, United States of America
Session:
AASP-P11: Audio and speech modeling, coding and transmission; Spatial audio recording and reproduction Poster
Track:
Audio and Acoustic Signal Processing
Location:
Poster Zone 2A
Poster Board PZ-2A.8
Poster Board PZ-2A.8
Presentation Time:
Thu, 18 Apr, 08:20 - 10:20 (UTC +9)
Session Chair:
Archontis Politis, Tampere University
Session AASP-P11
AASP-P11.1: ScoreDec: A Phase-preserving High-Fidelity Audio Codec with A Generalized Score-based Diffusion Post-filter
Yi-Chiao Wu, Dejan Markovic, Steven Krenn, Israel D. Gebru, Alexander Richard, Meta, United States of America
AASP-P11.2: SRCodec: Split-residual vector quantization for neural speech codec
Youqiang Zheng, Weiping Tu, Li Xiao, Xinmeng Xu, Wuhan University, China
AASP-P11.3: FASTMANDARIN: EFFICIENT LOCAL MODELING FOR NATURAL MANDARIN SPEECH SYNTHESIS
Chenglong Jiang, Ying Gao, Hao Jin, Linrong Pan, W. Y. Ng Wing, South China University of Technology, China
AASP-P11.4: PRE-ECHO REDUCTION IN TRANSFORM AUDIO CODING VIA TEMPORAL ENVELOPE CONTROL WITH MACHINE LEARNING BASED ESTIMATION
Jae-Won Kim, Kwangwoon University, Korea, Republic of; Byeongho Jo, Seungkwon Beack, Electronics and Telecommunications Research Institute, Korea, Republic of; Hochong Park, Kwangwoon University, Korea, Republic of
AASP-P11.5: SuperCodec: A Neural Speech Codec with Selective Back-Projection Network
Youqiang Zheng, Weiping Tu, Li Xiao, Xinmeng Xu, Wuhan University, China
AASP-P11.6: LIGHTCODEC: A HIGH FIDELITY NEURAL AUDIO CODEC WITH LOW COMPUTATION COMPLEXITY
Liang Xu, Jing Wang, Jianqian Zhang, Xiang Xie, Beijing Institute of Technology, China
AASP-P11.7: ULTRA-LOW DELAY LOSSLESS COMPRESSION OF HIGHER ORDER AMBISONICS
Mahmoud Namazi, Kenneth Rose, University of California, Santa Barbara, United States of America
AASP-P11.8: GENERATIVE DE-QUANTIZATION FOR NEURAL SPEECH CODEC VIA LATENT DIFFUSION
Haici Yang, Indiana University, United States of America; Inseon Jang, Electronics and Telecommunications Research Institute, Korea, Republic of; Minje Kim, University of Illinois at Urbana-Champaign, United States of America
AASP-P11.9: COMPRESSION OF HIGHER-ORDER AMBISONIC SIGNALS USING DIRECTIONAL AUDIO CODING
Christoph Hold, Ville Pulkki, Aalto University, Finland; Archontis Politis, Tampere University, Finland; Leo McCormack, Aalto University, Finland
AASP-P11.10: HRTF Recommendation Based on the Predicted Binaural Colouration Model
Nils Marggraf-Turley, University of Surrey, United Kingdom of Great Britain and Northern Ireland; Michael Lovedee-Turner, Fraunhofer Institute for Integrated Circuits, Germany; Enzo De Sena, University of Surrey, United Kingdom of Great Britain and Northern Ireland
AASP-P11.11: LEARNING SPEAKER-LISTENER MUTUAL HEAD ORIENTATION BY LEVERAGING HRTF AND VOICE DIRECTIVITY ON HEADPHONES
Harshvardhan Takawale, Nirupam Roy, University of Maryland, College Park, United States of America
AASP-P11.12: FROM RIR TO BRIR: A SPARSE RECOVERY BEAMFORMING APPROACH FOR VIRTUAL BINAURAL SOUND RENDERING
Huiyuan Sun, The University of Sydney, Australia; Howe Y. Zhu, Minh T. D. Nguyen, Vincent Nguyen, Chin-Teng Lin, University of Technology Sydney, Australia; Craig T. Jin, The University of Sydney, Australia
Contacts