SLP-P6.10

SPEAKING CLEARLY: A SIMPLIFIED WHISPER-BASED CODEC FOR LOW-BITRATE SPEECH CODING

Xin Zhang, Lin Li, Xiangni Lu, Wuhan University of Technology, China; Jianquan Liu, NEC Corporation, Japan; Kong Aik Lee, The Hong Kong Polytechnic University, Hong Kong

Session:
SLP-P6: Neural Vocoders and Codecs Poster

Track:
Speech and Language Processing [SL]

Location:
Poster Area 43

Presentation Time:
Tue, 5 May, 14:00 - 16:00

Presentation
Discussion
Resources
No resources available.
Session SLP-P6
SLP-P6.1: T-Mimi: A Transformer-based Mimi Decoder for Real-Time On-Phone TTS
Haibin Wu, Bach Viet Do, Naveen Suda, Julian Chan, Madhavan C R, Gene-Ping Yang, Yi-Chiao Wu, Naoyuki Kanda, Yossef Adi, Xin Lei, Yue Liu, Florian Metze, Yuzong Liu, Meta, United States of America
SLP-P6.2: Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Codec
Yanzhou Ren, Waseda University, Japan; Noboru Harada, Daiki Takeuchi, NTT, Inc., Japan; Siyu Chen, Wei Liu, Xiao Zhang, Liyuan Zhang, Waseda University, Japan; Takehiro Moriya, NTT, Inc., Japan; Shoji Makino, Waseda University, Japan
SLP-P6.3: FOCALCODEC-STREAM: STREAMING LOW-BITRATE SPEECH CODING VIA CAUSAL DISTILLATION
Luca Della Libera, Concordia University, Canada; Cem Subakan, Université Laval, Canada; Mirco Ravanelli, Concordia University, Canada
SLP-P6.4: WAVE-TRAINER-FIT: NEURAL VOCODER WITH TRAINABLE PRIOR AND FIXED-POINT ITERATION TOWARDS HIGH-QUALITY SPEECH GENERATION FROM SSL FEATURES
Hien Ohnaka, Nara Institute of Science and Technology, Japan; Yuma Shirahata, Masaya Kawamura, LY Corporation, Japan
SLP-P6.5: WAVENEXT 2: CONVNEXT-BASED FAST NEURAL VOCODERS WITH RESIDUAL DENOISING AND SUB-MODELING FOR GAN AND DIFFUSION MODELS
Wangzixi Zhou, Nara Institute of Science and Technology, Japan; Takuma Okamoto, Yamato Ohtani, National Institute of Information and Communications Technology, Japan; Sakriani Sakti, Nara Institute of Science and Technology, Japan; Hisashi Kawai, National Institute of Information and Communications Technology, Japan
SLP-P6.6: CODECSLIME: TEMPORAL REDUNDANCY COMPRESSION OF NEURAL SPEECH CODEC VIA DYNAMIC FRAME RATE
Hankun Wang, Yiwei Guo, Chongtian Shao, Bohan Li, Kai Yu, Shanghai Jiao Tong University, China
SLP-P6.7: DISCRETE DIFFUSION FOR GENERATIVE MODELING OF TEXT-ALIGNED SPEECH TOKENS
Pin-Jui Ku, Georgia Institute of Technology, United States of America; He Huang, Jean-Marie Lemercier, Subham Sekhar Sahoo, Zhehuai Chen Chen, Ante Jukić, NVIDIA, United States of America
SLP-P6.8: AUV: TEACHING AUDIO UNIVERSAL VECTOR QUANTIZATION WITH SINGLE NESTED CODEBOOK
Yushen Chen, Shanghai Jiao Tong University, China; Kai Hu, Long Zhou, Shulin Feng, Tencent, China; Xusheng Yang, Peking University, China; Hangting Chen, Tencent, China; Xie Chen, Shanghai Jiao Tong University, China
SLP-P6.9: STACODEC: SEMANTIC TOKEN ASSIGNMENT FOR BALANCING ACOUSTIC FIDELITY AND SEMANTIC INFORMATION IN AUDIO CODECS
Kaiyuan Zhang, Mohan Shi, Eray Eren, Natarajan Balaji Shankar, Zilai Wang, Abeer Alwan, UCLA, United States of America
SLP-P6.10: SPEAKING CLEARLY: A SIMPLIFIED WHISPER-BASED CODEC FOR LOW-BITRATE SPEECH CODING
Xin Zhang, Lin Li, Xiangni Lu, Wuhan University of Technology, China; Jianquan Liu, NEC Corporation, Japan; Kong Aik Lee, The Hong Kong Polytechnic University, Hong Kong
SLP-P6.11: EuleroDec: A Complex-Valued RVQ-VAE for Efficient and Robust Audio Coding
Luca Cerovaz, Sapienza University of Rome, Italy; Michele Mancusi, Moises Systems Inc., Salt Lake City, USA, Italy; Emanuele Rodolà, Sapienza University of Rome, Italy
SLP-P6.12: COMBINING MULTI-ORDER ATTENTION AND MULTI-RESOLUTION DISCRIMINATOR FOR HIGH-FIDELITY NEURAL VOCODER
Yan Shi, Jin Shi, Minchuan Chen, Ziyang Zhuang, Ping An Technology, China; Peng Qi, Shanghai Jiao Tong University Chongqing Artificial Intelligence Research Institute, China; Shaojun Wang, Jing Xiao, Ping An Technology, China
SLP-P6.13: FAC-FACODEC: CONTROLLABLE ZERO-SHOT FOREIGN ACCENT CONVERSION WITH FACTORIZED SPEECH CODEC
Yurii Halychanskyi, Cameron Churchwell, Yutong Wen, Volodymyr Kindratenko, University of Illinois Urbana-Champaign, United States of America
SLP-P6.14: QHARMA-GAN: QUASI-HARMONIC NEURAL VOCODER BASED ON AUTOREGRESSIVE MOVING AVERAGE MODEL
Shaowen Chen, Tomoki Toda, Nagoya University, Japan
SLP-P6.15: HOW TO LABEL RESYNTHESIZED AUDIO: THE DUAL ROLE OF NEURAL AUDIO CODECS IN AUDIO DEEPFAKE DETECTION
Yixuan Xiao, University of Stuttgart, Germany; Florian Lux, Alejandro Pérez-González-de-Martos, AppTek GmbH, Germany; Ngoc Thang Vu, University of Stuttgart, Germany
Contacts