AASP-P2.2

A Closer Look at Wav2Vec2 Embeddings for On-device Single-channel Speech Enhancement

Ravi Shankar, Johns Hopkins University, United States of America; Ke Tan, Buye Xu, Anurag Kumar, Meta Reality Labs, United States of America

Session:
AASP-P2: Speech enhancement 1; Music information retrieval 2 Poster

Track:
Audio and Acoustic Signal Processing

Location:
Poster Zone 2A
Poster Board PZ-2A.2

Presentation Time:
Tue, 16 Apr, 13:10 - 15:10 (UTC +9)

Session Chair:
Zhiyao Duan, University of Rochester
View Manuscript
Presentation
Discussion
Resources
Session AASP-P2
AASP-P2.1: ULTRA LOW COMPLEXITY DEEP LEARNING BASED NOISE SUPPRESSION
Shrishti Saha Shetu, Soumitro Chakrabarty, Oliver Thiergart, Edwin Mabande, Fraunhofer IIS, Germany
AASP-P2.2: A Closer Look at Wav2Vec2 Embeddings for On-device Single-channel Speech Enhancement
Ravi Shankar, Johns Hopkins University, United States of America; Ke Tan, Buye Xu, Anurag Kumar, Meta Reality Labs, United States of America
AASP-P2.3: GTCRN: A Speech Enhancement Model Requiring Ultralow Computational Resources
Xiaobin Rong, Tianchi Sun, Nanjing University, China; Xu Zhang, Jiangsu Thingstar Information Technology Co., Ltd., China; Yuxiang Hu, Changbao Zhu, Horizon Robotics, China; Jing Lu, Nanjing University, China
AASP-P2.4: Binaural multichannel blind speaker separation with a causal low-latency and low-complexity approach
Bernd Meyer, Nils Westhausen, Carl von Ossietzky Universitat Oldenburg Ringgold standard institution - Communication Acoustics Oldenburg
AASP-P2.5: BAE-Net: A Low complexity and high fidelity bandwidth-adaptive neural network for speech super-resolution
Guochen Yu, Xiguang Zheng, Nan Li, Runqiang Han, Kuaishou Technology, Beijing, China, China; Chengshi Zheng, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China, China; Chen Zhang, Chao Zhou, Qi Huang, Bing Yu, Kuaishou Technology, Beijing, China, China
AASP-P2.6: DDD: A PERCEPTUALLY SUPERIOR LOW-RESPONSE-TIME DNN-BASED DECLIPPER
Jayeon Yi, University of Michigan, United States of America; Junghyun Koo, Kyogu Lee, Seoul National University, Korea, Republic of
AASP-P2.7: ENHANCING VIOLIN FINGERING GENERATION THROUGH AUDIO-SYMBOLIC FUSION
Wei-Yang Lin, Academia Sinica, Taiwan; Yu-Chiang Frank Wang, National Taiwan University, Taiwan; Li Su, Academia Sinica, Taiwan
AASP-P2.8: MULTI-VIEW MIDIVAE: FUSING TRACK- AND BAR-VIEW REPRESENTATIONS FOR LONG MULTI-TRACK SYMBOLIC MUSIC GENERATION
Zhiwei Lin, Jun Chen, Boshi Tang, Binzhu Sha, Tsinghua University, China; Jing Yang, Yaolong Ju, Fan Fan, Huawei, China; Shiyin Kang, Skywork AI PTE. LTD., China; Zhiyong Wu, Tsinghua University, China; Helen Meng, The Chinese University of Hong Kong, China
AASP-P2.9: A SCALABLE SPARSE TRANSFORMER MODEL FOR SINGING MELODY EXTRACTION
Shuai Yu, Jun Liu, Donghua University, China; Yi Yu, National Institute of Informatics (NII), Japan; Wei Li, Fudan University, China
AASP-P2.10: BYTEHUM: FAST AND ACCURATE QUERY-BY-HUMMING IN THE WILD
Xingjian Du, Pei Zou, Mingyu Liu, Xia Liang, Minghang Chu, Bilei Zhu, ByteDance, China
AASP-P2.11: JOINT MUSIC AND LANGUAGE ATTENTION MODELS FOR ZERO-SHOT MUSIC TAGGING
Xingjian Du, Zhesong Yu, Jiaju Lin, ByteDance, China; Qiuqiang Kong, The Chinese University of Hong Kong, China; Bilei Zhu, ByteDance, China
AASP-P2.12: Dynamic Time Signature Recognition, Tempo Inference, and Beat Tracking through the Metrogram Transform
Simon Godsill, James Cozens, University of Cambridge - Engineering Trumpington Street Cambridge , Cambridge Cb21PZ United Kingdom of Great Britain and Northern Ireland
Contacts