AASP-P26.3

Shortcut Flow Matching for Speech Enhancement: Step-Invariant flows via single stage training

Naisong Zhou, École polytechnique fédérale de Lausanne, Switzerland; Saisamarth Rajesh Phaye, Milos Cernak, Andy Pearce, Tijana Stojkovic, Logitech, Singapore; Andrea Cavallaro, École polytechnique fédérale de Lausanne, Switzerland; Andrew Harper, Logitech, United Kingdom of Great Britain and Northern Ireland

Session:
AASP-P26: Audio and Speech Source Separation and Signal Enhancement III Poster

Track:
Audio and Acoustic Signal Processing [AA]

Location:
Poster Area 24

Presentation Time:
Fri, 8 May, 09:00 - 11:00

Presentation
Discussion
Resources
No resources available.
Session AASP-P26
AASP-P26.1: DISSECTING PERFORMANCE DEGRADATION IN AUDIO SOURCE SEPARATION UNDER SAMPLING FREQUENCY MISMATCH
Kanami Imamura, The University of Tokyo / National Institute of Advanced Industrial Science and Technology (AIST), Japan; Tomohiko Nakamura, National Institute of Advanced Industrial Science and Technology (AIST), Japan; Kohei Yatabe, Tokyo University of Agriculture and Technology, Japan; Hiroshi Saruwatari, The University of Tokyo, Japan
AASP-P26.2: NEURAL NETWORK-BASED TIME-FREQUENCY-BIN-WISE LINEAR COMBINATION OF BEAMFORMERS FOR UNDERDETERMINED TARGET SOURCE EXTRACTION
Changda Chen, Waseda University, Japan; Yichen Yang, Northwestern Polytechnical University, China; Wei Liu, Wuhan University, China; Shoji Makino, Waseda University, Japan
AASP-P26.3: Shortcut Flow Matching for Speech Enhancement: Step-Invariant flows via single stage training
Naisong Zhou, École polytechnique fédérale de Lausanne, Switzerland; Saisamarth Rajesh Phaye, Milos Cernak, Andy Pearce, Tijana Stojkovic, Logitech, Singapore; Andrea Cavallaro, École polytechnique fédérale de Lausanne, Switzerland; Andrew Harper, Logitech, United Kingdom of Great Britain and Northern Ireland
AASP-P26.4: TOWARDS REAL-TIME GENERATIVE SPEECH RESTORATION WITH FLOW-MATCHING
Tsun-An Hsieh, University of Illinois Urbana-Champaign, United States of America; Sebastian Braun, Microsoft, United States of America
AASP-P26.5: UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching
Woongjib Choi, Sangmin Lee, Hyungseob Lim, Hong-Goo Kang, Yonsei University, Korea, Republic of
AASP-P26.6: GENERALIZABILITY OF PREDICTIVE AND GENERATIVE SPEECH ENHANCEMENT MODELS TO PATHOLOGICAL SPEAKERS
Mingchi Hou, Idiap Research Institute, Switzerland; Ante Jukic, NVIDIA, United States of America; Ina Kodrasi, Idiap Research Institute, Switzerland
AASP-P26.7: CLASS-AWARE PERMUTATION-INVARIANT SIGNAL-TO-DISTORTION RATIO FOR SEMANTIC SEGMENTATION OF SOUND SCENE WITH SAME-CLASS SOURCES
Binh Thien Nguyen, Masahiro Yasuda, Daiki Takeuchi, Daisuke Niizumi, Noboru Harada, NTT, Inc., Japan
AASP-P26.8: SPATIAL COVARIANCE MATRIX RECONSTRUCTION FOR SPEECH ENHANCEMENT IN REVERBERANT MULTI-SOURCE ENVIRONMENTS
Wei Liu, Wuhan University, China; Xueqin Luo, Jilu Jin, Northwestern Polytechnical University, China; Gongping Huang, Wuhan University, China; Jingdong Chen, Northwestern Polytechnical University, China; Jacob Benesty, University of Quebec, Canada; Shoji Makino, Waseda University, Japan
AASP-P26.9: SINGLE-STEP CONTROLLABLE MUSIC BANDWIDTH EXTENSION WITH FLOW MATCHING
Carlos Hernández Oliván, Hendrik Vincent Koops, Hao Hao Tan, Elio Quinton, Universal Music Group, United Kingdom of Great Britain and Northern Ireland
AASP-P26.10: Multi-Channel Speech Enhancement Guided by Learning-based A Posteriori Speech Presence Probability Estimation
Shuai Tao, Aalborg University, China
Contacts