WS-15b.4

JOINT OPTIMIZATION OF STREAMING AND NON-STREAMING AUTOMATIC SPEECH RECOGNITION WITH MULTI-DECODER AND KNOWLEDGE DISTILLATION

Muhammad Shakeel, Yui Sudo, Honda Research Institute Japan Co. Ltd., Japan; Yifan Peng, Shinji Watanabe, Carnegie Mellon University, United States of America

Session:
WS-15b: Hands-free Speech Communication and Microphone Arrays (HSCMA 2024): Efficient and Personalized Speech Processing through Data Science II Poster

Track:
Satellite Workshops

Location:
Workshop Poster
Poster Board WSP.4

Presentation Time:
Tue, 16 Apr, 13:10 - 15:10 (UTC +9)

Presentation
Discussion
Resources
No resources available.
Session WS-15b
WS-15b.1: Deep Low-Latency Joint Speech Transmission and Enhancement over a Gaussian Channel
Mohammad Bokaei, Jesper Jensen, Aalborg university, Denmark; Simon Doclo, University of Oldenburg, Germany; Jan Østergaard, Aalborg university, Denmark
WS-15b.2: Geometrically Constrained Joint Moving Source Extraction and Dereverberation based on Constant Separating Vector Mixing Model
Mingxue Song, Tetsuya Ueda, Ruifeng Zhang, Jiahui Hu, Shoji Makino, Waseda University, Japan
WS-15b.3: CONVOIFILTER: A CASE STUDY OF DOING COCKTAIL PARTY SPEECH RECOGNITION
Thai Binh Nguyen, Karlsruhe Institute of Technology, Germany; Alexander Waibel, Carnegie Mellon University, United States of America
WS-15b.4: JOINT OPTIMIZATION OF STREAMING AND NON-STREAMING AUTOMATIC SPEECH RECOGNITION WITH MULTI-DECODER AND KNOWLEDGE DISTILLATION
Muhammad Shakeel, Yui Sudo, Honda Research Institute Japan Co. Ltd., Japan; Yifan Peng, Shinji Watanabe, Carnegie Mellon University, United States of America
WS-15b.5: TRAINING STRATEGIES FOR MODALITY DROPOUT RESILIENT MULTI-MODAL TARGET SPEAKER EXTRACTION
Srikanth Korse, Mohamed Elminshawi, Emanuël Habets, Srikanth Raj Chetupalli, International Audio Laboratories Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
WS-15b.6: A Unified Geometry-aware Source Localization and Separation Framework for Ad-hoc Microphone Array
Jingjie Fan, Southeast University, China; Rongzhi Gu, Yi Luo, Tencent, China; Cong Pang, Southeast University, China
WS-15b.7: Real-time speech extraction using spatially regularized independent low-rank matrix analysis and rank-constrained spatial covariance matrix estimation
Yuto Ishikawa, Kohei Konaka, The University of Tokyo, Japan; Tomohiko Nakamura, The National Institute of Advanced Industrial Science and Technology (AIST), Japan; Norihiro Takamune, Hiroshi Saruwatari, The University of Tokyo, Japan
WS-15b.8: Ensemble inference for diffusion model-based speech enhancement
Hao Shi, Kyoto University, Japan; Naoyuki Kamo, Marc Delcroix, Tomohiro Nakatani, Shoko Araki, NTT Corporation, Japan
WS-15b.9: NEURAL STEERER: NOVEL STEERING VECTOR SYNTHESIS WITH A CAUSAL NEURAL FIELD OVER FREQUENCY AND DIRECTION
Diego Di Carlo, Aditya Arie Nugraha, Center for Advanced Intelligence Project (AIP), RIKEN, Japan; Mathieu Fontaine, LTCI, Telecom Paris, Institut Polytechnique de Paris, France, France; Yoshiaki Bando, National Institute of Advanced Industrial Science and Technology (AIST), Center for Advanced Intelligence Project (AIP), RIKEN, Japan; Kazuyoshi Yoshii, Graduate School of Informatics, Kyoto University, Center for Advanced Intelligence Project (AIP), RIKEN, Japan
WS-15b.10: DATA-DRIVEN JOINT DETECTION AND LOCALIZATION OF ACOUSTIC REFLECTORS
H. Nazim Bicer, Cagdas Tuna, Andreas Walther, Fraunhofer Institute for Integrated Circuits IIS, Germany; Emanuël A. P. Habets, International Audio Laboratories Erlangen, Germany
WS-15b.11: Sound Source Separation Using Latent Variational Block-Wise Disentanglement
Karim Helwani, Masahito Togami, Paris Smaragdis, Michael M. Goodwin, Amazon, United States of America
WS-15b.12: Multi-channel Speech Enhancement using Beamforming and Nullforming for Severely Adverse Drone Environment
Seokhyun Kim, Won Jeong, Hyung-Min Park, Sogang Univ., Korea, Republic of
Contacts