WS-15a.11

Multichannel Voice Trigger Detection Based on Transform-average-concatenate

Takuya Higuchi, Apple, United States of America; Avamarie Brueggeman, The University of Texas at Dallas, United States of America; Masood Delfarah, Stephen Shum, Apple, United States of America

Session:
WS-15a: Hands-free Speech Communication and Microphone Arrays (HSCMA 2024): Efficient and Personalized Speech Processing through Data Science I Poster

Track:
Satellite Workshops

Location:
Workshop Poster
Poster Board WSP.11

Presentation Time:
Tue, 16 Apr, 13:10 - 15:10 (UTC +9)

Presentation
Discussion
Resources
No resources available.
Session WS-15a
WS-15a.1: Late Audio-Visual Fusion for In-The-Wild Speaker Diarization
Zexu Pan, Gordon Wichern, Francois Germain, Aswin Subramanian, Jonathan Le Roux, Mitsubishi Electric Research Labs, United States of America
WS-15a.2: Microphone Aligned Continuous Wearable Device-Related Transfer Function: Efficient Modeling and Measurements
Wageesha Manamperi, Thushara Abhayapala, Australian National University, Australia; Paul Holmberg, Dolby Laboratories, Australia
WS-15a.3: Ambisonics Encoding For Arbitrary Microphone Arrays Incorporating Residual Channels For Binaural Reproduction
Yhonatan Gayer, Ben Gurion University of the Negev., Israel; Vladimir Tourbabin, Zamir Ben-Hur, Jacob Donley, Meta, United States of America; Boaz Rafaely, Ben Gurion University of the Negev., Israel
WS-15a.4: A cross-talk robust multichannel VAD model for multiparty agent interactions trained using synthetic re-recordings
Hyewon Han, Yonsei University, Korea, Republic of; Naveen Kumar, Disney Research Imagineering, United States of America
WS-15a.5: Fast Random Approximation of Multi-channel Room Impulse Response
Yi Luo, Rongzhi Gu, Tencent AI Lab, China
WS-15a.6: Diffusion Model-Based MIMO Speech Denoising and Dereverberation
Rino Kimura, Waseda University, NTT Corporation, Japan; Tomohiro Nakatani, Naoyuki Kamo, Marc Delcroix, Shoko Araki, NTT Corporation, Japan; Tetsuya Ueda, Shoji Makino, Waseda University, Japan
WS-15a.7: Blind Estimation of Spatial Room Impulse Responses Using a Pseudo Reference Signal
Thomas Deppisch, Jens Ahrens, Chalmers University of Technology, Sweden; Sebastià V. Amengual Garí, Paul Calamia, Meta, United States of America
WS-15a.8: JOINT MINIMUM PROCESSING BEAMFORMING AND NEAR-END LISTENING ENHANCEMENT
Andreas Jonas Fuglsig, Jesper Jensen, Zheng-Hua Tan, Aalborg University, Denmark; Lars Søndergaard Bertelsen, Jens Christian Lindof, RTX A/S, Denmark; Jan Østergaard, Aalborg University, Denmark
WS-15a.9: A Two-Step Approach for Narrowband Source Localization in Reverberant Rooms
Wei-Ting Lai, Lachlan Birnie, Thushara Abhayapala, Amy Bastine, Shaoheng Xu, Prasanga Samarasinghe, Australian National University, Australia
WS-15a.10: Insights Into Magnitude and Phase Estimation by Masking and Mapping in DNN-based Multichannel Speaker Separation
Alexander Bohlender, Ghent University - imec, Belgium; Ann Spriet, Wouter Tirry, Goodix Technology (Belgium) B.V., Belgium; Nilesh Madhu, Ghent University - imec, Belgium
WS-15a.11: Multichannel Voice Trigger Detection Based on Transform-average-concatenate
Takuya Higuchi, Apple, United States of America; Avamarie Brueggeman, The University of Texas at Dallas, United States of America; Masood Delfarah, Stephen Shum, Apple, United States of America
WS-15a.12: External Knowledge Augmented Polyphone Disambiguation Using Large Language Model
Chen Li, Ant Group, China
Contacts