SS-L11: In-Context Learning Methods for Speech and Spoken Language Processing
Thu, 18 Apr, 16:30 - 18:30 (UTC +9)
Location: Room 103
Session Type: Lecture
Session Co-Chairs: Chao Zhang, Cambridge Universeity and Chao-Han Huck Yang, NVIDIA and Marco Siniscalchi, University of Palermo
Track: Special Sessions
Click the to view the manuscript on IEEE Xplore Open Preview
Thu, 18 Apr, 16:30 - 16:50 (UTC +9)
 

SS-L11.1: SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

zhehuai chen, HE HUANG, Andrei Andrusenko, Oleksii Hrinchuk, Krishna Puvvada, Jason Li, Subhankar Ghosh, Jagadeesh Balam, Boris Ginsburg, NVIDIA, United States of America
Thu, 18 Apr, 16:50 - 17:10 (UTC +9)
 

SS-L11.2: Hierarchical cross-modality knowledge transfer with sinkhorn attention for CTC-based ASR

Xugang Lu, Peng Shen, National Institute of Information and Communications Technology, Japan; Yu Tsao, Research Center for Information Technology Innovation, Taiwan; Hisashi Kawai, National Institute of Information and Communications Technology, Japan
Thu, 18 Apr, 17:10 - 17:30 (UTC +9)
 

SS-L11.3: VOXTLM: UNIFIED DECODER-ONLY MODELS FOR CONSOLIDATING SPEECH RECOGNITION, SYNTHESIS AND SPEECH, TEXT CONTINUATION TASKS

Soumi Maiti, Yifan Peng, CMU, United States of America; Shukjae Choi, 42dot, Korea, Republic of; Jee-weon Jung, Xuankai Chang, Shinji Watanabe, CMU, United States of America
Thu, 18 Apr, 17:30 - 17:50 (UTC +9)
 

SS-L11.4: Prompting Large Language Models with Speech Recognition Abilities

Yassir Fathullah, University of Cambridge, United Kingdom of Great Britain and Northern Ireland; Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, Jinxi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer, Meta AI, United States of America
Thu, 18 Apr, 17:50 - 18:10 (UTC +9)
 

SS-L11.5: Can Whisper perform speech-based in-context learning?

Siyin Wang, Tsinghua University, China; Chao-Han Yang, Georgia Institute of Technology, United States of America; Ji Wu, Chao Zhang, Tsinghua University, China
Thu, 18 Apr, 18:10 - 18:30 (UTC +9)
 

SS-L11.6: Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks

Kevin Everson, University of Washington, United States of America; Yile Gu, Huck Yang, Prashanth Gurunath Shivakumar, Amazon Alexa AI, United States of America; Guan-Ting Lin, National Taiwan University, Taiwan; Jari Kolehmainen, Ivan Bulyko, Ankur Gandhe, Shalini Ghosh, Wael Hamza, Amazon Alexa AI, United States of America; Hung-yi Lee, National Taiwan University, Taiwan; Ariya Rastrow, Amazon Alexa AI, United States of America; Andreas Stolcke, Uniphore, United States of America