SS-L11.3

VOXTLM: UNIFIED DECODER-ONLY MODELS FOR CONSOLIDATING SPEECH RECOGNITION, SYNTHESIS AND SPEECH, TEXT CONTINUATION TASKS

Soumi Maiti, Yifan Peng, CMU, United States of America; Shukjae Choi, 42dot, Korea, Republic of; Jee-weon Jung, Xuankai Chang, Shinji Watanabe, CMU, United States of America

Session:
SS-L11: In-Context Learning Methods for Speech and Spoken Language Processing Lecture

Track:
Special Sessions

Location:
Room 103

Presentation Time:
Thu, 18 Apr, 17:10 - 17:30 (UTC +9)

Session Co-Chairs:
Chao Zhang, Cambridge Universeity and Chao-Han Huck Yang, NVIDIA and Marco Siniscalchi, University of Palermo
View Manuscript
Presentation
Discussion
Resources
Session SS-L11
SS-L11.1: SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
zhehuai chen, HE HUANG, Andrei Andrusenko, Oleksii Hrinchuk, Krishna Puvvada, Jason Li, Subhankar Ghosh, Jagadeesh Balam, Boris Ginsburg, NVIDIA, United States of America
SS-L11.2: Hierarchical cross-modality knowledge transfer with sinkhorn attention for CTC-based ASR
Xugang Lu, Peng Shen, National Institute of Information and Communications Technology, Japan; Yu Tsao, Research Center for Information Technology Innovation, Taiwan; Hisashi Kawai, National Institute of Information and Communications Technology, Japan
SS-L11.3: VOXTLM: UNIFIED DECODER-ONLY MODELS FOR CONSOLIDATING SPEECH RECOGNITION, SYNTHESIS AND SPEECH, TEXT CONTINUATION TASKS
Soumi Maiti, Yifan Peng, CMU, United States of America; Shukjae Choi, 42dot, Korea, Republic of; Jee-weon Jung, Xuankai Chang, Shinji Watanabe, CMU, United States of America
SS-L11.4: Prompting Large Language Models with Speech Recognition Abilities
Yassir Fathullah, University of Cambridge, United Kingdom of Great Britain and Northern Ireland; Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, Jinxi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer, Meta AI, United States of America
SS-L11.5: Can Whisper perform speech-based in-context learning?
Siyin Wang, Tsinghua University, China; Chao-Han Yang, Georgia Institute of Technology, United States of America; Ji Wu, Chao Zhang, Tsinghua University, China
SS-L11.6: Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks
Kevin Everson, University of Washington, United States of America; Yile Gu, Huck Yang, Prashanth Gurunath Shivakumar, Amazon Alexa AI, United States of America; Guan-Ting Lin, National Taiwan University, Taiwan; Jari Kolehmainen, Ivan Bulyko, Ankur Gandhe, Shalini Ghosh, Wael Hamza, Amazon Alexa AI, United States of America; Hung-yi Lee, National Taiwan University, Taiwan; Ariya Rastrow, Amazon Alexa AI, United States of America; Andreas Stolcke, Uniphore, United States of America
Contacts