SS-L22.5

Investigating End-to-end ASR Architectures for Long form Audio Transcription

Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg, NVIDIA, United States of America

Session:
SS-L22: Efficient Modeling of Long Sequences with Applications to Speech and Audio Lecture

Track:
Special Sessions

Location:
Room 104

Presentation Time:
Fri, 19 Apr, 14:30 - 14:50 (UTC +9)

Session Co-Chairs:
Roshan Sharma, Google and Suyoun Kim, Meta
View Manuscript
Presentation
Discussion
Resources
Session SS-L22
SS-L22.1: Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing
William Chen, Carnegie Mellon University, United States of America; Takatomo Kano, Atsunori Ogawa, Marc Delcroix, NTT Corporation Japan, Japan; Shinji Watanabe, Carnegie Mellon University, United States of America
SS-L22.2: UPDATED CORPORA AND BENCHMARKS FOR LONG-FORM SPEECH RECOGNITION
Jennifer Drexler Fox, Rev.com, United States of America; Desh Raj, Johns Hopkins University, United States of America; Natalie Delworth, Quinn McNamara, Corey Miller, Miguel Jette, Rev.com, United States of America
SS-L22.3: MULTILINGUAL AND FULLY NON-AUTOREGRESSIVE ASR WITH LARGE LANGUAGE MODEL FUSION: A COMPREHENSIVE STUDY
W. Ronny Huang, Cyril Allauzen, Tongzhou Chen, Kilol Gupta, Ke Hu, James Qin, Yu Zhang, Yongqiang Wang, Shuo-yiin Chang, Tara Sainath, Google LLC, United States of America
SS-L22.4: DIALOG MODELING IN AUDIOBOOK SYNTHESIS
Cheng-chieh Yeh, Amirreza Shirani, Weicheng Zhang, Tuomo Raitio, Ramya Rasipuram, Ladan Golipour, David Winarsky, Apple, United States of America
SS-L22.5: Investigating End-to-end ASR Architectures for Long form Audio Transcription
Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg, NVIDIA, United States of America
SS-L22.6: CORAAL QA: A Dataset and Framework for Open Domain Spontaneous Speech Question Answering from Long Audio Files
Natarajan Balaji Shankar, Alexander Johnson, Christina Chance, Hariram Veeramani, Abeer Alwan, University of California Los Angeles, United States of America
Contacts