SS-L22: Efficient Modeling of Long Sequences with Applications to Speech and Audio
Fri, 19 Apr, 13:10 - 15:10 (UTC +9)
Location: Room 104
Session Type: Lecture
Session Co-Chairs: Roshan Sharma, Google and Suyoun Kim, Meta
Track: Special Sessions
Click the to view the manuscript on IEEE Xplore Open Preview
Fri, 19 Apr, 13:10 - 13:30 (UTC +9)
 

SS-L22.1: Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing

William Chen, Carnegie Mellon University, United States of America; Takatomo Kano, Atsunori Ogawa, Marc Delcroix, NTT Corporation Japan, Japan; Shinji Watanabe, Carnegie Mellon University, United States of America
Fri, 19 Apr, 13:30 - 13:50 (UTC +9)
 

SS-L22.2: UPDATED CORPORA AND BENCHMARKS FOR LONG-FORM SPEECH RECOGNITION

Jennifer Drexler Fox, Rev.com, United States of America; Desh Raj, Johns Hopkins University, United States of America; Natalie Delworth, Quinn McNamara, Corey Miller, Miguel Jette, Rev.com, United States of America
Fri, 19 Apr, 13:50 - 14:10 (UTC +9)
 

SS-L22.3: MULTILINGUAL AND FULLY NON-AUTOREGRESSIVE ASR WITH LARGE LANGUAGE MODEL FUSION: A COMPREHENSIVE STUDY

W. Ronny Huang, Cyril Allauzen, Tongzhou Chen, Kilol Gupta, Ke Hu, James Qin, Yu Zhang, Yongqiang Wang, Shuo-yiin Chang, Tara Sainath, Google LLC, United States of America
Fri, 19 Apr, 14:10 - 14:30 (UTC +9)
 

SS-L22.4: DIALOG MODELING IN AUDIOBOOK SYNTHESIS

Cheng-chieh Yeh, Amirreza Shirani, Weicheng Zhang, Tuomo Raitio, Ramya Rasipuram, Ladan Golipour, David Winarsky, Apple, United States of America
Fri, 19 Apr, 14:30 - 14:50 (UTC +9)
 

SS-L22.5: Investigating End-to-end ASR Architectures for Long form Audio Transcription

Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg, NVIDIA, United States of America
Fri, 19 Apr, 14:50 - 15:10 (UTC +9)
 

SS-L22.6: CORAAL QA: A Dataset and Framework for Open Domain Spontaneous Speech Question Answering from Long Audio Files

Natarajan Balaji Shankar, Alexander Johnson, Christina Chance, Hariram Veeramani, Abeer Alwan, University of California Los Angeles, United States of America