SPE-L13: Speech Recognition: Representations and Embeddings |
Session Type: Lecture |
Time: Thursday, 7 May, 16:30 - 18:30 |
Location: On-Demand |
Virtual Session: View on Virtual Platform |
Session Chair: Karen Livescu, Toyota Technological Institute - Chicago |
SPE-L13.1: MULTILINGUAL ACOUSTIC WORD EMBEDDING MODELS FOR PROCESSING ZERO-RESOURCE LANGUAGES |
Herman Kamper; Stellenbosch University |
Yevgen Matusevych; University of Edinburgh |
Sharon Goldwater; University of Edinburgh |
SPE-L13.2: MOCKINGJAY: UNSUPERVISED SPEECH REPRESENTATION LEARNING WITH DEEP BIDIRECTIONAL TRANSFORMER ENCODERS |
Andy T. Liu; National Taiwan University |
Shu-wen Yang; National Taiwan University |
Po-Han Chi; National Taiwan University |
Po-chun Hsu; National Taiwan University |
Hung-yi Lee; National Taiwan University |
SPE-L13.3: RECURRENT NEURAL AUDIOVISUAL WORD EMBEDDINGS FOR SYNCHRONIZED SPEECH AND REAL-TIME MRI |
Öykü Deniz Köse; Boğaziçi University |
Murat Saraçlar; Boğaziçi University |
SPE-L13.4: DEEP CONTEXTUALIZED ACOUSTIC REPRESENTATIONS FOR SEMI-SUPERVISED SPEECH RECOGNITION |
Shaoshi Ling; Amazon, Inc. |
Yuzong Liu; Amazon, Inc. |
Julian Salazar; Amazon, Inc. |
Katrin Kirchhoff; Amazon, Inc. |
SPE-L13.5: WHAT DOES A NETWORK LAYER HEAR? ANALYZING HIDDEN REPRESENTATIONS OF END-TO-END ASR THROUGH SPEECH SYNTHESIS |
Chung-Yi Li; National Taiwan University |
Pei-Chieh Yuan; National Taiwan University |
Hung-Yi Lee; National Taiwan University |
SPE-L13.6: LEARNING A SUBWORD INVENTORY JOINTLY WITH END-TO-END AUTOMATIC SPEECH RECOGNTION |
Jennifer Drexler; Massachusetts Institute of Technology |
James Glass; Massachusetts Institute of Technology |