SPE-L13: Speech Recognition: Representations and Embeddings |
| Session Type: Lecture |
| Time: Thursday, 7 May, 16:30 - 18:30 |
| Location: On-Demand |
| Virtual Session: View on Virtual Platform |
| Session Chair: Karen Livescu, Toyota Technological Institute - Chicago |
| SPE-L13.1: MULTILINGUAL ACOUSTIC WORD EMBEDDING MODELS FOR PROCESSING ZERO-RESOURCE LANGUAGES |
| Herman Kamper; Stellenbosch University |
| Yevgen Matusevych; University of Edinburgh |
| Sharon Goldwater; University of Edinburgh |
| SPE-L13.2: MOCKINGJAY: UNSUPERVISED SPEECH REPRESENTATION LEARNING WITH DEEP BIDIRECTIONAL TRANSFORMER ENCODERS |
| Andy T. Liu; National Taiwan University |
| Shu-wen Yang; National Taiwan University |
| Po-Han Chi; National Taiwan University |
| Po-chun Hsu; National Taiwan University |
| Hung-yi Lee; National Taiwan University |
| SPE-L13.3: RECURRENT NEURAL AUDIOVISUAL WORD EMBEDDINGS FOR SYNCHRONIZED SPEECH AND REAL-TIME MRI |
| Öykü Deniz Köse; Boğaziçi University |
| Murat Saraçlar; Boğaziçi University |
| SPE-L13.4: DEEP CONTEXTUALIZED ACOUSTIC REPRESENTATIONS FOR SEMI-SUPERVISED SPEECH RECOGNITION |
| Shaoshi Ling; Amazon, Inc. |
| Yuzong Liu; Amazon, Inc. |
| Julian Salazar; Amazon, Inc. |
| Katrin Kirchhoff; Amazon, Inc. |
| SPE-L13.5: WHAT DOES A NETWORK LAYER HEAR? ANALYZING HIDDEN REPRESENTATIONS OF END-TO-END ASR THROUGH SPEECH SYNTHESIS |
| Chung-Yi Li; National Taiwan University |
| Pei-Chieh Yuan; National Taiwan University |
| Hung-Yi Lee; National Taiwan University |
| SPE-L13.6: LEARNING A SUBWORD INVENTORY JOINTLY WITH END-TO-END AUTOMATIC SPEECH RECOGNTION |
| Jennifer Drexler; Massachusetts Institute of Technology |
| James Glass; Massachusetts Institute of Technology |