SS-P5: Multimodal Representation Learning for Language Generation and Understanding |
| Session Type: Poster |
| Time: Thursday, May 16, 15:30 - 17:30 |
| Location: Poster Area E, Meeting Room 1A |
| Session Chairs: Florian Metze, Carnegie Mellon University, Christian Fuegen, Facebook and Ramon Sanabria, Carnegie Mellon University
|
| |
| SS-P5.1: MODELS OF VISUALLY GROUNDED SPEECH SIGNAL PAY ATTENTION TO NOUNS: A BILINGUAL EXPERIMENT ON ENGLISH AND JAPANESE |
| William Havard; LIG, Univ. Grenoble Alpes |
| Jean-Pierre Chevrot; LIDILEM, Univ. Grenoble Alpes |
| Laurent Besacier; LIG, Univ. Grenoble Alpes |
| |
| SS-P5.2: MULTIMODAL ONE-SHOT LEARNING OF SPEECH AND IMAGES |
| Ryan Eloff; Stellenbosch University |
| Herman Engelbrecht; Stellenbosch University |
| Herman Kamper; Stellenbosch University |
| |
| SS-P5.3: LEARNING FROM MULTIVIEW CORRELATIONS IN OPEN-DOMAIN VIDEOS |
| Nils Holzenberger; Johns Hopkins University |
| Shruti Palaskar; Carnegie Mellon University |
| Pranava Madhyastha; Imperial College London |
| Florian Metze; Carnegie Mellon University |
| Raman Arora; Johns Hopkins University |
| |
| SS-P5.4: WAV2PIX: SPEECH-CONDITIONED FACE GENERATION USING GENERATIVE ADVERSARIAL NETWORKS |
| Amanda Duarte; Barcelona Supercomputing Center |
| Francisco Roldan; Universitat Politecnica de Catalunya |
| Miquel Tubau; Universitat Politecnica de Catalunya |
| Janna Escur; Universitat Politecnica de Catalunya |
| Santiago Pascual; Universitat Politecnica de Catalunya |
| Amaia Salvador; Universitat Politecnica de Catalunya |
| Eva Mohedano; Insight Centre for Data Analytics |
| Kevin McGuinness; Insight Centre for Data Analytics |
| Jordi Torres; Barcelona Supercomputing Center |
| Xavier Giro-i-Nieto; Universitat Politecnica de Catalunya |
| |
| SS-P5.5: NEURAL CODES TO FACTOR LANGUAGE IN MULTILINGUAL SPEECH RECOGNITION |
| Markus Müller; Karlsruhe Institute of Technology |
| Sebastian Stüker; Karlsruhe Institute of Technology |
| Alex Waibel; Karlsruhe Institute of Technology |
| |
| SS-P5.6: MULTIMODAL SPEAKER ADAPTATION OF ACOUSTIC MODEL AND LANGUAGE MODEL FOR ASR USING SPEAKER FACE EMBEDDING |
| Yasufumi Moriya; Dublin City University |
| Gareth Jones; Dublin City University |
| |
| SS-P5.7: MULTIMODAL GROUNDING FOR SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION |
| Ozan Caglayan; Le Mans University |
| Ramon Sanabria; Carnegie Mellon University |
| Shruti Palaskar; Carnegie Mellon University |
| Loïc Barrault; Le Mans University |
| Florian Metze; Carnegie Mellon University |
| |