SS-P5: Multimodal Representation Learning for Language Generation and Understanding |
Session Type: Poster |
Time: Thursday, May 16, 15:30 - 17:30 |
Location: Poster Area E, Meeting Room 1A |
Session Chairs: Florian Metze, Carnegie Mellon University, Christian Fuegen, Facebook and Ramon Sanabria, Carnegie Mellon University
|
|
SS-P5.1: MODELS OF VISUALLY GROUNDED SPEECH SIGNAL PAY ATTENTION TO NOUNS: A BILINGUAL EXPERIMENT ON ENGLISH AND JAPANESE |
Manuscript Link: Click here to view manuscript on IEEE Xplore |
William Havard; LIG, Univ. Grenoble Alpes |
Jean-Pierre Chevrot; LIDILEM, Univ. Grenoble Alpes |
Laurent Besacier; LIG, Univ. Grenoble Alpes |
|
SS-P5.2: MULTIMODAL ONE-SHOT LEARNING OF SPEECH AND IMAGES |
Manuscript Link: Click here to view manuscript on IEEE Xplore |
Ryan Eloff; Stellenbosch University |
Herman Engelbrecht; Stellenbosch University |
Herman Kamper; Stellenbosch University |
|
SS-P5.3: LEARNING FROM MULTIVIEW CORRELATIONS IN OPEN-DOMAIN VIDEOS |
Manuscript Link: Click here to view manuscript on IEEE Xplore |
Nils Holzenberger; Johns Hopkins University |
Shruti Palaskar; Carnegie Mellon University |
Pranava Madhyastha; Imperial College London |
Florian Metze; Carnegie Mellon University |
Raman Arora; Johns Hopkins University |
|
SS-P5.4: WAV2PIX: SPEECH-CONDITIONED FACE GENERATION USING GENERATIVE ADVERSARIAL NETWORKS |
Manuscript Link: Click here to view manuscript on IEEE Xplore |
Amanda Duarte; Barcelona Supercomputing Center |
Francisco Roldan; Universitat Politecnica de Catalunya |
Miquel Tubau; Universitat Politecnica de Catalunya |
Janna Escur; Universitat Politecnica de Catalunya |
Santiago Pascual; Universitat Politecnica de Catalunya |
Amaia Salvador; Universitat Politecnica de Catalunya |
Eva Mohedano; Insight Centre for Data Analytics |
Kevin McGuinness; Insight Centre for Data Analytics |
Jordi Torres; Barcelona Supercomputing Center |
Xavier Giro-i-Nieto; Universitat Politecnica de Catalunya |
|
SS-P5.5: NEURAL CODES TO FACTOR LANGUAGE IN MULTILINGUAL SPEECH RECOGNITION |
Manuscript Link: Click here to view manuscript on IEEE Xplore |
Markus Müller; Karlsruhe Institute of Technology |
Sebastian Stüker; Karlsruhe Institute of Technology |
Alex Waibel; Karlsruhe Institute of Technology |
|
SS-P5.6: MULTIMODAL SPEAKER ADAPTATION OF ACOUSTIC MODEL AND LANGUAGE MODEL FOR ASR USING SPEAKER FACE EMBEDDING |
Manuscript Link: Click here to view manuscript on IEEE Xplore |
Yasufumi Moriya; Dublin City University |
Gareth Jones; Dublin City University |
|
SS-P5.7: MULTIMODAL GROUNDING FOR SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION |
Manuscript Link: Click here to view manuscript on IEEE Xplore |
Ozan Caglayan; Le Mans University |
Ramon Sanabria; Carnegie Mellon University |
Shruti Palaskar; Carnegie Mellon University |
Loïc Barrault; Le Mans University |
Florian Metze; Carnegie Mellon University |
|