Paper ID | D-3-2.1 |
Paper Title |
DIVERSE AUDIO-TO-IMAGE GENERATION VIA SEMANTICS AND FEATURE CONSISTENCY |
Authors |
Pei-Tse Yang, Feng-Guang Su, Yu-Chiang Frank Wang, National Taiwan University, Taiwan |
Session |
D-3-2: Multimedia Analysis and Others |
Time | Thursday, 10 December, 15:30 - 17:15 |
Presentation Time: | Thursday, 10 December, 15:30 - 15:45 Check your Time Zone |
|
All times are in New Zealand Time (UTC +13) |
Topic |
Image, Video, and Multimedia (IVM): |
Abstract |
Humans are capable of imagining scene images when hearing ambient sounds. Therefore, audio-to-image synthesis becomes a challenging yet practical topic for both natural language comprehension and image content understanding. In this paper, we propose an audio-to-image generation network by applying the conditional generative adversarial networks. Specifically, we utilize such generative models with the proposed feature consistency and conditional adversarial losses, so that diverse image outputs with satisfactory visual quality can be synthesized from a single audio input. Experimental results on sports audio/visual data verify that the effectiveness and practicality of the proposed method over the state-of-the-art approaches on audio-to-image synthesis. |