Paper ID | D-1-1.3 |
Paper Title |
Attentively-Coupled Long Short-Term Memory for Audio-Visual Emotion Recognition |
Authors |
Jia-Hao Hsu, Chung-Hsien Wu, National Cheng Kung University, Taiwan |
Session |
D-1-1: Image/Video Recognition |
Time | Tuesday, 08 December, 12:30 - 14:00 |
Presentation Time: | Tuesday, 08 December, 13:00 - 13:15 Check your Time Zone |
|
All times are in New Zealand Time (UTC +13) |
Topic |
Image, Video, and Multimedia (IVM): |
Abstract |
There have been more and more studies on emotion recognition through multiple modalities. In the existing audio-visual emotion recognition methods, few studies focused on modeling emotional fluctuations in the signals. Besides, how to fuse multimodal signals, such as audio-visual signals, is still a challenging issue. In this paper, segments of audio-visual signals are extracted and considered as the recognition unit to characterize the emotional fluctuation. An Attentively-Coupled long-short term memory (ACLSTM) is proposed to combine the audio-based and visual-based LSTMs to improve the emotion recognition performance. In the Attentively-Coupled LSTM, the Coupled LSTM is used as the fusion model, and the neural tensor network (NTN) is employed for attention estimation to obtain the segment-based emotion consistency between audio and visual segments. Compared with previous approaches, the experimental results showed that the proposed method achieved the best results of 70.1% in multi-modal emotion recognition on the dataset BAUM-1. |