Technical Program

Paper Detail

Paper IDD-1-1.3
Paper Title Attentively-Coupled Long Short-Term Memory for Audio-Visual Emotion Recognition
Authors Jia-Hao Hsu, Chung-Hsien Wu, National Cheng Kung University, Taiwan
Session D-1-1: Image/Video Recognition
TimeTuesday, 08 December, 12:30 - 14:00
Presentation Time:Tuesday, 08 December, 13:00 - 13:15 Check your Time Zone
All times are in New Zealand Time (UTC +13)
Topic Image, Video, and Multimedia (IVM):
Abstract There have been more and more studies on emotion recognition through multiple modalities. In the existing audio-visual emotion recognition methods, few studies focused on modeling emotional fluctuations in the signals. Besides, how to fuse multimodal signals, such as audio-visual signals, is still a challenging issue. In this paper, segments of audio-visual signals are extracted and considered as the recognition unit to characterize the emotional fluctuation. An Attentively-Coupled long-short term memory (ACLSTM) is proposed to combine the audio-based and visual-based LSTMs to improve the emotion recognition performance. In the Attentively-Coupled LSTM, the Coupled LSTM is used as the fusion model, and the neural tensor network (NTN) is employed for attention estimation to obtain the segment-based emotion consistency between audio and visual segments. Compared with previous approaches, the experimental results showed that the proposed method achieved the best results of 70.1% in multi-modal emotion recognition on the dataset BAUM-1.