Technical Program

Paper Detail

Paper ID	D-1-1.3
Paper Title	Attentively-Coupled Long Short-Term Memory for Audio-Visual Emotion Recognition
Authors	Jia-Hao Hsu, Chung-Hsien Wu, National Cheng Kung University, Taiwan
Session	D-1-1: Image/Video Recognition
Time	Tuesday, 08 December, 12:30 - 14:00
Presentation Time:	Tuesday, 08 December, 13:00 - 13:15 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Image, Video, and Multimedia (IVM):
Abstract	There have been more and more studies on emotion recognition through multiple modalities. In the existing audio-visual emotion recognition methods, few studies focused on modeling emotional fluctuations in the signals. Besides, how to fuse multimodal signals, such as audio-visual signals, is still a challenging issue. In this paper, segments of audio-visual signals are extracted and considered as the recognition unit to characterize the emotional fluctuation. An Attentively-Coupled long-short term memory (ACLSTM) is proposed to combine the audio-based and visual-based LSTMs to improve the emotion recognition performance. In the Attentively-Coupled LSTM, the Coupled LSTM is used as the fusion model, and the neural tensor network (NTN) is employed for attention estimation to obtain the segment-based emotion consistency between audio and visual segments. Compared with previous approaches, the experimental results showed that the proposed method achieved the best results of 70.1% in multi-modal emotion recognition on the dataset BAUM-1.