Paper ID | D-1-1.5 |
Paper Title |
3D SKELETAL MOVEMENT ENHANCED EMOTION RECOGNITION NETWORK |
Authors |
Jiaqi Shi, Osaka University, Japan; Chaoran Liu, Carlos Toshinori Ishi, Advanced Telecommunications Research Institute International, Japan; Hiroshi Ishiguro, Osaka University, Japan |
Session |
D-1-1: Image/Video Recognition |
Time | Tuesday, 08 December, 12:30 - 14:00 |
Presentation Time: | Tuesday, 08 December, 13:30 - 13:45 Check your Time Zone |
|
All times are in New Zealand Time (UTC +13) |
Topic |
Image, Video, and Multimedia (IVM): |
Abstract |
Automatic emotion recognition has become an important trend in the field of human-computer natural interaction and artificial intelligence. Although gesture is one of the most important components of nonverbal communication, which has a considerable impact on emotion recognition, motion modalities are rarely considered in the study of affective computing. An important reason is the lack of large open emotion databases containing skeletal movement data. In this paper, we extract 3D skeleton information from video, and apply the method to IEMOCAP database to add a new modality. We propose an attention based convolutional neural network which takes the extracted data as input to predict the speaker's emotion state. We also combine our model with models using other modalities to provide complementary information in the emotion classification task. The combined model utilizes audio signals, text information and skeletal data simultaneously. The performance of the model significantly outperforms the bimodal model, proving the effectiveness of the method. |