Technical Program

Paper Detail

Paper ID	F-1-1.6
Paper Title	DEEP MULTILAYER PERCEPTRONS FOR DIMENSIONAL SPEECH EMOTION RECOGNITION
Authors	Bagus Tris Atmaja, JAIST, Japan; Masato Akagi, Japan Advanced Institute of Science and Technology, Japan
Session	F-1-1: Emotion, Dialect, and Age Recognition
Time	Tuesday, 08 December, 12:30 - 14:00
Presentation Time:	Tuesday, 08 December, 13:45 - 14:00 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Speech, Language, and Audio (SLA):
Abstract	Modern deep learning architectures are ordinarily performed in high-performance computing facilities due to the large size of their input features and complexity of their models. This paper proposes traditional multilayer perceptrons (MLP) with deep layers and small input sizes to tackle this computation requirement limitation. This paper compares the proposed deep MLP method to the more modern deep learning architectures with the same number of layers, batch size, and optimizer. The result shows that our proposed deep MLP outperformed modern deep learning architectures, i.e., LSTM and CNN, on the same number of layers and value of parameters. Both proposed and benchmark methods were optimized in the same way. The deep MLP exhibited the highest performance on both speaker-dependent and speaker-independent scenarios on IEMOCAP and MSP-IMPROV datasets.