Technical Program

Paper Detail

Paper ID	E-1-2.6
Paper Title	Deep Semantic Encoder-Decoder Network for Acoustic Scene Classification with Multiple Devices
Authors	Xinxin Ma, Jiangsu Normal University, China; Yunfei Shao, Tsinghua University, China; Yong Ma, Jiangsu Normal University, China; Wei-Qiang Zhang, Tsinghua University, China
Session	E-1-2: Music Information Processing 1, Audio Scene Classification
Time	Tuesday, 08 December, 15:30 - 17:00
Presentation Time:	Tuesday, 08 December, 16:45 - 17:00 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Speech, Language, and Audio (SLA):
Abstract	In this paper, we proposed Mini-SegNet, a simplified decoder-encoder SegNet model to capture deep semantic information in sound events. The semantic information can effectively discriminate the acoustic segments in different scenes. We also applied spectrum correction to combat mismatched frequency response. In order to prevent overfitting, we adopted mixup augmentation, ImageDataGenerator and temporal crop augmentation for data augmentation. Our best single system achieved an average accuracy of 65.15% on different devices in the DCASE2020 Development dataset, more than 10% improvement over the baseline system. The results indicate that our approach can achieve great classification performance, without use of any supplementary data from outside the official challenge dataset.