Technical Program

Paper Detail

Paper ID	C-3-3.2
Paper Title	Closed-Form Pre-Training for Small-Sample Environmental Sound Recognition
Authors	Nakamasa Inoue, Keita Goto, Tokyo Institute of Technology, Japan
Session	C-3-3: Machine Learning for Small-sample Data Analysis
Time	Thursday, 10 December, 17:30 - 19:30
Presentation Time:	Thursday, 10 December, 17:45 - 18:00 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Machine Learning and Data Analytics (MLDA): Special Session: Machine Learning for Small-sample Data Analysis
Abstract	This paper presents a new framework for pre-training neural networks, namely closed-form pre-training, and we apply it to small-sample environmental sound recognition. Our main idea is to pre-train neural networks on a dataset automatically generated by some formulas, without any prior real-world recordings or manual annotation. Specifically, the proposed framework consists of two steps. First, an audio classification dataset is generated. Here, we propose three types of dataset definitions using colored noise and its extensions. Second, a network is pre-trained on the generated dataset. The obtained pre-trained network is particularly effective for fine-tuning with few examples because it helps optimization methods avoid falling into a premature local optimal solution. In experiments, we demonstrate the effectiveness of the proposed framework for small-sample environmental sound recognition on three datasets: ESC-10, ESC-50, and UrbanSound8K. We obtained performance improvement on all datasets with a small number of training samples.