Technical Program

Paper Detail

Paper ID	E-1-2.2
Paper Title	Deep Neural Network Modeling of Distortion Stomp Box Using Spectral Features
Authors	Kento Yoshimoto, Hiroki Kuroda, Daichi Kitahara, Akira Hirabayashi, Ritsumeikan University, Japan
Session	E-1-2: Music Information Processing 1, Audio Scene Classification
Time	Tuesday, 08 December, 15:30 - 17:00
Presentation Time:	Tuesday, 08 December, 15:45 - 16:00 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Speech, Language, and Audio (SLA):
Abstract	We propose a distortion stomp box modeling method using a deep neural network. A state-of-the-art method exploits a feedforward variant of the original autoregressive WaveNet. The modified WaveNet is trained so as to minimize a loss function defined by the normalized mean squared error between the high-pass filtered outputs. This method works well for stomp boxes with low distortion, but not for those with high distortion. To solve this problem, we propose a method using the same WaveNet, but a new loss function, which is defined by a weighted sum of errors in the time and frequency domains. The error in the time domain is the mean squared error without high-pass filtering. The error in the frequency domain is the generalized Kullback-Leibler(KL) divergence between spectrograms, which are given with a short-time Fourier transform (STFT) and a Mel filter bank. Numerical experiments using a stomp box with high distortion, the Ibanez SD9, show that the proposed method is capable of reproducing high-quality sounds compared with the state-of-the-art method especially for high-frequency components.