Paper ID | E-1-2.2 |
Paper Title |
Deep Neural Network Modeling of Distortion Stomp Box Using Spectral Features |
Authors |
Kento Yoshimoto, Hiroki Kuroda, Daichi Kitahara, Akira Hirabayashi, Ritsumeikan University, Japan |
Session |
E-1-2: Music Information Processing 1, Audio Scene Classification |
Time | Tuesday, 08 December, 15:30 - 17:00 |
Presentation Time: | Tuesday, 08 December, 15:45 - 16:00 Check your Time Zone |
|
All times are in New Zealand Time (UTC +13) |
Topic |
Speech, Language, and Audio (SLA): |
Abstract |
We propose a distortion stomp box modeling method using a deep neural network. A state-of-the-art method exploits a feedforward variant of the original autoregressive WaveNet. The modified WaveNet is trained so as to minimize a loss function defined by the normalized mean squared error between the high-pass filtered outputs. This method works well for stomp boxes with low distortion, but not for those with high distortion. To solve this problem, we propose a method using the same WaveNet, but a new loss function, which is defined by a weighted sum of errors in the time and frequency domains. The error in the time domain is the mean squared error without high-pass filtering. The error in the frequency domain is the generalized Kullback-Leibler(KL) divergence between spectrograms, which are given with a short-time Fourier transform (STFT) and a Mel filter bank. Numerical experiments using a stomp box with high distortion, the Ibanez SD9, show that the proposed method is capable of reproducing high-quality sounds compared with the state-of-the-art method especially for high-frequency components. |