Technical Program

Paper Detail

Paper ID	E-1-2.5
Paper Title	TATUM-LEVEL DRUM TRANSCRIPTION BASED ON A CONVOLUTIONAL RECURRENT NEURAL NETWORK WITH LANGUAGE MODEL-BASED REGULARIZED TRAINING
Authors	Ryoto Ishizuka, Ryo Nishikimi, Eita Nakamura, Kazuyoshi Yoshii, Graduate School of Informatics, Kyoto University, Japan
Session	E-1-2: Music Information Processing 1, Audio Scene Classification
Time	Tuesday, 08 December, 15:30 - 17:00
Presentation Time:	Tuesday, 08 December, 16:30 - 16:45 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Speech, Language, and Audio (SLA):
Abstract	This paper describes a neural drum transcription method that detects from music signals the onset times of drums at the tatum level, where tatum times are assumed to be estimated in advance. In conventional studies on drum transcription, deep neural networks (DNNs) have often been used to take a music spectrogram as input and estimate the onset times of drums at the frame level. The major problem with such frame-to-frame DNNs, however, is that the estimated onset times do not often conform with the typical tatum-level patterns appearing in symbolic drum scores because the long-term musically meaningful structures of those patterns are difficult to learn at the frame level. To solve this problem, we propose a regularized training method for a frame-to-tatum DNN. In the proposed method, a tatum-level probabilistic language model (gated recurrent unit (GRU) network or repetition-aware bi-gram model) is trained from an extensive collection of drum scores. Given that the musical naturalness of tatum-level onset times can be evaluated by the language model, the frame-to-tatum DNN is trained with a regularizer based on the pretrained language model. The experimental results demonstrate the effectiveness of the proposed regularized training method.