Technical Program

Paper Detail

Paper IDE-2-1.3
Paper Title A VARIATIONAL AUTOENCODER FOR JOINT CHORD AND KEY ESTIMATION FROM AUDIO CHROMAGRAMS
Authors Yiming Wu, Eita Nakamura, Kazuyoshi Yoshii, Kyoto University, Japan
Session E-2-1: Music Information Processing 2, Voice Conversion
TimeWednesday, 09 December, 12:30 - 14:00
Presentation Time:Wednesday, 09 December, 13:00 - 13:15 Check your Time Zone
All times are in New Zealand Time (UTC +13)
Topic Speech, Language, and Audio (SLA):
Abstract This paper describes a deep generative approach to jointly estimating chords and keys from music signals. Although deep neural networks have widely been used for estimating various kinds of musical elements, joint estimation of multiple kinds of musical elements has scarcely been investigated so far. Given the mutual dependency between keys and chords, which both describe the harmonic content of music, we propose to use a unified deep classification model for jointly estimating chords and keys. At the heart of our study is the integration of supervised multi-task learning with unsupervised variational autoencoding for achieving improved performance and semi-supervised learning. Specifically, we formulate a deep latent-variable model that represents the generative process of chroma vectors from discrete key classes, chord classes, and continuous latent features. The deep classification model and another deep recognition model are then introduced for inferring keys, chords, and latent features from chroma vectors. These three models are trained jointly in a (semi-)supervised manner, where the generative model acts as a regularizer for the classification model. The experimental results show that the multi-task learning improves the consistency between estimated keys and chords and that the autoencoding-based regularization significantly improves the estimation performance.