Technical Program

Paper Detail

Paper IDC-3-2.2
Paper Title MPOP600: A MANDARIN POPULAR SONG DATABASE WITH ALIGNED AUDIO, LYRICS, AND MUSICAL SCORES FOR SINGING VOICE SYNTHESIS
Authors Chan-Chuan Chu, Fu-Rong Yang, Yi-Jhe Lee, Yi-Wen Liu, Shan-Hung Wu, National Tsing Hua University, Taiwan
Session C-3-2: Machine Learning and Data Analysis 2
TimeThursday, 10 December, 15:30 - 17:15
Presentation Time:Thursday, 10 December, 15:45 - 16:00 Check your Time Zone
All times are in New Zealand Time (UTC +13)
Topic Machine Learning and Data Analytics (MLDA):
Abstract The purpose of singing voice synthesis (SVS) is to generate human-like singing voice from lyrics and the corresponding musical score. Nowadays, mainstream SVS approaches rely on neural networks (NNs) which can map linguistic and musical contextual factors to acoustic features for producing audio outputs. For SVS in Mandarin or other Chinese languages in particular, a sufficiently large and adequately labeled database has not been publicly available. To proceed with Mandarin SVS research, we built a singing voice database from scratch, with 600 pop songs sung by 2 male and 2 female vocalists. Each audio contains single vocal only, without any background music. This paper describes the recording of the dataset and necessary steps of data preprocessing for training NNs to perform SVS. Several simple neural network architectures were adopted so preliminary SVS performance can be compared. Both subjective and objective evaluations show that these networks could learn from the MPop600 database to generate singing voice with unseen musical scores. MPop600 is available in both the MIDI and the MusicXML formats. In the future, we believe that more advanced and recently developed networks can be applied to model the singing behaviors in this database and help advance research in Mandarin SVS.