Technical Program

Paper Detail

Paper IDE-2-1.1
Paper Title PJS: PHONEME-BALANCED JAPANESE SINGING-VOICE CORPUS
Authors Junya Koguchi, Meiji University, Japan; Shinnosuke Takamichi, The University of Tokyo, Japan; Masanori Morise, Meji University, Japan
Session E-2-1: Music Information Processing 2, Voice Conversion
TimeWednesday, 09 December, 12:30 - 14:00
Presentation Time:Wednesday, 09 December, 12:30 - 12:45 Check your Time Zone
All times are in New Zealand Time (UTC +13)
Topic Speech, Language, and Audio (SLA):
Abstract This paper presents a free Japanese singing-voice corpus that can be used for highly applicable singing-voice synthesis research. A singing-voice corpus helps develop singing-voice synthesis, but existing corpora have two critical problems: data imbalance (i.e., singing-voice corpora do not guarantee phoneme balance, unlike speaking-voice corpora) and copyright issues (i.e., cannot legally share data). To avoid these problems, we constructed a phoneme-balanced Japanese singing-voice (PJS) corpus that guarantees phoneme balance and is licensed with CC BY-SA 4.0, and we composed melodies using a phoneme-balanced speaking-voice corpus. Furthermore, to temporally align phoneme sequences with speech feature sequences, we compare three alignment methods: Viterbi alignment of hidden Markov models, dynamic time warping using a synthesized voice, and statistical voice conversion. Experimental results demonstrate that 1) our corpus contains more unique monophones and diphones than an existing corpus, and 2) the voice-conversion-based method provides the most accurate alignment.