Technical Program

Paper Detail

Paper IDF-1-1.1
Paper Title Dialect-Aware Modeling for End-to-End Japanese Dialect Speech Recognition
Authors Ryo Imaizumi, Tokyo Metropolitan University, Japan; Ryo Masumura, Nippon Telegraph and Telephone Corporation, Japan; Sayaka Shiota, Hitoshi Kiya, Tokyo Metropolitan University, Japan
Session F-1-1: Emotion, Dialect, and Age Recognition
TimeTuesday, 08 December, 12:30 - 14:00
Presentation Time:Tuesday, 08 December, 12:30 - 12:45 Check your Time Zone
All times are in New Zealand Time (UTC +13)
Topic Speech, Language, and Audio (SLA):
Abstract In this paper, we present a novel technique for building end-to-end Japanese dialect automatic speech recognition (ASR) systems. It is known that ASR systems modeling for standard Japanese language are not suitable for recognizing Japanese dialects, which includes different accents and vocabulary from the standard Japanese language. Therefore, we aim to produce Japanese dialect-specific end-to-end ASR systems. Since it is difficult to collect massive speech-to-text paired data for each Japanese dialect, we utilize both dialect data and standard Japanese language one for constructing the dialect-specific end-to-end ASR systems. One primitive approach is a multi-condition modeling that simply merges the dialect data with the standard language one. However, the simple multi-condition modeling causes to capture inadequate dialect-specific characteristics because of mismatch between the dialects and the standard language. Thus, in order to produce reliable dialect-specific end-to-end ASR systems, we propose the dialect-aware modeling that utilizes dialect labels as auxiliary features. Main strength of the proposed method is to effectively utilize both the dialect data and the standard language one while capturing of the adequate dialect-specific characteristics. In our experiments using a home-made database of Japanese dialects, the proposed dialect-aware modeling outperformed the simple multi-condition modeling, and obtained the error reduction of 19.2%.