Paper ID | F-1-2.1 |
Paper Title |
LANGUAGE MODEL ADAPTATION FOR EMOTIONAL SPEECH RECOGNITION USING TWEET DATA |
Authors |
Kazuya Saeki, Masaharu Kato, Tetsuo Kosaka, Yamagata University, Japan |
Session |
F-1-2: Natural Language and Spoken Dialogue |
Time | Tuesday, 08 December, 15:30 - 17:00 |
Presentation Time: | Tuesday, 08 December, 15:30 - 15:45 Check your Time Zone |
|
All times are in New Zealand Time (UTC +13) |
Topic |
Speech, Language, and Audio (SLA): |
Abstract |
Generally, emotional speech recognition is considered more difficult than non-emotional speech recognition. This is because the acoustic features of emotional speech vary greatly depending on the emotion type and intensity. In addition, it is difficult to recognize colloquial expressions included in emotional utterances using a language model trained on a corpus such as lecture speech. We have been studying emotional speech recognition for an emotional speech corpus, Japanese Twitter-based emotional speech (JTES). In this study, we aim to improve the performance of emotional speech recognition for the JTES through language model adaptation, which will require a text corpus containing emotional expressions and colloquial expressions. However, there is no such large-scale Japanese corpus. To solve this problem, we propose a language model adaptation using tweet data. The sentences used for adaptation were extracted from the collected tweet data based on some rules. Following filtering based on these specified rules, a large amount of tweet data of 25.86M words could be obtained. In the recognition experiments, the baseline word error rate was 36.11%, whereas that of the language model adaptation was 25.68%. In addition, that of the combined use of the acoustic model adaptation and language model adaptation was 17.77%. |