Technical Program

Paper Detail

Paper IDF-3-1.3
Paper Title ENHANCEMENT OF SPEECH INTELLIGIBILITY UNDER NOISY REVERBERANT CONDITIONS BASED ON MODULATION SPECTRUM CONCEPT
Authors Thuanvan Ngo, Tuanvu Ho, Masashi Unoki, Japan Advanced Institute of Science and Technology, Japan; Rieko Kubo, National Institute of Information and Communications Technology, Japan; Masato Akagi, Japan Advanced Institute of Science and Technology, Japan
Session F-3-1: Speech Enhancement 3
TimeThursday, 10 December, 12:30 - 14:00
Presentation Time:Thursday, 10 December, 13:00 - 13:15 Check your Time Zone
All times are in New Zealand Time (UTC +13)
Topic Speech, Language, and Audio (SLA):
Abstract This study focuses on identifying effective features for controlling speech to increase speech intelligibility under adverse conditions. Previous methods either reduce noise and reverberation throughout speech presentation or enhance speech before presenting it by controlling its intensity and/or spectral properties to increase intelligibility. Among them, a method based on modulation transfer function theory, in which the environmental effects are inverted to anticipate attenuation of the modulation spectrum of speech, shows excellent potential due to its systematic and explicit derivation of intelligibility enhancement against environmental smears. However, directly obtaining that inversion requires estimating the modulation transfer function. The estimate seems complicated and tolerant under realistic variable conditions. This study takes a different approach: analyzing the relations of smeared modulation spectra by the environments for intelligibility to extract effective modifying features. First, we conduct listening tests for intelligibility in noise with different types of enhanced speech. Next, we extract acoustic and modulation frequency components in the smeared modulation spectra by noise showing high correlation with intelligibility scores. Finally, we examine the intelligibility benefits of modifying these components by performing listening tests. The results show that these components effectively increase intelligibility by at most 20%, which demonstrates that our concept is valid.