Paper ID | F-1-3.6 |
Paper Title |
A DEEP LEARNING-BASED TIME-DOMAIN APPROACH FOR NON-INTRUSIVE SPEECH QUALITY ASSESSMENT |
Authors |
Xupeng Jia, Dongmei Li, Tsinghua University, China |
Session |
F-1-3: Speech Enhancement 1 |
Time | Tuesday, 08 December, 17:15 - 19:15 |
Presentation Time: | Tuesday, 08 December, 18:30 - 18:45 Check your Time Zone |
|
All times are in New Zealand Time (UTC +13) |
Topic |
Speech, Language, and Audio (SLA): |
Abstract |
Objective speech quality assessment is an important component in speech processing systems. It can serve not only as an evaluation metric but also as a loss function in some deep learning-based systems. In this work, a novel deep learning-based non-intrusive speech quality assessment approach is proposed. Instead of using manually designed features or magnitude spectrum as input, the proposed method directly works on the time-domain waveform. The perceptual evaluation of speech quality (PESQ) is used as the learning target, and the network structure is designed referring to the PESQ calculation procedure. Multi-task training strategy is employed to optimize the network. Experimental results show that the proposed approach can yield high correlation to PESQ in both matched and unmatched situations. The proposed method can also be used as a non-intrusive estimation model for other speech quality or intelligibility assessment methods, such as the short time objective intelligibility (STOI). |