Technical Program

Paper Detail

Paper IDF-1-3.6
Paper Title A DEEP LEARNING-BASED TIME-DOMAIN APPROACH FOR NON-INTRUSIVE SPEECH QUALITY ASSESSMENT
Authors Xupeng Jia, Dongmei Li, Tsinghua University, China
Session F-1-3: Speech Enhancement 1
TimeTuesday, 08 December, 17:15 - 19:15
Presentation Time:Tuesday, 08 December, 18:30 - 18:45 Check your Time Zone
All times are in New Zealand Time (UTC +13)
Topic Speech, Language, and Audio (SLA):
Abstract Objective speech quality assessment is an important component in speech processing systems. It can serve not only as an evaluation metric but also as a loss function in some deep learning-based systems. In this work, a novel deep learning-based non-intrusive speech quality assessment approach is proposed. Instead of using manually designed features or magnitude spectrum as input, the proposed method directly works on the time-domain waveform. The perceptual evaluation of speech quality (PESQ) is used as the learning target, and the network structure is designed referring to the PESQ calculation procedure. Multi-task training strategy is employed to optimize the network. Experimental results show that the proposed approach can yield high correlation to PESQ in both matched and unmatched situations. The proposed method can also be used as a non-intrusive estimation model for other speech quality or intelligibility assessment methods, such as the short time objective intelligibility (STOI).