Paper ID | F-2-1.2 |
Paper Title |
SIGNIFICANCE OF CMVN FOR REPLAY SPOOF DETECTION |
Authors |
Ankur T. Patil, Hemant A. Patil, Dhirubhai Ambani Institute of Information and Communication Technology, India |
Session |
F-2-1: Speaker Recognition 1, Language Recognition |
Time | Wednesday, 09 December, 12:30 - 14:00 |
Presentation Time: | Wednesday, 09 December, 12:45 - 13:00 Check your Time Zone |
|
All times are in New Zealand Time (UTC +13) |
Topic |
Speech, Language, and Audio (SLA): |
Abstract |
In this paper, significance of the Cepstral Meanand Variance Normalization (CMVN) is investigated for replaySpoofed Speech Detection (SSD) task. Literature shows thatapplication of the CMVN produces significantly better per-formance on many feature sets, which is counter-intuitive forreplay SSD task. This behaviour is analyzed by performingexperiments for environment-independent and dependent caseswith % Equal Error Rate (EER) as evaluation metric. Further-more, analysis is also performed with the help of estimatedprobability density functions (pdfs) of the genuinevs.spoofspeech feature representations. The experiments are performedon the publicly available and statistically meaningful ASVspoof2017 version-2 dataset using well-known CQCC-GMM and LFCC-GMM SSD systems. This dataset comprised of seven acoustic environmentsfor replay speech. This study reveals that performance of the SSDsystem is better with application of the CMVN on environment-independent case. Whereas performance degrades drastically onenvironment-dependent scenario with application of the CMVN.For this scenario, the CMVN suppresses the transmission channeldistortion, which is in fact the discriminative cues for the genuinevs.replay speech signal. This results in degradation of theperformance. However, for environment-independent scenario,CMVN scale down the variability in feature space across thedifferent environment, which improves the performance. |