Technical Program

Paper Detail

Paper ID	F-2-1.2
Paper Title	SIGNIFICANCE OF CMVN FOR REPLAY SPOOF DETECTION
Authors	Ankur T. Patil, Hemant A. Patil, Dhirubhai Ambani Institute of Information and Communication Technology, India
Session	F-2-1: Speaker Recognition 1, Language Recognition
Time	Wednesday, 09 December, 12:30 - 14:00
Presentation Time:	Wednesday, 09 December, 12:45 - 13:00 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Speech, Language, and Audio (SLA):
Abstract	In this paper, significance of the Cepstral Meanand Variance Normalization (CMVN) is investigated for replaySpoofed Speech Detection (SSD) task. Literature shows thatapplication of the CMVN produces significantly better per-formance on many feature sets, which is counter-intuitive forreplay SSD task. This behaviour is analyzed by performingexperiments for environment-independent and dependent caseswith % Equal Error Rate (EER) as evaluation metric. Further-more, analysis is also performed with the help of estimatedprobability density functions (pdfs) of the genuinevs.spoofspeech feature representations. The experiments are performedon the publicly available and statistically meaningful ASVspoof2017 version-2 dataset using well-known CQCC-GMM and LFCC-GMM SSD systems. This dataset comprised of seven acoustic environmentsfor replay speech. This study reveals that performance of the SSDsystem is better with application of the CMVN on environment-independent case. Whereas performance degrades drastically onenvironment-dependent scenario with application of the CMVN.For this scenario, the CMVN suppresses the transmission channeldistortion, which is in fact the discriminative cues for the genuinevs.replay speech signal. This results in degradation of theperformance. However, for environment-independent scenario,CMVN scale down the variability in feature space across thedifferent environment, which improves the performance.