Technical Program

Paper Detail

Paper ID	E-2-3.1
Paper Title	PRIVACY PRESERVING ACOUSTIC MODEL TRAINING FOR SPEECH RECOGNITION
Authors	Yuuki Tachioka, Denso IT Laboratory, Japan
Session	E-2-3: Speech Recognition
Time	Wednesday, 09 December, 17:15 - 19:15
Presentation Time:	Wednesday, 09 December, 17:15 - 17:30 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Speech, Language, and Audio (SLA):
Abstract	In-domain speech data significantly improve the speech recognition performance of acoustic models. However, the data may contain confidential information and exposure of transcriptions may lead to a breach in speakers' privacy. In addition, speaker identification can be problematic when speakers want to hide their membership of a certain group. Thus, the in-domain data must be deleted after its period of use. However, once the data are deleted, models cannot be updated for future architectures. Privacy preservation is necessary when retaining speech data; it is important that the transcriptions cannot be reconstructed and the speaker cannot be identified. This paper proposes a privacy preserving acoustic model training (PPAMT) method that satisfies these requirements and formulates the sensitivities of three features (n-grams, phoneme labels, and acoustic features) for PPAMT. A sensitivity analysis showed that phoneme labels and acoustic features were less susceptible to PPAMT than n-grams, which is optimal because accurate phoneme labels and acoustic features are needed for acoustic model training. Speech recognition experiments showed that the word error rate degradation by PPAMT was less than 0.6\% as a result of this property.