Technical Program

Paper Detail

Paper ID	F-1-1.5
Paper Title	SPEAKER AGE ESTIMATION USING AGE-DEPENDENT INSENSITIVE LOSS
Authors	Yuki Kitagishi, Hosana Kamiyama, Atsushi Ando, Naohiro Tawara, Takeshi Mori, Satoshi Kobashikawa, NTT, Japan
Session	F-1-1: Emotion, Dialect, and Age Recognition
Time	Tuesday, 08 December, 12:30 - 14:00
Presentation Time:	Tuesday, 08 December, 13:30 - 13:45 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Speech, Language, and Audio (SLA):
Abstract	This paper proposes a new speaker age estimation method that uses an age-dependent insensitive loss. Most conventional speaker age estimation frameworks ignore the ambiguity of a perceptual speaker age. These "over-sensitive" frameworks can cause critical errors far from the actual age. We propose an age-dependent insensitive loss for speaker age estimation. The key idea of the proposed method is that the age estimator should allow some ambiguity of age-labels and this ambiguity should depend on age. The age-dependent insensitivity is learned by ε-MAE (mean absolute error) loss and soft target cross entropy loss in regression and classification problems. Experimental results showed that the proposed method improves the mean absolute error and the ratio of critical error by 5.2% and 5.7% for the regression problem and 9.6% and 31.5% for the classification problem.