Technical Program

Paper Detail

Paper ID	F-2-2.6
Paper Title	ANALYSIS OF BIT SEQUENCE REPRESENTATION FOR SOUND CLASSIFICATION
Authors	Yikang Wang, Masaki Okawa, Hiromitsu Nishizaki, University of Yamanashi, Japan
Session	F-2-2: Speaker Recognition 2, Sound Classification
Time	Wednesday, 09 December, 15:30 - 17:00
Presentation Time:	Wednesday, 09 December, 16:45 - 17:00 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Speech, Language, and Audio (SLA):
Abstract	In sound classification, commonly used speech perceptual features, such as the Mel-frequency cepstral coefficient and the Mel-spectrogram, ignore information other than the frequency features in raw waveforms. We cannot conclude that these discarded parts are meaningless. To avoid missing information in the time series, we previously proposed the bit sequence representation, which maintained the temporal characteristics of the sound waveform and improved its performance over the original waveform. The present study validated our findings on three datasets, namely two datasets for music/speech classification and one for English speech classification. We also compared the classification performances when the features were not pre-processed with that when the maximum amplitude was restricted. As a result, we found that appropriately limiting the maximum amplitude is effective in improving the classification performance.