Technical Program

Paper Detail

Paper IDF-2-2.6
Paper Title ANALYSIS OF BIT SEQUENCE REPRESENTATION FOR SOUND CLASSIFICATION
Authors Yikang Wang, Masaki Okawa, Hiromitsu Nishizaki, University of Yamanashi, Japan
Session F-2-2: Speaker Recognition 2, Sound Classification
TimeWednesday, 09 December, 15:30 - 17:00
Presentation Time:Wednesday, 09 December, 16:45 - 17:00 Check your Time Zone
All times are in New Zealand Time (UTC +13)
Topic Speech, Language, and Audio (SLA):
Abstract In sound classification, commonly used speech perceptual features, such as the Mel-frequency cepstral coefficient and the Mel-spectrogram, ignore information other than the frequency features in raw waveforms. We cannot conclude that these discarded parts are meaningless. To avoid missing information in the time series, we previously proposed the bit sequence representation, which maintained the temporal characteristics of the sound waveform and improved its performance over the original waveform. The present study validated our findings on three datasets, namely two datasets for music/speech classification and one for English speech classification. We also compared the classification performances when the features were not pre-processed with that when the maximum amplitude was restricted. As a result, we found that appropriately limiting the maximum amplitude is effective in improving the classification performance.