Technical Program

Paper Detail

Paper ID	F-3-3.5
Paper Title	DEEP NEURAL NETWORK COMPRESSION WITH KNOWLEDGE DISTILLATION USING CROSS-LAYER MATRIX, KL DIVERGENCE AND OFFLINE ENSEMBLE
Authors	Hsing-Hung Chou, Ching-Te Chiu, Yi-Ping Liao, National Tsing Hua University, Taiwan
Session	F-3-3: Signal Processing Systems for AI
Time	Thursday, 10 December, 17:30 - 19:30
Presentation Time:	Thursday, 10 December, 18:30 - 18:45 Check your Time Zone
	All times are in New Zealand Time (UTC +13)
Topic	Signal Processing Systems: Design and Implementation (SPS):
Abstract	Knowledge Distillation is one approach in Deep Neural Networks (DNN) to compress huge parameters and high level of computation associated with a teacher model to a smaller student model. Therefore, the smaller model can be deployed in embedded systems. Most of Knowledge Distillations transfer information at the last stage of the DNN model. We propose an efficient compression method that can be split into three parts. First, we propose a cross-layer Gramian matrix to extract more features from the teacher’s model. Second, we adopt Kullback Leibler (KL) Divergence in an offline deep mutual learning (DML) environment to make the student model find a wider robust minimum. Finally, we propose the use of offline ensemble pre-trained teachers to teach a student model. With ResNet-32 as the teacher’s model and ResNet-8 as the student’s model, experimental results showed that Top- 1 accuracy increased by 4.38% with a 6.11x compression rate and 5.27x computation rate.