IEEE ISIT 2024 || Athens, Greece || 7-12 July 2024

MO3.R2.1

Effect of Weight Quantization on Learning Models by Typical Case Analysis

Shuhei Kashiwamura, The University of Tokyo, Japan; Ayaka Sakata, The Institute of Statistical Mathematics, Japan; Masaaki Imaizumi, The University of Tokyo, Japan

Session:

Classification and Regression

Track:

8: Machine Learning

Location:

Ypsilon I-II-III

Presentation Time:

Mon, 8 Jul, 14:35 - 14:55

Session Chair:

Adam Krzyzak, Concordia University

Abstract

This paper examines the quantization methods used in large-scale data analysis models and their hyperparameter choices. The recent surge in data analysis scale has significantly increased computational resource requirements. To address this, quantizing model weights has become a prevalent practice in data analysis applications such as deep learning. Quantization is particularly vital for deploying large models on devices with limited computational resources. However, the selection of quantization hyperparameters, like the number of bits and value range for weight quantization, remains an underexplored area. In this study, we employ the typical case analysis from statistical physics, specifically the replica method, to explore the impact of hyperparameters on the quantization of simple learning models. Our analysis yields three key findings: (i) an unstable hyperparameter phase, known as replica symmetry breaking, occurs with a small number of bits and a large quantization width; (ii) there is an optimal quantization width that minimizes error; and (iii) quantization delays the onset of overparameterization, which mitigate overfitting as indicated by the double descent phenomenon. We also discover that non-uniform quantization can enhance stability. Additionally, we develop an approximate message-passing algorithm to validate our theoretical results.

Session MO3.R2

MO3.R2.1: Effect of Weight Quantization on Learning Models by Typical Case Analysis

Shuhei Kashiwamura, The University of Tokyo, Japan; Ayaka Sakata, The Institute of Statistical Mathematics, Japan; Masaaki Imaizumi, The University of Tokyo, Japan

MO3.R2.2: Sharp information-theoretic thresholds for shuffled linear regression

Leon Lufkin, Yihong Wu, Yale University, United States; Jiaming Xu, Duke University, United States

MO3.R2.3: Data-Driven Estimation of the False Positive Rate of the Bayes Binary Classifier via Soft Labels

Minoh Jeong, Martina Cardone, University of Minnesota, United States; Alex Dytso, Qualcomm Flarion Technology, Inc., United States

MO3.R2.4: Rate of convergence of an over-parametrized convolutional neural network image classifier learned by gradient descent

Michael Kohler, Technical University of Darmstadt, Germany; Adam Krzyzak, Concordia University, Canada; Benjamin Walter, Technical University of Darmstadt, Germany

Resources

View Manuscript