MO2.R5.3

Risk Bound on MDL Estimator for Simple ReLU Networks

Yoshinari Takeishi, Jun'ichi Takeuchi, Kyushu University, Japan

Session:
Estimation and Prediction

Track:
11: Information Theory and Statistics

Location:
Omikron I

Presentation Time:
Mon, 8 Jul, 12:30 - 12:50

Session Chair:
Osvaldo Simeone, King's College, London
Abstract
To investigate the theoretical foundations of deep learning from the view points of the minimum description length (MDL) principle, we analyse the risk bounds on MDL estimators for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we construct a two-stage code with small redundancy based on the fact that the eigenvalue distribution of the Fisher information matrix of the NNs is strongly biased, which was recently shown by Takeishi et al. (2023). This means that the MDL estimator induced by the two-stage code enjoys a tight upper bound on its risk, which is a direct consequence of the theory on MDL estimators originated by Barron and Cover (1991). The target NNs consist of $d$ nodes in the input layer, $p$ nodes in the hidden layer, and one output node. The object of estimation is only the $p$ weights from the hidden layer to the output node. In the context of the large-scale neural networks of interest to us, it is assumed that $p \gg d$. Note that the leading term of our risk bound is $O(d^2 \log n /n)$, independent of $p$.
Resources