TH2.R1.3

Log-Concave Coupling for Sampling Neural Net Posteriors

Curtis McDonald, Andrew Barron, Yale University, United States

Session:
Sampling and Samplers

Track:
8: Machine Learning

Location:
Ballroom II & III

Presentation Time:
Thu, 11 Jul, 12:10 - 12:30

Session Chair:
Stefano Rini, National Yang Ming Chiao Tung University
Abstract
In this work, we present a sampling algorithm for single hidden layer neural networks. This algorithm is built upon a recursive series of Bayesian posteriors using a method we call Greedy Bayes. Sampling of the Bayesian posterior for neuron weight vectors w of dimension d is challenging because of its multimodality. Our algorithm to tackle this problem is based on a coupling of the posterior density for w with an auxiliary random variable. The resulting reverse conditional of neuron weights given auxiliary random variable is shown to be log concave. In the construction of the posterior distributions we provide some freedom in the choice of the prior. In particular, for Gaussian priors on w with suitably small variance, the resulting marginal density of the auxiliary variable is proven to be strictly log concave for all dimensions d. For a uniform prior on the unit l1 ball, evidence is given that the density of the auxiliary random variable is again strictly log concave for sufficiently large d. The score of the marginal density of the auxiliary random variable is determined by an expectation over the reverse conditional and thus can be computed by various rapidly mixing Markov Chain Monte Carlo methods. Moreover, the computation of the score of the auxiliary random variable permits methods of sampling by a stochastic diffusion (Langevin dynamics) with drift function built from this score. With such dynamics, information-theoretic methods pioneered by Bakry and Emery show that accurate sampling of the auxiliary random variable is obtained rapidly when its density is indeed strictly log-concave. After which, one more draw from the reverse conditional, produces neuron weights w whose marginal distribution is from the desired posterior.
Resources