MO3.R2.2

Sharp information-theoretic thresholds for shuffled linear regression

Leon Lufkin, Yihong Wu, Yale University, United States; Jiaming Xu, Duke University, United States

Session:
Classification and Regression

Track:
8: Machine Learning

Location:
Ypsilon I-II-III

Presentation Time:
Mon, 8 Jul, 14:55 - 15:15

Session Chair:
Adam Krzyzak, Concordia University
Abstract
This paper studies the problem of shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation. Specifically, we consider the model $y = \Pi_* X \beta_* + w$, where $X$ is an $n \times d$ standard Gaussian design matrix, $w$ is Gaussian noise with entrywise variance $\sigma^2$, $\Pi_*$ is an unknown $n \times n$ permutation matrix, and $\beta_*$ is the regression coefficient, also unknown. Previous work has shown that, in the large $n$-limit, the minimal signal-to-noise ratio ($\mathsf{SNR}$), $\norm{\beta_*}^2/\sigma^2$, for recovering the unknown permutation exactly with high probability is between $n^2$ and $n^C$ for some absolute constant $C$ and the sharp threshold is unknown even for $d=1$. We show that this threshold is precisely $\mathsf{SNR}= n^4$ for exact recovery throughout the sublinear regime $d=o(n)$. As a by-product of our analysis, we also determine the sharp threshold of almost exact recovery to be $\mathsf{SNR}= n^2$, where all but a vanishing fraction of the permutation is reconstructed.
Resources