MO3.R9.1

Missing Mass under Random Duplications

Prafulla Chandra, Andrew Thangaraj, IIT Madras, India

Session:
Statistical Estimation and Detection

Track:
11: Information Theory and Statistics

Location:
Lamda

Presentation Time:
Mon, 8 Jul, 14:35 - 14:55

Session Chair:
Shirin Jalali, Rutgers
Abstract
Learning a distribution from error-prone or non-ideal sampling modeled as "duplication" or "sticky" channels has been of interest recently inspired by applications such as DNA computing. Missing mass, the total probability of missing letters, is an important quantity that plays a crucial role in distribution estimation, specifically in the large alphabet setting. In this work, we consider the problem of estimation of missing mass, which has been well-understood under independent and identically distributed (i.i.d) sampling, in the case when the samples involve duplications. Precisely, we consider the setting where each sample from an unknown distribution gets repeated a Bernoulli-distributed number of times creating a hidden Markov memory in the samples. We characterize the minimax rate of Mean Squared Error (MSE) of estimating missing mass from such elementary duplication sampling channels. An upper bound on the minimax rate is obtained by bounding the risk of a proposed estimator. We derive a matching lower bound by reduction to the i.i.d setting.
Resources