TH2.R3.3

Optimal Stopping Rules for Best Arm Identification in Stochastic Bandits under Uniform Sampling

Vedang Gupta, Yash Gadhia, Shivaram Kalyanakrishnan, Nikhil Karamchandani, Indian Institute of Technology Bombay, India

Session:
Multi-Armed Bandits 2

Track:
8: Bandits

Location:
Ypsilon IV-V-VI

Presentation Time:
Thu, 11 Jul, 12:10 - 12:30

Session Chair:
Vincent Tan, National University of Singapore
Abstract
We consider the problem of best arm identification in stochastic multi-armed bandits, in the setting that each arm is sampled once in each round. This \textit{uniform sampling} regime is a conceptually simple setting that is relevant to many practical applications. The aim is to stop and correctly identify the best arm with probability at least $1 - \delta$, while keeping the number of rounds low. We derive a lower bound on the sample complexity for this setting. Thereafter, we propose two natural stopping rules for Bernoulli bandits: one based on PPR martingale confidence sequences, and the other based on the GLR statistic. Both rules are shown to match the lower bound as $\delta \to 0$. Our analysis and experiments suggest that the relative performance of the two stopping rules depends on a property of the bandit instance.
Resources