IEEE ISIT 2024 || Athens, Greece || 7-12 July 2024

TH2.R3.3

Optimal Stopping Rules for Best Arm Identification in Stochastic Bandits under Uniform Sampling

Vedang Gupta, Yash Gadhia, Shivaram Kalyanakrishnan, Nikhil Karamchandani, Indian Institute of Technology Bombay, India

Session:

Multi-Armed Bandits 2

Track:

8: Bandits

Location:

Ypsilon IV-V-VI

Presentation Time:

Thu, 11 Jul, 12:10 - 12:30

Session Chair:

Vincent Tan, National University of Singapore

Abstract

We consider the problem of best arm identification in stochastic multi-armed bandits, in the setting that each arm is sampled once in each round. This \textit{uniform sampling} regime is a conceptually simple setting that is relevant to many practical applications. The aim is to stop and correctly identify the best arm with probability at least $1 - \delta$, while keeping the number of rounds low. We derive a lower bound on the sample complexity for this setting. Thereafter, we propose two natural stopping rules for Bernoulli bandits: one based on PPR martingale confidence sequences, and the other based on the GLR statistic. Both rules are shown to match the lower bound as $\delta \to 0$. Our analysis and experiments suggest that the relative performance of the two stopping rules depends on a property of the bandit instance.

Session TH2.R3

TH2.R3.1: BAI in Exponential Family: Efficiency and Optimality

Arpan Mukherjee, Ali Tajer, Rensselaer Polytechnic Institute, United States

TH2.R3.2: Best Arm Identification with Arm Erasures

Srinivas Reddy Kota, Indian Institute of Technology Madras, India; Karthik Periyapattana Narayana Prasad, Indian Institute of Technology Hyderabad, India; Vincent Y. F. Tan, National University of Singapore, Singapore

TH2.R3.3: Optimal Stopping Rules for Best Arm Identification in Stochastic Bandits under Uniform Sampling

Vedang Gupta, Yash Gadhia, Shivaram Kalyanakrishnan, Nikhil Karamchandani, Indian Institute of Technology Bombay, India

TH2.R3.4: Minimax Regret with Unbounded Weights

Michael Drmota, TU Wien, Austria; Philippe Jacquet, INRIA, France; Changlong Wu, Wojciech Szpankowski, Purdue University, United States

Resources

View Manuscript