IEEE ISIT 2024 || Athens, Greece || 7-12 July 2024

TH2.R3.2

Best Arm Identification with Arm Erasures

Srinivas Reddy Kota, Indian Institute of Technology Madras, India; Karthik Periyapattana Narayana Prasad, Indian Institute of Technology Hyderabad, India; Vincent Y. F. Tan, National University of Singapore, Singapore

Session:

Multi-Armed Bandits 2

Track:

8: Bandits

Location:

Ypsilon IV-V-VI

Presentation Time:

Thu, 11 Jul, 11:50 - 12:10

Session Chair:

Vincent Tan, National University of Singapore

Abstract

In this paper, we address the problem of best arm identification (BAI) with arm erasures in a multi-armed bandit setting with finitely many arms. A {\em learner} who seeks to identify the best arm---the arm with the largest mean reward---samples arms sequentially, one at each time instant, and communicates the sampled arm to an {\em agent} through an erasure channel with a known erasure probability $\varepsilon \in (0,1)$. The learner {\em does not} receive any erasure feedback, and hence does not know whether the transmitted arm was erased by the channel. In instances where erasure does not occur, and the transmitted arm is successfully received by the agent, the agent promptly pulls the received arm. On the contrary, when erasure occurs, we analyse the following two distinct scenarios: (a) the agent randomly selects an arm, and (b) the agent selects the most recent successfully received arm. We assume that the instantaneous reward from the pulled arm is available to the learner, whose objective is to find the best arm as quickly as possible, subject to an upper bound on the error probability. Given $\delta \in (0,1)$, we derive a problem-dependent lower bound on the expected stopping time of any algorithm whose error probability is within $\delta$. We also propose two successive elimination algorithms for each of the aforementioned scenarios (a), (b), and provide upper bounds on their stopping times that hold with probability $1-\delta$. To our best knowledge, this is the first work on BAI with arm erasures.

Session TH2.R3

TH2.R3.1: BAI in Exponential Family: Efficiency and Optimality

Arpan Mukherjee, Ali Tajer, Rensselaer Polytechnic Institute, United States

TH2.R3.2: Best Arm Identification with Arm Erasures

TH2.R3.3: Optimal Stopping Rules for Best Arm Identification in Stochastic Bandits under Uniform Sampling

Vedang Gupta, Yash Gadhia, Shivaram Kalyanakrishnan, Nikhil Karamchandani, Indian Institute of Technology Bombay, India

TH2.R3.4: Minimax Regret with Unbounded Weights

Michael Drmota, TU Wien, Austria; Philippe Jacquet, INRIA, France; Changlong Wu, Wojciech Szpankowski, Purdue University, United States

Resources

View Manuscript