TH2.R3.2

Best Arm Identification with Arm Erasures

Srinivas Reddy Kota, Indian Institute of Technology Madras, India; Karthik Periyapattana Narayana Prasad, Indian Institute of Technology Hyderabad, India; Vincent Y. F. Tan, National University of Singapore, Singapore

Session:
Multi-Armed Bandits 2

Track:
8: Bandits

Location:
Ypsilon IV-V-VI

Presentation Time:
Thu, 11 Jul, 11:50 - 12:10

Session Chair:
Vincent Tan, National University of Singapore
Abstract
In this paper, we address the problem of best arm identification (BAI) with arm erasures in a multi-armed bandit setting with finitely many arms. A {\em learner} who seeks to identify the best arm---the arm with the largest mean reward---samples arms sequentially, one at each time instant, and communicates the sampled arm to an {\em agent} through an erasure channel with a known erasure probability $\varepsilon \in (0,1)$. The learner {\em does not} receive any erasure feedback, and hence does not know whether the transmitted arm was erased by the channel. In instances where erasure does not occur, and the transmitted arm is successfully received by the agent, the agent promptly pulls the received arm. On the contrary, when erasure occurs, we analyse the following two distinct scenarios: (a) the agent randomly selects an arm, and (b) the agent selects the most recent successfully received arm. We assume that the instantaneous reward from the pulled arm is available to the learner, whose objective is to find the best arm as quickly as possible, subject to an upper bound on the error probability. Given $\delta \in (0,1)$, we derive a problem-dependent lower bound on the expected stopping time of any algorithm whose error probability is within $\delta$. We also propose two successive elimination algorithms for each of the aforementioned scenarios (a), (b), and provide upper bounds on their stopping times that hold with probability $1-\delta$. To our best knowledge, this is the first work on BAI with arm erasures.
Resources