IEEE ISIT 2024 || Athens, Greece || 7-12 July 2024

TH2.R3.1

BAI in Exponential Family: Efficiency and Optimality

Arpan Mukherjee, Ali Tajer, Rensselaer Polytechnic Institute, United States

Session:

Multi-Armed Bandits 2

Track:

8: Bandits

Location:

Ypsilon IV-V-VI

Presentation Time:

Thu, 11 Jul, 11:30 - 11:50

Session Chair:

Vincent Tan, National University of Singapore

Abstract

This paper investigates a hitherto unaddressed aspect of best arm identification (BAI) in stochastic multi-armed bandits in the fixed-confidence setting. Two essential metrics for assessing bandit algorithms are computational efficiency and performance optimality (e.g., in sample complexity). In stochastic BAI literature, there have been advances in designing algorithms to achieve optimal performance, but they are generally computationally expensive to implement (e.g., optimization-based methods). There also exist approaches that have high computationally efficiency but do not achieve the optimal performance (e.g., UCB-based methods) or achieve it up to a gap (e.g., the $\beta$-optimal approaches in top-two methods). This paper introduces a framework and an algorithm for BAI that achieves optimal performance with a computationally efficient set of decision rules. The central process that facilitates this is a routine for sequentially estimating the optimal allocations up to sufficient fidelity. Specifically, these estimates are accurate enough for identifying the best arm (hence, achieving optimality) but not excessively accurate to an unnecessary extent (hence, maintaining efficiency). Numerical evaluations are provided to (i) establish the optimality and efficiency of the algorithm, (ii) showcase the implicit estimation property of the proposed allocation rules, and (ii) demonstrate the superior performance of the proposed algorithms compared to the existing ones.

Session TH2.R3

TH2.R3.1: BAI in Exponential Family: Efficiency and Optimality

Arpan Mukherjee, Ali Tajer, Rensselaer Polytechnic Institute, United States

TH2.R3.2: Best Arm Identification with Arm Erasures

Srinivas Reddy Kota, Indian Institute of Technology Madras, India; Karthik Periyapattana Narayana Prasad, Indian Institute of Technology Hyderabad, India; Vincent Y. F. Tan, National University of Singapore, Singapore

TH2.R3.3: Optimal Stopping Rules for Best Arm Identification in Stochastic Bandits under Uniform Sampling

Vedang Gupta, Yash Gadhia, Shivaram Kalyanakrishnan, Nikhil Karamchandani, Indian Institute of Technology Bombay, India

TH2.R3.4: Minimax Regret with Unbounded Weights

Michael Drmota, TU Wien, Austria; Philippe Jacquet, INRIA, France; Changlong Wu, Wojciech Szpankowski, Purdue University, United States

Resources

View Manuscript