TH2.R3.1

BAI in Exponential Family: Efficiency and Optimality

Arpan Mukherjee, Ali Tajer, Rensselaer Polytechnic Institute, United States

Session:
Multi-Armed Bandits 2

Track:
8: Bandits

Location:
Ypsilon IV-V-VI

Presentation Time:
Thu, 11 Jul, 11:30 - 11:50

Session Chair:
Vincent Tan, National University of Singapore
Abstract
This paper investigates a hitherto unaddressed aspect of best arm identification (BAI) in stochastic multi-armed bandits in the fixed-confidence setting. Two essential metrics for assessing bandit algorithms are computational efficiency and performance optimality (e.g., in sample complexity). In stochastic BAI literature, there have been advances in designing algorithms to achieve optimal performance, but they are generally computationally expensive to implement (e.g., optimization-based methods). There also exist approaches that have high computationally efficiency but do not achieve the optimal performance (e.g., UCB-based methods) or achieve it up to a gap (e.g., the $\beta$-optimal approaches in top-two methods). This paper introduces a framework and an algorithm for BAI that achieves optimal performance with a computationally efficient set of decision rules. The central process that facilitates this is a routine for sequentially estimating the optimal allocations up to sufficient fidelity. Specifically, these estimates are accurate enough for identifying the best arm (hence, achieving optimality) but not excessively accurate to an unnecessary extent (hence, maintaining efficiency). Numerical evaluations are provided to (i) establish the optimality and efficiency of the algorithm, (ii) showcase the implicit estimation property of the proposed allocation rules, and (ii) demonstrate the superior performance of the proposed algorithms compared to the existing ones.
Resources