IEEE ISIT 2024 || Athens, Greece || 7-12 July 2024

TH1.R3.2

Multi-armed Bandits with Probing

Eray Can Elumar, Carnegie Mellon University, United States; Cem Tekin, Bilkent University, Turkey; Osman Yagan, Carnegie Mellon University, United States

Session:

Multi-Armed Bandits 1

Track:

8: Bandits

Location:

Ypsilon IV-V-VI

Presentation Time:

Thu, 11 Jul, 10:05 - 10:25

Session Chair:

Ali Tajer, Rensselaer Polytechnic Institute

Abstract

We examine a multi-armed bandit problem involving probes, where the agent is permitted to probe one arm for a cost $c \geq 0$ to observe its reward before making a pull. We identify the optimal strategy for deciding whether to probe or pull an arm. In the case of probing an arm, we also make a decision on which arm to pull after observing the probe's outcome. Additionally, we introduce a novel regret definition based on the expected reward of the optimal action. We propose UCBP, a novel algorithm that utilizes this strategy. UCBP achieves a gap-independent regret upper bound that scales with $ \mathcal{O}(\sqrt{K T \log T})$, and an order optimal gap-dependent upper bound that scales with $ \mathcal{O}(K \log T)$. We provide UCB-naive-probe, a naive UCB-based approach which has a gap-independent regret upper bound on the order of $O(\sqrt{K T \log T})$, and gap-dependent regret on the order of $O(K^2 \log T)$ as a baseline. We provide empirical simulations to verify the utility of the UCBP algorithms in practical settings, and show that UCBP outperforms UCB-naive-probe in simulations.

Session TH1.R3

TH1.R3.1: Improved Bound for Robust Causal Bandits with Linear Models

Zirui Yan, Arpan Mukherjee, Burak Varıcı, Ali Tajer, Rensselaer Polytechnic Institute, United States

TH1.R3.2: Multi-armed Bandits with Probing

Eray Can Elumar, Carnegie Mellon University, United States; Cem Tekin, Bilkent University, Turkey; Osman Yagan, Carnegie Mellon University, United States

TH1.R3.3: Federated Learning for Heterogeneous Bandits with Unobserved Contexts

Jiabin Lin, Shana Moothedath, Iowa State University, United States

TH1.R3.4: Explore-then-Commit Algorithms for Decentralized Two-sided Matching Markets

Tejas Pagare, Department of Electrical Engineering, IIT Bombay, India; Avishek Ghosh, SysCon and CMInDS, IIT Bombay, India

Resources

View Manuscript