TP2b.4: Reward Attack on Stochastic Bandits with Non-stationary Rewards
Chenye Yang, Guanlin Liu, Lifeng Lai, University of California, Davis, United States
TP2b.5: Multi-Agent Recurrent Deterministic Policy Gradient with Inter-Agent Communication
Joohyun Cho, Mingxi Liu, Yi Zhou, Rong-Rong Chen, University of Utah, United States