MLSP-P50.10

BOOSTING CONTEXTUAL ADAPTIVE POLICY LEARNING WITH FOUNDATION MODEL GUIDANCE

Yuanfei Wang, Hao Dong, Peking University, China

Session:
MLSP-P50: Reinforcement Learning Algorithms and Applications II Poster

Track:
Machine Learning for Signal Processing [ML]

Location:
Poster Area 10

Presentation Time:
Thu, 7 May, 09:00 - 11:00

Presentation
Discussion
Resources
No resources available.
Session MLSP-P50
MLSP-P50.1: GLUCOAPRL: AHEAD-PLANNING REINFORCEMENT LEARNING MECHANISM FOR SAFE BLOOD GLUCOSE REGULATION
Liangliang Liu, Yi Guan, Rujia Shen, Chaoran Kong, Guowei Zheng, Yanming Li, Jingchi Jiang, Harbin Institute of Technology, China; Yi Lin, Harbin Medical University, China
MLSP-P50.2: SYMBOLIC GOAL-GUIDED INTRINSIC CURRICULA FOR LONG-HORIZON REINFORCEMENT LEARNING
Tong Wu, Yi Wen, Lingfu Wang, Guangchun Luo, Dayong Zhu, University of Electronic Science and Technology of China, China
MLSP-P50.3: DYNAPREDICT: ALTERNATING PREDICTIVE AND REAL ITERATION FOR EFFICIENT DEEP REINFORCEMENT LEARNING TRAINING
Yiran Liu, Peng Qiao, Rongchun Li, Zhouyu He, Tao Sun, National University of Defense Technology, China
MLSP-P50.4: LONG CHAIN-OF-THOUGHT COMPRESSION VIA FINE-GRAINED GROUP POLICY OPTIMIZATION
Xinchen Han, Hossam Afifi, Michel Marot, Institut Polytechnique de Paris, France; Xilu Wang, Lu Yin, University of Surrey, United Kingdom of Great Britain and Northern Ireland
MLSP-P50.5: F2G-AMD: Feature-to-Graph Affinity with Large-Kernel Attention for AMD Grading using Fundus Images
Mohamed Elsharkawy, Ibrahim Abdelhalim, Moumen El-Melegy, Asem Ali, University of Louisville, United States of America; Mohammed Ghazal, Abu Dhabi University, United Arab Emirates; Ali Mahmoud, University of Louisville, United States of America; Ashraf Sewelam, Mansoura University, United States of America; Harpal Sandhu, Ayman El-Baz, University of Louisville, United States of America
MLSP-P50.6: DWC-PO: Dynamic Weight Constraints for Model-Based Policy Optimization via Wasserstein Policy Improvement Bounds
Yuetian Wang, MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, China; Dianxi Shi, Intelligent Game and Decision Lab (IGDL), China; Huanhuan Yang, Beijing Academy of Science and Technology, China
MLSP-P50.7: MDPO: MULTI-DIMENSIONAL LABEL ENHANCED DIRECT PREFERENCE OPTIMIZATION FOR EFFICIENT MULTIMODAL LLM FINE-TUNING
Hanwen Hu, Beijing University of Posts and Telecommunications, China; Ningyuan Guo, Tsinghua University, China; Xinghan Li, Shanghai Jiao Tong University, China
MLSP-P50.8: ADAPTIVE WORLD MODEL WITH LATENT GENERATION ALGORITHM FOR DEEP REINFORCEMENT LEARNING IN PORTFOLIO OPTIMIZATION
Fengchen Gu, Zhengyong Jiang, Xi’an Jiaotong-Liverpool University, China; Ángel F. García-Fernández, Universidad Politécnica de Madrid, Spain; Jionglong Su, Huakang Li, Xi’an Jiaotong-Liverpool University, China
MLSP-P50.9: ALLEVIATING OVERTHINKING IN LARGE REASONING MODELS VIA SELF-ITERATIVE PREFERENCE OPTIMIZATION
Shen Chen, Jin Wang, Xuejie Zhang, Yunnan University, China
MLSP-P50.10: BOOSTING CONTEXTUAL ADAPTIVE POLICY LEARNING WITH FOUNDATION MODEL GUIDANCE
Yuanfei Wang, Hao Dong, Peking University, China
Contacts