SLP-L17.6
TESTAGENT: AUTOMATIC BENCHMARKING AND EXPLORATORY INTERACTION FOR EVALUATING LLMS IN VERTICAL DOMAINS
Wanying Wang, Zeyu Ma, Shanghai Development Center of Computer Software Technology, China; Xuhong Wang, Shanghai Artificial Intelligence Lab, China; Yangchun Zhang, Shanghai University, China; Pengfei Liu, Shanghai Jiao Tong University, China; Mingang Chen, Shanghai Development Center of Computer Software Technology, China
Session:
SLP-L17: Question Answering: Reasoning and Evaluation Oral
Track:
Speech and Language Processing [SL]
Location:
Room 115
Presentation Time:
Thu, 7 May, 18:10 - 18:30
Presentation
Discussion
Resources
No resources available.
Session SLP-L17
SLP-L17.1: DIAGNOSE-REFLECTIVE PLANNING: FAITHFUL KG REASONING VIA LLM-GUIDED MCTS WITH STRATEGIC SELF-CORRECTION
Mingyu Zhao, University of Electronic Science and Technology of China, China; Tianxi Huang, Chengdu Textile College, China; Guiduo Duan, Songmin Wang, University of Electronic Science and Technology of China, China
SLP-L17.2: Enhancing Knowledge Base Question Answering with Reinforced Hop-wise Logical Form Generation
Jinghua Tang, Jian Cao, Jianqi Gao, Ranran Bu, Shanghai Jiao Tong University, China
SLP-L17.3: BIOMED-R2 : JOINT DIVERSITY RETRIEVAL AND EVIDENCE REASONING FOR BIOMEDICAL QUESTION ANSWERING
Hongjiao Guan, Chuanlong Li, Weiyu Zhang, Qilu University of Technology (Shandong Academy of Sciences), China; Ying Lian, The First Affiliated Hospital of Shandong First Medical University, China; Jianbin Guo, Beijing Wenge Technology Co.,Ltd, China; Wenpeng Lu, Qilu University of Technology (Shandong Academy of Sciences), China
SLP-L17.4: DIVERSITY IS ALL YOU NEED: SELF-SUPERVISED HYPERGRAPH LEARNING FOR MITIGATING POPULARITY BIAS IN CONVERSATIONAL RECOMMENDER SYSTEM
Yongsen Zheng, Nanyang Technological University, Singapore; Ruilin Xu, Ye Ma, Sun Yat-sen University, China; Guohua Wang, South China Agricultural University, China; Liang Lin, Sun Yat-sen University, China; Kwok-Yan Lam, Nanyang Technological University, China
SLP-L17.5: BENCHMARKING HUMANS AND MACHINES ON COMPLEX MULTILINGUAL SPEECH UNDERSTANDING TASKS
Sai Samrat Kankanala, Ram Chandra, Sriram Ganapathy, Indian Institute of Science, India
SLP-L17.6: TESTAGENT: AUTOMATIC BENCHMARKING AND EXPLORATORY INTERACTION FOR EVALUATING LLMS IN VERTICAL DOMAINS
Wanying Wang, Zeyu Ma, Shanghai Development Center of Computer Software Technology, China; Xuhong Wang, Shanghai Artificial Intelligence Lab, China; Yangchun Zhang, Shanghai University, China; Pengfei Liu, Shanghai Jiao Tong University, China; Mingang Chen, Shanghai Development Center of Computer Software Technology, China
Contacts