TH1.R1.2

Predicting Uncertainty of Generative LLMs with MARS: Meaning-Aware Response Scoring

Yavuz Faruk Bakman, Duygu Nur Yaldiz, Baturalp Buyukates, University of Southern California, United States; Chenyang Tao, Dimitrios Dimitriadis, Amazon, United States; Salman Avestimehr, University of Southern California, United States

Session:
Language Models

Track:
8: Machine Learning

Location:
Ballroom II & III

Presentation Time:
Thu, 11 Jul, 10:05 - 10:25

Session Chair:
Homa Esfahanizadeh,
Abstract
Generative Large Language Models (LLMs) have recently been widely utilized for their unprecedented capabilities across many tasks. Considering their use in high-stakes environments and for mission-critical applications, the fact that LLMs often can generate inaccurate or misleading results can be potentially harmful, which motivates us to study the correctness of generative LLM outputs. Uncertainty Estimation (UE) in generative LLMs is a developing area, with state-of-the-art probability-based techniques frequently using length-normalized scoring. As an alternative to length-normalized scoring in UE, in this work, we propose Meaning-Aware Response Scoring (MARS). The key idea of MARS is to consider the semantic contribution of each token of the generated sequence to the context of the question during UE. Through extensive experiments on three question-answering datasets across five pre-trained LLMs, we show that utilizing MARS during UE results in a universal and significant improvement in UE performance.
Resources