IEEE ISIT 2024 || Athens, Greece || 7-12 July 2024

TH1.R1.1

Asymptotics of Language Model Alignment

Joy Yang, University of Sydney, Australia; Salman Salamatian, MIT, United States; Ziteng Sun, Ananda Theertha Suresh, Ahmad Beirami, Google, Australia

Session:

Language Models

Track:

8: Machine Learning

Location:

Ballroom II & III

Presentation Time:

Thu, 11 Jul, 09:45 - 10:05

Session Chair:

Homa Esfahanizadeh,

Abstract

Let $p$ denote a reference generative language model. Let $r$ denote a reward model that returns a scalar to capture the degree at which a draw from $p$ is preferred. The goal of {\em language model alignment} is to alter $p$ to a new distribution $\phi$ that results in a higher expected reward while keeping $\phi$ close to $p$. A popular alignment method is the {\em KL-constrained reinforcement learning (RL)}, which chooses a distribution $\phi_\Delta$ that maximizes $E_{\phi_{\Delta}} r(y)$ subject to a relative entropy constraint $D_{\text{KL}}(\phi_\Delta \| p) \leq \Delta.$ Another simple alignment method is {\em best-of-$N$}, where $N$ samples are drawn from $p$ and one with highest reward is selected. In this paper, we offer a closed-form characterization of the optimal KL-constrained RL solution. We then demonstrate that any alignment method that achieves a comparable trade-off between KL divergence and expected reward must approximate the optimal KL-constrained RL solution in terms of relative entropy. To analyze the properties of alignment methods, we introduce two simplifying assumptions: we let the language model be memoryless, and the reward model be linear. Although these assumptions may not reflect complex real-world scenarios, they enable a precise characterization of the asymptotic (in the sequence length) behavior of the best-of-$N$ and the KL-constrained RL methods, in terms of information-theoretic quantities.

Session TH1.R1

TH1.R1.1: Asymptotics of Language Model Alignment

Joy Yang, University of Sydney, Australia; Salman Salamatian, MIT, United States; Ziteng Sun, Ananda Theertha Suresh, Ahmad Beirami, Google, Australia

TH1.R1.2: Predicting Uncertainty of Generative LLMs with MARS: Meaning-Aware Response Scoring

Yavuz Faruk Bakman, Duygu Nur Yaldiz, Baturalp Buyukates, University of Southern California, United States; Chenyang Tao, Dimitrios Dimitriadis, Amazon, United States; Salman Avestimehr, University of Southern California, United States

TH1.R1.3: TexShape: Information Theoretic Sentence Embedding for Language Models

Kaan Kale, Bogazici University, Turkey; Homa Esfahanizadeh, Nokia Bell Labs, United States; Noel Elias, Oguzhan Baser, University of Texas at Austin, United States; Muriel Medard, MIT, United States; Sriram Vishwanath, University of Texas at Austin, United States

TH1.R1.4: The Likelihood Gain of a Language Model as a Metric for Text Summarization

Dana Levin, Alon Kipnis, Reichman University, Israel

Resources

View Manuscript