S. Roy

Blog Post·2026-06-20·10 min read

The Bradley–Terry Model: From ELO Scores to Reward Models

Chatbot Arena ranks LLMs with ELO, InstructGPT trains a reward model on pairwise preferences, and chess has rated players for seventy years. All three rest on the same one-line probabilistic model — Bradley–Terry — which turns out to be logistic regression over comparisons.

preference-learning rlhf reward-models elo ranking

Tag: elo

The Bradley–Terry Model: From ELO Scores to Reward Models