Home / Glossary / Elo Rating

Elo Rating

The Elo rating system, originally developed for chess, measures relative performance through pairwise comparisons. In DetectArena, each AI detection tool has an Elo rating that increases when it wins a blind comparison and decreases when it loses. Higher-rated tools that lose to lower-rated tools experience larger rating drops, making the system self-correcting over time.

How the Elo System Works

The Elo system was developed by physicist Arpad Elo in the 1960s to rate chess players. It works by comparing two competitors head-to-head and adjusting their ratings based on the outcome relative to the expected result.

The key formula: when two tools with ratings RA and RB compete, the expected win probability for tool A is EA = 1 / (1 + 10(RB - RA)/400). After the comparison, the rating update is R'A = RA + K(SA - EA), where SA is 1 for a win, 0.5 for a tie, and 0 for a loss, and K is the adjustment factor.

The K-factor controls how quickly ratings change. DetectArena uses higher K-factors for new tools (which have fewer data points and need to reach their true rating faster) and lower K-factors for established tools (which have more stable, reliable ratings).

Why DetectArena Uses Elo

The Elo system is well-suited for DetectArena because:

Interpreting Elo Ratings

Elo ratings are relative, not absolute. A tool with a rating of 1600 is not "good" in an absolute sense; it is stronger than tools with lower ratings in the same pool. A 100-point difference corresponds to roughly a 64% expected win rate for the higher-rated tool.