AI Detector Leaderboard

Ranking based on 12 community battles

11
Detectors
12
Total Battles
1058
Top Elo
RankDetectorElo ScoreAccuracyW / L / TBattles
🥇1058
100%
4 / 0 / 48
🥈1016
100%
1 / 0 / 01
🥉1000
0%
0 / 0 / 00
41000
0%
0 / 0 / 00
51000
0%
0 / 0 / 00
61000
0%
0 / 0 / 00
71000
0%
0 / 0 / 00
8998
80%
1 / 1 / 35
9984
33%
1 / 2 / 03
10972
50%
0 / 2 / 24
11972
33%
0 / 2 / 13

How the Elo Leaderboard Works

The Arena

In the Arena, users are shown an image — either AI-generated or a real photograph — along with the verdicts from two randomly selected detectors. The user votes for whichever detector gave the more accurate answer. This single vote becomes a "battle" that updates both detectors' Elo ratings.

Elo Rating System

The Elo system was originally developed for chess and is now widely used in competitive ranking. Every detector starts at a base rating of 1000. When a detector wins a battle against a higher-rated opponent, it gains more points than it would against a lower-rated opponent. This means the rankings self-correct over time — consistently accurate detectors rise, while inconsistent ones fall.

Leaderboard vs Benchmark

The Leaderboard and the Benchmark measure detector quality differently. The Benchmark runs automated tests on a curated dataset and calculates accuracy, false positive rate, and false negative rate. The Leaderboard reflects community judgment through head-to-head comparisons. A detector can rank differently on each — for example, a detector with high benchmark accuracy might lose Arena battles on edge cases that another detector handles better.

Win Rate vs Elo

Win rate (accuracy column) shows the raw percentage of battles won or tied. Elo rating accounts for opponent strength — beating a strong detector is worth more than beating a weak one. Two detectors with the same win rate can have different Elo scores based on who they beat.

Frequently Asked Questions

What is the AI Detector Leaderboard?
The AI Detector Leaderboard ranks AI image detectors using an Elo rating system based on community battles. Users compare two detectors on the same image and vote for the one that gave the better answer. Over time, consistently accurate detectors rise to the top.
How does the Elo rating system work?
Elo is a rating system originally developed for chess. Each detector starts at a base rating (1000). When two detectors are compared on the same image, the winner gains Elo points and the loser drops points. The amount gained or lost depends on the expected outcome — beating a higher-rated detector earns more points than beating a lower-rated one.
What is the difference between the Leaderboard and the Benchmark?
The Benchmark tests detectors automatically against a curated dataset and measures accuracy, false positive rate, and false negative rate. The Leaderboard uses community votes from head-to-head battles in the Arena. Both rankings complement each other — the Benchmark provides controlled testing, while the Leaderboard reflects real-world comparative judgment.
How can I contribute to the Leaderboard?
Go to the Arena page and start a battle. You will be shown an image along with two detector verdicts. Vote for the detector that gave the more accurate answer. Each vote updates the Elo ratings in real time.
What do wins, losses, and ties mean?
A win means the detector was chosen as more accurate in a battle. A loss means the other detector was chosen. A tie means both detectors gave equally correct (or equally wrong) answers and the user voted it a draw.
How often are Leaderboard rankings updated?
Rankings update in real time after every battle vote. Elo ratings are recalculated immediately, so the Leaderboard always reflects the latest community results.