AI Detector Rankings
Combined benchmark accuracy & community arena ratings
Detector Rankings
| Rank | Detector | Score |
|---|---|---|
| 🥇 | 97.7 | |
| 🥈 | 96.2 | |
| 🥉 | 88.7 | |
| 4 | 87.8 | |
| 5 | 87.3 | |
| 6 | 81.1 | |
| 7 | 77.2 | |
| 8 | 72.3 | |
| 9 | 50.7 |
AI Model Detection Rates
How often each AI model gets detected. Lower = harder to detect.
How Rankings Work
Combined Score Formula
The Combined Score balances three key metrics: F1 Score (50%), False Positive Rate (30%), and False Negative Rate (20%).
Score = 0.5 × F1 + 0.3 × (1 - FPR) + 0.2 × (1 - FNR)
This formula rewards detectors that balance precision (avoiding false positives) with recall (catching AI images).
F1 Score
F1 is the harmonic mean of Precision and Recall. A high F1 means the detector is good at both catching AI images (low FNR) and not flagging real images (low FPR). Formula: F1 = 2 × Precision × Recall / (Precision + Recall).
FPR & FNR
FPR (False Positive Rate) — percentage of real images incorrectly flagged as AI. Lower is better. FNR (False Negative Rate) — percentage of AI images missed. Lower is better.
Dataset
Our benchmark uses images from Midjourney, Stable Diffusion (SDXL, SD 3.5), DALL-E 3, Flux, Adobe Firefly, Leonardo.ai, Runway, Google Imagen, and Ideogram. Real images are sourced from photography databases to test for false positives.
