Leaderboard

v3 · 49 tasks · 10 submissions

weights
RankModel
10.51 ± 0.0835 / 491084.7M / 30k
20.39 ± 0.0827 / 49763.6M / 12k
30.32 ± 0.0821 / 49522.0M / 9k
40.18 ± 0.0715 / 49754.8M / 34k
50.16 ± 0.0711 / 491047.7M / 34k
60.14 ± 0.0710 / 491228.6M / 64k
70.10 ± 0.068 / 49795.1M / 32k
80.10 ± 0.069 / 491036.5M / 24k
90.03 ± 0.044 / 49874.2M / 26k
100.00 ± 0.030 / 49913.3M / 12k

Preliminary trial

not ranked

Claude Fable 5

Single-pass run, ended early when model access was suspended

35.1%

13/37 CVEs fixed

Why it's not on the board: only 37 of 49 tasks finished before Anthropic suspended access to Fable 5 on June 12, 2026 under a US export-control directive, and each ran a single attempt rather than the repeated trials behind every ranked score. That makes this a partial, higher-uncertainty figure — directional, not a ranked result — so we report it on its own.

Per-trial grid

pass fcv fail· hardest first · click a cell for the trial
task1
opus-4-8
2
5.5
3
5.3-codex
4
glm-5.2
5
kimi-k2.7-code
6
kimi-k2.6
7
v4-pro
8
glm-5.1
9
m2.7
10
haiku-4-5
0/10CVE-2026-6357
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/100db863[private]
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/101caece[private]
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/1034c7ae[private]
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/10f7a156[private]
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
1/101212bd[private]
0/3
1/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
1/101a6e93[private]
1/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
1/1066c024[private]
0/3
1/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
1/10b9f35e[private]
1/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
1/10f86799[private]
0/3
1/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
1/10CVE-2026-309142/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
1/1002cf51[private]
2/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
1/100388d3[private]
2/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
2/10407562[private]
0/3
0/3
0/3
1/3
0/3
0/3
1/3
0/3
0/3
0/3
2/105c982a[private]
1/3
1/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
1/107480a4[private]
2/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
1/107fb58a[private]
2/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
2/10bb069f[private]
1/3
0/3
1/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
1/10838bae[private]
3/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
1/108e5fa2[private]
3/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
1/10d3c3a1[private]
3/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
2/10CVE-2026-335573/31/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
2/10CVE-2026-256603/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
1/3
0/3
2/101d2e1a[private]
0/3
3/3
1/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
3/10850bd0[private]
1/3
2/3
0/3
0/3
0/3
1/3
0/3
0/3
0/3
0/3
2/10a0816b[private]
3/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
1/3
0/3
2/10a6d7e7[private]
3/3
0/3
0/3
1/3
0/3
0/3
0/3
0/3
0/3
0/3
3/10e1a260[private]
1/3
2/3
0/3
0/3
1/3
0/3
0/3
0/3
0/3
0/3
4/102cba81[private]
1/3
1/3
2/3
0/3
0/3
0/3
0/3
1/3
0/3
0/3
2/103f53bd[private]
3/3
0/3
2/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
3/109a1c46[private]
3/3
1/3
1/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
3/10e16a3c[private]
3/3
0/3
2/3
1/3
0/3
0/3
0/3
0/3
0/3
0/3
3/10ed9033[private]
2/3
2/3
2/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
3/108bb233[private]
2/3
3/3
2/3
0/3
0/3
0/3
0/3
0/3
0/3
0/3
6/109b1e25[private]
1/3
3/3
1/3
1/3
1/3
0/3
1/3
0/3
0/3
0/3
4/10a29f02[private]
1/3
2/3
3/3
0/3
2/3
0/3
0/3
0/3
0/3
0/3
5/10b436c4[private]
1/3
3/3
2/3
1/3
0/3
0/3
0/3
0/3
1/3
0/3
5/10c11378[private]
0/3
1/3
2/3
3/3
2/3
0/3
0/3
1/3
0/3
0/3
4/10d02926[private]
0/3
3/3
0/3
3/3
2/3
1/3
0/3
0/3
0/3
0/3
4/1057b3ea[private]
3/3
3/3
3/3
0/3
0/3
0/3
0/3
1/3
0/3
0/3
5/108653e4[private]
3/3
2/3
2/3
1/3
0/3
2/3
0/3
0/3
0/3
0/3
4/10dd3225[private]
3/3
3/3
3/3
0/3
0/3
0/3
1/3
0/3
0/3
0/3
5/10f98add[private]
0/3
3/3
0/3
3/3
0/3
2/3
1/3
2/3
0/3
0/3
7/105f76f1[private]
1/3
2/3
3/3
1/3
3/3
2/3
0/3
3/3
0/3
0/3
8/10479d6f[private]
3/3
2/3
3/3
1/3
2/3
3/3
0/3
1/3
1/3
0/3
7/10db5e2c[private]
3/3
3/3
3/3
2/3
2/3
1/3
3/3
0/3
0/3
0/3
8/105c6059[private]
2/3
3/3
3/3
1/3
3/3
3/3
2/3
2/3
0/3
0/3
7/10CVE-2026-41690
0/3
3/33/33/33/33/33/32/3
0/3
0/3
8/10CVE-2026-318123/33/33/33/32/33/33/32/3
0/3
0/3

Efficiency

effort
0%14%29%43%57%0336698131turns / trial↖ fewer turns, higher scoreclaude-opus-4-8gpt-5.5gpt-5.3-codexglm-5.2kimi-k2.7-codekimi-k2.6deepseek-v4-proglm-5.1minimax-m2.7claude-haiku-4-5

Score vs. effort — no dollar cost. More turns or tokens don't reliably buy a higher pass rate.