The Joule Index
An Independent Benchmark · Reported May 2026
Runs · Reported May 2026

Every Verified run, every observational trace, every billed number.

The current release contains 24 Verified runs across 3 real OSS bug-fix tasks and 8 model tiers from Anthropic, Blankline, Google, plus one retired task kept on disk for full audit. Sanitized public_trace.json files are published per the disclosure policy.

Per-run detail · 24 Verified runs
TaskTierVendorF1$ / taskJ / taskTokens in / outWall sTrace
joule-001Claude Haiku 4.5Anthropic1.000$0.166070880,778 / 4,783299public_trace.json · pending host
joule-001Claude Opus 4.7Anthropic1.000$0.131862113,883 / 58816public_trace.json · pending host
joule-001Claude Sonnet 4.6Anthropic1.000$0.116940103,889 / 65719public_trace.json · pending host
joule-001Dropstone FastCurrent leaderBlankline1.000$0.011744239,079 / 3,297147public_trace.json · pending host
joule-001Dropstone HeavyBlankline1.000$0.2755545358,394 / 3,973186public_trace.json · pending host
joule-001Dropstone ProBlankline1.000$0.2494107325,882 / 2,545161public_trace.json · pending host
joule-001Gemini 3.1 FlashGoogle1.000$0.035636433,737 / 1,18044public_trace.json · pending host
joule-001Gemini 3.1 ProGoogle1.000$0.2172101154,547 / 5,04775public_trace.json · pending host
joule-002Claude Haiku 4.5Anthropic1.000$0.159272997,281 / 5,419375public_trace.json · pending host
joule-002Claude Opus 4.7Anthropic1.000$0.200197173,397 / 1,80739public_trace.json · pending host
joule-002Claude Sonnet 4.6Anthropic1.000$0.207547138,780 / 2,56295public_trace.json · pending host
joule-002Dropstone FastCurrent leaderBlankline1.000$0.029774376,501 / 7,9823108public_trace.json · pending host
joule-002Dropstone HeavyBlankline1.000$0.2449484316,201 / 4,036195public_trace.json · pending host
joule-002Dropstone ProBlankline1.000$0.3190149419,502 / 5,756346public_trace.json · pending host
joule-002Gemini 3.1 FlashGoogle1.000$0.026431229,787 / 1,87231public_trace.json · pending host
joule-002Gemini 3.1 ProGoogle1.000$0.2500107247,638 / 4,60187public_trace.json · pending host
joule-004Claude Haiku 4.5Anthropic1.000$0.62792974,387,841 / 20,647328public_trace.json · pending host
joule-004Claude Opus 4.7Anthropic1.000$2.744013752,820,013 / 17,985403public_trace.json · pending host
joule-004Claude Sonnet 4.6Anthropic0.182$1.24484531,622,105 / 23,668563public_trace.json · pending host
joule-004Dropstone FastCurrent leaderBlankline1.000$0.20455553,062,902 / 39,6203798public_trace.json · pending host
joule-004Dropstone HeavyBlankline1.000$2.050740512,692,119 / 24,2431503public_trace.json · pending host
joule-004Dropstone ProBlankline1.000$0.51724441,273,631 / 15,672806public_trace.json · pending host
joule-004Gemini 3.1 FlashGoogle0.000$0.12001401,275,389 / 7,679100public_trace.json · pending host
joule-004Gemini 3.1 ProGoogle0.462$2.27009982,538,983 / 27,130369public_trace.json · pending host
Retired tasks

Excluded from leaderboard, kept for audit

Methodology §3.3 specifies that tasks with inter-rater disagreement are discarded rather than adjudicated. A benchmark that documents its retired tasks publicly is more trustworthy than one that hides them.

TaskReason for retirement
joule-003The merged Pull Request's description listed scope (a backend proxy endpoint) that did not appear in the merged diff. Every evaluated tier followed the description in good faith and produced changes the diff did not reward. The Blankline Research Team identified the mismatch during inter-rater review and retired the task per methodology §3.3.