LLM Cost Calculator — Real Cost Per Outcome, Not Per Token

01 / Control Panel

Calculator Parameters

1. Task Category

2. Retry Rate Multiplier1.1x

0.5x (Optimistic)1.1x (Task Default)3.0x (Pessimistic)

3. Compare Models

Calculation Formula

Base cost is calculated per million tokens. The total outcome cost =

(in_tokens × $/M + out_tokens × $/M) × retry_rate

Loading interactive visualizations...

05 / Ground Evidence

Stack-Ranked Outcomes

Sorted by cost ascending

LLM Model	Estimated Tokens	Quality Score	Outcome Cost ▲	Token Source
Gemini 3 FlashCheapest	In: 3,800 Out: 1,200	74% LMSys Arena	$0.0007	Aider Workload
GPT-4o mini	In: 3,900 Out: 1,400	76.5% Aider Leaderboard	$0.0016	Aider Workload
DeepSeek V4 Pro	In: 4,000 Out: 1,850	82.1% LMSys Arena	$0.0037	Aider Workload
Claude 4.5 Haiku	In: 3,800 Out: 1,300	79.1% Aider Leaderboard	$0.0091	Aider Workload
o3-mini	In: 4,500 Out: 2,500	85.6% Aider Leaderboard	$0.0175	Aider Workload
Gemini 3.5 Flash	In: 3,850 Out: 1,450	78.9% LMSys Arena	$0.0207	Aider Workload
GPT-4o	In: 4,100 Out: 1,900	84.8% Aider Leaderboard	$0.0322	Aider Workload
Gemini 3.1 Pro	In: 4,300 Out: 2,000	83.5% LMSys Arena	$0.0359	Aider Workload
Claude 4.6 Sonnet	In: 4,000 Out: 1,800	85.2% Aider Leaderboard	$0.0429	Aider Workload
Claude 4.7 Opus	In: 4,200 Out: 2,200	86.8% Aider Leaderboard	$0.2508	Aider Workload

The Ground Truth Registry (⬇️ Sorted by True Cost)

What is measured: The raw underlying calculations proving the final costs. It maps the workload token counts, verified accuracy (Quality Score), and the calculated True Cost.

Verifiability & Dual Sourcing: (1) Quality Source: Click any highlighted link under Quality Score to audit the accuracy evaluation (e.g. LMSys battles or AgentNoah OWASP audits). (2) Token Source: The rightmost badge identifies the benchmark workload used to audit raw input/output token counts.

LLM Cost-Per-Outcome Calculator

Quick example:

Calculator Parameters

Stack-Ranked Outcomes

Want this methodology on your own audits?