Built by AgentNoah BUILD ⚡
This calculator was built end-to-end using AgentNoah's BYOL BUILD pipeline on Google Antigravity (Gemini 3.5 Flash) in one evening. Read the full case study → including the 6 fabrications the review loop caught before this site went live.
The Formula: Cheap ≠ Value
Standard vendor tables show pricing in simple "Dollars per Million input/output tokens". This promotes models that look extremely inexpensive (e.g. Gemini Flash or GPT-4o mini) while hiding the real economic cost of completing tasks in software development workflows.
When a model has lower quality, it makes mistakes, leading to failed builds, lint errors, or broken tests. In practice, a developer or orchestrator must retry the call (often multiple times) until a successful outcome is achieved.
Calculation Formula
Where the Retry Rate acts as a linear multiplier representing the average attempts needed to pass a verification loop (tests, compile, manual validation).
Proven Data & Source URLs
| Vendor / Source | Provenance Link | Last Verified |
|---|---|---|
| Anthropic Claude | anthropic.com/pricing | 2026-05-23 |
| OpenAI GPT & o-series | openai.com/pricing | 2026-05-23 |
| Google Gemini | ai.google.dev/pricing | 2026-05-23 |
| DeepSeek (V4 Pro) | api-docs.deepseek.com/quick_start/pricing | 2026-05-25 |
| Aider Leaderboard | aider.chat/docs/leaderboards | 2026-05-23 |
| AgentNoah Benchmark | AgentNoah OWASP K=3 security sweep | 2026-05-23 |
Limits + Caveats
- Token estimate provenance: Cells tagged
aiderare calibrated to typical patterns from the Aider coding leaderboard — they are our best estimate of typical input/output token usage per model per task, NOT direct per-cell measurements (Aider does not publish per-task token counts in this format). Cells taggedagentnoah-owaspfor the security-audit task come from AgentNoah's K=3 BYOL benchmark. PRs welcome at github.com/guevae2/llm-cost-per-outcome/issues to refine specific cells with real measurements. - Token estimates vary: Real usage varies wildly depending on prompt templates, system instructions, few-shot examples, and framework overhead. The estimates are static baseline measurements.
- Retry Rate is a proxy: In actual systems, failed attempts are not always complete retries. They might be smaller recovery steps, although the overall cost contribution trends similarly.
- Open source contributed: Data is contributed and updated by the community. If you notice any outdated or incorrect price cells, please open an Issue.