Plan AI usage budgets with detailed token accounting. Adjust pricing, context, caching, and batch factors. See totals instantly, then download reports in seconds easily.
1) Split prompt tokens into cached and fresh:
2) Compute base costs per request (rates are per 1,000,000 tokens):
3) Add retry usage, apply batch savings, then overhead:
| Scenario | Prompt Tokens | Output Tokens | Requests | Prompt Rate / 1M | Output Rate / 1M | Cache Hit | Total Cost (est.) |
|---|---|---|---|---|---|---|---|
| Support chatbot batch | 1,800 | 650 | 10,000 | $3.00 | $15.00 | 30% | $??.?? |
| RAG search answers | 3,200 | 900 | 2,500 | $2.50 | $10.00 | 15% | $??.?? |
| Evaluation runs | 900 | 150 | 50,000 | $1.00 | $4.00 | 60% | $??.?? |
Token costs become predictable when you track them like any other compute bill. This calculator converts per‑request token usage into a ledger you can audit by prompt, cached context, and output. Teams can compare experiments, production traffic, and evaluation runs using the same unit economics. The per‑request view highlights where a small template change increases total spend at scale, even when quality looks unchanged. and when requests jump unexpectedly overnight.
Prompt tokens are not all equal when caching is available. Reused system prompts, tool schemas, and stable reference context can be discounted or billed differently. By splitting prompt tokens into cached and fresh portions, you estimate how much spend is tied to dynamic user content versus reusable scaffolding. That clarity supports decisions like trimming long instructions, moving static text into cacheable blocks, and reusing retrieval results across turns. to cut latency today.
Output tokens often drive variance because completions expand with reasoning depth, tool logs, citations, and structured formats. The breakdown isolates output cost so you can set caps, apply truncation rules, or switch response templates. Monitoring output per request alongside cost per request helps you spot regressions after prompt edits or model upgrades. Add a retry percentage to cover re-asks, tool failures, and timeout recoveries. without masking the true unit economics behind.
Batch discounts can materially lower subtotal cost when workloads are flexible and latency is less critical. Apply a batch reduction to the token subtotal after retries, then add overhead for routing, monitoring, storage, and observability. Overhead is not a token charge, but it is real money paid to run production systems. Keeping it explicit avoids underpricing internal services and improves forecasting when traffic spikes. or when you launch new features globally suddenly.
Once you know total cost, derive actionable ratios: cost per request, cost per 1,000 users, and cost per successful task. Compare scenarios by adjusting rates and token counts to model tradeoffs between quality and budget. The CSV and PDF exports support reviews for finance, engineering, and vendors. Use the example table as a starting point, then replace placeholders with your logs from production. to validate estimates before committing to long contracts.
Enter your provider’s input and output prices expressed per 1,000,000 tokens. If you have per‑1K pricing, multiply by 1,000. Keep currency consistent across fields so totals and exports remain comparable.
It is the share of prompt tokens served from reusable context, templates, or stored system blocks. Higher cache hit rates shift spend from fresh prompt tokens to discounted cached tokens, lowering total cost without changing outputs.
Use logs to compute extra requests caused by failures, re-prompts, tool errors, or timeouts. If 3 out of 100 requests repeat once, start with 3%. Revisit after incident fixes or model changes.
Apply it when your workload can be queued or processed asynchronously and your provider offers a reduced rate for batch jobs. Leave it at 0% for interactive chat flows where latency matters.
Token charges ignore supporting costs like gateways, vector stores, monitoring, and incident response. Overhead percentage helps you price internal APIs realistically and prevents budget surprises as usage grows.
CSV and PDF export the most recent calculation stored in your session. Run a calculation first, then click Export CSV or Export PDF to download a snapshot of inputs, per‑request breakdown, and totals.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.