LLM Cost Calculator

Inputs

Enter token volumes, pricing, and assumptions. Results appear above after you submit.

Workload / Model label

Example: “RAG assistant”, “Support bot”, “Summarization job”.

Avg input tokens / request

Prompt + retrieved context + tool arguments.

Avg output tokens / request

Completion tokens generated per call.

Overhead tokens / request

System prompts, formatting, guardrails, tools.

Requests per day

Average daily call volume.

Days in period

Common values: 7, 30, 90, 365.

Retry rate (%)

Extra calls due to timeouts, rate limits, or re-asks.

Cache hit rate on input (%)

Share of input tokens billed at a cached rate.

Input price *

$ / 1M

Enter your provider’s input-token rate.

Cached input price

$ / 1M

Use 0 if caching does not apply.

Output price *

$ / 1M

Enter your provider’s output-token rate.

Batch/commit discount (%)

Applies to subtotal after token pricing.

Token buffer (%)

Cushion for longer prompts and variance.

Contingency margin (%)

Budget padding after discounts.

Display currency

Prices are entered in USD for consistency.

Exchange rate (1 USD = X)

Set 1 to keep values in USD.

Reset

Tip: Use the example button to insert placeholder rates, then adjust to match your provider’s current pricing.

Example data table

Use these rows to sanity-check your own assumptions.

Scenario	Requests/day	Input tokens	Output tokens	Cache hit	Token buffer	Notes
Support assistant	2,500	850 + 120 overhead	350	35%	8%	Typical RAG prompts, moderate reuse.
Document summarization	400	7,500 + 250 overhead	1,200	10%	12%	Long inputs, higher variance per file.
Agentic workflow	900	1,600 + 300 overhead	650	25%	15%	Tool calls increase overhead and retries.

Replace the placeholder rates with your token prices to estimate each scenario’s spend.

Formula used

This calculator uses transparent arithmetic so you can audit every step.

Volume and tokens

base_requests = requests_per_day × days
effective_requests = base_requests × (1 + retry_rate%)

input_tokens_per_request = avg_input_tokens + overhead_tokens
billable_input_tokens = effective_requests × input_tokens_per_request × (1 + token_buffer%)
billable_output_tokens = effective_requests × avg_output_tokens × (1 + token_buffer%)

Pricing, caching, and totals

cached_input_tokens = billable_input_tokens × cache_hit%
standard_input_tokens = billable_input_tokens − cached_input_tokens

input_cost = (standard_input_tokens ÷ 1,000,000) × input_price
+ (cached_input_tokens ÷ 1,000,000) × cached_input_price
output_cost = (billable_output_tokens ÷ 1,000,000) × output_price

subtotal = input_cost + output_cost
after_discount = subtotal × (1 − batch_discount%)
total = after_discount × (1 + contingency%)
display_total = total × exchange_rate

All prices are assumed to be “USD per 1M tokens” to avoid unit mismatches.

How to use this calculator

A quick workflow for realistic cost estimates.

Measure tokens using logs or a token counter for your prompts and responses.
Set volumes (requests/day and days) to match your forecast period.
Enter token prices from your provider and plan type.
Model real-world effects like retries, overhead, and caching.
Add buffers for variance, then apply discounts and contingency.
Export a CSV for finance, or a PDF for approvals.

Token drivers and measurement

Accurate forecasting starts with measuring tokens, not characters. Capture median and p95 input and output tokens per request from logs, then separate user text from retrieved context and guardrail overhead. A 10% increase in retrieval size can raise input tokens more than request volume. Track tool calls, system prompts, and formatting templates because they often add stable overhead that scales linearly with traffic.

Pricing inputs and cached rates

Providers usually bill input and output at different rates, so the split matters. Enter prices per million tokens and, when available, a discounted cached input rate. Cache hit rate should reflect repeated prefixes such as policies, instructions, or shared conversation state. If caching is uncertain, model conservative and optimistic scenarios to bracket risk, then revisit after a week of production telemetry.

Throughput forecasting for budgets

Requests per day should be tied to product metrics: active users, sessions, and features that trigger calls. Use days in period to align with billing cycles and planned launches. For bursty workloads, consider using a higher daily average during peak campaigns. When you forecast growth, update both volume and token averages because prompt complexity often increases as capabilities expand.

Buffers, retries, and risk controls

Retry rate captures hidden cost from timeouts, rate limits, and user re-asks. Even a 2% retry rate becomes meaningful at scale. Token buffer protects you from variance in long documents, multilingual content, and atypical agent loops. Add a separate contingency margin for budgeting approvals; it supports procurement planning and avoids mid-quarter surprises when product usage shifts. Use dashboards to validate assumptions and catch drift after releases quickly.

Reporting for stakeholder alignment

Finance teams respond well to unit economics. Use cost per request, cost per 1K tokens, and daily burn to compare models and features. Share the token breakdown to explain why caching or prompt refactors reduce spend. Export CSV for spreadsheets and a PDF for review packets. Keep a versioned snapshot of assumptions so engineering and product can iterate responsibly. For teams managing multiple applications, maintain a small cost registry that lists model, feature owner, target metrics, and monthly cap, then review it in planning meetings to adjust assumptions.

FAQs

Common questions when estimating token-based spend.

1) What token numbers should I use?

Start with real logs. Use the median for typical traffic and the 95th percentile for stress testing. Separate input, output, and overhead tokens so you can reduce cost by targeting the biggest driver.

2) How do I estimate cache hit rate?

Look for repeated prompt prefixes and stable conversation scaffolding. If you do not have telemetry yet, test 10%, 30%, and 50% scenarios. Replace assumptions after collecting a few days of production traces.

3) Why include overhead tokens?

System prompts, safety policies, formatting, and tool routing can be a consistent share of every request. Ignoring overhead can understate spend and hide optimization opportunities like shorter templates or fewer tool calls.

4) How should I model retries?

Use incident history and rate-limit behavior. Count both automatic retries and user resubmissions. A small retry rate can add substantial cost at high volume, so treat it as a reliability and budgeting metric.

5) What does token buffer represent?

It is a cushion for variability in prompts, retrieved context, and long outputs. Buffers reduce the chance of missing budget targets when inputs change, languages vary, or new features increase prompt length.

6) Can this calculator compare two models?

Yes. Run one model’s pricing and token assumptions, export results, then repeat for the alternative. Compare cost per request and cost per 1K tokens to evaluate tradeoffs alongside quality and latency.

Token drivers and measurement

Pricing inputs and cached rates

Throughput forecasting for budgets

Buffers, retries, and risk controls

Reporting for stakeholder alignment

1) What token numbers should I use?

2) How do I estimate cache hit rate?

3) Why include overhead tokens?

4) How should I model retries?

5) What does token buffer represent?

6) Can this calculator compare two models?

Related Calculators