Token Reuse Estimator Calculator

Estimator inputs

Total requests

Total calls evaluated for the period.

Evaluation period (days)

Used to compute requests per day.

Currency

Affects display only.

Prompt tokens per request

Includes system + user + tools text.

Completion tokens per request

Average model output length.

Prompt reuse rate (%)

Share of prompt tokens served from cache.

Token overhead margin (%)

Adds a safety buffer to token estimates.

Input price (per 1M tokens)

Standard billed prompt token price.

Cached input price (per 1M tokens)

Use your provider’s cached prompt rate.

Output price (per 1M tokens)

Billed completion token price.

Reset

Example data table

Scenario	Requests	Prompt tokens	Completion tokens	Reuse rate	Estimated savings
Support bot, stable policy header	50,000	1,600	420	60%	Lower prompt spend, same output spend
RAG Q&A, shared retrieval template	20,000	1,200	280	45%	Moderate savings, depends on cache hit rate
Agent workflow, reusable tool instructions	10,000	2,300	650	35%	Savings grow with repeated runs

Use these rows as a quick sanity check for your inputs.

Formula used

The estimator assumes token reuse applies to prompt tokens only, while completion tokens remain unchanged. A safety margin is applied to both prompt and completion.

prompt_eff = prompt_tokens × (1 + overhead%)
completion_eff = completion_tokens × (1 + overhead%)
baseline_cost = (N×prompt_eff/1e6)×input_price + (N×completion_eff/1e6)×output_price
cached_prompt = (N×prompt_eff)×reuse_rate%
reuse_cost = (fresh_prompt/1e6)×input_price + (cached_prompt/1e6)×cached_input_price + output_cost
savings = baseline_cost − reuse_cost

How to use this calculator

Estimate average prompt and completion tokens per request from logs.
Choose a realistic reuse rate based on stable prompt segments.
Enter your pricing for input, cached input, and output tokens.
Add an overhead margin if usage fluctuates across requests.
Submit to see savings above the form, then export results.

Token reuse as a measurable spend lever

Token reuse reduces billed prompt tokens when repeated instructions, policies, or tool schemas stay identical. In a 10,000‑request month, a 1,200‑token prompt produces about 12.0 million input tokens before overhead. If 55% of that prompt is cached, the estimator shifts roughly 6.6 million tokens to the cached tier. At 70% reuse, 12.0 million prompt tokens become 8.4 million cached and 3.6 million fresh. With input 3.00 and cached 1.50, prompt spend drops by 12.6 currency units per million cached tokens. This matters most when prompts exceed outputs, such as long policy headers or tool manifests in high-volume customer support systems.

Deriving reuse rate from real traffic

Start with logs and isolate stable segments: system rules, formatting templates, and shared retrieval headers. A practical method is sampling 200–500 requests, computing the repeated portion, then averaging the share. When prompts vary by user text, reuse rate often stays below 30%; with standardized workflows, 50–80% is common.

Blended prompt pricing and sensitivity checks

The calculator reports a blended prompt price per million tokens after reuse. For example, standard input at 3.00 and cached input at 1.50 yields a midpoint when reuse is near 50%. Try a sensitivity sweep: reuse 30%, 50%, 70% while keeping outputs fixed. You will see savings scale linearly with cached prompt tokens, not with completion tokens.

Overhead margin for variance and burstiness

Overhead accounts for longer prompts during edge cases, retries, or expanded tool arguments. A 5% margin turns a 1,200‑token prompt into 1,260 effective tokens and keeps forecasts from under‑budgeting. For noisy agent workloads, 10–15% is safer, especially when multi‑step reasoning or extra retrieval chunks appear.

Actions that increase reuse without harming quality

Move stable rules into a single header, keep tool instructions constant, and version templates deliberately. Use short, consistent system messages and avoid injecting dynamic timestamps into cached segments. If retrieval is needed, cache the query plan and keep only the document excerpts variable. These changes can raise reuse by 10–25 points and improve cost predictability.

FAQs

1) What does “prompt reuse rate” represent?

It is the percentage of prompt tokens served from a cache because the prompt segment is identical. The estimator applies reuse only to prompt tokens, not completions.

2) Why are completion tokens not discounted here?

Most caching approaches target repeated input. Outputs depend on user intent and model variation, so they are billed normally. This keeps forecasts conservative and comparable.

3) How do I estimate prompt and completion tokens?

Use usage fields from your provider logs or SDK responses. Average over a representative sample, then add a small overhead margin to cover spikes, retries, and longer tool arguments.

4) What if cached input costs the same as standard input?

Set cached input price equal to input price. Savings will approach zero, and the calculator becomes a token budget planner that still helps size volumes and per‑request costs.

5) Can reuse exceed 80% in production?

Yes, when system rules, templates, and tool schemas are stable and user text is short. Heavy retrieval inserts, large user messages, or frequent prompt edits typically reduce reuse.

6) What is the fastest way to raise reuse safely?

Standardize instruction blocks and keep them unchanged across requests. Avoid dynamic content inside reusable headers, and version templates so only intentional changes invalidate cached segments.

Estimator inputs

Example data table

Formula used

How to use this calculator

Token reuse as a measurable spend lever

Deriving reuse rate from real traffic

Blended prompt pricing and sensitivity checks

Overhead margin for variance and burstiness

Actions that increase reuse without harming quality

FAQs

1) What does “prompt reuse rate” represent?

2) Why are completion tokens not discounted here?

3) How do I estimate prompt and completion tokens?

4) What if cached input costs the same as standard input?

5) Can reuse exceed 80% in production?

6) What is the fastest way to raise reuse safely?

Related Calculators