Token Burn Rate Calculator

Calculator Inputs

Enter usage, time window, and pricing to compute burn.

CSV PDF

Input tokens

Tokens from prompts, system, and tools inputs.

Output tokens

Tokens generated in responses or completions.

Requests in window

Total calls measured during the time window.

Time window

Use a consistent measurement interval.

Input price per 1K tokens

Your current billing rate for input tokens.

Output price per 1K tokens

Your current billing rate for output tokens.

Overhead factor (%)

Adds buffer for metadata, retries, or tooling.

Context limit (tokens)

Used only for warnings against average request size.

Projected runtime per day (minutes)

Minutes you expect the workload to run daily.

Days per month

30 is typical for forecasting.

Growth rate (%)

Optional uplift for expected traffic growth.

Safety buffer (%)

Adds margin for peak load and variance.

Daily budget

Used to estimate daily budget runway.

Monthly budget

Used to flag overages on monthly forecast.

Save this calculation to the log

Enables CSV/PDF export with your last 50 runs.

New Session View

Example Data Table

Scenario	Input Tokens	Output Tokens	Window (min)	Requests	Tokens/Min	Cost
Chat support burst	90,000	55,000	30	180	4,833.333333	0.5100
Batch summarization	250,000	120,000	120	400	3,083.333333	1.2200
Agent workflow	140,000	140,000	60	220	4,666.666667	1.1200

Numbers illustrate typical patterns; adjust pricing to match your plan.

Formula Used

BaseTokens = InputTokens + OutputTokens
TotalTokens = BaseTokens × (1 + Overhead% / 100)
WindowMinutes = TimeValue × UnitToMinutes
TokensPerMinute = TotalTokens / WindowMinutes
TokensPerRequest = TotalTokens / Requests
TotalCost = (InputTokens/1000 × InputPrice) + (OutputTokens/1000 × OutputPrice)
CostPerMinute = TotalCost / WindowMinutes
ProjectedMonthlyTokens = TokensPerMinute × RuntimePerDay × DaysPerMonth
AdjustedProjection = ProjectedMonthlyTokens × (1+Growth%) × (1+Buffer%)
ProjectedMonthlyCost ≈ AdjustedProjection × (TotalCost / TotalTokens)

How to Use This Calculator

Collect input and output tokens for a measured workload window.
Enter the request count and the exact window duration.
Provide your input and output prices per 1K tokens.
Set overhead, growth, and safety buffer to match reality.
Optionally add runtime per day to forecast monthly usage.
Press submit to see burn rate, cost rates, and warnings.
Enable saving to export your recent runs as CSV or PDF.

Operational meaning of burn rate

Token burn rate is the pace at which your workload consumes tokens during a window. Enter input tokens, output tokens, requests, and duration, then compute tokens per minute and tokens per request. These metrics separate throughput pressure from prompt size in practice. If tokens per request rises while requests stay steady, prompts, context, or traces are expanding. If requests rise while tokens per request stays flat, traffic or concurrency is driving spend.

Cost translation for budgeting

Burn becomes actionable when converted into money. The calculator applies your input and output prices per 1K tokens, then derives cost per minute, hour, and day. This allows budget owners to set operational caps such as “cost per hour under 2.00” or “daily spend under 25.00.” Compare cost per request across features. A change from 0.004 to 0.006 per request is a 50% increase, if volume is unchanged.

Monthly forecasting with runtime and variance

Forecasting is strongest when you pair measured burn with runtime. The calculator projects monthly tokens using tokens per minute × runtime per day × days per month, then applies growth and safety buffer multipliers. Use growth for expected adoption and buffer for peak loads, retries, and long responses. If you run 180 minutes daily, a burn of 4,000 tokens per minute yields 720,000 tokens per day. Over 30 days, that is 21.6 million tokens before adjustments.

Efficiency levers and diagnostic signals

To reduce burn, target the component that moved. If tokens per request is high, shorten prompts, trim retrieved context, cap tool output, and enforce response length. If output dominates, add structured instructions, stop sequences, or concise templates. If input dominates, compress system instructions and avoid repeating guidance text. The context-limit warning is a governance guardrail: an average request above your set limit indicates truncation risk, latency spikes, or runaway tool traces.

Governance and reporting workflows

Professional reporting favors repeatable snapshots. Save runs to the session log, export CSV for spreadsheets, and export PDF for stakeholders. Track token burn during peak and off-peak windows, then benchmark changes after releases. Pair the burn report with a decision rule: if projected monthly cost exceeds budget, reduce runtime, reduce tokens per request, or adjust feature rollout. Over time, the saved log becomes a lightweight audit trail for spend reviews and capacity planning.

FAQs

1) What is the difference between tokens per request and tokens per minute?

Tokens per request measures average request size. Tokens per minute measures throughput over time. Together they distinguish prompt expansion from rising traffic or concurrency.

2) Why do I need separate input and output prices?

Many billing plans price input and output differently. Using both rates improves cost estimates and highlights whether prompts or responses are driving spend.

3) What should I set for overhead percentage?

Use overhead for retries, tooling metadata, and logging. Start with 3–10% for stable workloads, then tune using real measurements from peak windows.

4) How does the calculator estimate projected monthly cost?

It converts measured burn into monthly tokens using runtime and days, applies growth and buffer, then multiplies by observed cost per token from your window.

5) What does the context-limit warning mean?

If average tokens per request exceed your limit, requests may truncate, slow down, or fail. Reduce context, compress prompts, or enforce shorter tool outputs.

6) Can I use the exports for ongoing reporting?

Yes. Enable saving, then export CSV for trend analysis and PDF for stakeholder updates. The log keeps the most recent 50 calculations per session.

Saved Calculations (Last 50)

Download CSV Download PDF Clear Log

No saved runs yet. Submit the form with saving enabled.