| Scenario | Input Tokens | Output Tokens | Window (min) | Requests | Tokens/Min | Cost |
|---|---|---|---|---|---|---|
| Chat support burst | 90,000 | 55,000 | 30 | 180 | 4,833.333333 | 0.5100 |
| Batch summarization | 250,000 | 120,000 | 120 | 400 | 3,083.333333 | 1.2200 |
| Agent workflow | 140,000 | 140,000 | 60 | 220 | 4,666.666667 | 1.1200 |
- BaseTokens = InputTokens + OutputTokens
- TotalTokens = BaseTokens × (1 + Overhead% / 100)
- WindowMinutes = TimeValue × UnitToMinutes
- TokensPerMinute = TotalTokens / WindowMinutes
- TokensPerRequest = TotalTokens / Requests
- TotalCost = (InputTokens/1000 × InputPrice) + (OutputTokens/1000 × OutputPrice)
- CostPerMinute = TotalCost / WindowMinutes
- ProjectedMonthlyTokens = TokensPerMinute × RuntimePerDay × DaysPerMonth
- AdjustedProjection = ProjectedMonthlyTokens × (1+Growth%) × (1+Buffer%)
- ProjectedMonthlyCost ≈ AdjustedProjection × (TotalCost / TotalTokens)
- Collect input and output tokens for a measured workload window.
- Enter the request count and the exact window duration.
- Provide your input and output prices per 1K tokens.
- Set overhead, growth, and safety buffer to match reality.
- Optionally add runtime per day to forecast monthly usage.
- Press submit to see burn rate, cost rates, and warnings.
- Enable saving to export your recent runs as CSV or PDF.
Operational meaning of burn rate
Token burn rate is the pace at which your workload consumes tokens during a window. Enter input tokens, output tokens, requests, and duration, then compute tokens per minute and tokens per request. These metrics separate throughput pressure from prompt size in practice. If tokens per request rises while requests stay steady, prompts, context, or traces are expanding. If requests rise while tokens per request stays flat, traffic or concurrency is driving spend.
Cost translation for budgeting
Burn becomes actionable when converted into money. The calculator applies your input and output prices per 1K tokens, then derives cost per minute, hour, and day. This allows budget owners to set operational caps such as “cost per hour under 2.00” or “daily spend under 25.00.” Compare cost per request across features. A change from 0.004 to 0.006 per request is a 50% increase, if volume is unchanged.
Monthly forecasting with runtime and variance
Forecasting is strongest when you pair measured burn with runtime. The calculator projects monthly tokens using tokens per minute × runtime per day × days per month, then applies growth and safety buffer multipliers. Use growth for expected adoption and buffer for peak loads, retries, and long responses. If you run 180 minutes daily, a burn of 4,000 tokens per minute yields 720,000 tokens per day. Over 30 days, that is 21.6 million tokens before adjustments.
Efficiency levers and diagnostic signals
To reduce burn, target the component that moved. If tokens per request is high, shorten prompts, trim retrieved context, cap tool output, and enforce response length. If output dominates, add structured instructions, stop sequences, or concise templates. If input dominates, compress system instructions and avoid repeating guidance text. The context-limit warning is a governance guardrail: an average request above your set limit indicates truncation risk, latency spikes, or runaway tool traces.
Governance and reporting workflows
Professional reporting favors repeatable snapshots. Save runs to the session log, export CSV for spreadsheets, and export PDF for stakeholders. Track token burn during peak and off-peak windows, then benchmark changes after releases. Pair the burn report with a decision rule: if projected monthly cost exceeds budget, reduce runtime, reduce tokens per request, or adjust feature rollout. Over time, the saved log becomes a lightweight audit trail for spend reviews and capacity planning.
FAQs
1) What is the difference between tokens per request and tokens per minute?
Tokens per request measures average request size. Tokens per minute measures throughput over time. Together they distinguish prompt expansion from rising traffic or concurrency.
2) Why do I need separate input and output prices?
Many billing plans price input and output differently. Using both rates improves cost estimates and highlights whether prompts or responses are driving spend.
3) What should I set for overhead percentage?
Use overhead for retries, tooling metadata, and logging. Start with 3–10% for stable workloads, then tune using real measurements from peak windows.
4) How does the calculator estimate projected monthly cost?
It converts measured burn into monthly tokens using runtime and days, applies growth and buffer, then multiplies by observed cost per token from your window.
5) What does the context-limit warning mean?
If average tokens per request exceed your limit, requests may truncate, slow down, or fail. Reduce context, compress prompts, or enforce shorter tool outputs.
6) Can I use the exports for ongoing reporting?
Yes. Enable saving, then export CSV for trend analysis and PDF for stakeholder updates. The log keeps the most recent 50 calculations per session.