Inputs
Formula used
This calculator separates request-driven tokens from background usage, then applies seasonality, peak load, and a safety buffer.
InReqTokm = Requestsm × AvgInputTok × Seasonality × Peak
OutReqTokm = Requestsm × AvgOutputTok × Seasonality × Peak
InputTokm = InReqTokm + (Usersm × EmbedTokPerUser) + BatchTok
TotalTokm = (InputTokm + OutReqTokm) × (1 + Buffer%)
Costm = (InputTok × BlendedIn + OutReqTok × BlendedOut) / 1,000,000
- Users evolve monthly using growth then churn.
- Seasonality and Peak adjust only request-driven tokens.
- Buffer adds headroom for bursts, retries, and variance.
How to use this calculator
- Set your start month and forecast horizon.
- Enter users, growth, and churn to model adoption.
- Define requests per user and typical input/output tokens.
- Add embedding and batch usage for background demand.
- Use seasonality for predictable swings and a peak month for launches.
- Apply a safety buffer for operational headroom.
- Fill in mix and rates to estimate monthly and total spend.
- Press Calculate forecast, then export CSV or PDF.
Demand inputs you can measure
Start with active users, daily requests, and typical tokens per request. For example, 500 users at 1.5 requests per day produce about 22,500 requests in a 30‑day month. If each request averages 550 input tokens and 750 output tokens, request traffic alone is roughly 29.3 million tokens. These values are observable from logs, gateway metrics, or vendor dashboards, so your forecast can be audited and updated monthly.
Turning usage into monthly requests
The calculator projects users forward using growth then churn. With 8% growth and 2% churn, 500 users become about 529 users next month, then 559 the month after. Monthly requests scale with days in month, so February often forecasts fewer requests than March even with higher users. This matters for quota planning because shorter months can hide ramp risk until a longer month arrives.
Separating input, output, and background load
Request-driven tokens are split into input and output to mirror real billing and latency patterns. Background usage is added separately as embedding tokens per user plus fixed batch tokens per month. Example: 1,200 embedding tokens per user adds 600,000 tokens at 500 users, while a 300,000 batch job adds a predictable floor. Seasonality adjusts only request traffic, and a single peak month multiplier models launches or marketing bursts.
Cost modeling with blended rates
Many teams route traffic across multiple models. The mix section blends prices using shares that total 100%. If 70% uses standard rates and 30% uses premium rates, the calculator computes blended input and output costs per million tokens. You can decide whether the safety buffer impacts cost. Operationally, apply the buffer to tokens for capacity, and apply it to cost when budgeting conservatively.
Scenario review and export workflow
After calculation, review totals, monthly averages, and the peak month identified by buffered tokens. Compare scenarios by changing one driver at a time: growth, output length, or seasonality. Export CSV for full month-by-month analysis in spreadsheets, and export PDF for stakeholder reviews. Treat the forecast as a living plan, updating inputs after each billing cycle to tighten variance. For high-volume apps, track p95 tokens per request to avoid underestimating long responses unexpectedly.
FAQs
What is the difference between input and output tokens?
Input tokens cover prompts, system text, and retrieved context. Output tokens are the generated responses. Tracking them separately improves cost accuracy, because prices and lengths often differ between what you send and what the model returns.
How should I choose the safety buffer?
Use historical variance between forecast and actual usage. Many teams start with 10% to 25% for steady traffic, then adjust after two billing cycles. Increase the buffer for launches, unstable prompts, or frequent retries.
Why does February sometimes forecast fewer requests?
Requests depend on days in the month. Even with growing users, a 28‑day month can produce fewer total requests than a 31‑day month. This effect is helpful when setting monthly quotas and alert thresholds.
How do model shares affect the cost estimate?
Shares blend the input and output prices into a single effective rate. If premium traffic rises from 30% to 50%, blended costs increase even if tokens stay constant. Keep shares aligned with routing rules and product tiers.
Should embedding and batch tokens be treated as input tokens?
Often yes, because they are typically sent to an endpoint without long generated text. This calculator books them on the input side for budgeting. If your provider bills them differently, adjust the rates or interpret costs separately.
How do I validate the forecast against real usage?
Compare monthly totals and peak days to your logs or provider reports. Update requests per user, token averages, and background jobs using measured medians and p95 values. Re-run scenarios after each release that changes prompt size or output length.
Example data table
Illustrative numbers only. Your results will differ based on inputs.
| Month | Users | Requests | Input tokens | Output tokens | Total tokens | Cost |
|---|---|---|---|---|---|---|
| Jan 2026 | 500 | 23,250 | 13,337,500 | 17,437,500 | 33,852,500 | USD 179.10 |
| Feb 2026 | 529 | 22,218 | 13,168,080 | 16,663,500 | 32,814,738 | USD 173.64 |
| Mar 2026 | 559 | 26,020 | 15,311,000 | 19,515,000 | 38,307,600 | USD 203.31 |