Monthly Token Forecast Calculator

Inputs

Tune growth, usage, token mix, and pricing to forecast demand.

White theme

Scenario name

Used in exports and result header.

Start month

Forecast begins on this month.

Forecast months

Choose 1–24 months.

Starting active users

Users in the first month.

Monthly user growth %

Applied before churn each month.

Monthly churn %

Fraction of users lost monthly.

Requests per user per day

Average daily requests per active user.

Avg input tokens per request

Prompt + context tokens on average.

Avg output tokens per request

Completion tokens on average.

Embedding tokens per user per month

Search, indexing, or retrieval usage.

Batch tokens per month

Offline jobs, evals, fine-tuning prep, etc.

Seasonality % (requests)

Applies to request-driven tokens.

Peak multiplier

Extra load for one peak month.

Peak month index

0 = none, 1 = first month, etc.

Safety buffer %

Applied to total tokens each month.

Apply buffer to cost estimate

Use buffered tokens for billing projection.

Pricing and mix

Enter your blended rates and model share. Shares must total 100%.

Standard share %

Premium share %

Currency (3 letters)

Example: USD, EUR, PKR.

Standard input price per 1M

Applied to input-side tokens.

Standard output price per 1M

Premium input price per 1M

Premium output price per 1M

Formula used

This calculator separates request-driven tokens from background usage, then applies seasonality, peak load, and a safety buffer.

Requests_m = Users_m × ReqPerUserPerDay × DaysInMonth
InReqTok_m = Requests_m × AvgInputTok × Seasonality × Peak
OutReqTok_m = Requests_m × AvgOutputTok × Seasonality × Peak
InputTok_m = InReqTok_m + (Users_m × EmbedTokPerUser) + BatchTok
TotalTok_m = (InputTok_m + OutReqTok_m) × (1 + Buffer%)
Cost_m = (InputTok × BlendedIn + OutReqTok × BlendedOut) / 1,000,000

Users evolve monthly using growth then churn.
Seasonality and Peak adjust only request-driven tokens.
Buffer adds headroom for bursts, retries, and variance.

How to use this calculator

Set your start month and forecast horizon.
Enter users, growth, and churn to model adoption.
Define requests per user and typical input/output tokens.
Add embedding and batch usage for background demand.
Use seasonality for predictable swings and a peak month for launches.
Apply a safety buffer for operational headroom.
Fill in mix and rates to estimate monthly and total spend.
Press Calculate forecast, then export CSV or PDF.

Demand inputs you can measure

Start with active users, daily requests, and typical tokens per request. For example, 500 users at 1.5 requests per day produce about 22,500 requests in a 30‑day month. If each request averages 550 input tokens and 750 output tokens, request traffic alone is roughly 29.3 million tokens. These values are observable from logs, gateway metrics, or vendor dashboards, so your forecast can be audited and updated monthly.

Turning usage into monthly requests

The calculator projects users forward using growth then churn. With 8% growth and 2% churn, 500 users become about 529 users next month, then 559 the month after. Monthly requests scale with days in month, so February often forecasts fewer requests than March even with higher users. This matters for quota planning because shorter months can hide ramp risk until a longer month arrives.

Separating input, output, and background load

Request-driven tokens are split into input and output to mirror real billing and latency patterns. Background usage is added separately as embedding tokens per user plus fixed batch tokens per month. Example: 1,200 embedding tokens per user adds 600,000 tokens at 500 users, while a 300,000 batch job adds a predictable floor. Seasonality adjusts only request traffic, and a single peak month multiplier models launches or marketing bursts.

Cost modeling with blended rates

Many teams route traffic across multiple models. The mix section blends prices using shares that total 100%. If 70% uses standard rates and 30% uses premium rates, the calculator computes blended input and output costs per million tokens. You can decide whether the safety buffer impacts cost. Operationally, apply the buffer to tokens for capacity, and apply it to cost when budgeting conservatively.

Scenario review and export workflow

After calculation, review totals, monthly averages, and the peak month identified by buffered tokens. Compare scenarios by changing one driver at a time: growth, output length, or seasonality. Export CSV for full month-by-month analysis in spreadsheets, and export PDF for stakeholder reviews. Treat the forecast as a living plan, updating inputs after each billing cycle to tighten variance. For high-volume apps, track p95 tokens per request to avoid underestimating long responses unexpectedly.

FAQs

What is the difference between input and output tokens?

Input tokens cover prompts, system text, and retrieved context. Output tokens are the generated responses. Tracking them separately improves cost accuracy, because prices and lengths often differ between what you send and what the model returns.

How should I choose the safety buffer?

Use historical variance between forecast and actual usage. Many teams start with 10% to 25% for steady traffic, then adjust after two billing cycles. Increase the buffer for launches, unstable prompts, or frequent retries.

Why does February sometimes forecast fewer requests?

Requests depend on days in the month. Even with growing users, a 28‑day month can produce fewer total requests than a 31‑day month. This effect is helpful when setting monthly quotas and alert thresholds.

How do model shares affect the cost estimate?

Shares blend the input and output prices into a single effective rate. If premium traffic rises from 30% to 50%, blended costs increase even if tokens stay constant. Keep shares aligned with routing rules and product tiers.

Should embedding and batch tokens be treated as input tokens?

Often yes, because they are typically sent to an endpoint without long generated text. This calculator books them on the input side for budgeting. If your provider bills them differently, adjust the rates or interpret costs separately.

How do I validate the forecast against real usage?

Compare monthly totals and peak days to your logs or provider reports. Update requests per user, token averages, and background jobs using measured medians and p95 values. Re-run scenarios after each release that changes prompt size or output length.

Example data table

Illustrative numbers only. Your results will differ based on inputs.

Month	Users	Requests	Input tokens	Output tokens	Total tokens	Cost
Jan 2026	500	23,250	13,337,500	17,437,500	33,852,500	USD 179.10
Feb 2026	529	22,218	13,168,080	16,663,500	32,814,738	USD 173.64
Mar 2026	559	26,020	15,311,000	19,515,000	38,307,600	USD 203.31

Example totals assume a 10% buffer and blended rates.

Inputs

Pricing and mix

Formula used

How to use this calculator

Demand inputs you can measure

Turning usage into monthly requests

Separating input, output, and background load

Cost modeling with blended rates

Scenario review and export workflow

FAQs

What is the difference between input and output tokens?

How should I choose the safety buffer?

Why does February sometimes forecast fewer requests?

How do model shares affect the cost estimate?

Should embedding and batch tokens be treated as input tokens?

How do I validate the forecast against real usage?

Example data table

Related Calculators