This sample shows how inputs influence outputs across typical usage levels.
| Scenario | Requests | Duration (ms) | Memory (MB) | GB-s (est.) | Notes |
|---|---|---|---|---|---|
| API light | 250,000 | 120 | 256 | 7,500 | Small payloads, short handlers |
| API steady | 2,000,000 | 250 | 512 | 250,000 | Common backend workload |
| Batch burst | 10,000,000 | 900 | 1024 | 8,789,063 | ETL bursts, higher memory usage |
- Effective Requests = Requests × Concurrency/Retry Factor
- GB-seconds = (MemoryMB ÷ 1024) × (DurationMs ÷ 1000) × Effective Requests
- Billable Requests = max(0, Effective Requests − Free Requests)
- Billable GB-seconds = max(0, GB-seconds − Free GB-seconds)
- Request Cost = (Billable Requests ÷ 1,000,000) × Price per Million
- Compute Cost = Billable GB-seconds × Price per GB-second
- Extras = (EgressGB × Rate) + (StorageGB-month × Rate) + (LogsGB × Rate)
- Subtotal = Request Cost + Compute Cost + Extras
- Overhead = Subtotal × Overhead%
- Tax = (Subtotal + Overhead) × Tax%
- Grand Total = Subtotal + Overhead + Tax
- Enter monthly requests, average duration, and allocated memory.
- Set request and compute rates to match your pricing sheet.
- Fill free-tier allowances to estimate real billable usage.
- Add optional egress, storage, and logs if they apply.
- Use overhead and tax fields for a more complete budget view.
- Press Calculate to see totals, breakdown, and chart above.
- Download CSV or PDF to share assumptions with stakeholders.
Cost drivers and why they matter
Serverless spend tends to cluster around two levers: execution volume and memory-time. A workload with 2,000,000 monthly requests, 512 MB memory, and 250 ms duration produces roughly 250,000 GB-seconds. Because the model is linear, a 15% traffic lift usually means a 15% bill lift unless you also optimize runtime. Use the cost-per-million figure to benchmark services across teams and prioritize optimization work during capacity planning cycles.
Compute usage as GB-seconds
The calculator converts memory and duration into GB-seconds using (MB/1024)×(ms/1000)×effective requests. Doubling memory from 512 MB to 1024 MB doubles compute usage, even if requests stay fixed. Cutting duration from 250 ms to 200 ms reduces compute by 20%. For teams tracking p95 latency, test a “high-load” duration, like 600 ms, to estimate worst-month exposure.
Request charges and free tiers
Request pricing is commonly quoted per million invocations. At 0.20 per million, 3,000,000 billable requests cost 0.60. A 1,000,000 request allowance can eliminate request charges for small APIs, but compute can still be billable. Likewise, a 400,000 GB-second allowance may cover dev environments, while production quickly exceeds it.
Retries, fan-out, and effective requests
Modern architectures amplify invocations through retries, queues, and map-style fan-out. The concurrency/retry factor models this as a multiplier. A factor of 1.10 represents about 10% extra executions; 1.30 can reflect aggressive parallelism or noisy dependencies. That uplift applies to both request and compute charges. Pair this with error budgets and idempotency to reduce wasted work.
Network, storage, and observability extras
Extras often surprise budgets because they scale with data, not invocations. For example, 500 GB of egress at 0.09 adds 45.00 monthly. Storing 2,000 GB-month at 0.023 adds 46.00. Logging 50 GB at 0.50 adds 25.00. If your system emits 5 KB logs per request, 2,000,000 requests yields about 10 GB logs, before compression and sampling.
Budgeting with overhead and tax
Finance reviews often require all-in numbers, not just provider line items. Overhead can represent tooling subscriptions, on-call coverage, and reliability work. On a 300 subtotal, 8% overhead adds 24.00. Applying 16% tax on subtotal plus overhead adds 51.84, producing a 375.84 total. Re-run the model after each release to compare cost per million executions and keep a stable monthly baseline.
What does GB-seconds represent here?
It is memory-time usage. The calculator converts memory (GB) multiplied by runtime (seconds) for each execution, then sums across effective requests to estimate billable compute.
Why include a concurrency/retry factor?
Retries, fan-out, and duplicate events increase executions beyond raw traffic. The factor scales requests and GB-seconds together, giving a practical estimate when you only know average uplift.
How do I match a specific provider’s pricing?
Replace request, GB-second, egress, storage, and logs rates with your published prices, and set free-tier allowances to the correct monthly values.
Should I use average duration or p95 duration?
Use average for expected spend and p95 for risk planning. Running both shows the sensitivity of compute cost to latency spikes during peak load.
What costs are not covered?
This model targets common serverless lines. Managed databases, API gateways, queues, build pipelines, and third-party monitoring may add separate charges you should budget outside this estimate.
How accurate is the estimate?
Accuracy depends on input quality. With measured requests, memory, and duration, it can be close for compute and request lines. Validate monthly by exporting the report and comparing to invoices.