Size clusters from traffic, latency, and SLAs. Compare node shapes and set utilization targets quickly. Export results to share plans with your team today.
| Scenario | Peak RPS | CPU ms/req | p95 latency (ms) | Node shape | Buffers | Result nodes |
|---|---|---|---|---|---|---|
| API + cache | 750 | 12 | 180 | 8 vCPU / 32 GB | Safety 20%, Growth 15%, N+1 | 6 |
| Batch + worker | 120 | 55 | 600 | 16 vCPU / 64 GB | Safety 25%, Growth 10% | 3 |
| Stateful store | 350 | 18 | 250 | 8 vCPU / 32 GB / 2 TB | Rep 3×, Retention 14d | 8 |
1) Concurrency (optional auto-estimate)
If concurrency is set to 0, the calculator uses: Concurrency ≈ ceil(RPS × p95LatencySeconds).
2) CPU cores required
CPUcores = (RPS × CPUms/1000) / (TargetCPU × (1−CPUOverhead))
Then it applies buffers: × (1+Safety) × (1+Growth).
3) Memory required
InflightGB = (Concurrency × MBperInflight) / 1024
MemGB = (InflightGB + WorkingSetGB) / (TargetMem × (1−MemOverhead))
Then it applies the same buffers.
4) Storage required
StorageGB = IngestPerDay × RetentionDays × (1+StorageOverhead) × Replication, then apply buffers.
5) Network required
Mbps = RPS × (ReqKB+RespKB) × 8 / 1024, then apply buffers.
The recommended node count is the maximum nodes required by CPU, memory, storage, and network, with availability rules applied.
No. It’s a planning baseline that uses simplified resource models. Validate with load tests, real traces, and production dashboards, then adjust node shape, overhead, and buffers accordingly.
Concurrency drives inflight memory and can expose contention. If latency rises under load, concurrency increases, which raises memory and CPU needs. Using p95 latency helps capture typical peak behavior.
Use measured on-CPU time from profiling or distributed tracing at steady load. Avoid local dev numbers. If workloads vary, use a weighted average or size for the most expensive critical endpoints.
Include OS, agents, sidecars, runtime services, and reserved capacity. In many clusters, 5–20% CPU and 10–25% memory is common, but measure your baseline node usage to be sure.
Headroom reduces tail latency and helps absorb bursts, rebalancing, and background work. Running near 100% risks queueing, retries, and cascading failures during peak or partial outages.
Lower retention, reduce replication, enable compression, tier cold data, or shrink payloads. Also review overhead assumptions for indexing and compaction; stateful systems often require extra free space for stability.
It’s a simple, conservative rule for small to mid clusters. For larger fleets, you might model a percentage reserve instead. Use your SLOs and failure history to pick an availability strategy.
Consider autoscaling behavior, bin packing efficiency, pod limits, disk IOPS, shard counts, cache hit rates, and maintenance windows. For multi-region systems, include cross-region replication bandwidth and failover traffic.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.