Cluster Sizing Calculator

Calculator inputs

Fill the fields, then press Calculate. Leave concurrency as 0 to auto-estimate from RPS and p95 latency.

Workload

Peak requests per second (RPS) CPU time per request (ms) p95 latency target (ms) Concurrency (0 = auto) Memory per inflight request (MB)

Payload & network

Average request size (KB) Average response size (KB) Working set in memory (GB)

Network is estimated from (request + response) payload at peak RPS, then adjusted by safety and growth buffers.

Storage

Ingest per day (GB) Retention (days) Replication factor Storage overhead (%)

Overhead can include indexing, metadata, and compaction headroom.

Node shape

vCPU per node RAM per node (GB) Usable storage per node (GB) Network per node (Mbps)

Utilization targets

Target CPU utilization (%) Target memory utilization (%) CPU overhead (%) Memory overhead (%)

Overhead accounts for OS, agents, sidecars, and runtime services.

Buffers & availability

Safety buffer (%) Growth buffer (%) Availability zones count

Add one extra node (N+1 headroom)

Buffers model bursts and forecast growth. N+1 helps absorb node loss.

Reset

Example data

Use this as a starting point, then tune with real measurements.

Scenario	Peak RPS	CPU ms/req	p95 latency (ms)	Node shape	Buffers	Result nodes
API + cache	750	12	180	8 vCPU / 32 GB	Safety 20%, Growth 15%, N+1	6
Batch + worker	120	55	600	16 vCPU / 64 GB	Safety 25%, Growth 10%	3
Stateful store	350	18	250	8 vCPU / 32 GB / 2 TB	Rep 3×, Retention 14d	8

Example rows are illustrative; your results will vary by workload and headroom needs.

Formula used

1) Concurrency (optional auto-estimate)

If concurrency is set to 0, the calculator uses: Concurrency ≈ ceil(RPS × p95LatencySeconds).

2) CPU cores required

CPUcores = (RPS × CPUms/1000) / (TargetCPU × (1−CPUOverhead))

Then it applies buffers: × (1+Safety) × (1+Growth).

3) Memory required

InflightGB = (Concurrency × MBperInflight) / 1024

MemGB = (InflightGB + WorkingSetGB) / (TargetMem × (1−MemOverhead))

Then it applies the same buffers.

4) Storage required

StorageGB = IngestPerDay × RetentionDays × (1+StorageOverhead) × Replication, then apply buffers.

5) Network required

Mbps = RPS × (ReqKB+RespKB) × 8 / 1024, then apply buffers.

The recommended node count is the maximum nodes required by CPU, memory, storage, and network, with availability rules applied.

How to use this calculator

Enter peak traffic (RPS) and a realistic p95 latency target.
Use measured CPU time per request from profiling or tracing.
Set concurrency to 0 to auto-estimate, or enter a known value.
Choose a node shape that matches your provider or on-prem hardware.
Set utilization targets to avoid sustained saturation.
Add buffers for load spikes and forecast growth.
Press Calculate to see nodes and resource totals.
Export CSV or PDF to share sizing decisions.

FAQs

1) Is this sizing exact?

No. It’s a planning baseline that uses simplified resource models. Validate with load tests, real traces, and production dashboards, then adjust node shape, overhead, and buffers accordingly.

2) Why does concurrency matter?

Concurrency drives inflight memory and can expose contention. If latency rises under load, concurrency increases, which raises memory and CPU needs. Using p95 latency helps capture typical peak behavior.

3) What should I put for CPU time per request?

Use measured on-CPU time from profiling or distributed tracing at steady load. Avoid local dev numbers. If workloads vary, use a weighted average or size for the most expensive critical endpoints.

4) How do I estimate overhead?

Include OS, agents, sidecars, runtime services, and reserved capacity. In many clusters, 5–20% CPU and 10–25% memory is common, but measure your baseline node usage to be sure.

5) Why use utilization targets like 60–70%?

Headroom reduces tail latency and helps absorb bursts, rebalancing, and background work. Running near 100% risks queueing, retries, and cascading failures during peak or partial outages.

6) Storage seems huge. What can reduce it?

Lower retention, reduce replication, enable compression, tier cold data, or shrink payloads. Also review overhead assumptions for indexing and compaction; stateful systems often require extra free space for stability.

7) Does N+1 always mean “add one node”?

It’s a simple, conservative rule for small to mid clusters. For larger fleets, you might model a percentage reserve instead. Use your SLOs and failure history to pick an availability strategy.

8) What else should I consider beyond this calculator?

Consider autoscaling behavior, bin packing efficiency, pod limits, disk IOPS, shard counts, cache hit rates, and maintenance windows. For multi-region systems, include cross-region replication bandwidth and failover traffic.

Calculator inputs

Example data

Formula used

How to use this calculator

FAQs

1) Is this sizing exact?

2) Why does concurrency matter?

3) What should I put for CPU time per request?

4) How do I estimate overhead?

5) Why use utilization targets like 60–70%?

6) Storage seems huge. What can reduce it?

7) Does N+1 always mean “add one node”?

8) What else should I consider beyond this calculator?

Related Calculators