Estimator Inputs
Enter realistic averages from monitoring, load tests, or profiling. Then compare capacity and bottlenecks before scaling.
Example Data Table
These example scenarios show how workload costs shape capacity. Use them as starting points, then replace with your measured values.
| Scenario | Cores | Memory (GB) | IOPS | Gbps | CPU ms/req | IO/req | KB/req | Cache % | Peak |
|---|---|---|---|---|---|---|---|---|---|
| Light REST API | 4 | 8 | 60,000 | 1 | 3 | 0.6 | 12 | 50 | 1.4 |
| Search Service | 16 | 64 | 250,000 | 10 | 9 | 2.0 | 45 | 35 | 1.8 |
| Write-heavy Ingest | 8 | 32 | 120,000 | 2.5 | 7 | 3.5 | 22 | 15 | 2.2 |
Formula Used
- eff_disk_ios = avg_disk_ios × (1 − cache_hit)
- eff_net_kb = avg_net_kb × (1 − 0.6×cache_hit)
- eff_cpu_ms = avg_cpu_ms × (1 − 0.15×cache_hit)
- over_factor = 1 + replication_overhead
- CPU = cores × (1000 / (eff_cpu_ms×over)) × util × efficiency
- Disk = (IOPS / (eff_disk_ios×over)) × util
- Net = (KB/s / (eff_net_kb×over)) × util
- Memory = concurrency / (avg_resp_ms/1000)
- bottleneck = min(CPU, Disk, Net, Memory)
- sustained = bottleneck × (1 − headroom)
- safe_avg = sustained / peak_factor
- p95_est ≈ avg_resp × (1 + 2.2×u/(1−u)) (heuristic)
How to Use This Calculator
- Start with measured averages: CPU ms, IOs, payload KB, and response time.
- Set utilization targets below saturation to protect tail latency.
- Add realistic overhead for replication, retries, encryption, and logging.
- Use a peak factor that matches your traffic burstiness.
- Review the bottleneck, then change one variable at a time.
- Export CSV for sharing, and PDF for quick reviews.
Baseline capacity signals
A single instance can be summarized by four ceilings: CPU, disk I/O, network bandwidth, and memory concurrency. Use measured averages from load tests or production telemetry. If CPU time averages 6 ms per request at a 70% utilization target, one 8-core node budgets about 8*(1000/6)*0.70 ≈ 933 requests per second before efficiency and overhead adjustments. That number is directional, not a benchmark.
CPU budgeting with overhead
Efficiency captures virtualization, throttling, and noisy neighbors. Overhead aggregates encryption, retries, observability, and replication. With 85% efficiency and 10% overhead, the CPU ceiling becomes 933*0.85/1.10 ≈ 721 req/s. Cutting CPU time from 6 ms to 4 ms raises the ceiling by 50%, often cheaper than adding cores. Set lower utilization targets for latency-sensitive work, such as 50–60%.
Storage pressure and caching
Disk work is modeled as IOPS divided by I/Os per request, then scaled by a utilization target. A 250,000 IOPS NVMe device at a 65% target yields 162,500 usable IOPS. If each request performs 1.2 I/Os and cache hit is 35%, effective I/Os are 0.78, producing about 208,333 req/s, far above the CPU ceiling. When disk is the bottleneck, raise cache hit, batch writes, and avoid random I/O.
Network throughput realism
Bandwidth is converted to KB/s (1 Gbps ≈ 128,000 KB/s). With 18 KB per request and a 65% target, usable bandwidth is 83,200 KB/s, supporting about 4,622 req/s before overhead. Payload compression, smaller responses, and caching are direct levers. Watch retransmits and TLS record sizes, because they inflate effective KB per request during congestion.
Memory concurrency via Little’s Law
Usable memory equals total minus reserve for the OS. With 32 GB and 15% reserve, usable is 27,852 MB. At 14 MB per request, maximum concurrency is roughly 1,989 in-flight requests. With 55 ms average response time, memory suggests ~36,164 req/s, again not limiting. If memory limits first, reduce per-request allocations and cap concurrency.
Planning targets and safety margins
The estimator reports sustained throughput after headroom, then safe average after peak factor. For example, a 20% headroom and 1.5× peak factor turns 721 req/s CPU capacity into 384 req/s safe average. Iterate one change at a time and re-check the limiting resource. Validate sizing with a test and error-budget review.
FAQs
1) Why does the calculator show multiple capacities?
Each resource can become the limiter. The calculator estimates capacity for CPU, disk, network, and memory concurrency, then selects the smallest value as the practical ceiling.
2) How should I choose utilization targets?
Lower targets reduce queueing and tail latency. For user-facing APIs, 50–70% is common. For batch jobs, higher targets can be acceptable if latency is not critical.
3) What does “overhead” include?
Overhead is extra work beyond the core request: replication, retries, encryption, logging, metrics, service mesh, and middleware. Use observed CPU or network deltas to calibrate it.
4) Why does caching affect disk and network differently?
Caching often eliminates backend reads entirely, but payload reduction can be partial due to headers, auth, or dynamic fields. The model applies a stronger reduction to disk than network.
5) Is the p95 latency estimate accurate?
It is a heuristic that rises as utilization increases. Use it to compare scenarios, not to replace load testing. Validate with measured percentiles under representative traffic.
6) How do I model horizontal scaling?
Estimate safe average throughput per instance, then multiply by instance count. Account for shared bottlenecks such as databases, caches, and network egress limits at the cluster level.