Inputs
Example data table
| Scenario | Peak RPS | Req KB | Resp KB | Peak Conns | vCPU | CPU ms | Max Mbps | Max Conns/node | Util cap | Spare |
|---|
Formula used
Throughput
We approximate network demand from average request and response sizes:
throughput_mbps = RPS × total_kb × 1024 × 8 ÷ 1,000,000
This is a steady-state estimate; real traffic is bursty.
CPU cores
CPU cores required equals total CPU seconds per second:
cores_needed = RPS × cpu_ms_eff ÷ 1000
We then apply safety margin and utilization cap.
Nodes required
For each constraint, we compute nodes and take the maximum:
nodes_cpu = ceil( cores_needed × (1 + safety%/100) ÷ (vCPU_per_node × util) )
nodes_net = ceil( throughput_mbps × (1 + safety%/100) ÷ (max_mbps_per_node × util) )
nodes_conn = ceil( conns_peak × surge × (1 + safety%/100) ÷ (max_conns_per_node × util) )
base_nodes = max(nodes_cpu, nodes_net, nodes_conn, 1)
recommended_nodes = base_nodes + spare_nodes
How to use this calculator
- Enter peak RPS, sizes, and peak concurrent connections.
- Set per-node limits using lab tests or vendor guidance.
- Choose utilization cap and safety margin for headroom.
- Click Calculate to see recommended node counts.
- Export CSV or PDF for documentation and review.
Workload signals
Capacity planning starts with peak requests per second, concurrent connections, and payload sizes. Multiply peak RPS by a surge factor to cover bursts and cache misses. Combine request and response kilobytes to estimate per request transfer. When these inputs are measured from logs, include the 95th percentile rather than means. A small increase in response size can dominate Mbps demand during fan‑out traffic windows. Track peaks per endpoint.
CPU overhead
CPU sizing converts per request processing time into cores. If routing, header parsing, and observability consume 0.25 ms per request, 4,000 effective RPS needs roughly 1 core before margins. TLS termination adds overhead that scales with handshake rate and cipher choice, so model it as a percentage factor. Health checks and logging add cost. Keep a utilization cap, like 70%, to avoid queueing spikes during deployments and failovers.
Bandwidth sensitivity
Network sizing uses throughput Mbps = RPS × (req KB + resp KB) × 1024 × 8 ÷ 1,000,000. This highlights why gzip, image resizing, and API pagination matter. If responses average 80 KB at 2,000 RPS, demand is about 1,310 Mbps, exceeding many nodes. Apply safety margin for protocol overhead and retransmits. The calculator compares demand to per node Mbps times utilization to estimate required nodes under CDN bypass conditions.
Connection limits
Connection limits can bind before CPU or bandwidth, especially with keep‑alive, HTTP/2 multiplexing, and WebSockets. A peak of 80,000 connections with a 1.2 surge becomes 96,000, then safety margin increases it further. Per node max connections depends on memory, file descriptors, and ephemeral port tuning. Session affinity and long timeouts raise connection counts. The calculator converts the connection demand into nodes using the utilization cap and safety margin.
Headroom validation
Recommended nodes equal the maximum of CPU, bandwidth, and connection driven counts, plus spare nodes for redundancy. Safety margin protects against forecasting error, uneven distribution, and hot partitions. Use N+1 at minimum within each failure domain, and consider extra spares for patching. After sizing, validate with load tests that reproduce real headers, TLS, and latency. Revisit inputs regularly to track growth, verify assumptions, and monitor p95 utilization weekly.
FAQs
1) What does “binding constraint” mean here?
The binding constraint is the limit that forces the highest node count: CPU cores, bandwidth Mbps, or concurrent connections. Improving that bottleneck typically reduces required nodes more than tuning the other inputs.
2) How should I estimate CPU time per request?
Start with profiling or synthetic tests on one node and measure incremental CPU under known RPS. Use the p95 or p99 CPU cost for busy endpoints, then include TLS and observability overhead percentages.
3) What utilization cap should I choose?
For steady workloads, 70% is a common ceiling. For bursty traffic, latency targets, or uneven distribution, use 60–65%. Higher caps increase risk of queues and retries during incidents or deploys.
4) How do I model TLS termination accurately?
If the load balancer terminates TLS, include a realistic overhead factor based on your cipher suites and handshake rate. If TLS is terminated upstream or at clients and passed through, set the TLS overhead to zero.
5) Why do connections matter if RPS is low?
Long-lived connections consume memory, file descriptors, and kernel resources even when idle. WebSockets, slow clients, and long timeouts can push connection counts high, making connection limits the primary sizing driver.
6) How should I validate the recommended nodes?
Run load tests that match real headers, request sizes, TLS settings, and latency distribution. Verify per-node CPU, Mbps, and connection utilization stays under the chosen cap during peak and failure scenarios, then adjust inputs.