Load Balancer Capacity Calculator

Inputs

Enter peak workload and per-node limits. Use the utilization cap to keep headroom.

Workload Peak numbers should reflect the busiest period.

Peak requests per second (RPS) *

Example: 2500 means 2,500 requests each second.

Surge factor (multiplier)

Use 1.20 for 20% burst above your peak estimate.

Peak concurrent connections

Active connections at peak, including keep-alives.

Avg request size (KB)

HTTP request payload + headers approximation.

Avg response size (KB)

Average bytes returned to clients per request.

TLS CPU overhead (%)

Set 0 if TLS terminates elsewhere.

Per-node limits If a limit is unknown, keep the default and rely on other constraints.

vCPU per node *

Total CPU capacity per node.

CPU time per request (ms) *

Routing + parsing + TLS (before overhead factors).

Max throughput per node (Mbps)

Sustained network capacity under normal conditions.

Max connections per node

Based on memory, ports, and tuning.

Health-check overhead (%)

Extra CPU overhead from checks, logging, and metrics.

Utilization cap per node (%)

Lower values increase headroom (recommended 60–80).

Safety and redundancy These parameters protect against uncertainty and failures.

Safety margin (%)

Adds capacity for estimation error and growth.

Spare nodes (N+)

Add at least 1 for N+1 in a region/zone.

Scenario notes (optional)

Included in exports and share links.

Tip: If you already used peak values, keep surge factor at 1.00.

Example data table

Click “Load” to populate inputs and calculate.

Scenario	Peak RPS	Req KB	Resp KB	Peak Conns	vCPU	CPU ms	Max Mbps	Max Conns/node	Util cap	Spare

Formula used

Throughput

We approximate network demand from average request and response sizes:

total_kb = req_kb + resp_kb
throughput_mbps = RPS × total_kb × 1024 × 8 ÷ 1,000,000

This is a steady-state estimate; real traffic is bursty.

CPU cores

CPU cores required equals total CPU seconds per second:

cpu_ms_eff = cpu_ms × (1 + tls%/100) × (1 + health%/100)
cores_needed = RPS × cpu_ms_eff ÷ 1000

We then apply safety margin and utilization cap.

Nodes required

For each constraint, we compute nodes and take the maximum:

util = utilization_cap% ÷ 100
nodes_cpu = ceil( cores_needed × (1 + safety%/100) ÷ (vCPU_per_node × util) )
nodes_net = ceil( throughput_mbps × (1 + safety%/100) ÷ (max_mbps_per_node × util) )
nodes_conn = ceil( conns_peak × surge × (1 + safety%/100) ÷ (max_conns_per_node × util) )
base_nodes = max(nodes_cpu, nodes_net, nodes_conn, 1)
recommended_nodes = base_nodes + spare_nodes

How to use this calculator

Enter peak RPS, sizes, and peak concurrent connections.
Set per-node limits using lab tests or vendor guidance.
Choose utilization cap and safety margin for headroom.
Click Calculate to see recommended node counts.
Export CSV or PDF for documentation and review.

Note: Validate outputs with load tests and real telemetry. Consider regional failover, session persistence, and protocol choices (HTTP/2, gRPC, WebSockets).

Workload signals

Capacity planning starts with peak requests per second, concurrent connections, and payload sizes. Multiply peak RPS by a surge factor to cover bursts and cache misses. Combine request and response kilobytes to estimate per request transfer. When these inputs are measured from logs, include the 95th percentile rather than means. A small increase in response size can dominate Mbps demand during fan‑out traffic windows. Track peaks per endpoint.

CPU overhead

CPU sizing converts per request processing time into cores. If routing, header parsing, and observability consume 0.25 ms per request, 4,000 effective RPS needs roughly 1 core before margins. TLS termination adds overhead that scales with handshake rate and cipher choice, so model it as a percentage factor. Health checks and logging add cost. Keep a utilization cap, like 70%, to avoid queueing spikes during deployments and failovers.

Bandwidth sensitivity

Network sizing uses throughput Mbps = RPS × (req KB + resp KB) × 1024 × 8 ÷ 1,000,000. This highlights why gzip, image resizing, and API pagination matter. If responses average 80 KB at 2,000 RPS, demand is about 1,310 Mbps, exceeding many nodes. Apply safety margin for protocol overhead and retransmits. The calculator compares demand to per node Mbps times utilization to estimate required nodes under CDN bypass conditions.

Connection limits

Connection limits can bind before CPU or bandwidth, especially with keep‑alive, HTTP/2 multiplexing, and WebSockets. A peak of 80,000 connections with a 1.2 surge becomes 96,000, then safety margin increases it further. Per node max connections depends on memory, file descriptors, and ephemeral port tuning. Session affinity and long timeouts raise connection counts. The calculator converts the connection demand into nodes using the utilization cap and safety margin.

Headroom validation

Recommended nodes equal the maximum of CPU, bandwidth, and connection driven counts, plus spare nodes for redundancy. Safety margin protects against forecasting error, uneven distribution, and hot partitions. Use N+1 at minimum within each failure domain, and consider extra spares for patching. After sizing, validate with load tests that reproduce real headers, TLS, and latency. Revisit inputs regularly to track growth, verify assumptions, and monitor p95 utilization weekly.

FAQs

1) What does “binding constraint” mean here?

The binding constraint is the limit that forces the highest node count: CPU cores, bandwidth Mbps, or concurrent connections. Improving that bottleneck typically reduces required nodes more than tuning the other inputs.

2) How should I estimate CPU time per request?

Start with profiling or synthetic tests on one node and measure incremental CPU under known RPS. Use the p95 or p99 CPU cost for busy endpoints, then include TLS and observability overhead percentages.

3) What utilization cap should I choose?

For steady workloads, 70% is a common ceiling. For bursty traffic, latency targets, or uneven distribution, use 60–65%. Higher caps increase risk of queues and retries during incidents or deploys.

4) How do I model TLS termination accurately?

If the load balancer terminates TLS, include a realistic overhead factor based on your cipher suites and handshake rate. If TLS is terminated upstream or at clients and passed through, set the TLS overhead to zero.

5) Why do connections matter if RPS is low?

Long-lived connections consume memory, file descriptors, and kernel resources even when idle. WebSockets, slow clients, and long timeouts can push connection counts high, making connection limits the primary sizing driver.

6) How should I validate the recommended nodes?

Run load tests that match real headers, request sizes, TLS settings, and latency distribution. Verify per-node CPU, Mbps, and connection utilization stays under the chosen cap during peak and failure scenarios, then adjust inputs.