Enter workload and capacity assumptions
Example data table
Use these sample rows to sanity-check assumptions and compare scenarios.
| Throughput (req/s) | Response (ms) | Think (ms) | Peak | Margin | Servers | Target in-flight | Workers/server |
|---|---|---|---|---|---|---|---|
| 50 | 180 | 900 | 1.2 | 15% | 3 | 12.42 | 6 |
| 120 | 250 | 1000 | 1.5 | 20% | 4 | 67.50 | 25 |
| 300 | 400 | 0 | 2.0 | 25% | 6 | 300.00 | 72 |
| 900 | 120 | 200 | 2.5 | 35% | 10 | 364.50 | 53 |
Formula used
This calculator is based on Little’s Law: L = λ × W where L is concurrency, λ is throughput, and W is time-in-system.
- Base in-flight requests = throughput(req/s) × response time(s)
- Concurrent sessions = throughput(req/s) × (response + think)(s)
- Peak-adjusted = base × peak multiplier
- Target = peak-adjusted × (1 + safety margin)
- Workers per server = ceil(target in-flight / servers / utilization)
- Capacity check compares target to worker limits, if set
How to use this calculator
- Enter expected throughput and choose the correct time unit.
- Use a realistic latency percentile for response time.
- Add think time for interactive users; set zero for bots.
- Apply a peak multiplier for bursts and known seasonality.
- Add a safety margin to cover unknown spikes and variance.
- Set server count and a conservative utilization target.
- Optionally set max workers per server for capacity checks.
- Click Calculate, then export CSV or download a PDF.
Engineering notes
- Latency varies; size using p95 or p99 for safety.
- Queueing increases response time; add more headroom early.
- Workers are a proxy for concurrent in-flight work units.
- If databases limit connections, align pool sizes with workers.
Throughput, Latency, and Little's Law
Concurrency is estimated with Little's Law: L = λ × W. Normalize throughput to requests per second; 7,200 requests per minute becomes 120 per second. If average response time is 0.25 seconds, base in flight work is 30. Prefer p95 or p99 latency when planning, because tail delays inflate W and raise L. Use consistent measurement windows and remove warmup anomalies first.
Peak Factors and Safety Margins
Real demand is rarely flat. A peak multiplier models predictable surges, such as 1.5× lunchtime spikes or 2.0× batch windows. A safety margin adds headroom for variance, retries, cache misses, or noisy neighbors. With 20% margin, a 45 peak in flight target becomes 54. When margins are high, revisit assumptions and reduce root causes instead.
Sessions, Think Time, and User Behavior
Interactive usage includes pauses between actions. Think time expands cycle time, so concurrency for sessions can exceed raw in flight requests even when servers are calm. For example, 120 requests per second, 0.25 seconds response, and 1.0 second think time gives a 1.25 second cycle, producing 150 concurrent sessions before peaks and headroom. Session estimates help size connection pools, rate limits, and frontend limits.
Worker Sizing and Utilization Targets
Workers approximate parallel work units for application, thread, or async processing models. Sizing workers from target in flight ensures enough processing capacity without saturating CPUs. Utilization targets, such as 70%, reserve room for burst handling, garbage collection, and background tasks. If per server in flight demand is 17 and utilization is 0.70, recommended workers per server is ceil(17/0.70)=25. Track error rates and queue depth to validate the chosen utilization.
Capacity Checks and Practical Validation
Optional worker caps reveal bottlenecks early. If each server supports 200 workers, at 70% utilization the effective in flight ceiling is 140 per server. Combine this with response time to estimate maximum sustainable throughput: λmax ≈ Lmax / W. Validate with load tests that reproduce realistic payloads, warm caches, and downstream latency. If queueing rises sharply, add servers, lower utilization targets, or reduce response time. Record results for multiple scenarios and choose the highest required sizing, then monitor production continually to refine inputs later.
FAQs
1) What does target in-flight requests mean?
It is the estimated number of requests being processed simultaneously at the sizing target, after peak and safety headroom are applied. Use it to plan worker limits, connection pools, and queue thresholds.
2) Why should I use p95 or p99 response time?
Concurrency scales with time-in-system, so tail latency drives capacity. Using p95 or p99 helps prevent under-sizing when occasional slow calls dominate in-flight work during peaks.
3) How do peak multiplier and safety margin differ?
Peak multiplier models predictable demand spikes compared with baseline. Safety margin adds additional headroom for uncertainty such as retries, cache misses, and uneven load distribution across servers.
4) How is workers per server calculated?
The calculator divides target in-flight requests by server count, then divides by the utilization target, and rounds up. This estimates the parallel work units needed while keeping headroom to avoid saturation.
5) When should I set think time to zero?
Set it to zero for machine-to-machine traffic, cron jobs, and streaming workloads where requests arrive independently. Keep a nonzero value for interactive users who pause between actions.
6) What if the capacity check says undersized?
Increase server count, raise worker limits, reduce response time, or lower the utilization target. Then rerun the calculation and confirm with a load test that matches real traffic and payloads.