Calculator
Formula used
| Quantity | Formula | Meaning |
|---|---|---|
| Average RPS | avg_rps = total_requests / window_seconds | Baseline throughput during the measurement window. |
| Successful RPS | successful_rps = avg_rps × (success_rate / 100) | Adjusts for failures/timeouts before retries. |
| Effective RPS | effective_rps = successful_rps × (1 + retry_rate / 100) | Adds retry overhead to estimate real load. |
| Peak RPS | peak_rps = effective_rps × peak_multiplier | Models burst traffic over the average rate. |
| Target Capacity RPS | target_rps = peak_rps × (1 + safety_margin / 100) | Headroom for variance, deployments, and noise. |
| Concurrency (optional) | concurrency ≈ target_rps × (p95_latency_ms / 1000) | Little’s Law approximation for in‑flight requests. |
| Instances (optional) | instances = ceil(target_rps / per_instance_max_rps) | Quick sizing when per‑node capacity is known. |
How to use this calculator
- Capture a request count and its exact time window from logs or metrics.
- Enter the totals, then set peak multiplier based on historical bursts.
- Add retry rate if clients or upstreams retry on failures.
- Choose a safety margin for deploys, cache misses, and noisy neighbors.
- Optionally enter p95 latency to estimate required concurrency.
- Optionally enter per‑instance max RPS to estimate instance count.
- Export CSV or PDF to attach to sizing or SLO documents.
Example data table
| Scenario | Requests | Window (s) | Average RPS | Peak x | Margin % | Target RPS |
|---|---|---|---|---|---|---|
| Login burst | 45,000 | 60 | 750.00 | 2.00 | 25 | 1,875.00 |
| API batch import | 180,000 | 900 | 200.00 | 1.20 | 20 | 288.00 |
| Search traffic | 720,000 | 3,600 | 200.00 | 1.80 | 30 | 468.00 |
| Webhook fan-out | 12,000 | 15 | 800.00 | 3.00 | 35 | 3,240.00 |
1) Establish a clean baseline window
Start with a measurement window that matches the workload pattern you want to size for. For steady APIs, 300–900 seconds often smooths noise while still reacting to traffic shifts. For bursty endpoints, use shorter windows, such as 10–60 seconds, and compare several adjacent windows. The calculator’s Average RPS is simply requests divided by seconds, so accuracy depends on precise timestamps and consistent log sampling.
2) Translate bursts into peak demand
Peak multiplier models the ratio between your average and your worst realistic spike. Many consumer services see 1.3×–2.5× daily peaks, while event-driven webhooks can hit 3×–6× in short intervals. Apply safety margin after the peak multiplier to protect rollouts and cache cold-starts; 15%–35% is common in shared clusters. Target Capacity RPS becomes a practical number for autoscaling thresholds and load-test targets.
3) Quantify retries and hidden load
Retries amplify load even when user traffic is flat. A 2% retry rate means 1.02 attempts per successful request; at 1,000 RPS that is 20 extra calls every second. During incident conditions, retries can exceed 50%, creating positive feedback that worsens latency. Use your client, gateway, and queue metrics to set retry rate, and align it with your error budget policy so “success” reflects what your system must actually serve.
4) Connect latency to concurrency
Concurrency is estimated using Little’s Law: in-flight requests ≈ throughput × response time. If Target Capacity is 2,000 RPS and p95 latency is 250 ms, you should plan for roughly 500 concurrent in-flight requests. This number informs connection pools, thread limits, and upstream quotas. If latency rises under load, re-run the calculator with the degraded p95 so you size for the true operating point, not the idle benchmark.
5) Convert capacity into infrastructure
When you know per-instance max RPS from profiling, instance count is a straightforward ceiling division. Record the date, region, and cache state; these details explain why RPS differs materially. Keep utilization below 70%–80% for CPU-bound services to avoid tail-latency blowups. Bandwidth estimation uses payload size and Target Capacity; for example, 2,000 RPS at 16 KB is about 262 Mbps of raw transfer, before TLS and headers. Use the export to document assumptions alongside your SLO and scaling rules.
FAQs
What is the difference between Average RPS and Target Capacity RPS?
Average RPS describes observed throughput during your window. Target Capacity RPS adds success adjustment, retry overhead, peak multiplier, and safety margin, producing a sizing number for autoscaling, load tests, and capacity reservations.
How do I choose a realistic peak multiplier?
Compare the highest short-window rate to a longer baseline, using the same endpoint mix. A common approach is 95th-percentile minute RPS divided by 15‑minute average RPS, then round up slightly for seasonality.
Where should I get per-instance max RPS?
Use controlled load tests or profiling on production-like hardware. Measure the RPS where CPU, memory, or downstream limits first cause p95 latency or error rates to breach your SLO, then use that as the per-instance ceiling.
Why use p95 latency for concurrency instead of the mean?
Tail latency drives queueing and saturation in real systems. Using p95 produces a safer concurrency estimate that better matches worst-user experience and prevents under-provisioning during spikes or partial outages.
Does the bandwidth estimate include headers and encryption overhead?
No. It uses payload size only, so it is a lower bound. Add extra headroom for TLS, HTTP headers, compression behavior, and retransmits, especially across regions or through service meshes.
What should I include in exports for reviews?
Export results, recent runs, and example scenarios, then add your assumptions: time window, traffic source, peak multiplier rationale, retry behavior, and p95 latency context. Reviewers can then validate sizing choices quickly.