Request Per Second Calculator

Calculator

Enter your measurement window and traffic characteristics.

Tip: Use p95 latency for safer sizing.

Total requests

Count of requests in the observed period.

Measurement window (seconds)

Duration that produced the total requests.

Successful requests (%)

Optional; adjusts for failures/timeouts.

Retry rate (%)

Extra attempts per request, as a percent.

Peak multiplier (x)

Models bursts above the average rate.

Safety margin (%)

Headroom for variance, deploys, noise.

Average payload (KB)

Optional; estimates network bandwidth.

p95 latency (ms)

Optional; estimates concurrency via Little’s Law.

Per‑instance max RPS

Optional; yields suggested instance count.

Notes (optional)

Saved with the run summary for exporting.

Reset

Formula used

All calculations are deterministic, with optional sizing helpers.

Quantity	Formula	Meaning
Average RPS	avg_rps = total_requests / window_seconds	Baseline throughput during the measurement window.
Successful RPS	successful_rps = avg_rps × (success_rate / 100)	Adjusts for failures/timeouts before retries.
Effective RPS	effective_rps = successful_rps × (1 + retry_rate / 100)	Adds retry overhead to estimate real load.
Peak RPS	peak_rps = effective_rps × peak_multiplier	Models burst traffic over the average rate.
Target Capacity RPS	target_rps = peak_rps × (1 + safety_margin / 100)	Headroom for variance, deployments, and noise.
Concurrency (optional)	concurrency ≈ target_rps × (p95_latency_ms / 1000)	Little’s Law approximation for in‑flight requests.
Instances (optional)	instances = ceil(target_rps / per_instance_max_rps)	Quick sizing when per‑node capacity is known.

How to use this calculator

Capture a request count and its exact time window from logs or metrics.
Enter the totals, then set peak multiplier based on historical bursts.
Add retry rate if clients or upstreams retry on failures.
Choose a safety margin for deploys, cache misses, and noisy neighbors.
Optionally enter p95 latency to estimate required concurrency.
Optionally enter per‑instance max RPS to estimate instance count.
Export CSV or PDF to attach to sizing or SLO documents.

Example data table

These sample scenarios demonstrate how changing burst and margin affects target capacity.

Scenario	Requests	Window (s)	Average RPS	Peak x	Margin %	Target RPS
Login burst	45,000	60	750.00	2.00	25	1,875.00
API batch import	180,000	900	200.00	1.20	20	288.00
Search traffic	720,000	3,600	200.00	1.80	30	468.00
Webhook fan-out	12,000	15	800.00	3.00	35	3,240.00

1) Establish a clean baseline window

Start with a measurement window that matches the workload pattern you want to size for. For steady APIs, 300–900 seconds often smooths noise while still reacting to traffic shifts. For bursty endpoints, use shorter windows, such as 10–60 seconds, and compare several adjacent windows. The calculator’s Average RPS is simply requests divided by seconds, so accuracy depends on precise timestamps and consistent log sampling.

2) Translate bursts into peak demand

Peak multiplier models the ratio between your average and your worst realistic spike. Many consumer services see 1.3×–2.5× daily peaks, while event-driven webhooks can hit 3×–6× in short intervals. Apply safety margin after the peak multiplier to protect rollouts and cache cold-starts; 15%–35% is common in shared clusters. Target Capacity RPS becomes a practical number for autoscaling thresholds and load-test targets.

3) Quantify retries and hidden load

Retries amplify load even when user traffic is flat. A 2% retry rate means 1.02 attempts per successful request; at 1,000 RPS that is 20 extra calls every second. During incident conditions, retries can exceed 50%, creating positive feedback that worsens latency. Use your client, gateway, and queue metrics to set retry rate, and align it with your error budget policy so “success” reflects what your system must actually serve.

4) Connect latency to concurrency

Concurrency is estimated using Little’s Law: in-flight requests ≈ throughput × response time. If Target Capacity is 2,000 RPS and p95 latency is 250 ms, you should plan for roughly 500 concurrent in-flight requests. This number informs connection pools, thread limits, and upstream quotas. If latency rises under load, re-run the calculator with the degraded p95 so you size for the true operating point, not the idle benchmark.

5) Convert capacity into infrastructure

When you know per-instance max RPS from profiling, instance count is a straightforward ceiling division. Record the date, region, and cache state; these details explain why RPS differs materially. Keep utilization below 70%–80% for CPU-bound services to avoid tail-latency blowups. Bandwidth estimation uses payload size and Target Capacity; for example, 2,000 RPS at 16 KB is about 262 Mbps of raw transfer, before TLS and headers. Use the export to document assumptions alongside your SLO and scaling rules.

FAQs

What is the difference between Average RPS and Target Capacity RPS?

Average RPS describes observed throughput during your window. Target Capacity RPS adds success adjustment, retry overhead, peak multiplier, and safety margin, producing a sizing number for autoscaling, load tests, and capacity reservations.

How do I choose a realistic peak multiplier?

Compare the highest short-window rate to a longer baseline, using the same endpoint mix. A common approach is 95th-percentile minute RPS divided by 15‑minute average RPS, then round up slightly for seasonality.

Where should I get per-instance max RPS?

Use controlled load tests or profiling on production-like hardware. Measure the RPS where CPU, memory, or downstream limits first cause p95 latency or error rates to breach your SLO, then use that as the per-instance ceiling.

Why use p95 latency for concurrency instead of the mean?

Tail latency drives queueing and saturation in real systems. Using p95 produces a safer concurrency estimate that better matches worst-user experience and prevents under-provisioning during spikes or partial outages.

Does the bandwidth estimate include headers and encryption overhead?

No. It uses payload size only, so it is a lower bound. Add extra headroom for TLS, HTTP headers, compression behavior, and retransmits, especially across regions or through service meshes.

What should I include in exports for reviews?

Export results, recent runs, and example scenarios, then add your assumptions: time window, traffic source, peak multiplier rationale, retry behavior, and p95 latency context. Reviewers can then validate sizing choices quickly.