Planner Inputs
Example Data Table
Use this sample scenario to validate expected behavior.
| Scenario | Average rps | Peak rps | Capacity rps/instance | Target util | Buffer | Min | Max | Recommended range |
|---|---|---|---|---|---|---|---|---|
| API service launch week | 200 | 800 | 120 | 65% | 20% | 2 | 30 | 4 to 13 |
The example assumes buffered load and utilization-limited effective capacity, then rounds up to whole instances.
Formula Used
The planner uses a simple, deployment-friendly sizing model.
How to Use This Calculator
- Measure throughput per instance at your latency target to fill “Per-instance capacity”.
- Enter average and sustained peak traffic based on logs, forecasts, or load tests.
- Choose utilization and buffer to reflect jitter, retries, and noisy neighbors.
- Set min and max bounds from availability targets and budget constraints.
- Review the recommended range and check the headroom warning for peak limits.
- Export CSV or PDF to share plans and keep deployment notes.
Workload inputs and measurement
Use average requests per second for steady demand and sustained peak for stress windows. Measure per‑instance capacity from load tests at your latency SLO, not single‑thread microbenchmarks. For example, 120 rps at p95 latency can outperform 180 rps that violates error budgets. Record the test duration, payload mix, and cache state so later comparisons stay valid.
Utilization targets and headroom
Target utilization converts raw capacity into usable capacity. With a 65% target, an instance rated at 120 rps delivers 78 rps of planned capacity, leaving room for retries, GC pauses, and noisy neighbors. Raising the target reduces cost but amplifies tail latency and queueing risk. Keep a consistent target across services so fleet sizing remains comparable.
Safety buffer and burst behavior
Buffer adds uncertainty margin on top of traffic. A 20% buffer turns 800 rps peak into 960 rps planned load. Buffers are helpful when traffic is spiky, when upstream retries are common, or when instance performance varies. If you already have strong rate limiting and even load distribution, you can lower the buffer, but validate with incident data.
Cooldowns and step policies
Cooldown values prevent rapid oscillation when metrics lag. A short scale‑out cooldown reacts fast, while a longer scale‑in cooldown avoids flapping after a burst. Step sizes define how aggressively capacity changes. Adding two instances per action can clear backlog quickly, but may overshoot if demand falls. Pair steps with alarms on error rate and saturation.
Growth projection and capacity limits
The projection table compounds growth to show when peak demand outgrows your maximum. At 10% weekly growth, an 800 rps peak becomes about 1,416 rps by week six. The planner recomputes required instances using the same utilization and buffer, which exposes when quotas or budgets will break. Use this view to schedule scaling tests and procurement.
Operational review checklist
Review results before deployment. Confirm the recommended minimum covers warm capacity and availability zones. Confirm the recommended maximum stays within account limits and that peak requirement does not exceed it. Cross‑check CPU sizing when CPU dominates request cost. Finally, test scaling behavior in staging with realistic traffic ramps and cooldowns. Document assumptions, share exports with stakeholders, and revisit numbers after each major planned release.
FAQs
What is “per-instance capacity” and how do I measure it?
Run a load test that matches real endpoints, payload sizes, and cache behavior. Increase traffic until p95 latency or error rate breaches your SLO, then record the sustainable requests per second for one instance.
Why does the calculator use utilization instead of 100% capacity?
Operating below saturation reduces queueing delay and absorbs jitter from retries, GC pauses, and uneven load balancing. A utilization target converts theoretical throughput into planned throughput that is safer during bursts.
How should I choose the safety buffer percentage?
Start with 10–30%. Use higher buffers for spiky traffic, variable instance performance, or heavy retry amplification. Use lower buffers if you have strong rate limiting, stable workloads, and proven autoscaling responsiveness.
What do cooldown settings change in practice?
Cooldowns limit how often scaling actions occur. Short scale-out cooldowns react faster to rising demand, while longer scale-in cooldowns prevent oscillation after transient spikes and metric delays.
Why does the projection sometimes exceed my maximum instances?
The projection applies your growth rate to peak traffic and recomputes required instances. If the required count is above your maximum, you need a higher quota, better per-instance capacity, or revised targets.
How do I use the CSV and PDF exports?
Export after each scenario run to share assumptions, ranges, and projections with teams. Keep exports with deployment notes so you can compare planned versus observed scaling during incidents and releases.