Autoscaling Threshold Calculator

Inputs

Single page · White theme

Metric type

Current utilization (%)

Use a rolling average (e.g., last 60s).

Target utilization (%)

Target tracking attempts to keep this stable.

Current instances

Min instances

Max instances

Capacity buffer (%)

Adds headroom for bursts, retries, and uneven load.

Sample window (seconds)

Longer windows reduce noise but react slower.

Scale-out hysteresis (%)

Trigger when metric rises above target band.

Scale-in hysteresis (%)

Avoid rapid oscillations during steady load.

Scale-out cooldown (s)

Scale-in cooldown (s)

Notes (optional)

Tip: Use real production data, not single snapshots.

Example scenarios

Use these to sanity-check outputs and compare strategies.

Scenario	Metric	Current	Target	Instances	Buffer	Hysteresis
API steady load	CPU	62%	55%	6	10%	±10%
Batch spikes	Memory	78%	60%	8	15%	±12%
Traffic burst	RPS	1200	150/inst	7	10%	±8%

Formula used

1) Target tracking (proportional scaling)

desired_instances = ceil( current_instances × (current_metric / target_metric) × buffer_factor )

Then clamp to min/max instance limits.

2) Threshold band with hysteresis

scale_out_threshold = target_metric × (1 + scale_out_hysteresis)
scale_in_threshold = target_metric × (1 − scale_in_hysteresis)

Hysteresis reduces thrashing when metrics fluctuate.

3) Traffic mode (RPS) uses per-instance RPS. Desired instances come from total_RPS ÷ target_RPS_per_instance with buffer applied.

How to use this calculator

Pick a metric: CPU, memory, or traffic (RPS).
Enter current readings averaged over your chosen window.
Set a target that matches your SLOs and latency goals.
Add buffer for spikes, retries, and uneven request spread.
Use hysteresis and cooldowns to prevent oscillation.
Press calculate, then export CSV or PDF for review.

Threshold band design

A practical band starts with a target (for example, 55% CPU) and a hysteresis margin (such as ±10%). That yields a scale-in trigger near 49.5% and a scale-out trigger near 60.5%. The wider the band, the fewer oscillations you see during small metric swings.

Buffer and burst handling

Buffer factor adds headroom on top of proportional scaling. With 10% buffer, a raw recommendation of 10 instances becomes 11 after rounding. This reduces risk when traffic arrives in bursts, when retries occur, or when load is unevenly balanced across nodes.

Target tracking math

Target tracking uses a ratio of current to target. If current CPU is 72% and target is 55%, the ratio is 1.309. With 6 instances and 10% buffer, desired ≈ ceil(6 × 1.309 × 1.10) = 9. This is then clamped to your min and max limits.

Traffic mode capacity planning

In traffic mode, per-instance throughput matters. If total RPS is 950, you run 6 instances, and the target is 140 RPS per instance, current load is about 158 RPS per instance. With an 8% scale-out margin, the scale-out trigger is 151.2, indicating scale-out pressure.

Sampling window and cooldowns

Sampling stabilizes noisy telemetry. A 60–180 second window often smooths short spikes without hiding sustained demand. Cooldowns prevent repeated actions before new capacity warms up. Many teams use a shorter scale-out cooldown (e.g., 120s) and a longer scale-in cooldown (e.g., 300s).

Operational review checklist

After computing thresholds, validate them against incident timelines. Confirm that the scale-out trigger occurs before latency breaches, that scale-in does not cut capacity during steady high percentiles, and that min instances cover background jobs. Revisit settings after releases, traffic shifts, or cache changes.

Typical starting points include CPU targets of 50% to 65%, memory targets of 55% to 70%, hysteresis of 8% to 12%, and buffer of about 10%. Calibrate with production traces and review weekly.

If your platform supports step scaling, limit each action to a small percent change, such as 10% to 25%, and add a minimum step of one instance. This keeps the system responsive without overshooting. Always confirm that scaling events align with request saturation, not with transient background activity. during backups, deployments, or noisy batch intervals. too.

Engineering guidance

Prefer longer cooldown for scale-in than scale-out.
Keep min instances above one for critical services.
Validate targets using percentile latency, not average.
Use separate alarms for saturation and errors.
Re-check thresholds after major releases or traffic shifts.

FAQ

Why use hysteresis?

It creates a buffer zone around the target. That reduces rapid scale-in and scale-out cycles when your metric fluctuates near the target.

What buffer percent is typical?

Many teams start with 5–15% for stable workloads. Spiky traffic, retries, or uneven request routing often need higher headroom.

CPU target vs memory target?

CPU targets fit compute-bound services. Memory targets help when caching, heap growth, or leaks dominate. Use whichever correlates better with latency.

How do cooldowns help?

Cooldowns give new instances time to warm up and metrics time to stabilize. Scale-in cooldown should usually be longer to avoid shrinking too early.

Why can desired instances jump?

Proportional scaling reacts to the ratio of current to target. Large spikes, low targets, or high buffer can increase the ratio and raise the recommendation.

Can I use this for custom metrics?

Yes. Treat your metric like a utilization signal where lower is better. Choose a sensible target, apply hysteresis, and validate with real incident history.

Inputs

Example scenarios

Formula used

How to use this calculator

Threshold band design

Buffer and burst handling

Target tracking math

Traffic mode capacity planning

Sampling window and cooldowns

Operational review checklist

Engineering guidance

FAQ

Related Calculators