Set safer scale triggers from CPU, memory, traffic. See thresholds, target tracking, and instance recommendations. Share results with your team and improve stability today.
| Scenario | Metric | Current | Target | Instances | Buffer | Hysteresis |
|---|---|---|---|---|---|---|
| API steady load | CPU | 62% | 55% | 6 | 10% | ±10% |
| Batch spikes | Memory | 78% | 60% | 8 | 15% | ±12% |
| Traffic burst | RPS | 1200 | 150/inst | 7 | 10% | ±8% |
1) Target tracking (proportional scaling)
2) Threshold band with hysteresis
3) Traffic mode (RPS) uses per-instance RPS. Desired instances come from total_RPS ÷ target_RPS_per_instance with buffer applied.
A practical band starts with a target (for example, 55% CPU) and a hysteresis margin (such as ±10%). That yields a scale-in trigger near 49.5% and a scale-out trigger near 60.5%. The wider the band, the fewer oscillations you see during small metric swings.
Buffer factor adds headroom on top of proportional scaling. With 10% buffer, a raw recommendation of 10 instances becomes 11 after rounding. This reduces risk when traffic arrives in bursts, when retries occur, or when load is unevenly balanced across nodes.
Target tracking uses a ratio of current to target. If current CPU is 72% and target is 55%, the ratio is 1.309. With 6 instances and 10% buffer, desired ≈ ceil(6 × 1.309 × 1.10) = 9. This is then clamped to your min and max limits.
In traffic mode, per-instance throughput matters. If total RPS is 950, you run 6 instances, and the target is 140 RPS per instance, current load is about 158 RPS per instance. With an 8% scale-out margin, the scale-out trigger is 151.2, indicating scale-out pressure.
Sampling stabilizes noisy telemetry. A 60–180 second window often smooths short spikes without hiding sustained demand. Cooldowns prevent repeated actions before new capacity warms up. Many teams use a shorter scale-out cooldown (e.g., 120s) and a longer scale-in cooldown (e.g., 300s).
After computing thresholds, validate them against incident timelines. Confirm that the scale-out trigger occurs before latency breaches, that scale-in does not cut capacity during steady high percentiles, and that min instances cover background jobs. Revisit settings after releases, traffic shifts, or cache changes.
Typical starting points include CPU targets of 50% to 65%, memory targets of 55% to 70%, hysteresis of 8% to 12%, and buffer of about 10%. Calibrate with production traces and review weekly.
If your platform supports step scaling, limit each action to a small percent change, such as 10% to 25%, and add a minimum step of one instance. This keeps the system responsive without overshooting. Always confirm that scaling events align with request saturation, not with transient background activity. during backups, deployments, or noisy batch intervals. too.
Why use hysteresis?
It creates a buffer zone around the target. That reduces rapid scale-in and scale-out cycles when your metric fluctuates near the target.
What buffer percent is typical?
Many teams start with 5–15% for stable workloads. Spiky traffic, retries, or uneven request routing often need higher headroom.
CPU target vs memory target?
CPU targets fit compute-bound services. Memory targets help when caching, heap growth, or leaks dominate. Use whichever correlates better with latency.
How do cooldowns help?
Cooldowns give new instances time to warm up and metrics time to stabilize. Scale-in cooldown should usually be longer to avoid shrinking too early.
Why can desired instances jump?
Proportional scaling reacts to the ratio of current to target. Large spikes, low targets, or high buffer can increase the ratio and raise the recommendation.
Can I use this for custom metrics?
Yes. Treat your metric like a utilization signal where lower is better. Choose a sensible target, apply hysteresis, and validate with real incident history.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.