Calculator inputs
Set a global step, pick a decay style, optionally add warmup, then export the computed schedule.
Example data table
These examples show how different settings change the resulting schedule expression.
| Run | Initial LR | Mode | Decay rate | Decay steps | Warmup steps | Step | LR |
|---|---|---|---|---|---|---|---|
| A | 0.001 | Power | 0.96 | 1000 | 0 | 5000 | 0.001 * 0.96^(5) |
| B | 0.0005 | Natural | 0.12 | 2000 | 500 | 6000 | 0.0005 * e^(-0.12*2.75) |
| C | 0.01 | Power | 0.90 | 500 | 1000 | 2500 | Warmup then 0.01*0.90^(3) |
Formula used
This tool supports two common exponential schedules with optional warmup and a minimum clamp.
- Warmup (optional): for step < warmup_steps
lr(step) = warmup_init_lr + (initial_lr - warmup_init_lr) · (step / warmup_steps)
- Power exponential decay: after warmup
ratio = (step - warmup_steps) / decay_steps
lr(step) = initial_lr · decay_rate^(ratio)
- Natural exponential decay: after warmup
ratio = (step - warmup_steps) / decay_steps
lr(step) = initial_lr · exp( -gamma · k · ratio )
- Staircase option: replaces ratio with floor(ratio) in both decay modes.
- Minimum clamp: final output is max(min_lr, lr(step)).
These formulas are widely used in training loops, schedulers, and experiment tracking.
How to use this calculator
- Enter your initial learning rate and the current global step.
- Choose a decay type, then set decay steps and decay rate.
- Add warmup steps and warmup start rate if needed.
- Enable staircase if you want piecewise constant drops.
- Adjust preview rows and spacing to match your logging cadence.
- Press Submit to view results directly below the header.
- Download CSV or PDF to attach with experiment notes.
FAQs
1) What is exponential decay learning rate scheduling?
It reduces the learning rate multiplicatively over training, either smoothly or in staircase steps. This often stabilizes late‑stage updates while keeping early learning aggressive.
2) When should I choose power decay versus natural decay?
Power decay is convenient when you think in “multiply by 0.96 every block.” Natural decay is useful when you prefer an e-based constant and want fine control using gamma and k.
3) What does decay steps actually control?
Decay steps sets the time scale of reduction. Larger values slow down decay, keeping rates higher for longer. Smaller values reduce the rate faster, which can help convergence but may underfit.
4) What is staircase decay and why use it?
Staircase decay uses a floor on the ratio, producing stepwise drops instead of continuous change. It can match epoch-based training or simplify debugging when you want predictable update points.
5) How does warmup interact with decay?
Warmup runs first and linearly ramps from warmup initial rate to the initial rate. After warmup finishes, the decay formula starts using the remaining steps beyond warmup.
6) Why set a minimum learning rate?
A minimum clamp prevents the rate from becoming too small, which can stall learning due to tiny updates. It’s often used with long training runs or when fine-tuning needs a stable floor.
7) Can this calculator help reproduce experiment logs?
Yes. Set preview spacing to match how you log steps and export CSV or PDF. This creates a lightweight record of your schedule parameters and the resulting learning rate values.
8) What settings are common starting points?
For power decay, bases like 0.95–0.99 with decay steps aligned to an epoch or fixed step block are common. Add a short warmup for large batches, and clamp min rate when training is long.