Exponential Decay Learning Rate Calculator

Tune learning rates smoothly across long training runs. Compare warmup, staircase, and minimum rate limits. Download tables, visualize steps, and validate optimizer settings today.

Calculator inputs

Set a global step, pick a decay style, optionally add warmup, then export the computed schedule.

Start rate used after warmup completes.
Current training iteration to evaluate.
Controls how fast the rate decreases.
Choose a base decay or a natural constant.
Power: base in (0,1). Natural: non‑negative constant.
Only affects natural mode: exp(-gamma·k·ratio).
Clamps the output to avoid vanishing rates.
Linear ramp before decay begins.
Start of warmup ramp. Use 0 for cold start.
Number of steps shown in preview and exports.
Distance between preview points, starting at 0.
Applies floor(step/decay_steps) to decay ratio.
Reset

Example data table

These examples show how different settings change the resulting schedule expression.

RunInitial LRModeDecay rateDecay stepsWarmup stepsStepLR
A0.001Power0.961000050000.001 * 0.96^(5)
B0.0005Natural0.12200050060000.0005 * e^(-0.12*2.75)
C0.01Power0.9050010002500Warmup then 0.01*0.90^(3)
Tip: Use the calculator for numeric values and export the schedule for logs.

Formula used

This tool supports two common exponential schedules with optional warmup and a minimum clamp.

  • Warmup (optional): for step < warmup_steps

lr(step) = warmup_init_lr + (initial_lr - warmup_init_lr) · (step / warmup_steps)

  • Power exponential decay: after warmup

ratio = (step - warmup_steps) / decay_steps
lr(step) = initial_lr · decay_rate^(ratio)

  • Natural exponential decay: after warmup

ratio = (step - warmup_steps) / decay_steps
lr(step) = initial_lr · exp( -gamma · k · ratio )

  • Staircase option: replaces ratio with floor(ratio) in both decay modes.
  • Minimum clamp: final output is max(min_lr, lr(step)).

These formulas are widely used in training loops, schedulers, and experiment tracking.

How to use this calculator

  1. Enter your initial learning rate and the current global step.
  2. Choose a decay type, then set decay steps and decay rate.
  3. Add warmup steps and warmup start rate if needed.
  4. Enable staircase if you want piecewise constant drops.
  5. Adjust preview rows and spacing to match your logging cadence.
  6. Press Submit to view results directly below the header.
  7. Download CSV or PDF to attach with experiment notes.

FAQs

1) What is exponential decay learning rate scheduling?

It reduces the learning rate multiplicatively over training, either smoothly or in staircase steps. This often stabilizes late‑stage updates while keeping early learning aggressive.

2) When should I choose power decay versus natural decay?

Power decay is convenient when you think in “multiply by 0.96 every block.” Natural decay is useful when you prefer an e-based constant and want fine control using gamma and k.

3) What does decay steps actually control?

Decay steps sets the time scale of reduction. Larger values slow down decay, keeping rates higher for longer. Smaller values reduce the rate faster, which can help convergence but may underfit.

4) What is staircase decay and why use it?

Staircase decay uses a floor on the ratio, producing stepwise drops instead of continuous change. It can match epoch-based training or simplify debugging when you want predictable update points.

5) How does warmup interact with decay?

Warmup runs first and linearly ramps from warmup initial rate to the initial rate. After warmup finishes, the decay formula starts using the remaining steps beyond warmup.

6) Why set a minimum learning rate?

A minimum clamp prevents the rate from becoming too small, which can stall learning due to tiny updates. It’s often used with long training runs or when fine-tuning needs a stable floor.

7) Can this calculator help reproduce experiment logs?

Yes. Set preview spacing to match how you log steps and export CSV or PDF. This creates a lightweight record of your schedule parameters and the resulting learning rate values.

8) What settings are common starting points?

For power decay, bases like 0.95–0.99 with decay steps aligned to an epoch or fixed step block are common. Add a short warmup for large batches, and clamp min rate when training is long.

Related Calculators

cosine annealing schedulerstep decay learning ratelearning rate range test

Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.