Exponential Decay Learning Rate Calculator

Calculator inputs

Set a global step, pick a decay style, optionally add warmup, then export the computed schedule.

Initial learning rate

Start rate used after warmup completes.

Global step

Current training iteration to evaluate.

Decay steps

Controls how fast the rate decreases.

Decay type

Choose a base decay or a natural constant.

Decay rate (base or k)

Power: base in (0,1). Natural: non‑negative constant.

Gamma multiplier

Only affects natural mode: exp(-gamma·k·ratio).

Minimum learning rate

Clamps the output to avoid vanishing rates.

Warmup steps

Linear ramp before decay begins.

Warmup initial learning rate

Start of warmup ramp. Use 0 for cold start.

Schedule preview rows

Number of steps shown in preview and exports.

Preview spacing (steps)

Distance between preview points, starting at 0.

Enable staircase decay

Applies floor(step/decay_steps) to decay ratio.

Reset

Example data table

These examples show how different settings change the resulting schedule expression.

Run	Initial LR	Mode	Decay rate	Decay steps	Warmup steps	Step	LR
A	0.001	Power	0.96	1000	0	5000	0.001 * 0.96^(5)
B	0.0005	Natural	0.12	2000	500	6000	0.0005 * e^(-0.12*2.75)
C	0.01	Power	0.90	500	1000	2500	Warmup then 0.01*0.90^(3)

Tip: Use the calculator for numeric values and export the schedule for logs.

Formula used

This tool supports two common exponential schedules with optional warmup and a minimum clamp.

Warmup (optional): for step < warmup_steps

lr(step) = warmup_init_lr + (initial_lr - warmup_init_lr) · (step / warmup_steps)

Power exponential decay: after warmup

ratio = (step - warmup_steps) / decay_steps
lr(step) = initial_lr · decay_rate^(ratio)

Natural exponential decay: after warmup

ratio = (step - warmup_steps) / decay_steps
lr(step) = initial_lr · exp( -gamma · k · ratio )

Staircase option: replaces ratio with floor(ratio) in both decay modes.
Minimum clamp: final output is max(min_lr, lr(step)).

These formulas are widely used in training loops, schedulers, and experiment tracking.

How to use this calculator

Enter your initial learning rate and the current global step.
Choose a decay type, then set decay steps and decay rate.
Add warmup steps and warmup start rate if needed.
Enable staircase if you want piecewise constant drops.
Adjust preview rows and spacing to match your logging cadence.
Press Submit to view results directly below the header.
Download CSV or PDF to attach with experiment notes.

FAQs

1) What is exponential decay learning rate scheduling?

It reduces the learning rate multiplicatively over training, either smoothly or in staircase steps. This often stabilizes late‑stage updates while keeping early learning aggressive.

2) When should I choose power decay versus natural decay?

Power decay is convenient when you think in “multiply by 0.96 every block.” Natural decay is useful when you prefer an e-based constant and want fine control using gamma and k.

3) What does decay steps actually control?

Decay steps sets the time scale of reduction. Larger values slow down decay, keeping rates higher for longer. Smaller values reduce the rate faster, which can help convergence but may underfit.

4) What is staircase decay and why use it?

Staircase decay uses a floor on the ratio, producing stepwise drops instead of continuous change. It can match epoch-based training or simplify debugging when you want predictable update points.

5) How does warmup interact with decay?

Warmup runs first and linearly ramps from warmup initial rate to the initial rate. After warmup finishes, the decay formula starts using the remaining steps beyond warmup.

6) Why set a minimum learning rate?

A minimum clamp prevents the rate from becoming too small, which can stall learning due to tiny updates. It’s often used with long training runs or when fine-tuning needs a stable floor.

7) Can this calculator help reproduce experiment logs?

Yes. Set preview spacing to match how you log steps and export CSV or PDF. This creates a lightweight record of your schedule parameters and the resulting learning rate values.

8) What settings are common starting points?

For power decay, bases like 0.95–0.99 with decay steps aligned to an epoch or fixed step block are common. Add a short warmup for large batches, and clamp min rate when training is long.