Learning Rate Finder Calculator

Calculator Inputs

Use observed points from a learning rate range test. Submit the form to place calculated results above this section.

Start learning rate

End learning rate

Scan mode

Epochs in scan

Steps per epoch

Optimizer family

Batch size

Reference batch size

Gradient accumulation

Warmup fraction

Loss smoothing beta

Safety margin

Learning rate at minimum loss

Learning rate at steepest descent

Learning rate at divergence

Loss value at minimum point

Loss value at divergence point

Example Data Table

This sample range test shows how loss often improves, reaches a low zone, and later rises as the learning rate becomes too aggressive.

Iteration	Learning Rate	Smoothed Loss	Observation
1	1.00e-05	1.284	Very small updates. Training moves slowly.
120	4.20e-04	0.688	Loss drops consistently. Stability looks strong.
210	2.50e-03	0.412	Minimum loss point. Good conservative anchor.
260	6.00e-03	0.458	Steepest useful descent. Good balanced anchor.
320	1.20e-02	0.771	Loss becomes noisier. Risk starts increasing.
370	2.00e-02	1.920	Divergence begins. Ceiling should remain below this point.

Formula Used

1. Total scan iterations
Total iterations = Epochs × Steps per epoch

2. Effective batch size
Effective batch = Batch size × Gradient accumulation

3. Batch scaling factor
Batch scale = √(Effective batch ÷ Reference batch)

4. Adjustment factor
Adjustment factor = Batch scale × Optimizer factor × Warmup factor × Smoothing factor

5. Sweep growth metric
Exponential multiplier = (End rate ÷ Start rate)^{1 ÷ (Iterations − 1)}
Linear increase = (End rate − Start rate) ÷ (Iterations − 1)

6. Safe ceiling
Maximum safe rate = Divergence rate × Adjustment factor × (1 − Safety margin)

7. Recommended rates
Conservative rate = min(Minimum-loss rate × 0.45 × Adjustment factor, Safe ceiling × 0.55)
Balanced rate = min(√(Minimum-loss rate × Steepest-descent rate) × Adjustment factor × (1 − 0.4 × Safety margin), Safe ceiling × 0.72)
Aggressive rate = min(Steepest-descent rate × 1.05 × Adjustment factor × (1 − 0.25 × Safety margin), Safe ceiling × 0.88)

These formulas produce practical schedule anchors rather than strict theoretical guarantees. They work best when the range test was run with clean, monotonic rate growth and reliable smoothed loss tracking.

How to Use This Calculator

Run a learning rate range test from a very small rate to a clearly unstable one.
Record the learning rate where smoothed loss is lowest, where descent is strongest, and where divergence starts.
Enter scan length, batch details, gradient accumulation, warmup share, optimizer family, and safety settings.
Submit the form and review the conservative, balanced, aggressive, and safe-ceiling recommendations.
Use the balanced value as a strong default, or start with the conservative value for noisier datasets.
Download the result as CSV for records or PDF for sharing with your training team.

Frequently Asked Questions

1. What does this calculator estimate?

It estimates practical learning rate anchors from a range test. You get conservative, balanced, aggressive, and ceiling values plus warmup and schedule guidance.

2. Which observed rate matters most?

The minimum-loss rate gives a safer anchor, while the steepest-descent rate often supports faster learning. Divergence rate defines the upper limit you should respect.

3. Why include batch size and accumulation?

Larger effective batches can usually tolerate higher rates. The calculator applies square-root scaling so recommendations better reflect your actual update size.

4. What is the safety margin used for?

Safety margin pushes the ceiling downward. Increase it when training is noisy, labels are messy, regularization is heavy, or model stability is uncertain.

5. Should I always choose the balanced rate?

Balanced is usually a strong default. Conservative is better for fragile runs, while aggressive can help when your data pipeline and optimization are already stable.

6. How does warmup affect the result?

Warmup slightly lowers the recommended active rate and also provides a step count. This helps avoid early instability when gradients are still settling.

7. Can I use this for one-cycle schedules?

Yes. The aggressive recommendation and one-cycle peak are useful upper anchors. The conservative value also helps define a safer starting level.

8. Does this replace real validation experiments?

No. It narrows the search quickly, but you should still validate against real training curves, final accuracy, generalization, and repeatability across runs.