Calculator Inputs
Plotly Graph
The chart visualizes how the chosen schedule changes the learning rate across epochs. This helps you inspect warmup, decay, and convergence behavior.
Example Data Table
This sample output table shows selected checkpoints from the training plan. It makes schedule progression easier to review at a glance.
| Epoch | Learning Rate | Training Phase |
|---|---|---|
| 0 | 0.00208000 | Warmup |
| 5 | 0.01000000 | Exploration |
| 25 | 0.01000000 | Exploration |
| 50 | 0.00100000 | Refinement |
| 100 | 0.00010000 | Convergence |
Formula Used
Dynamic learning rate methods adjust the step size during training rather than keeping it fixed. This improves convergence speed and reduces instability.
1. Warmup:
LR(epoch) = minLR + (baseLR - minLR) × ((epoch + 1) / warmupEpochs)
2. Step Decay:
LR(epoch) = baseLR × decayRatefloor(epoch / stepSize)
3. Exponential Decay:
LR(epoch) = baseLR × e-decayRate × epoch
4. Inverse Time:
LR(epoch) = baseLR / (1 + decayRate × epoch)
5. Polynomial Decay:
LR(epoch) = (baseLR - minLR) × (1 - progress)power + minLR
6. Cosine Annealing:
LR(epoch) = minLR + 0.5 × (baseLR - minLR) × (1 + cos(2π × cycles × progress))
7. Adaptive Plateau Proxy:
LR(epoch) = baseLR / (1 + decayRate × √(epoch + 1))
Additional metrics: relative rate compares the current rate to the initial rate, effective update scale combines rate with batch size, and stability score summarizes smoothness plus recent loss movement.
How to Use This Calculator
- Enter the base learning rate used at training start.
- Set the current epoch and intended total training epochs.
- Choose a schedule such as step decay, cosine, or polynomial.
- Provide decay controls, minimum learning rate, and warmup epochs.
- Add batch size and recent loss values for practical diagnostics.
- Press Calculate Dynamic Rate to generate the results.
- Review the results above the form and inspect the graph.
- Export the summary as CSV or PDF for reporting.
Frequently Asked Questions
1. What is a dynamic learning rate?
A dynamic learning rate changes during training instead of staying fixed. It helps models learn quickly early on, then refine weights more carefully later.
2. Why use warmup epochs?
Warmup starts with smaller rates and gradually increases them. This reduces unstable jumps at the beginning of training, especially with large batches or deeper models.
3. When is step decay useful?
Step decay is useful when you want predictable drops at selected milestones. It works well for many standard training loops and is easy to tune.
4. How does cosine annealing help?
Cosine annealing lowers the rate smoothly instead of sharply. That smoother transition can improve convergence stability and reduce late-stage oscillations.
5. What does the stability score mean?
The stability score is a practical indicator built from rate smoothness and recent loss improvement. Higher values generally suggest calmer schedule transitions.
6. Why include batch size in this calculator?
Batch size influences the effective size of parameter updates. Combining it with learning rate offers a more useful operational view of optimization intensity.
7. Which schedule should I choose?
Choose based on your model, dataset, and training goals. Step decay is simple, cosine is smooth, and polynomial gives flexible end-of-training control.
8. Can I use this for real experiments?
Yes. This tool is designed for planning and comparison. You should still validate the chosen schedule through actual training logs and model performance.