Build robust backtests with configurable rolling training windows. Add gaps, steps, and fold limits easily. Compare expanding versus fixed windows across every split run.
Enter sample counts or index positions. The output lists each fold’s train and test index ranges.
This sample shows a simple time series. The calculator works on index ranges, regardless of your feature columns.
| Index | Date | Value |
|---|---|---|
| 1 | 2026-01-01 | 101.2 |
| 2 | 2026-01-02 | 100.6 |
| 3 | 2026-01-03 | 102.1 |
| 4 | 2026-01-04 | 103.0 |
| 5 | 2026-01-05 | 102.4 |
| 6 | 2026-01-06 | 104.2 |
| 7 | 2026-01-07 | 103.7 |
| 8 | 2026-01-08 | 105.1 |
| 9 | 2026-01-09 | 104.8 |
| 10 | 2026-01-10 | 106.0 |
Let indices run from S to S+N−1.
A fold is valid when test_start ≤ S+N−1, and the test range fits unless partial output is enabled.
Rolling window splitting protects time ordered evaluation by ensuring training observations precede test observations. The calculator outputs index ranges for each fold, letting teams document how models were trained and scored. By controlling window sizes and step movement, you can match business rhythms such as weekly retrains or monthly rebalances while avoiding look‑ahead bias. Confirm your index convention and align window edges with feature pipelines. If you resample, do it consistently per fold to preserve horizons. This improves reproducibility and reduces silent evaluation drift.
In fixed rolling mode, both training and testing windows slide forward together. This emphasizes recent data and supports nonstationary signals, common in demand forecasting, fraud detection, and markets. Expanding mode grows the training window over time, improving parameter stability for models that benefit from more history, such as gradient boosting with rich seasonality or baseline regressions. For classification, learn imbalance handling within training only. For forecasting, keep horizon consistent with test length and evaluate using metrics matching business losses.
The optional gap parameter creates a buffer between training end and test start. Gaps reduce leakage when features use trailing aggregations, labels arrive with delay, or post‑event corrections exist. For example, a seven‑day gap can prevent future information from appearing in lagged features, and it can mimic production latency in logging pipelines.
Step size determines how frequently folds are created. Smaller steps produce many overlapping folds and more performance estimates, but they increase computation. Larger steps reduce overlap and training cost, but they may miss regime changes. The calculator also supports a maximum fold limit and partial last windows, useful when you need a bounded schedule or want to keep the most recent evaluation.
Use the generated table to align model training scripts, experiment tracking, and governance. Exporting CSV supports reproducible reviews, while PDF is helpful for audits and stakeholder sign‑off. When comparing algorithms, keep the split configuration constant, report mean and dispersion across folds, and confirm that any preprocessing, scaling, or target encoding is fit only within each training window. When overlap is high, report dispersion across folds and log fold IDs in experiments so predictions remain traceable during reviews and audits.
It generates sequential train and test ranges so training always precedes testing. Each fold moves forward by a step, producing time‑aware evaluation for ordered data.
Use expanding mode when additional history improves stability or captures long seasonal patterns. Training grows each fold, while the test window remains forward‑looking.
A gap reduces leakage when labels arrive late or features use trailing aggregations. It mimics production latency and prevents near‑future information from influencing training.
Smaller steps create more folds and smoother estimates but cost more computation. Larger steps reduce overlap and training time, but may miss fast regime shifts.
Yes, enable the partial option to keep the final fold when remaining data is limited. The test end is clipped to the last index.
Keep the same split configuration for every model, fit preprocessing within each training window, and summarize performance across folds using mean and dispersion, not a single fold.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.