How do we know if our forecasting model is accurate and reliable and evaluate its performance on unseen data? This is where backtesting comes in.
To evaluate the performance of a forecasting model, we use a procedure called backtesting (also known as time-series cross-validation). Backtesting is essentially a way of testing how a model would have performed if it had been used in the past.
How does it work?
To backtest a time-series forecasting model, we start by splitting the data into two parts: a training set and a validation set (sometimes also called a test set, but we’ll clarify the difference in the next sections). The training set is used to train the model, while the test set is used to evaluate how well the model performs on unseen data. Once the model has been trained, you can then use it to make predictions on the test set. You can compare these predictions to the actual values to see how well the model performs.
How do we measure the performance of a model?
There are several metrics that can be used to evaluate the performance of a time-series forecasting model, such as Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE). These metrics measure how close the predicted values are to the actual values.
The procedure is typically repeated multiple times, allowing us to:
- Have a good estimate of the model performance
- Visualize the evolution of the performance in time
Here below we show a graphical representation of the backtesting process, using 3 splits:
In the figure above we show only 3 non-overlapping validation periods. However, nothing prevents us to use more partially overlapping windows, potentially one per time-step.