Hitting Time Forecasting: The Other Way for Time Series Probabilistic Forecasting | by Marco Cerliani | Jun, 2023


How long does it take to reach a specific value?

Photo by Mick Haupt on Unsplash

The ability to make accurate predictions is fundamental for every time series forecasting application. Following this purpose, data scientists are used to choosing the best models that minimize errors from a point forecast perspective. That’s correct but may not be always the best effective approach.

Data scientists should also consider the possibility of developing probabilistic forecasting models. These models produce, together with point estimates, also upper and lower reliability bands in which future observations are likely to fall in. Despite probabilistic forecasting seeming to be a prerogative of statistical or deep learning solutions, any model can be used to produce probabilistic forecasts. The concept is explained in one of my previous posts where I introduced conformal prediction as a way to estimate prediction intervals with any scikit-learn models.

For sure a point forecast is considerably easier to communicate to non-technical stakeholders. At the same time, the possibility to generate KPIs on the reliability of our predictions is an added value. A probabilistic output may carry more information to support decision-making. Communicating that there is a 60% chance of rain in the next hours may be more informative than reporting how many millimeters of rain will fall.

In this post, we propose a forecasting technique, known as forecasting hitting time, used to estimate when a specific event or condition will occur. It reveals to be accurate since it’s based on conformal prediction, interpretable because it has probabilistic interpretability, and reproducible with any forecasting technique.

Forecasting hitting time is a concept commonly used in various fields. It refers to predicting or estimating the time it takes for a certain event or condition to occur, often in the context of reaching a specific threshold or level.

Simulated seasonality and trend [image by the author]
Simulated time series (seasonality + trend) with an example of hitting time level [image by the author]

The most known applications of hitting time refer to fields like reliability analysis and survival analysis. It involves estimating the time it takes for a system or process to experience a specific event, such as a failure or reaching a particular state. In finance, hitting time is often applied to determine which is the probability of a signal/index following a desired direction.

Overall, forecasting hitting time involves making predictions about the time it takes for a particular event, which follows temporal dynamics, to occur.

To correctly estimate hitting times we have to start from point forecasting. As a first step, we choose the desired forecasting algorithm. For this article, we adopt a simple recursive estimator easily available in scikit-learn style from tspiral.

Predicted vs real data points on test set [image by the author]
model = ForecastingCascade(
Ridge(),
lags=range(1,24*7+1),
use_exog=False,
)

Our aim is to produce forecasting distributions for each predicted point from which extract probabilistic insights. This is done following a three-step approach and making use of the theory behind conformal prediction:

  • Forecasts are collected on the training set through cross-validation and then averaged together.
CV = TemporalSplit(n_splits=10, test_size=y_test.shape[0])

pred_val_matrix = np.full(
shape=(X_train.shape[0], CV.get_n_splits(X_train)),
fill_value=np.nan,
dtype=float,
)

for i, (id_train, id_val) in enumerate(CV.split(X_train)):

pred_val = model.fit(
X_train[id_train],
y_train[id_train]
).predict(X_train[id_val])

pred_val_matrix[id_val, i] = np.array(
pred_val, dtype=float
)

pred_val = np.nanmean(pred_val_matrix, axis=1)

  • Conformity scores are calculated on the training data as absolute residuals from cross-validated predictions and real values.
conformity_scores  = np.abs(
np.subtract(
y_train[~np.isnan(pred_val)],
pred_val[~np.isnan(pred_val)]
)
)
  • Future forecast distributions are obtained by adding conformity scores to test predictions.
pred_test = model.fit(
X_train,
y_train
).predict(X_test)

estimated_test_distributions = np.add(
pred_test[:, None], conformity_scores
)

Predicted distribution on test data [image by the author]

Following the procedure depicted above, we end up with a collection of plausible trajectories that future values may follow. We have all that we need to provide a probabilistic representation of our forecasts.

For each future time point, it’s recorded how many times the values in the estimated test distributions exceed a predefined threshold (our hit target level). This count is transformed into a probability simply normalizing by the number of values in each estimated test distribution.

Finally, a transformation is applied to the array of probabilities to have a series of monotonic increasing probabilities.

THRESHOLD = 40

prob_test = np.mean(estimated_test_distributions > THRESHOLD, axis=1)

prob_test = pd.Series(prob_test).expanding(1).max()

Predicted vs real data points on test set plus hitting time probabilities [image by the author]

Whatever the event we are trying to forecast, we can generate a curve of probabilities simply starting from the point forecasts. The interpretation remains straightforward, i.e. for each forecasted time point we can derive the probability of our target series reaching a predefined level.

In this post, we introduced a way to provide probabilistic outcomes to our forecasting models. It doesn’t require the application of strange and intensive additional estimation techniques. Simply starting from a point forecasting problem, it’s possible to add a probabilistic overview of the task by applying a hitting time approach.



Source link

Leave a Comment