The ability to make accurate predictions is fundamental for every time series forecasting application. Following this purpose, **data scientists are used to choosing the best models that minimize errors from a point forecast perspective**. That’s correct but may not be always the best effective approach.

Data scientists should also consider the possibility of developing probabilistic forecasting models. These models produce, together with point estimates, also upper and lower reliability bands in which future observations are likely to fall in. Despite probabilistic forecasting seeming to be a prerogative of statistical or deep learning solutions, **any model can be used to produce probabilistic forecasts**. The concept is explained in one of my previous posts where **I introduced conformal prediction as a way to estimate prediction intervals with any scikit-learn models**.

For sure a point forecast is considerably easier to communicate to non-technical stakeholders. At the same time, the possibility to generate KPIs on the reliability of our predictions is an added value. **A probabilistic output may carry more information to support decision-making**. Communicating that there is a 60% chance of rain in the next hours may be more informative than reporting how many millimeters of rain will fall.

In this post, **we propose a forecasting technique, known as forecasting hitting time, used to estimate when a specific event or condition will occur**. It reveals to be **accurate **since it’s based on conformal prediction, **interpretable **because it has probabilistic interpretability, and **reproducible **with any forecasting technique.

**Forecasting hitting time** is a concept commonly used in various fields. It **refers to predicting or estimating the time it takes for a certain event or condition to occur**, often in the context of reaching a specific threshold or level.

The most known applications of hitting time refer to fields like reliability analysis and survival analysis. It involves estimating the time it takes for a system or process to experience a specific event, such as a failure or reaching a particular state. In finance, hitting time is often applied to determine which is the probability of a signal/index following a desired direction.

Overall, forecasting hitting time involves making predictions about the time it takes for a particular event, which follows temporal dynamics, to occur.

**To correctly estimate hitting times we have to start from point forecasting**. As a first step, we choose the desired forecasting algorithm. For this article, we adopt a simple recursive estimator easily available in scikit-learn style from **tspiral**.

`model = ForecastingCascade(`

Ridge(),

lags=range(1,24*7+1),

use_exog=False,

)

**Our aim is to produce forecasting distributions for each predicted point from which extract probabilistic insights**. This is done following a three-step approach and making use of the theory behind conformal prediction:

- Forecasts are collected on the training set through cross-validation and then averaged together.

`CV = TemporalSplit(n_splits=10, test_size=y_test.shape[0])`pred_val_matrix = np.full(

shape=(X_train.shape[0], CV.get_n_splits(X_train)),

fill_value=np.nan,

dtype=float,

)

for i, (id_train, id_val) in enumerate(CV.split(X_train)):

pred_val = model.fit(

X_train[id_train],

y_train[id_train]

).predict(X_train[id_val])

pred_val_matrix[id_val, i] = np.array(

pred_val, dtype=float

)

pred_val = np.nanmean(pred_val_matrix, axis=1)

- Conformity scores are calculated on the training data as absolute residuals from cross-validated predictions and real values.

`conformity_scores = np.abs(`

np.subtract(

y_train[~np.isnan(pred_val)],

pred_val[~np.isnan(pred_val)]

)

)

- Future forecast distributions are obtained by adding conformity scores to test predictions.

`pred_test = model.fit(`

X_train,

y_train

).predict(X_test)estimated_test_distributions = np.add(

pred_test[:, None], conformity_scores

)

Following the procedure depicted above, we end up with a collection of plausible trajectories that future values may follow. We have all that we need to provide a probabilistic representation of our forecasts.

For each future time point, it’s recorded **how many times the values in the estimated test distributions exceed a predefined threshold (our hit target level)**. This count is transformed into a probability simply normalizing by the number of values in each estimated test distribution.

Finally, a transformation is applied to the array of probabilities to have a series of monotonic increasing probabilities.

`THRESHOLD = 40`prob_test = np.mean(estimated_test_distributions > THRESHOLD, axis=1)

prob_test = pd.Series(prob_test).expanding(1).max()

**Whatever the event we are trying to forecast, we can generate a curve of probabilities simply starting from the point forecasts**. The interpretation remains straightforward, i.e. for each forecasted time point we can derive the probability of our target series reaching a predefined level.

In this post, we introduced a way to provide probabilistic outcomes to our forecasting models. It doesn’t require the application of strange and intensive additional estimation techniques. Simply starting from a point forecasting problem, it’s possible to add a probabilistic overview of the task by applying a hitting time approach.