Expected Calibration Error (ECE) – a step by step visual explanation | by Maja Pavlovic | Jul, 2023


First we will set up the same example from above:

import numpy as np

# Binary Classification
samples = np.array([0.22, 0.64, 0.92, 0.42, 0.51, 0.15, 0.70, 0.37, 0.83])
true_labels = np.array([0,1,0,0,0,1,1,0,1])

We then define the ECE function as follows:

def expected_calibration_error(samples, true_labels, M=3):
# uniform binning approach with M number of bins
bin_boundaries = np.linspace(0, 1, M + 1)
bin_lowers = bin_boundaries[:-1]
bin_uppers = bin_boundaries[1:]

# keep confidences / predicted "probabilities" as they are
confidences = samples
# get binary class predictions from confidences
predicted_label = (samples>0.5).astype(float)

# get a boolean list of correct/false predictions
accuracies = predicted_label==true_labels

ece = np.zeros(1)
for bin_lower, bin_upper in zip(bin_lowers, bin_uppers):
# determine if sample is in bin m (between bin lower & upper)
in_bin = np.logical_and(confidences > bin_lower.item(), confidences <= bin_upper.item())
# can calculate the empirical probability of a sample falling into bin m: (|Bm|/n)
prop_in_bin = in_bin.astype(float).mean()

if prop_in_bin.item() > 0:
# get the accuracy of bin m: acc(Bm)
accuracy_in_bin = accuracies[in_bin].astype(float).mean()
# get the average confidence of bin m: conf(Bm)
avg_confidence_in_bin = confidences[in_bin].mean()
# calculate |acc(Bm) - conf(Bm)| * (|Bm|/n) for bin m and add to the total ECE
ece += np.abs(avg_confidence_in_bin - accuracy_in_bin) * prop_in_bin
return ece

Calling the function on the binary example returns the same value as we calculated above 0.23778 (rounded).

expected_calibration_error(samples, true_labels)

You should now know how to calculate ECE for binary classification by hand and using numpy

In addition to the binary example, we can also add the option for multi-class classification with few lines of extra code. Let’s use James D. McCaffrey’s example. This gives us 5 target classes and the associated sample confidences. We really only need the target indices for our calculation: [0,1,2,3,4] and can, with regard to ECE, ignore the label that they correspond to. Looking at sample i=1, we can see that instead of just one estimated probability we now have an estimate associated with each class: [0.25,0.2,0.22,0.18,0.15].

# Multi-class Classification
samples_multi = np.array([[0.25,0.2,0.22,0.18,0.15],

true_labels_multi = np.array([0,2,3,4,2,0,1,3,3,2])

We now have to change the ‘confidences’ variable in our code to take the maximum value, as that one will now determine the predicted label. For sample i=1 the maximum estimated probability is 0.25.

if binary:
# keep confidences / predicted "probabilities" as they are
confidences = samples
# get binary predictions from confidences
predicted_label = (samples>0.5).astype(float)
# get max probability per sample i
confidences = np.max(samples, axis=1)
# get predictions from confidences (positional in this case)
predicted_label = np.argmax(samples, axis=1).astype(float)

In order to get the predicted label we now have to change the ‘predicted_label’ variable to take the argmax over the samples, which for i=1 would give us the index 0 corresponding to the label ‘democrat’.

Give the Google Colab Notebook a go and try it out for yourself in numpy or PyToch (see below).

Now you can also calculate ECE for multi-class classification 🙂

Source link

Leave a Comment