# Expected Calibration Error (ECE) – a step by step visual explanation | by Maja Pavlovic | Jul, 2023

## Numpy

First we will set up the same example from above:

`import numpy as np# Binary Classificationsamples = np.array([0.22, 0.64, 0.92, 0.42, 0.51, 0.15, 0.70, 0.37, 0.83])true_labels = np.array([0,1,0,0,0,1,1,0,1])`

We then define the ECE function as follows:

`def expected_calibration_error(samples, true_labels, M=3):# uniform binning approach with M number of binsbin_boundaries = np.linspace(0, 1, M + 1)bin_lowers = bin_boundaries[:-1]bin_uppers = bin_boundaries[1:]# keep confidences / predicted "probabilities" as they areconfidences = samples# get binary class predictions from confidencespredicted_label = (samples>0.5).astype(float)# get a boolean list of correct/false predictionsaccuracies = predicted_label==true_labelsece = np.zeros(1)for bin_lower, bin_upper in zip(bin_lowers, bin_uppers):# determine if sample is in bin m (between bin lower & upper)in_bin = np.logical_and(confidences > bin_lower.item(), confidences <= bin_upper.item())# can calculate the empirical probability of a sample falling into bin m: (|Bm|/n)prop_in_bin = in_bin.astype(float).mean()if prop_in_bin.item() > 0:# get the accuracy of bin m: acc(Bm)accuracy_in_bin = accuracies[in_bin].astype(float).mean()# get the average confidence of bin m: conf(Bm)avg_confidence_in_bin = confidences[in_bin].mean()# calculate |acc(Bm) - conf(Bm)| * (|Bm|/n) for bin m and add to the total ECEece += np.abs(avg_confidence_in_bin - accuracy_in_bin) * prop_in_binreturn ece`

Calling the function on the binary example returns the same value as we calculated above 0.23778 (rounded).

`expected_calibration_error(samples, true_labels)`

You should now know how to calculate ECE for binary classification by hand and using numpy

In addition to the binary example, we can also add the option for multi-class classification with few lines of extra code. Let’s use James D. McCaffrey’s example. This gives us 5 target classes and the associated sample confidences. We really only need the target indices for our calculation: [0,1,2,3,4] and can, with regard to ECE, ignore the label that they correspond to. Looking at sample i=1, we can see that instead of just one estimated probability we now have an estimate associated with each class: [0.25,0.2,0.22,0.18,0.15].

`# Multi-class Classificationsamples_multi = np.array([[0.25,0.2,0.22,0.18,0.15],[0.16,0.06,0.5,0.07,0.21],[0.06,0.03,0.8,0.07,0.04],[0.02,0.03,0.01,0.04,0.9],[0.4,0.15,0.16,0.14,0.15],[0.15,0.28,0.18,0.17,0.22],[0.07,0.8,0.03,0.06,0.04],[0.1,0.05,0.03,0.75,0.07],[0.25,0.22,0.05,0.3,0.18],[0.12,0.09,0.02,0.17,0.6]])true_labels_multi = np.array([0,2,3,4,2,0,1,3,3,2])`

We now have to change the ‘confidences’ variable in our code to take the maximum value, as that one will now determine the predicted label. For sample i=1 the maximum estimated probability is 0.25.

`if binary:# keep confidences / predicted "probabilities" as they areconfidences = samples# get binary predictions from confidencespredicted_label = (samples>0.5).astype(float)else:                                          # get max probability per sample i                 confidences = np.max(samples, axis=1)               # get predictions from confidences (positional in this case)predicted_label = np.argmax(samples, axis=1).astype(float)`

In order to get the predicted label we now have to change the ‘predicted_label’ variable to take the argmax over the samples, which for i=1 would give us the index 0 corresponding to the label ‘democrat’.

Give the Google Colab Notebook a go and try it out for yourself in numpy or PyToch (see below).

Now you can also calculate ECE for multi-class classification 🙂