## Numpy

First we will set up the same example from above:

`import numpy as np`# Binary Classification

samples = np.array([0.22, 0.64, 0.92, 0.42, 0.51, 0.15, 0.70, 0.37, 0.83])

true_labels = np.array([0,1,0,0,0,1,1,0,1])

We then define the ECE function as follows:

`def expected_calibration_error(samples, true_labels, M=3):`

# uniform binning approach with M number of bins

bin_boundaries = np.linspace(0, 1, M + 1)

bin_lowers = bin_boundaries[:-1]

bin_uppers = bin_boundaries[1:]# keep confidences / predicted "probabilities" as they are

confidences = samples

# get binary class predictions from confidences

predicted_label = (samples>0.5).astype(float)

# get a boolean list of correct/false predictions

accuracies = predicted_label==true_labels

ece = np.zeros(1)

for bin_lower, bin_upper in zip(bin_lowers, bin_uppers):

# determine if sample is in bin m (between bin lower & upper)

in_bin = np.logical_and(confidences > bin_lower.item(), confidences <= bin_upper.item())

# can calculate the empirical probability of a sample falling into bin m: (|Bm|/n)

prop_in_bin = in_bin.astype(float).mean()

if prop_in_bin.item() > 0:

# get the accuracy of bin m: acc(Bm)

accuracy_in_bin = accuracies[in_bin].astype(float).mean()

# get the average confidence of bin m: conf(Bm)

avg_confidence_in_bin = confidences[in_bin].mean()

# calculate |acc(Bm) - conf(Bm)| * (|Bm|/n) for bin m and add to the total ECE

ece += np.abs(avg_confidence_in_bin - accuracy_in_bin) * prop_in_bin

return ece

Calling the function on the **binary example** returns the same value as we calculated above ** 0.23778 **(rounded).

`expected_calibration_error(samples, true_labels)`

You should now know how to calculate ECE for binary classification by hand and using numpy

In addition to the binary example, we can also add the ** option** for multi-class classification with few lines of extra code. Let’s use

*James D. McCaffrey’s*example

*.*This gives us 5 target classes and the associated sample confidences. We really only need the target indices for our calculation: [0,1,2,3,4] and can, with regard to ECE, ignore the label that they correspond to. Looking at sample

**, we can see that instead of just one estimated probability we now have an estimate associated with each class: [0.25,0.2,0.22,0.18,0.15].**

*i=1*`# Multi-class Classification`

samples_multi = np.array([[0.25,0.2,0.22,0.18,0.15],

[0.16,0.06,0.5,0.07,0.21],

[0.06,0.03,0.8,0.07,0.04],

[0.02,0.03,0.01,0.04,0.9],

[0.4,0.15,0.16,0.14,0.15],

[0.15,0.28,0.18,0.17,0.22],

[0.07,0.8,0.03,0.06,0.04],

[0.1,0.05,0.03,0.75,0.07],

[0.25,0.22,0.05,0.3,0.18],

[0.12,0.09,0.02,0.17,0.6]])true_labels_multi = np.array([0,2,3,4,2,0,1,3,3,2])

We now have to change the ‘** confidences’** variable in our code to take the maximum value, as that one will now determine the predicted label. For sample

**the maximum estimated probability is**

*i=1***.**

*0.25*`if binary:`

# keep confidences / predicted "probabilities" as they are

confidences = samples

# get binary predictions from confidences

predicted_label = (samples>0.5).astype(float)

else:

# get max probability per sample i

confidences = np.max(samples, axis=1)

# get predictions from confidences (positional in this case)

predicted_label = np.argmax(samples, axis=1).astype(float)

In order to get the predicted label we now have to change the ‘** predicted_label’** variable to take the

*argmax*over the samples, which for

**would give us the index**

*i=1***corresponding to the label ‘democrat’.**

*0*Give the ** Google Colab Notebook **a go and try it out for yourself in numpy or PyToch (see below).

Now you can also calculate ECE for multi-class classification 🙂