Here we’ll see some code that can convince us about the rough equivalence of the 2 approaches.

We’ll take 1000 batches of a randomly generated 1×1 image with 3 channels, and see if the manually computed mean and variance are similar to the ones computed using PyTorch’s BatchNorm2d layer.

`torch.manual_seed(21)`

num_channels = 3# Example tensor so that we can use randn_like() below.

y = torch.randn(20, num_channels, 1, 1)

model = nn.BatchNorm2d(num_channels)

# nb is a dict containing the buffers (non-trainable parameters)

# of the BatchNorm2d layer. Since these are non-trainable

# parameters, we don't need to run a backward pass to update

# these values. They will be updated during the forward pass itself.

nb = dict(model.named_buffers())

print(f"Buffers in BatchNorm2d: {nb.keys()}n")

stacked = torch.tensor([]).reshape(0, num_channels, 1, 1)

for i in range(2000):

x = torch.randn_like(y)

y_hat = model(x)

# Save all the input tensor into 'stacked' so that

# we can compute the mean and variance later.

stacked = torch.cat([stacked, x], dim=0)

# end for

print(f"Shape of stackend tensor: {stacked.shape}n")

smean = stacked.mean(dim=(0, 2, 3))

svar = stacked.var(dim=(0, 2, 3))

print(f"Manually Computed:")

print(f"------------------")

print(f"Mean: {smean}nVariance: {svar}n")

print(f"Computed by BatchNorm2d:")

print(f"------------------------")

rm, rv = nb['running_mean'], nb['running_var']

print(f"Mean: {rm}nVariance: {rv}n")

print(f"Mean Absolute Differences:")

print(f"--------------------------")

print(f"Mean: {(smean-rm).abs().mean():.4f}, Variance: {(svar-rv).abs().mean():.4f}")

You can see that output of the code cell below.

`Buffers in BatchNorm2d: dict_keys(['running_mean', 'running_var', 'num_batches_tracked'])`Shape of stackend tensor: torch.Size([40000, 3, 1, 1])

Manually Computed:

------------------

Mean: tensor([0.0039, 0.0015, 0.0095])

Variance: tensor([1.0029, 1.0026, 0.9947])

Computed by BatchNorm2d:

------------------------

Mean: tensor([-0.0628, 0.0649, 0.0600])

Variance: tensor([1.0812, 1.0318, 1.0721])

Mean Absolute Differences:

--------------------------

Mean: 0.0602, Variance: 0.0616

We started with a random tensor initialized using torch.randn_like(), so we expect that over a sufficiently large (40k) number of samples, the mean and variance will tend to 0.0 and 1.0 respectively, since that’s what we expect torch.randn_like() to generate.

We see that the difference between the manually computed mean and variance over the entire input and the mean and variance computed using BatchNorm2d’s rolling average based method is close enough for all practical purposes. We can see that the means computed using BatchNorm2d are consistently higher or lower (by up to 40x) than those computed manually. However, in practical terms, this should not matter.