Product VQ splits the input vector x of dimension D to M independent subspaces of dimension D/M. Then it applies M independent VQ modules to the existing subspaces. At the end, Product VQ quantizes the input vector x as a concatenation of M closest codewords (one per each codebook). The figure below shows the Product VQ when M=3.

Vector quantization (VQ) training means to optimize the codebook(s) such that they model the data distribution in a way that the error of quantization (such as mean squared error) between data points and codebook elements is minimized. To optimize the codebooks for these three above-mentioned variants of VQ (Residual VQ, Additive VQ, and Product VQ) there are different approaches which we will mention in the following.

1. K-means Algorithm (traditional approach):
Based on the literature review, in most of the papers, codebooks for these three VQ methods have been optimized by k-means algorithm .

2. Stochastic Optimization (machine learning algorithms):
Machine learning optimization algorithms are based on gradient calculation. Therefore, it is impossible to optimize vector quantization methods using machine learning optimization, since the argmin function in vector quantization function (first equation above) is not differentiable. In other words, we cannot pass the gradients over vector quantization function in backpropagation. Here we have mentioned two solutions to solve this problem.

2.1. Straight Through Estimator (STE)
STE [10] solves the problem by simply copying the gradients intactly over VQ module in backpropagation. Hence, it does not consider the influence of vector quantization and leads to a mismatch between the gradients and true behavior of the VQ function.

2.2. Noise Substitution in Vector Quantization (NSVQ):
The NSVQ technique [11] is our recently proposed method, in which the vector quantization error is simulated by adding noise to the input vector, such that the simulated noise would gain the shape of original VQ error distribution (you can read shortly about NSVQ in this post ).

NSVQ technique [11] has some advantages over STE method [10] which are listed in the following. 1) NSVQ yields more accurate gradients for VQ function. 2) NSVQ achieves faster convergence for VQ training (codebook optimization). 3) NSVQ does not need any additional hyper-parameter tuning for VQ training (does not require additional loss term for VQ training to be added to the global optimization loss function).

In our paper, we have used our recently proposed NSVQ technique [11] to optimize three above-mentioned variants of VQ by machine learning optimization. To evaluate the performance of these three VQ methods and study the trade-offs between accuracy, bitrate, and complexity of them, we conducted four different scenarios for experiments. We will explain all these scenarios of experiments in the following.

1. Approximate Nearest Neighbor (ANN) Search
In this experiment, we modeled the distribution of SIFT1M dataset [12] (128-D image descriptors) by training three VQ methods on its learning set. The SIFT1M image descriptors dataset [12] includes 10⁶ base vectors, 10⁵ learning vectors, and 10⁴ query vectors for testing purposes. The ground truth contains the set of actual nearest neighbors, from the base vectors to the query vectors. In the ANN search, we first compress the base vectors using the corresponding learnt codebooks trained on the learning set. Then, for each query vector, we find the approximate nearest neighbors from the compressed base vectors by performing an exhaustive search. To assess the quality of data compression, we calculate the recall metric at different values for parameter T , which shows whether the actual nearest neighbor (from groundtruth) exists in the first T computed nearest neighbors. The figure below illustrates the comparison of three variants of VQ optimized by our proposed NSVQ technique with the baseline methods under recall metric . In general, all three machine learning-based optimized VQ methods achieve comparable (even slightly better in case of RVQ) recall values to the baselines.

2. Image Compression using VQ-VAE
In this experiment, we trained a vector quantized variational autoencoder (VQ-VAE) on the training set of CIFAR10 dataset to compress it. To apply the vector quantization in the bottleneck of VQ-VAE, we used each of these three VQ methods. After training, we reconstructed the test images of CIFAR10 using the trained encoder, decoder, and learnt codebooks for each VQ method. To evaluate the quality of reconstructed images, we employ Peak Signal to Noise Ratio (Peak SNR) metric. In addition, we computed the complexity of each VQ method using Weighted Million Operations Per Second (WMOPS) metric, which is under ITU-T standard . The following figure shows the results of this experiment.

According to the complexity figure (in the right), we found that for the same use of computational resources (left vertical red line) and a higher bitrate, Product VQ performs better than Residual VQ. In addition, for the same use of computational resources (right vertical red line) and a higher bitrate, Residual VQ performs better than Additive VQ. Therefore, depending on how much computational resources are available, we can conclude which is the best VQ method to use.

3. Speech Coding
In this experiment, we model the spectral envelope of speech signals by three VQ methods using the speech codec presented in [13]. To evaluate the quality of decoded speech signals, we used perceptual evaluation of speech quality (PESQ) and perceptually weighted signal to noise ratio (pSNR) as objective metrics. The following figure shows the performance of all three VQ methods under PESQ and pSNR criteria. According to the results, we observe that Additive VQ gains higher mean and lower variance than both Residual VQ and Product VQ in both metrics.

4. Toy Examples
In this experiment, we intend to compare the performance of three VQ methods with respect to the correlation in the data. Hence, we prepared two correlated and uncorrelated datasets of dimension 64. Then, we compressed these datasets using these three VQ methods. To evaluate the performance, we computed the mean squared error (MSE) between each dataset and its quantized version. The following figure shows the results for this experiment.

In correlated dataset, since Residual VQ and Additive VQ take the correlation among all data dimensions into account, they have much lower quantization error than Product VQ as expected. On the other hand, Product VQ has better performance than Additive VQ and Residual VQ for uncorrelated data, since there is no correlation among data dimensions and that is exactly what Product VQ presumes.

Using variants of Vector Quantization (VQ) such as Residual VQ, Additive VQ, and Product VQ allows to apply VQ for high bitrates and high dimensional data. These VQ methods have been optimized by classic expectation maximization and k-menas algorithm so far. In this paper, we optimize these VQ methods by machine learning optimization using our recently proposed Noise Substitution in Vector Quantization (NSVQ) [11] technique. In addition, NSVQ allows end-to-end optimization of VQ methods in Neural Netwroks. We also study the trade-offs between bitrate, accuracy, and complexity of these three VQ methods. Hence, our open-source implementation [14] helps to make the best choice of VQ method for a particular use-case.

We provide the PyTorch implementation of these VQ methods in the following webpage.