Vector quantization and compression for efficient memory usage and search performance
Efficient management of high-dimensional vector data is crucial for scalable search and retrieval. Advanced methods for vector quantization and compression, such as LVQ (Locally-adaptive Vector Quantization) and LeanVec, can dramatically optimize memory usage and improve search speed, without sacrificing much accuracy. This page describes practical approaches to quantizing and compressing vectors for scalable search.
SVS-VAMANA with COMPRESSION will fall back to basic, 8-bit scalar quantization implementation: all values in a vector are scaled using the global minimum and maximum, and then each dimension is quantized independently into 256 levels using 8-bit precision.REDUCE argument. The default is typically input dimension / 2, but more aggressive reduction (such as input dimension / 4) is possible for greater efficiency.| Compression type | Best for | Observations |
|---|---|---|
| LVQ4x4 | Fast search and low memory use | Consider LeanVec for even faster search |
| LeanVec4x8 | Fastest search and ingestion | LeanVec dimensionality reduction might reduce recall |
| LVQ4 | Maximum memory saving | Recall might be insufficient |
| LVQ8 | Faster ingestion than LVQ4x4 | Search likely slower than LVQ4x4 |
| LeanVec8x8 | Improved recall when LeanVec4x8 is insufficient | LeanVec dimensionality reduction might reduce recall |
| LVQ4x8 | Improved recall when LVQ4x4 is insufficient | Slightly worse memory savings |
Both LVQ and LeanVec support two-level compression schemes. LVQ's two-level compression works by first quantizing each vector individually to capture its main structure, then encoding the residual error—the difference between the original and quantized vector—using a second quantization step. This allows fast search using only the first level, with the second level used for re-ranking to boost accuracy when needed.
Similarly, LeanVec uses a two-level approach: the first level reduces dimensionality and applies LVQ to speed up candidate retrieval, while the second level applies LVQ to the original high-dimensional vectors for accurate re-ranking.
Note that the original full-precision embeddings are never used by either LVQ or LeanVec, as both operate entirely on compressed representations.
This two-level approach allows for:
The naming convention used for the configurations reflects the number of bits allocated per dimension at each level of compression.
Same notation is used for LeanVec.
The strong performance of LVQ and LeanVec stems from their ability to adapt to the structure of the input vectors. By learning compression parameters directly from the data, they achieve more accurate representations with fewer bits.
By default, Redis Open Source with the Redis Query Engine supports SVS-VAMANA indexing with the global 8-bit quantisation. To compile Redis with the Intel SVS-VAMANA optimisations, LeanVec and LVQ, for Intel platforms, follow the instructions below.
Follow the Redis Open Source build instructions. Before executing make, define the following environment variable.
export BUILD_INTEL_SVS_OPT=yes
Alternatively, you can define the BUILD_INTEL_SVS_OPT variable as part of the make command:
make BUILD_INTEL_SVS_OPT=yes