Missing ANN regime: 2–4-bit scalar-quantized FastScan search index #520

Open

@shaal

Description

@shaal

shaal

opened

on May 30, 2026

Missing ANN regime: 2–4-bit scalar-quantized FastScan search index

Summary

ruvector has strong ANN coverage but is missing the 2–4-bit scalar-quantized search regime that every mainstream production vector DB lives in (FAISS IndexPQFastScan, Qdrant, Milvus IVF-SQ). This issue makes the case for adding it as a small, self-contained crate (ruvector-turbovec, proposed in ADR-194). A working scalar-reference milestone (M1) is implemented and validated — see the companion PR #521 .

The gap

Crate	Index family	Quantization	Suitable for "search 10M vectors for nearest 10"?
`ruvector-core`	HNSW (graph)	none (f32)	yes, but full f32 memory
`ruvector-diskann`	DiskANN (graph)	none / PQ-ish	yes
`ruvector-rairs`	IVF (ADR-193)	optional	yes
`ruvector-rabitq`	Flat + rotation	1-bit only	yes, but 1-bit caps recall
`ruvllm` `turbo_quant.rs`	— (value codec)	2.5–4 bit	no — KV-cache compressor, not an index

Two things sit near this regime but neither closes it:

ruvector-rabitq is 1-bit. It's excellent when memory dominates, but on 1536–3072-dim production embeddings (OpenAI text-embedding-3, Cohere, Voyage) you need an exact-rerank pass over the raw f32 originals to hit production R@1 — which re-inflates memory back toward the f32 footprint, partly defeating the point.
ruvllm/src/quantize/turbo_quant.rs is a TurboQuant value codec for transformer KV-cache / embedding compression. It has no inverted lists, no top-k heap, no FastScan LUT kernel, no filtered/allowlist search, no stable IDs/deletion. Wrong abstraction for search.

What's missing, concretely

2–4 bits per dimension (not 1, not 32).
Codebook-free scalar quantization — no k-means training, online ingest.
A FastScan-style nibble-LUT SIMD kernel that scores 16/32 candidates per vector instruction without ever materializing f32.
Competitive recall without a mandatory f32 rerank pass.

Why it matters

Memory/recall trade-off that's actually new. The per-vector length-renormalization scalar (c_x) makes the inner-product estimator unbiased, so 2/4-bit recall holds without the rerank tax that 1-bit RaBitQ pays. The memory win is real, not borrowed-back at query time.
Parity with the field. It puts ruvector in the same regime as FAISS IndexPQFastScan / Milvus IVF-SQ, which is what most teams reach for when they outgrow flat f32 but can't afford 1-bit's recall hit.
Reuse, not sprawl. It implements the existing ruvector_rabitq::AnnIndex trait and reuses RandomRotation — so it drops into the same dispatcher/consumers with no new plumbing, and adds exactly one workspace crate.
Roadmap fit. The same block-SoA codes can later back an IVF posting list (IVF-SQ-FastScan) — a natural composition with ruvector-rairs (ADR-193). It's also adjacent to the in-flight SymphonyQG research (research(nightly): SymphonyQG — graph-coupled 4-bit FastScan neighbor scoring #443 , research(nightly): symphonyqg — co-designed 1-bit graph quantization (SIGMOD 2025) #428 : graph-coupled 4-bit FastScan scoring), and complements the merged TurboQuant KV-cache work (feat(ruvllm): TurboQuant KV cache & vector compression #297 ) by covering the search-index side those didn't.

Evidence it works (M1, measured)

n=5,000 uniform-random vectors (worst case for ANN), dim=256, k=10, no f32 rerank, vs exact brute-force L2:

Width	recall@10	compression	mean cosine bias
1-bit	0.308	×ばつ	+0.0005
2-bit	0.561	×ばつ	+0.0001
4-bit	0.879	×ばつ	−0.0000

Recall rises monotonically with bit-width; near-zero bias confirms the unbiased estimator. 16 unit + 1 doc-test pass; clippy clean. Reproduce with cargo run --release -p ruvector-turbovec.

Proposed path

Tracked by ADR-194. Incremental milestones:

M1 — scalar reference (this PR): rotation reuse + Lloyd–Max 2/4-bit SQ + TQ+ calibration + unbiased scoring + IdMapIndex (O(1) delete, filtered search). ✅ implemented & validated.
M2 — FastScan nibble-LUT SIMD kernel (AVX2 + NEON), fuzzed bit-identical to the scalar oracle.
M3 — .tv persistence.
M4 — AVX-512BW kernel + ruvector-rulake dispatcher registration.

The scalar M1 scorer is deliberately the determinism oracle the SIMD kernels must match bit-for-bit, so the perf work lands without risking correctness.

Ask

Adopt ADR-194 and land M1 (#521) as the foundation, then iterate on the SIMD milestones. Feedback welcome on the reuse boundary (rabitq trait/rotation), the ADR-193 IVF composition, and whether the Lloyd–Max math should be hoisted into a shared ruvector-math crate rather than living in the codec.

Metadata

Assignees

No one assigned

Labels

No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing ANN regime: 2–4-bit scalar-quantized FastScan search index #520

Description

Missing ANN regime: 2–4-bit scalar-quantized FastScan search index

Summary

The gap

What's missing, concretely

Why it matters

Evidence it works (M1, measured)

Proposed path

Ask

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions