Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

KV cache pre-build for RLV: question latency 35s → 5s #83

Open
Labels
enhancementNew feature or request

Description

Summary

Pre-compute and persist KV caches for each document chunk during indexing, eliminating prefill overhead at query time. This is the single highest-impact speed optimization for RLV.

Current State

Each RLV question requires:

Locator (BM25): 0.01s ← fast
Lookup (LLM): 15-20s ← prefill 8s + generate 7s
Verifier: 0-8s
Total: ~35s/question

The 8s prefill is spent re-reading the same chunk text every time. For 20 questions on the same document, we prefill the same chunks ~20 times.

Proposed Solution

# One-time indexing (slow, ~5min for 1.3MB doc)
quantcpp index document.txt --output document.kv/
# Per-question (fast, ~5s)
quantcpp rlv --index document.kv/ "Who directed Mercury Fur?"

Implementation:

# During indexing:
for chunk in gist.chunks:
 ctx = quant_new(model, config)
 quant_generate(ctx, chunk.text, null_callback, null) # prefill only
 quant_save_context(ctx, f"document.kv/chunk_{chunk.id}.kv")
# During query:
ctx = quant_new(model, config)
quant_load_context(ctx, f"document.kv/chunk_{best_id}.kv") # instant
quant_generate(ctx, question, on_token, data) # generate only (~5s)

Impact

Metric Before After
Per-question latency 35s ~5s
20-question benchmark 12min ~2min
First-question latency 35s 35s (indexing amortized)

quant.cpp Advantage

save_context/load_context is unique to quant.cpp — no other inference engine provides this. Combined with KV compression (6.4x), each chunk's cache is only a few hundred KB on disk.

Priority: P0

This is the difference between "demo" and "usable product". 35s/question is a demo; 5s/question is a tool people actually use.


Proposed by ClawTeam based on RLV Day 5 benchmarking

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /