-
Notifications
You must be signed in to change notification settings - Fork 12
ctx ai backend #92
Description
ctx ai backend
Problem
ctx today has no AI backend abstraction. Users who want local-first
AI capabilities (running on vLLM, Ollama, LM Studio, or a
self-hosted OpenAI-compatible server) have no first-class wiring
through ctx — they configure their downstream AI tool (Claude Code,
OpenCode, etc.) by hand. Two user-visible pains follow:
- Users may not know what they do not know. Inference is opaque
to non-ML users; "setANTHROPIC_BASE_URLto your vLLM endpoint"
is not a recipe non-experts can act on without help. - Cloud LLMs break air-gap.
ctx's thesis (Invariant 5,
local-first / air-gap capable) is currently honoured only for the
persistence layer. AI-shaped capabilities are off-limits to
classified, isolated, or constrained-environment projects because
the only paths are cloud APIs.
Block A is the foundation. It provides the abstraction that blocks
B (structured extraction) and C (embedding-backed recall) build on
later, while staying inside the six invariants (Markdown-on-filesystem,
zero runtime deps for core functionality, deterministic assembly,
human authority, local-first/air-gap, no default telemetry).
Approach
ctx grows an optional, local-first AI backend layer that talks
to any OpenAI-compatible HTTP endpoint. vLLM is the canonical local
backend (it restores air-gap capability, supports
schema-constrained structured outputs, and ships prefix caching that
rewards stable-prefix prompt structure). The same contract works
against OpenAI, Anthropic, Ollama, LM Studio, and any other
OpenAI-compatible server.
The layer is strictly additive:
- Existing commands (
ctx status,ctx agent, ceremonies, hooks)
keep working with no backend configured. The deterministic core
is untouched. - AI commands fail closed with a clear "no backend reachable"
error rather than degrading silently to a non-AI path. There is
no "use vLLM if available, fall back to deterministic" behaviour. - The contract floor is OpenAI-compatible HTTP. Anthropic Messages
is treated as a strict superset that some backends also support,
not as a competing contract.
Block A delivers four things:
- A backend registry with one entrypoint per known backend type
(vllm,openai,anthropic,ollama,lmstudio, generic
openai-compatible). - A
ctx setup --backend <name>extension to the existing setup
family that templates endpoint + auth wiring into.ctxrcand
(where applicable) downstream AI-tool configs. - An AI command surface (shape TBD — see Open Questions) that
exposes one or more verbs against the configured backend. The
minimum viable verb set isping(reachability) plus one
structured-output consumer from block B chosen during A's
implementation to validate the pattern end-to-end (see Testing). - Reachability/health checks and fail-closed error reporting.
Behavior
Happy Path
- User runs
ctx setup --backend vllm --endpoint http://localhost:8000
(exact flag shape TBD). ctx writes the backend definition to
.ctxrcunder a new[backends.vllm]table (or equivalent —
exact key TBD). - User runs
ctx ai ping(or equivalent). ctx reads the backend
definition, performs a reachability check against the endpoint
(HTTP GET on/v1/modelsfor OpenAI-compatible servers), and
reports the backend name, endpoint, and first model listed. - User runs the validation consumer command from block B (e.g.,
ctx compact --emit decisions,learnings,tasks <input>— exact
shape lives in the B+C supplementary spec). ctx routes the
request through the configured backend, receives a
schema-constrained JSON response, and writes a proposed-patch
artifact under the proposal queue (location TBD; see Open
Questions) — never directly to.context/*.md. - Existing ceremonies (
/ctx-remember,/ctx-wrap-up, etc.) work
exactly as before.ctx statusandctx agentare untouched.
Edge Cases
| Case | Expected behavior |
|---|---|
| Backend unreachable | Fail closed with a clear error naming the configured endpoint and suggesting ctx setup --backend <name> --endpoint <url>. No fallback. |
| Backend reachable but model unavailable | Surface upstream 4xx verbatim; do not retry with a different model. |
Multiple backends configured (e.g., vllm + openai) |
User must specify --backend <name> on the AI command, or set a default via .ctxrc [backends].default. No implicit selection. |
| No backend configured | AI commands print: "no backend configured; run ctx setup --backend <name>" and exit non-zero. Non-AI commands are unaffected. |
| API key missing or invalid | Surface upstream auth error verbatim; do not retry. Suggest the env-var or .ctxrc key the backend reads. |
| Slow backend | Respect timeout from .ctxrc [backends.<name>].timeout (default TBD). No infinite waits. |
| AI command invoked from inside a deterministic ceremony hook | Fail closed. Coupling deterministic-core hooks to AI availability would violate Invariant 2 ("zero runtime deps for core functionality"). |
Existing ANTHROPIC_BASE_URL already set in user env |
Honour it; do not overwrite. ctx setup --backend prints a warning if the env vars it would template conflict with what's already set. |
.ctxrc malformed (e.g., missing required key) |
Refuse with a clear parse error naming the offending key; do not silently default. |
Validation Rules
| Field | Rule | Enforced where |
|---|---|---|
| Backend name | Alphanumeric + hyphen; must match a registered backend type | At setup time and at AI-command dispatch |
| Endpoint URL | Must parse as http:// or https://; localhost recommended (not required) for vllm-canonical backend |
At setup time |
| API key (if any) | Read from env-var (preferred) or .ctxrc; never allowed in a committed .ctxrc if the project's git config marks it as such |
Setup-time warning; commit hook (out of scope here, but flagged in Non-Goals) |
| Default backend | If [backends].default is set, must reference a configured backend |
At AI-command dispatch |
| Determinism boundary | ctx ai commands must not be invoked by ctx agent, ctx status, or any hook that fires during deterministic ceremony paths |
Unit test guard (see Testing) |
Error Handling
| Error condition | User-facing message | Recovery |
|---|---|---|
| No backend configured | no backend configured; run \ctx setup --backend `` |
Run setup |
| Backend unreachable | backend \` unreachable at : ` |
Check endpoint; verify vllm/ollama/etc. is running |
| Model not found | (relay upstream 4xx body verbatim) + backend \` rejected the model selection; check `/v1/models` on the endpoint` |
Pick a listed model |
| Auth failed | (relay upstream 401/403 verbatim) + backend \` rejected the credential; check <env-var or .ctxrc key>` |
Update credential |
| Timeout | backend \` timed out after ; tune `[backends.].timeout` in .ctxrc` |
Increase timeout or use a faster model |
| Multiple backends, none specified | multiple backends configured; pass \--backend ` or set `[backends].default` in .ctxrc` |
Pass flag or set default |
| AI command called from deterministic hook | (developer-only) ctx ai called from deterministic context; this would violate Invariant 2 |
Restructure hook |
Interface
CLI
Open question — the brief explicitly punts this to A's spec
author. Two shapes were enumerated; A's implementation must
commit to one. The decision is expensive to unwind (users will
script against whichever ships).
Option 1 — new top-level ctx ai namespace:
ctx ai ping [--backend <name>]
ctx ai <verb> [--backend <name>] [verb-specific flags]
Option 2 — flags on existing commands:
ctx compact <input> --emit <kinds> --use-ai [--backend <name>]
ctx ingest <input> --extract <kinds> [--backend <name>]
| Flag | Type | Default | Description |
|---|---|---|---|
--backend |
string | (resolved from .ctxrc [backends].default) |
Selects which configured backend to dispatch through |
--endpoint (setup only) |
URL | (per-backend) | Endpoint override at setup time |
--api-key-env (setup only) |
string | (per-backend) | Name of the env-var the backend reads for auth |
| (additional flags TBD with interface decision) |
Skill (if applicable)
TBD. Likely a /ctx-ai-setup companion skill that wraps
ctx setup --backend, but the brief is silent and the
existing ctx setup family already has skills (/ctx-setup,
per-tool variants) that could absorb this. Decide during
implementation.
Implementation
Files to Create/Modify
| File | Change |
|---|---|
internal/backend/ |
New package. Backend registry, contract types (Backend, Request, Response), per-backend implementations (vllm.go, openai.go, anthropic.go, ollama.go, lmstudio.go, openaicompat.go) |
internal/cli/ai/ |
New package (if Option 1). Command surface for the ai namespace |
internal/cli/setup/cmd/root/ |
Extend with --backend handling; templating into .ctxrc |
internal/cli/setup/core/backend/ |
New subpackage. Setup-time wiring per backend type (env-var templates, downstream-tool config writes) |
internal/rc/ |
Add [backends] table parsing and validation |
.ctxrc (project-init template) |
Add commented-out [backends] skeleton |
internal/assets/context/AGENT_PLAYBOOK.md |
(TBD) note that ctx ai <verb> exists and when agents should call it vs. hand-rolling against the AI tool |
docs/recipes/ |
New recipe local-inference-with-vllm.md (or ai-backend-setup.md); the user explicitly carved out recipe-restructuring work, so this is one recipe added, not a recipe-surface rework |
docs/cli/ |
New page documenting ctx ai (or the chosen flag surface) |
Key Functions
TBD pending interface decision. Skeleton contract:
// internal/backend/contract.go type Backend interface { Name() string Ping(ctx context.Context) error Complete(ctx context.Context, req Request) (Response, error) } type Registry interface { Register(name string, factory func(cfg Config) (Backend, error)) Resolve(name string) (Backend, error) Default() (Backend, error) // honours .ctxrc [backends].default }
Helpers to Reuse
internal/cli/setup/— existing setup family (Claude Code,
OpenCode, Cursor, etc.). The--backendextension follows the
same templating-into-config-files pattern.internal/rc/—.ctxrcparsing and validation. Adding a
[backends]table follows the existing TOML pattern.internal/err/— typed-string sentinels for backend errors
(per the recententity.Sentinelconvention).internal/assets/commands/text/errors.yaml— externalised
error strings.
Configuration
.ctxrc additions (proposed shape — final key names TBD):
[backends] default = "vllm" # optional; required only if more than one backend is configured [backends.vllm] endpoint = "http://localhost:8000" api_key_env = "" # vllm typically runs without auth; empty means none timeout = "30s" default_model = "openai/gpt-oss-120b" [backends.openai] endpoint = "https://api.openai.com" api_key_env = "OPENAI_API_KEY" timeout = "60s" default_model = "gpt-4o"
Environment variables: only read from env-vars named in
api_key_env. ctx never writes credentials to .ctxrc.
Testing
- Unit: Backend registry resolution (single, multiple, default,
missing);.ctxrcparsing (well-formed, missing required key,
malformed table); error sentinels match expected strings. - Integration: Spin up a fake OpenAI-compatible HTTP server (a
tinyhttptest.Serverper backend test) and drive each backend's
Ping+ oneCompletecall against it. Verify fail-closed
behaviour when the server returns 401/404/500/timeout. - Edge cases: Backend unreachable; multiple backends without
selector; no backend configured; AI-command-from-deterministic-hook
guard (compile-time or test-time check thatctx agent/
ctx status/ canonical hooks do not importinternal/backend). - Validation consumer: One end-to-end test that exercises the
chosen B-block validation command (ctx compact ... --emit decisions,learnings,tasksor equivalent) against the fake
server, asserts a proposed-patch artifact is written to the
proposal queue, and asserts.context/*.mdfiles are
unchanged.
Non-Goals
This spec explicitly does not cover:
- Cost management. No usage tracking, no billing integration,
no cost attribution. Out of manifesto; would dilute the product. - Secrets management. ctx reads from env-vars or
.ctxrc; it
does not encrypt, rotate, or vault credentials. Vault/Doppler/etc.
are separate products. - Observability backends. No metrics emission, no time-series
storage, no dashboards. The thesis's "no default telemetry"
(Invariant 6) governs. - Multi-backend routing daemon / HTTP proxy. Killed in the
brief — structurally cannot solve the failure modes it was
proposed for, and forces ctx into service-shape. - Embeddings, semantic search, vector storage. Block C.
Supplementary spec. - Structured extraction commands. Block B (other than the one
validation consumer needed to prove A works end-to-end).
Supplementary spec. - Automatic application of AI-generated content to
.context/*.md.
All AI-produced edits land as proposed patches in a review
queue; ratification is human (or agent) via existing ceremony
paths. - Replacing
/ctx-wrap-upor any existing ceremony skill. AI
commands augment ceremonies via the proposal queue; they do not
short-circuit them. - Touching
ctx agentor its deterministic assembly. Sibling,
not replacement. - Recipe-surface restructuring. Adding one recipe for AI
backend setup is in scope; mapping vLLM's index-style taxonomy
ontodocs/recipes/is explicitly out (rejected in the brief).
Open Questions
These were left open in the brief and must be settled during A's
implementation:
- CLI namespace shape:
ctx ai <verb>(Option 1) vs. flags
on existing commands (Option 2). Expensive to unwind. Pick once. - Proposal queue location:
.context/proposals/? Kb-closeout-style
under.context/ingest/? Belongs to B+C supplementary, but A's
validation consumer must write somewhere — pick a provisional
location that B+C can confirm or relocate. - Default extraction model: A-spec leaves model choice to the
user; recommended models per task type can be a recipe. - Companion skill:
/ctx-ai-setupor absorb into existing
/ctx-setup? - Validation consumer:
ctx compactis the cheapest validator
per the brief, but the exact command/flag shape lands in the
B+C supplementary spec. A's implementation needs to pick one
B-block consumer to ship alongside A; the others wait.
Task Breakdown
Paste-ready rows for .context/TASKS.md. Each is a single block.
Ordering reflects what blocks what — earlier rows must land before
later rows. The Spec: reference points at this file.
-
Decide the AI command CLI namespace:
ctx ai <verb>(new
top-level) vs. flags on existing commands (--use-ai,--emit,
etc.). Foundational; expensive to unwind once shipped. Record
the call as a.context/DECISIONS.mdentry naming the chosen
shape and the rejected alternative with rationale. Blocks every
other task in this group. Spec:specs/ctx-ai-backend.mdOpen
Question deps: Bump actions/checkout from 4 to 6 #1 . #priority:medium #added:2026年05月21日 -
Implement the backend contract and registry: new
internal/backend/package withBackendinterface (Name,
Ping,Complete),Request/Responsetypes, and a
Registry(Register,Resolve,Default). No per-backend
implementations yet; this is the abstraction surface that later
tasks plug into. Unit tests cover single/multiple/default/missing
backend resolution. Spec:specs/ctx-ai-backend.md§Implementation.
#priority:medium #added:2026年05月21日 -
Extend
internal/rc/to parse and validate the.ctxrc
[backends]table per the spec's Configuration section:
per-backendendpoint,api_key_env,timeout,default_model,
plus optional[backends].default. Refuse malformed tables with
a clear parse error naming the offending key. Add fixtures and
round-trip tests. Spec:specs/ctx-ai-backend.md§Configuration.
#priority:medium #added:2026年05月21日 -
Implement the minimum viable backend set:
vllm(canonical
local) and genericopenai-compatible(the contract floor) in
internal/backend/vllm.goandinternal/backend/openaicompat.go.
Both must implementPing(HTTP GET on/v1/models) and
Complete(POST/v1/chat/completions). Fail closed on
unreachable / 4xx / 5xx / timeout; never retry with a different
model. Spec:specs/ctx-ai-backend.md§Approach and §Edge Cases.
#priority:medium #added:2026年05月21日 -
Add the named-backend implementations:
openai,anthropic,
ollama,lmstudioininternal/backend/. Each is a thin
wrapper overopenaicompatwith backend-specific defaults
(endpoint, auth header shape, env-var name). Anthropic uses the
Messages API endpoint where supported but inherits the
OpenAI-compatible floor for/v1/chat/completions. Spec:
specs/ctx-ai-backend.md§Approach. #priority:medium
#added:2026年05月21日 -
Extend the
ctx setupfamily with--backend <name>:
templates endpoint + auth wiring into.ctxrcand (where
applicable) downstream AI-tool configs (ANTHROPIC_BASE_URL,
OPENAI_BASE_URL). Honours existing env-var values: warn but
do not overwrite. Lives in newinternal/cli/setup/core/backend/
subpackage. Spec:specs/ctx-ai-backend.md§Implementation.
#priority:medium #added:2026年05月21日 -
Build the AI command surface per the namespace decision from
the first task. Minimum verbs:ping(reachability + first model
listed) plus the validation consumer chosen below. All AI
commands honour--backendflag (falls back to
[backends].default), fail closed when no backend configured,
and surface upstream errors verbatim. Spec:
specs/ctx-ai-backend.md§Interface. #priority:medium
#added:2026年05月21日 -
Add the deterministic-core boundary guard: a unit test (or
lint check) that fails ifinternal/cli/agent/,
internal/cli/status/, or any deterministic-ceremony hook
importsinternal/backend/. This is the structural enforcement
for Invariant 2 — without it, the additive/optional discipline
is honour-system only. Spec:specs/ctx-ai-backend.md§Validation
Rules and §Testing. #priority:medium #added:2026年05月21日 -
Ship the validation consumer from block B: pick one
extraction command (the spec recommendsctx compact <input> --emit decisions,learnings,tasks,open-questionsas the cheapest
per the brief). Implements the full pattern end-to-end:
schema-constrained dispatch through the backend, JSON validation,
proposal artifact written to the provisional proposal queue
location (settled later by the B+C spec)..context/*.mdfiles
must remain unchanged. Integration test confirms the round-trip
against a fake OpenAI-compatible httptest server. Spec:
specs/ctx-ai-backend.md§Testing and Open Question cleanup before v0.2.0 release cut. #5 .
#priority:medium #added:2026年05月21日 -
Write the documentation deliverables: one new recipe
(docs/recipes/local-inference-with-vllm.mdor
docs/recipes/ai-backend-setup.md) covering the
ctx setup --backend vllmflow end-to-end, plus a CLI reference
page underdocs/cli/for whichever command surface the
namespace decision produced. The recipe is one file, not a
recipe-surface rework — that scope was explicitly rejected in
the brief. Spec:specs/ctx-ai-backend.md§Non-Goals.
#priority:medium #added:2026年05月21日
After all ten land, the B + C supplementary spec
(specs/ctx-ai-extraction-and-recall.md) gets re-debated with
A's surface as ground truth, then promoted to contract specs of
its own before B/C implementation work is broken into tasks.