diff --git a/.env.example b/.env.example index f96b135..6a4e090 100644 --- a/.env.example +++ b/.env.example @@ -445,6 +445,17 @@ TOON_MIN_BYTES=4096 TOON_FAIL_OPEN=true TOON_LOG_STATS=true +# Model price overrides: pin per-1M-token USD prices for models the pricing +# registry doesn't know (otherwise their cost is recorded as null/unknown). +# JSON object keyed by model name. Example: +# MODEL_PRICE_OVERRIDES={"my-model":{"input":0.5,"output":1.5}} + +# Caveman terse-output injection (opt-in): append a brevity instruction to the +# system prompt to reduce OUTPUT tokens. Off by default — changes model style. +# Levels: lite | full | ultra +CAVEMAN_ENABLED=false +CAVEMAN_LEVEL=lite + # ============================================================================== # Tiered Model Routing (REQUIRED) # ============================================================================== diff --git a/README.md b/README.md index 9935f54..765b430 100644 --- a/README.md +++ b/README.md @@ -545,6 +545,28 @@ TOOL_INJECTION_ENABLED=false CODE_MODE_ENABLED=true ``` +Always-on (no config): **smart tool selection** (server mode), **RTK tool-result +compression** (test/git/grep/lint/build/JSON output), **MCP tool dedup** (drops +built-in WebSearch/WebFetch when an Exa/Tavily MCP tool is present), and +**request bypass** (Claude CLI Warmup / title-extraction calls are answered +locally, never hitting a provider). + +Optional **terse-output mode** to cut *output* tokens: +```bash +CAVEMAN_ENABLED=true # off by default — nudges the model to be concise +CAVEMAN_LEVEL=lite # lite | full | ultra +``` + +### Cost tracking & model pricing +Per-request cost is computed from a model-pricing registry (LiteLLM → models.dev, +cached 24h) and recorded in telemetry. Models the registry doesn't know record +`cost_usd=null` (logged once) rather than a fabricated price. Pin prices for +unknown models: +```bash +# Per-1M-token USD prices, JSON keyed by model name +MODEL_PRICE_OVERRIDES={"my-model":{"input":0.5,"output":1.5}} +``` + ### Memory System (Titans-inspired) ```bash MEMORY_ENABLED=true @@ -652,35 +674,45 @@ npm start ## Benchmark Results -Measured on real agentic coding workloads (Claude Code / Cursor sessions) with Ollama, Moonshot, and Azure OpenAI backends. Run with `node benchmark-tier-routing.js`. +Head-to-head against **LiteLLM** on the **same backends** (Ollama `minimax-m2.5`, Moonshot, Azure OpenAI), 9 scenarios across 4 feature categories. Apples-to-apples comparison is Lynkr vs LiteLLM **billed tokens on the same scenario**. Run with `node benchmark-tier-routing.js`. -### Token compression +> _Run: June 5, 2026 · Lynkr v9.3.2 · LiteLLM v1.87.1 · macOS, Apple Silicon._ -| Scenario | Tokens without Lynkr | Tokens with Lynkr | Reduction | +### Token reduction (vs LiteLLM, same model & prompt) + +| Mechanism | Lynkr | LiteLLM | Result | |---|---|---|---| -| 14-tool request (read task) | 1,042 | **547** | **47%** | -| 14-tool request (write task) | 1,043 | **412** | **60%** | -| Large JSON grep result (60 items) | 3,458 | **427** | **87.6%** | +| Smart tool selection (14 tools) | **959** tokens · 0ドル.0044 | 2,085 tokens · 0ドル.0091 | **53% fewer tokens, 52% cheaper** | +| TOON compression (60-item grep JSON) | **427** tokens · 0ドル.009 | 3,458 tokens · 0ドル.018 | **87.6% fewer tokens, 50% cheaper** | -Lynkr strips irrelevant tool schemas before forwarding (smart tool selection) and binary-compresses large JSON tool results (TOON) — both happen in-process with no added latency. +Lynkr strips irrelevant tool schemas (smart tool selection) and binary-compresses large JSON tool results (TOON) — both in-process, no added latency. ### Semantic cache | | Tokens billed | Response time | |---|---|---| | First call (cold) | 2,857 | 1,891ms | -| **Second call — paraphrased, cache hit** | **0** | **171ms** | +| **Second call — paraphrased, cache hit** | **0** (served from cache) | **171ms (×ばつ faster)** | -Near-identical prompts return cached responses in 171ms. Zero tokens billed on a cache hit. +Near-identical prompts return cached responses in 171ms. Zero model tokens billed on a cache hit. ### Tier routing -| Request | Routed to | -|---|---| -| "What does git stash do?" | SIMPLE → local model (free) | -| JWT vs cookies security analysis | COMPLEX → cloud model (correct) | +| Request | Lynkr routes to | LiteLLM routes to | +|---|---|---| +| "What does git stash do?" | `minimax-m2.5` (local, free) | Ollama (local) | +| JWT vs cookies security analysis | `moonshot` (cloud — correct) | **Ollama (local — wrong call)** | + +Lynkr scores each request on 15 dimensions (token count, code complexity, reasoning markers, risk signals, agentic patterns) and escalates automatically. LiteLLM's `cost-based-routing` sends everything to the cheapest model regardless of complexity. + +### Cost projection (100,000 requests/month, same backend) + +| | Monthly cost | vs LiteLLM | +|---|---|---| +| LiteLLM | ~818ドル | baseline | +| **Lynkr** | **~409ドル** | **~50% cheaper** | -Lynkr scores each request on 15 dimensions (token count, code complexity, reasoning markers, risk signals, agentic patterns) and routes automatically. No caller changes needed. +_Based on a tool-heavy agentic session (TOON scenario). On equal footing — same provider, same model — Lynkr is cheaper due to token optimization._ → [Full benchmark report with methodology](BENCHMARK_REPORT.md) diff --git a/docs/index.html b/docs/index.html index 1c2c025..66570b6 100644 --- a/docs/index.html +++ b/docs/index.html @@ -34,7 +34,7 @@ "description": "Self-hosted LLM gateway for Claude Code, Cursor, and Codex. Compresses tokens before they hit the model.", "url": "https://github.com/Fast-Editor/Lynkr", "downloadUrl": "https://www.npmjs.com/package/lynkr", - "softwareVersion": "9.3.2", + "softwareVersion": "9.4.6", "author": { "@type": "Person", "name": "Vishal Veera Reddy", "url": "https://github.com/vishalveerareddy123" }, "offers": { "@type": "Offer", "price": "0", "priceCurrency": "USD" }, "keywords": "LLM gateway, Claude Code, Cursor, Ollama, AWS Bedrock, AI coding, self-hosted" @@ -72,7 +72,7 @@
-
v9.3.2 — benchmarked in production
+
v9.4.6 — benchmarked in production

The LLM gateway
diff --git a/docs/index.md b/docs/index.md index 4457b3c..02aa934 100644 --- a/docs/index.md +++ b/docs/index.md @@ -50,7 +50,7 @@ "description": "Self-hosted LLM gateway server that enables Claude Code, Cursor, and AI coding tools to work with any LLM provider with 60-80% cost reduction.", "url": "https://github.com/Fast-Editor/Lynkr", "downloadUrl": "https://www.npmjs.com/package/lynkr", - "softwareVersion": "9.3.2", + "softwareVersion": "9.4.6", "author": { "@type": "Person", "name": "Vishal Veera Reddy", @@ -107,7 +107,7 @@
- v9.3.2 — Production Ready + v9.4.6 — Production Ready

diff --git a/documentation/token-optimization.md b/documentation/token-optimization.md index 8e19a93..3fcaf4c 100644 --- a/documentation/token-optimization.md +++ b/documentation/token-optimization.md @@ -12,6 +12,7 @@ Lynkr reduces tokens sent to the model through multiple independent mechanisms. |---|---|---| | **Smart tool selection** | **47–60%** | 14-tool request (read or write task) | | **TOON JSON compression** | **87.6%** | Large grep/file-read tool result (60-item array) | +| **Tool-result compression (RTK)** | up to **87.6%** | grep/test/git/lint/build/log/JSON tool output | | **Semantic cache** | **100% on hit, 171ms** | Paraphrased repeat query | | MCP Code Mode | **96%** | 100+ MCP tool schemas → 4 meta-tools | | History compression | up to 80% | Long multi-turn sessions | @@ -45,7 +46,7 @@ At 100,000 requests/month on a tool-heavy agentic workload, this translates to * --- -## 7 Optimization Phases +## Optimization Phases ### Phase 0: MCP Code Mode (96% reduction for MCP tools) @@ -283,6 +284,58 @@ HISTORY_SUMMARIZE_OLDER=true # Summarize older turns (default: true) --- +### Phase 7: Tool-Result Compression (up to 87.6% on tool output) + +**Problem:** Tool results dominate agentic token usage. A single `grep`, test run, `git diff`, or JSON API response can be thousands of tokens — most of it boilerplate the model doesn't need to reason over. + +Lynkr compresses `tool_result` blocks **in-process before forwarding** (no added latency), via two complementary mechanisms. + +#### 7a. RTK pattern compression + +Detects the *shape* of a tool result and rewrites it to a compact, information-preserving summary. Each detector only fires when it recognizes the format; unrecognized text passes through unchanged. + +| Detector | What it compresses | Example outcome | +|----------|--------------------|-----------------| +| `test_output` | jest/vitest/pytest/cargo/go test logs | Keep the summary line + failures, drop passing-test noise | +| `git_diff` | `git diff` | Per-file `+adds/-dels` with capped change lines | +| `git_status` | `git status` | Branch + staged/modified/untracked lists | +| `git_log` | `git log` | One line per commit (` (author, date)`) | +| `lint_output` | eslint/tsc/ruff/clippy/biome | Counts grouped by rule, not every occurrence | +| `build_output` | npm/cargo/webpack | Errors + capped warnings + success line | +| `container_output` | docker/kubectl tables | Header + first N rows + "+M more" | +| `json_response` | large JSON objects | Structural skeleton (search/fetch results preserved) | +| `grep_output` | `grep`/`rg` (`file:line:content`) | Grouped by file, capped at 10 matches/file | +| `directory_listing` | `ls`/`find`/`tree` | Grouped by directory with counts | +| `large_file` | long source files | Imports + signatures skeleton | +| `dedup_log` | repetitive logs | Collapses consecutive duplicate lines | +| `smart_truncate` | very long unmatched output | Keeps head + tail, drops the middle | + +**Tier-aware thresholds** — compression only kicks in above a size that scales with the routing tier, so cheap models get aggressive compression and reasoning models get the full picture: + +| Tier | Compress if result exceeds | +|------|----------------------------| +| SIMPLE | 300 chars | +| MEDIUM | 800 chars | +| COMPLEX | 2,000 chars | +| REASONING | never | + +**Lossless recovery (tee):** the full original is stashed for 5 minutes and a pointer (`[full: tee_...]`) is appended to the compressed result. The model — or you — can fetch the original via `GET /tee/:id` if the detail is actually needed. + +Always on (no configuration). Metrics: `GET /metrics/tool-compression`. + +#### 7b. TOON compression (binary JSON encoding) + +For large JSON tool results (arrays of objects, API payloads), TOON re-encodes the structure into a far denser representation than pretty-printed JSON — **87.6% reduction** on a 60-item grep array in benchmarks. Plain text and small payloads are left untouched. + +```bash +TOON_ENABLED=true # opt-in (default: false) +TOON_MIN_BYTES=4096 # only compress payloads larger than this +TOON_FAIL_OPEN=true # on any encode error, forward the original (default: true) +TOON_LOG_STATS=true # log per-call compression stats +``` + +--- + ### Phase 8: Headroom Context Compression (Optional, 47-92% reduction) **Problem:** Even with all other optimizations, large requests can still exceed context limits. @@ -308,7 +361,7 @@ HEADROOM_ENABLED=true ## Combined Savings -When all 8 phases work together: +When all phases work together: **Example Request Flow:** diff --git a/install.sh b/install.sh index bf34dfc..d51c235 100755 --- a/install.sh +++ b/install.sh @@ -108,8 +108,24 @@ clone_or_update() { install_dependencies() { print_info "Installing dependencies..." cd "$INSTALL_DIR" - npm install --production + # --omit=dev keeps optionalDependencies (better-sqlite3, hnswlib-node, + # tree-sitter) which back telemetry, the memory store and routing ML. + # The postinstall hook (scripts/check-native.js) verifies the native ABI + # and rebuilds if Node was upgraded — best-effort, never fails the install. + npm install --omit=dev print_success "Dependencies installed" + + # Native optional modules need a C/C++ toolchain only if no prebuilt binary + # is available for this platform. They degrade gracefully if absent. + if ! node -e "const D=require('better-sqlite3'); new D(':memory:').close()">/dev/null 2>&1; then + print_warning "Native module 'better-sqlite3' is not loadable." + echo " Telemetry, the memory store and sessions need it. To enable:" + echo " - Ensure a build toolchain is present (Xcode CLT on macOS, build-essential + python3 on Linux), then:" + echo " - ${BLUE}cd $INSTALL_DIR && npm run rebuild-native${NC}" + echo " Lynkr still runs without it (those features stay disabled)." + else + print_success "Native modules OK (telemetry, memory, sessions enabled)" + fi } # Create default .env file @@ -131,7 +147,7 @@ create_env_file() { MODEL_PROVIDER=ollama # Server Configuration -PORT=8080 +PORT=8081 # Ollama Configuration (default for local development) OLLAMA_MODEL=qwen2.5-coder:7b @@ -161,7 +177,7 @@ EOF print_info "📝 Configuration ready! Key settings:" echo " • Default provider: Ollama (local, offline)" echo " • Memory system: Enabled (learns from conversations)" - echo " • Port: 8080" + echo " • Port: 8081" echo "" print_warning "To use cloud providers (Databricks/OpenAI/Azure):" echo " Edit: ${BLUE}nano $INSTALL_DIR/.env${NC}" @@ -220,7 +236,7 @@ print_next_steps() { echo " ${BLUE}lynkr${NC}" echo "" echo " 3. Configure Claude Code CLI:" - echo " ${BLUE}export ANTHROPIC_BASE_URL=http://localhost:8080${NC}" + echo " ${BLUE}export ANTHROPIC_BASE_URL=http://localhost:8081${NC}" echo " ${BLUE}claude${NC}" echo "" echo " ${YELLOW}Option B: Use Cloud Providers (Databricks/OpenAI/Azure)${NC}" @@ -238,7 +254,7 @@ print_next_steps() { echo " ${BLUE}lynkr${NC}" echo "" echo " 3. Configure Claude Code CLI:" - echo " ${BLUE}export ANTHROPIC_BASE_URL=http://localhost:8080${NC}" + echo " ${BLUE}export ANTHROPIC_BASE_URL=http://localhost:8081${NC}" echo " ${BLUE}export ANTHROPIC_API_KEY=any-non-empty-value${NC} ${GREEN}← Placeholder${NC}" echo " ${BLUE}claude${NC}" echo "" diff --git a/package.json b/package.json index e5cb491..9d0305a 100644 --- a/package.json +++ b/package.json @@ -8,13 +8,15 @@ "lynkr-setup": "scripts/setup.js" }, "scripts": { + "postinstall": "node scripts/check-native.js", + "rebuild-native": "node scripts/check-native.js", "prestart": "node -e \"if(process.env.HEADROOM_ENABLED==='true'&&process.env.HEADROOM_DOCKER_ENABLED!=='false'){process.exit(0)}else{process.exit(1)}\" && docker compose --profile headroom up -d --build headroom 2>/dev/null || echo 'Headroom skipped (disabled or Docker not running)'", "start": "node index.js 2>&1 | npx pino-pretty --sync", "stop": "node -e \"if(process.env.HEADROOM_ENABLED==='true'&&process.env.HEADROOM_DOCKER_ENABLED!=='false'){process.exit(0)}else{process.exit(1)}\" && docker compose --profile headroom down || echo 'Headroom skipped (disabled or Docker not running)'", "dev": "nodemon index.js", "lint": "eslint src index.js", "test": "npm run test:unit && npm run test:performance", - "test:unit": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/routing.test.js test/hybrid-routing-integration.test.js test/web-tools.test.js test/passthrough-mode.test.js test/openrouter-error-resilience.test.js test/format-conversion.test.js test/azure-openai-config.test.js test/azure-openai-format-conversion.test.js test/azure-openai-routing.test.js test/azure-openai-streaming.test.js test/azure-openai-error-resilience.test.js test/azure-openai-integration.test.js test/openai-integration.test.js test/toon-compression.test.js test/llamacpp-integration.test.js test/resilience.test.js test/telemetry-routing.test.js test/memory/store.test.js test/memory/surprise.test.js test/memory/extractor.test.js test/memory/search.test.js test/memory/retriever.test.js test/distill.test.js test/large-payload.test.js test/code-mode.test.js test/prompt-cache-injection.test.js test/risk-analyzer.test.js test/interaction-block.test.js test/preflight.test.js", + "test:unit": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/routing.test.js test/hybrid-routing-integration.test.js test/web-tools.test.js test/passthrough-mode.test.js test/openrouter-error-resilience.test.js test/format-conversion.test.js test/azure-openai-config.test.js test/azure-openai-format-conversion.test.js test/azure-openai-routing.test.js test/azure-openai-streaming.test.js test/azure-openai-error-resilience.test.js test/azure-openai-integration.test.js test/openai-integration.test.js test/toon-compression.test.js test/llamacpp-integration.test.js test/resilience.test.js test/telemetry-routing.test.js test/memory/store.test.js test/memory/surprise.test.js test/memory/extractor.test.js test/memory/search.test.js test/memory/retriever.test.js test/distill.test.js test/large-payload.test.js test/code-mode.test.js test/prompt-cache-injection.test.js test/risk-analyzer.test.js test/interaction-block.test.js test/preflight.test.js test/token-reduction.test.js test/session-affinity.test.js test/model-registry-cost.test.js", "test:memory": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/memory/store.test.js test/memory/surprise.test.js test/memory/extractor.test.js test/memory/search.test.js test/memory/retriever.test.js", "test:new-features": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node --test test/passthrough-mode.test.js test/openrouter-error-resilience.test.js test/format-conversion.test.js", "test:performance": "DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node test/hybrid-routing-performance.test.js && DATABRICKS_API_KEY=test-key DATABRICKS_API_BASE=http://test.com node test/performance-tests.js", diff --git a/public/dashboard.html b/public/dashboard.html index 83bef39..5cd6ea8 100644 --- a/public/dashboard.html +++ b/public/dashboard.html @@ -244,6 +244,7 @@ const t = d.today; const s = d.stats; + const tierLabel = t => t === 'default' ? 'default' : String(t).toLowerCase(); const providerCards = d.providers.length === 0 ? `

No providers configured

` : d.providers.map(p => ` @@ -251,10 +252,21 @@
${p.name} + ${(p.tiers || []).map(t => `${tierLabel(t)}`).join('')}
${p.type}

`).join(''); + const providerWarnings = (d.providerWarnings || []).map(w => ` +
+
+ + ${w.name} + ${(w.tiers || []).map(t => `${tierLabel(t)}`).join('')} +
+ no credentials +
`).join(''); + const recentRows = (d.recentRequests || []).map(r => `
${fmt.ago(r.timestamp)} @@ -279,7 +291,7 @@ ${card(`

Configured Providers

-
${providerCards}
+
${providerCards}${providerWarnings}
`)} diff --git a/scripts/check-native.js b/scripts/check-native.js new file mode 100644 index 0000000..eba34ca --- /dev/null +++ b/scripts/check-native.js @@ -0,0 +1,97 @@ +#!/usr/bin/env node +/** + * Native module ABI guard (postinstall). + * + * better-sqlite3 (and the other native optionalDependencies) are compiled + * against a specific Node ABI. When Node is upgraded, the prebuilt/compiled + * binary stops loading with: + * + * "was compiled against a different Node.js version using + * NODE_MODULE_VERSION 115. This version of Node.js requires + * NODE_MODULE_VERSION 141." + * + * The failure is silent at runtime — telemetry, request logs, and the memory + * store all sit behind try/catch and simply go empty. This probe detects the + * mismatch and rebuilds the native modules so it self-heals on `npm install`. + * + * It is intentionally best-effort: it NEVER exits non-zero, so it can't break + * `npm install` on machines without a build toolchain (the modules are + * optional and the app degrades gracefully without them). + */ + +const { execSync } = require("child_process"); + +// Native optionalDependencies that are ABI-sensitive. If Node changed, all of +// them are stale, so we rebuild the set in one pass. +const NATIVE_DEPS = [ + "better-sqlite3", + "hnswlib-node", + "tree-sitter", + "tree-sitter-javascript", + "tree-sitter-python", + "tree-sitter-typescript", +]; + +function log(msg) { + console.log(`[check-native] ${msg}`); +} + +/** + * Probe better-sqlite3 — the canary. `require()` alone is not enough: the + * native addon only loads when a Database is instantiated. + * @returns {"ok"|"absent"|"mismatch"} + */ +function probe() { + let Database; + try { + Database = require("better-sqlite3"); + } catch (err) { + if (err && err.code === "MODULE_NOT_FOUND") return "absent"; + return "mismatch"; + } + try { + const db = new Database(":memory:"); + db.close(); + return "ok"; + } catch (err) { + if (/NODE_MODULE_VERSION|different Node\.js version|invalid ELF|dlopen|\.node/i.test(err.message || "")) { + return "mismatch"; + } + // Some other instantiation error — not an ABI issue we can fix by rebuild. + return "ok"; + } +} + +function main() { + const status = probe(); + + if (status === "absent") { + // Optional dependency not installed (e.g. build skipped). Nothing to do. + return; + } + if (status === "ok") { + return; + } + + log("native module ABI mismatch detected (Node was likely upgraded). Rebuilding native modules..."); + try { + execSync(`npm rebuild ${NATIVE_DEPS.join(" ")}`, { stdio: "inherit" }); + } catch { + log("rebuild did not complete (a build toolchain may be missing). Continuing — native features will be disabled until you run: npm rebuild better-sqlite3"); + return; + } + + // Re-probe to report the outcome. + if (probe() === "ok") { + log("native modules rebuilt successfully."); + } else { + log("native modules still not loadable after rebuild. Run `npm rebuild better-sqlite3` manually."); + } +} + +try { + main(); +} catch (err) { + // Never fail the install. + log(`skipped (${err.message})`); +} diff --git a/src/clients/databricks.js b/src/clients/databricks.js index ef9e244..5d31c79 100644 --- a/src/clients/databricks.js +++ b/src/clients/databricks.js @@ -1506,10 +1506,16 @@ async function invokeMoonshot(body) { "claude-haiku-4-5-20251001": "kimi-k2-turbo-preview", "claude-haiku-4-5": "kimi-k2-turbo-preview", "claude-3-haiku": "kimi-k2-turbo-preview", + // moonshot-v1-auto 400s with "tokenization failed" (its server-side auto + // context-size pass fails on large tool-bearing payloads). Remap to a + // fixed model that's broadly available on api.moonshot.ai. + "moonshot-v1-auto": "moonshot-v1-128k", }; const requestedModel = body._tierModel || body.model || config.moonshot.model; - const mappedModel = modelMap[requestedModel] || config.moonshot.model || "kimi-k2-turbo-preview"; + let mappedModel = modelMap[requestedModel] || config.moonshot.model || "kimi-k2-turbo-preview"; + // Guard against the deprecated auto model arriving via config too. + if (mappedModel === "moonshot-v1-auto") mappedModel = "moonshot-v1-128k"; // Convert messages using existing utility const messages = convertAnthropicMessagesToOpenRouter(body.messages || []); @@ -1522,12 +1528,18 @@ async function invokeMoonshot(body) { messages.unshift({ role: "system", content: systemContent }); } + // kimi-k2.x (k2.5 / k2.6 ...) are thinking models that only accept + // temperature: 1 — any other value 400s with "invalid temperature". + const isKimiThinking = /^kimi-k2/i.test(mappedModel); + const moonshotBody = { model: mappedModel, messages, max_tokens: body.max_tokens || 16384, - temperature: body.temperature ?? 0.7, - top_p: body.top_p ?? 1.0, + // kimi-k2.x thinking models pin sampling params: temperature must be 1 + // and top_p must be 0.95 — any other value 400s. + temperature: isKimiThinking ? 1 : (body.temperature ?? 0.7), + top_p: isKimiThinking ? 0.95 : (body.top_p ?? 1.0), stream: false, // Force non-streaming - OpenAI SSE to Anthropic SSE conversion not implemented }; @@ -2027,6 +2039,65 @@ async function invokeCodex(body) { }; } +/** + * Compute request cost in USD from model pricing ×ばつ token usage. + * Registry returns per-1M-token prices ({ input, output }); returns null when + * pricing is unknown so we don't record misleading zeros. + */ +const _unknownCostWarned = new Set(); +function computeCostUsd(model, inputTokens, outputTokens) { + try { + const { getModelRegistrySync } = require("../routing/model-registry"); + const reg = getModelRegistrySync && getModelRegistrySync(); + const cost = reg?.getCost?.(model); + if (!cost) return null; + // Unknown model → record null (not a fabricated default), warn once so the + // gap is visible and can be fixed via MODEL_PRICE_OVERRIDES. + if (cost.unknown) { + if (model && !_unknownCostWarned.has(model)) { + _unknownCostWarned.add(model); + logger.warn({ model }, "[Cost] No pricing for model — recording cost_usd=null. Set MODEL_PRICE_OVERRIDES to fix."); + } + return null; + } + if (cost.input == null && cost.output == null) return null; + const inUsd = ((inputTokens || 0) / 1e6) * (cost.input || 0); + const outUsd = ((outputTokens || 0) / 1e6) * (cost.output || 0); + return Number((inUsd + outUsd).toFixed(6)); + } catch { + return null; + } +} + +// Telemetry prompt/response text is always captured (truncated) to build the +// routing ML training corpus. Stored locally in .lynkr/telemetry.db only. +const TELEMETRY_TEXT_MAXLEN = 2000; + +/** Flatten the latest user message to plain text (for telemetry capture). */ +function captureRequestText(body) { + const messages = body?.messages; + if (!Array.isArray(messages)) return null; + for (let i = messages.length - 1; i>= 0; i--) { + const m = messages[i]; + if (m?.role !== "user") continue; + let text = ""; + if (typeof m.content === "string") text = m.content; + else if (Array.isArray(m.content)) { + text = m.content.filter((b) => b?.type === "text").map((b) => b.text || "").join(" "); + } + if (text) return text.slice(0, TELEMETRY_TEXT_MAXLEN); + } + return null; +} + +/** Flatten an Anthropic response's text blocks to plain text (for telemetry). */ +function captureResponseText(resultJson) { + const content = resultJson?.content; + if (!Array.isArray(content)) return null; + const text = content.filter((b) => b?.type === "text").map((b) => b.text || "").join(" "); + return text ? text.slice(0, TELEMETRY_TEXT_MAXLEN) : null; +} + async function invokeModel(body, options = {}) { const { determineProviderSmart, isFallbackEnabled, getFallbackProvider } = require("./routing"); const metricsCollector = getMetricsCollector(); @@ -2233,6 +2304,9 @@ async function invokeModel(body, options = {}) { circuit_breaker_state: breaker.state, quality_score: qualityScore, tokens_per_second: outputTokens && latency> 0 ? outputTokens / (latency / 1000) : null, + cost_usd: computeCostUsd(routingDecision.model || body._tierModel, inputTokens, outputTokens), + request_text: captureRequestText(body), + response_text: captureResponseText(result.json), }); // Return result with provider info and routing decision for headers @@ -2394,6 +2468,9 @@ async function invokeModel(body, options = {}) { { status_code: 200, output_tokens: fbOutputTokens, tool_calls_made: fbToolCalls, was_fallback: true, retry_count: 0, latency_ms: Date.now() - startTime } ), tokens_per_second: fbOutputTokens && fallbackLatency> 0 ? fbOutputTokens / (fallbackLatency / 1000) : null, + cost_usd: computeCostUsd(routingDecision.model || body._tierModel, fbInputTokens, fbOutputTokens), + request_text: captureRequestText(body), + response_text: captureResponseText(fallbackResult.json), }); // Return result with actual provider used (fallback provider) and routing decision diff --git a/src/clients/openrouter-utils.js b/src/clients/openrouter-utils.js index 1a2daba..7978f8c 100644 --- a/src/clients/openrouter-utils.js +++ b/src/clients/openrouter-utils.js @@ -176,6 +176,21 @@ function convertAnthropicMessagesToOpenRouter(anthropicMessages) { } } + // Kimi/Moonshot (and some OpenAI-compatible APIs) reject a message whose + // content is an empty string with "Invalid request: tokenization failed". + // This happens when a turn had only non-text blocks (thinking / image / + // stripped content) and flattened to "". Replace empty/whitespace-only + // content with a single space — but never touch an assistant message that + // carries tool_calls, where content: null is intentional and required. + for (const m of converted) { + if (m.role === 'tool') continue; + const hasToolCalls = Array.isArray(m.tool_calls) && m.tool_calls.length> 0; + if (hasToolCalls) continue; + if (typeof m.content !== 'string' || m.content.trim() === '') { + m.content = ' '; + } + } + // Log the converted messages for debugging logger.debug({ inputCount: anthropicMessages.length, diff --git a/src/config/index.js b/src/config/index.js index 729f2fc..e4ac410 100644 --- a/src/config/index.js +++ b/src/config/index.js @@ -208,6 +208,11 @@ const tokenBudgetWarning = Number.parseInt(process.env.TOKEN_BUDGET_WARNING ?? " const tokenBudgetMax = Number.parseInt(process.env.TOKEN_BUDGET_MAX ?? "180000", 10); const tokenBudgetEnforcement = process.env.TOKEN_BUDGET_ENFORCEMENT !== "false"; // default true +// Caveman terse-output injection (opt-in, off by default) +const cavemanEnabled = process.env.CAVEMAN_ENABLED === "true"; +const cavemanLevel = (process.env.CAVEMAN_LEVEL ?? "lite").toLowerCase(); + + // TOON payload compression (opt-in) const toonEnabled = process.env.TOON_ENABLED === "true"; // default false const toonMinBytes = Number.parseInt(process.env.TOON_MIN_BYTES ?? "4096", 10); @@ -641,6 +646,10 @@ var config = { toolResultCompression: { enabled: true, }, + caveman: { + enabled: cavemanEnabled, + level: cavemanLevel, + }, server: { jsonLimit: process.env.REQUEST_JSON_LIMIT ?? "1gb", }, diff --git a/src/context/caveman.js b/src/context/caveman.js new file mode 100644 index 0000000..550b201 --- /dev/null +++ b/src/context/caveman.js @@ -0,0 +1,94 @@ +/** + * Caveman Terse-Output Injector + * + * Appends a brevity instruction to the system prompt so the model produces + * terser responses, reducing OUTPUT tokens. Opt-in and off by default — it + * changes model behavior, so it's only applied when explicitly enabled. + * + * Enable with CAVEMAN_ENABLED=true. Level via CAVEMAN_LEVEL=lite|full|ultra + * (default: lite). Adapted from 9router's caveman injector / the caveman skill + * (https://github.com/JuliusBrussee/caveman). + * + * @module context/caveman + */ + +const config = require("../config"); +const logger = require("../logger"); + +const LEVELS = ["lite", "full", "ultra"]; + +// Shared guardrails so brevity never corrupts the substance that matters. +const BOUNDARIES = + "Code blocks, file paths, commands, errors, URLs: keep exact. " + + "Security warnings, irreversible-action confirmations, and multi-step ordered " + + "sequences: write in full normal prose. Resume terse style afterward."; + +const EXAMPLES = + 'Not: "Sure! I\'d be happy to help. The issue is likely caused by..." ' + + 'Yes: "Bug in auth middleware. Token expiry uses `<` not `<=`. Fix:"'; + +const PERSISTENCE = "Apply this to every response unless a guardrail above applies."; + +const PROMPTS = { + lite: [ + "Respond tersely. Keep grammar and full sentences but drop filler, hedging, and pleasantries (just/really/basically/sure/of course/I'd be happy to).", + "Pattern: state the thing, the action, the reason. Then the next step.", + EXAMPLES, + BOUNDARIES, + PERSISTENCE, + ].join(" "), + + full: [ + "Respond like a terse caveman. All technical substance stays exact; only fluff dies.", + "Drop articles (a/an/the), filler (just/really/basically/actually/simply), pleasantries, and hedging. Fragments OK. Prefer short synonyms (big not extensive, fix not implement a solution for).", + "Pattern: [thing] [action] [reason]. [next step].", + EXAMPLES, + BOUNDARIES, + PERSISTENCE, + ].join(" "), + + ultra: [ + "Respond ultra-terse. Maximum compression. Telegraphic.", + "Abbreviate (DB/auth/config/req/res/fn/impl), strip conjunctions, use arrows for causality (X → Y). One word when one word is enough.", + "Pattern: [thing] → [result]. [fix].", + EXAMPLES, + BOUNDARIES, + PERSISTENCE, + ].join(" "), +}; + +const MARKER = "[brevity]"; + +/** Resolve the configured level, falling back to "lite". */ +function resolveLevel(level) { + const l = String(level || config.caveman?.level || "lite").toLowerCase(); + return LEVELS.includes(l) ? l : "lite"; +} + +/** + * Append the brevity instruction to a system prompt string. + * Idempotent — won't double-inject if the marker is already present. + * + * @param {string} system - Existing system prompt (may be empty). + * @param {object} [opts] + * @param {boolean} [opts.enabled] - Override config enablement. + * @param {string} [opts.level] - Override level. + * @returns {string} system prompt, possibly with brevity instruction appended. + */ +function injectCaveman(system, opts = {}) { + const enabled = opts.enabled ?? config.caveman?.enabled === true; + if (!enabled) return system || ""; + + const base = system || ""; + if (base.includes(MARKER)) return base; + + const level = resolveLevel(opts.level); + const instruction = `\n\n${MARKER} ${PROMPTS[level]}`; + logger.debug({ level }, "[Caveman] Injected brevity instruction into system prompt"); + return base + instruction; +} + +module.exports = { + injectCaveman, + LEVELS, +}; diff --git a/src/context/tool-dedup.js b/src/context/tool-dedup.js new file mode 100644 index 0000000..65f0aba --- /dev/null +++ b/src/context/tool-dedup.js @@ -0,0 +1,95 @@ +/** + * MCP-aware Tool Dedup + * + * Strips built-in tool definitions when an equivalent MCP tool is present in + * the request. Sending both wastes tool-schema tokens and gives the model + * redundant choices. Rule-based and deterministic. + * + * Example: if the Exa or Tavily MCP search tools are present, the built-in + * WebSearch/WebFetch tools are redundant and dropped. + * + * Ported from 9router's toolDeduper. Always on — purely removes redundant + * tool definitions, never adds. + * + * @module context/tool-dedup + */ + +const logger = require("../logger"); + +// Each rule: if any `triggers` tool is present, strip any tools matching +// `strip`. Patterns may be exact strings or RegExp (matched against the name). +const DEDUP_RULES = [ + { + // Exa MCP present → drop built-in web tools (Exa is preferred). + triggers: ["mcp__exa__web_search_exa", "mcp__exa__web_fetch_exa"], + strip: ["WebSearch", "WebFetch", "web_search", "web_fetch", "mcp__workspace__web_fetch"], + }, + { + // Tavily MCP present → drop built-in web tools. + triggers: ["mcp__tavily__tavily_search", "mcp__tavily__tavily_extract"], + strip: ["WebSearch", "WebFetch", "web_search", "web_fetch", "mcp__workspace__web_fetch"], + }, + { + // Browser MCP present → drop a duplicate Chrome-connector tool family. + triggers: [/^mcp__browsermcp__/], + strip: [/^mcp__Claude_in_Chrome__/], + }, +]; + +function getToolName(t) { + return t?.name || t?.function?.name || ""; +} + +function matches(name, pattern) { + if (typeof pattern === "string") return name === pattern; + return pattern instanceof RegExp ? pattern.test(name) : false; +} + +/** + * Remove redundant built-in tools that are superseded by present MCP tools. + * + * @param {Array} tools - Tool definitions (Anthropic or OpenAI shape). + * @returns {{tools: Array, stripped: string[]}} filtered tools + names removed. + */ +function dedupeTools(tools) { + if (!Array.isArray(tools) || tools.length === 0) return { tools, stripped: [] }; + + const names = tools.map(getToolName); + const toStrip = new Set(); + + for (const rule of DEDUP_RULES) { + const hasTrigger = names.some((n) => rule.triggers.some((p) => matches(n, p))); + if (!hasTrigger) continue; + for (const n of names) { + // Never strip a tool that is itself a trigger. + if (rule.triggers.some((p) => matches(n, p))) continue; + if (rule.strip.some((p) => matches(n, p))) toStrip.add(n); + } + } + + if (toStrip.size === 0) return { tools, stripped: [] }; + + const out = tools.filter((t) => !toStrip.has(getToolName(t))); + return { tools: out, stripped: Array.from(toStrip) }; +} + +/** + * Apply tool dedup to a payload in place. No-op when nothing is stripped. + * + * @param {object} payload - Request body with a `tools` array. + * @returns {string[]} names of stripped tools. + */ +function applyToolDedup(payload) { + if (!payload || !Array.isArray(payload.tools)) return []; + const { tools, stripped } = dedupeTools(payload.tools); + if (stripped.length> 0) { + payload.tools = tools; + logger.debug({ stripped }, "[ToolDedup] Stripped redundant built-in tools (MCP equivalents present)"); + } + return stripped; +} + +module.exports = { + dedupeTools, + applyToolDedup, +}; diff --git a/src/context/tool-result-compressor.js b/src/context/tool-result-compressor.js index c538d5b..9171b16 100644 --- a/src/context/tool-result-compressor.js +++ b/src/context/tool-result-compressor.js @@ -455,6 +455,107 @@ function compressContainerOutput(text) { return `${header}\n${dataLines.slice(0, 10).join("\n")}\n... +${dataLines.length - 10} more (${dataLines.length} total)`; } +// 11. Grep / ripgrep output ("file:lineno:content"), per-file match cap. +// Ported from 9router RTK grep filter (rtk/src/cmds/system/pipe_cmd.rs). +const GREP_PER_FILE_MAX = 10; +function compressGrep(text) { + const byFile = new Map(); + let total = 0; + + for (const line of text.split("\n")) { + // splitn(3, ':') — only split on the first two colons. + const first = line.indexOf(":"); + if (first === -1) continue; + const second = line.indexOf(":", first + 1); + if (second === -1) continue; + const file = line.slice(0, first); + const lineNumStr = line.slice(first + 1, second); + const content = line.slice(second + 1); + if (!/^\d+$/.test(lineNumStr)) continue; + total++; + if (!byFile.has(file)) byFile.set(file, []); + byFile.get(file).push([lineNumStr, content]); + } + + // Require a meaningful number of matches so we don't mangle prose that + // happens to contain a "word:123:..." line. + if (total < 5) return null; + + const files = Array.from(byFile.keys()).sort(); + let out = `${total} matches in ${files.length}F:\n\n`; + for (const file of files) { + const matches = byFile.get(file); + out += `[file] ${file} (${matches.length}):\n`; + for (const [lineNum, content] of matches.slice(0, GREP_PER_FILE_MAX)) { + out += ` ${lineNum.padStart(4)}: ${content.trim()}\n`; + } + if (matches.length> GREP_PER_FILE_MAX) { + out += ` +${matches.length - GREP_PER_FILE_MAX}\n`; + } + out += "\n"; + } + return out; +} + +// 12. Generic log de-duplication: collapse consecutive duplicate lines and +// runs of blank lines, with a hard line cap. Ported from 9router RTK dedupLog. +const DEDUP_LINE_MAX = 2000; +function compressDedupLog(text) { + const lines = text.split("\n"); + const out = []; + let prev = null; + let runCount = 0; + let blankStreak = 0; + + const flushRun = () => { + if (prev !== null && runCount> 1) { + out.push(` ... (${runCount - 1} duplicate lines)`); + } + }; + + for (const line of lines) { + if (line.trim() === "") { + if (blankStreak < 1) out.push(line); + blankStreak += 1; + flushRun(); + prev = null; + runCount = 0; + continue; + } + blankStreak = 0; + if (line === prev) { + runCount += 1; + continue; + } + flushRun(); + out.push(line); + prev = line; + runCount = 1; + if (out.length>= DEDUP_LINE_MAX) { + out.push(`... (truncated at ${DEDUP_LINE_MAX} lines)`); + return out.join("\n"); + } + } + flushRun(); + return out.join("\n"); +} + +// 13. Last-resort generic truncation: keep head + tail lines, drop the middle. +// Only kicks in for very long output no specific compressor matched. +// Ported from 9router RTK smartTruncate. +const SMART_TRUNCATE_HEAD = 120; +const SMART_TRUNCATE_TAIL = 60; +const SMART_TRUNCATE_MIN_LINES = 250; +function compressSmartTruncate(text) { + const lines = text.split("\n"); + if (lines.length < SMART_TRUNCATE_MIN_LINES) return null; + + const head = lines.slice(0, SMART_TRUNCATE_HEAD); + const tail = lines.slice(lines.length - SMART_TRUNCATE_TAIL); + const cut = lines.length - head.length - tail.length; + return [...head, `... +${cut} lines truncated`, ...tail].join("\n"); +} + // ── Compression Pipeline ───────────────────────────────────────────── const COMPRESSORS = [ @@ -466,8 +567,13 @@ const COMPRESSORS = [ { name: "build_output", fn: compressBuildOutput }, { name: "container_output", fn: compressContainerOutput }, { name: "json_response", fn: compressJSON }, + { name: "grep_output", fn: compressGrep }, { name: "directory_listing", fn: compressDirectoryListing }, { name: "large_file", fn: compressLargeFile }, + // Generic fallbacks last: dedup exact-duplicate spam, then hard head/tail + // truncation only if nothing more specific applied. + { name: "dedup_log", fn: compressDedupLog }, + { name: "smart_truncate", fn: compressSmartTruncate }, ]; // Compression levels tied to routing tiers diff --git a/src/dashboard/api.js b/src/dashboard/api.js index 5e0399c..58c4373 100644 --- a/src/dashboard/api.js +++ b/src/dashboard/api.js @@ -5,24 +5,74 @@ const metrics = require('../metrics'); const { getMetricsCollector } = require('../observability/metrics'); const { TIER_DEFINITIONS } = require('../routing/model-tiers'); -function getConfiguredProviders() { +// Per-provider type + whether its credentials/endpoint are actually present. +function providerMeta() { const c = config; - const providers = []; - const add = (name, type, ok) => ok && providers.push({ name, type }); - - add('databricks', 'cloud', c.databricks?.url && c.databricks?.apiKey); - add('azure-anthropic','cloud', c.azureAnthropic?.endpoint && c.azureAnthropic?.apiKey); - add('bedrock', 'cloud', c.bedrock?.apiKey); - add('openrouter', 'cloud', c.openrouter?.apiKey); - add('openai', 'cloud', c.openai?.apiKey); - add('azure-openai', 'cloud', c.azureOpenAI?.endpoint && c.azureOpenAI?.apiKey); - add('vertex', 'cloud', c.vertex?.projectId); - add('moonshot', 'cloud', c.moonshot?.apiKey); - add('ollama', 'local', c.ollama?.endpoint); - add('llamacpp', 'local', c.llamacpp?.endpoint); - add('lmstudio', 'local', c.lmstudio?.endpoint); - - return providers; + return { + databricks: { type: 'cloud', configured: !!(c.databricks?.url && c.databricks?.apiKey) }, + 'azure-anthropic': { type: 'cloud', configured: !!(c.azureAnthropic?.endpoint && c.azureAnthropic?.apiKey) }, + bedrock: { type: 'cloud', configured: !!c.bedrock?.apiKey }, + openrouter: { type: 'cloud', configured: !!c.openrouter?.apiKey }, + openai: { type: 'cloud', configured: !!c.openai?.apiKey }, + 'azure-openai': { type: 'cloud', configured: !!(c.azureOpenAI?.endpoint && c.azureOpenAI?.apiKey) }, + vertex: { type: 'cloud', configured: !!c.vertex?.projectId }, + moonshot: { type: 'cloud', configured: !!c.moonshot?.apiKey }, + ollama: { type: 'local', configured: !!c.ollama?.endpoint }, + llamacpp: { type: 'local', configured: !!c.llamacpp?.endpoint }, + lmstudio: { type: 'local', configured: !!c.lmstudio?.endpoint }, + }; +} + +// Providers the active routing config actually points at: the provider prefix +// of each TIER_* value (format `provider:model[:variant]`) plus the base +// MODEL_PROVIDER. Returns Map. +function getReferencedProviders() { + const refs = new Map(); + const note = (provider, label) => { + const key = String(provider || '').trim().toLowerCase(); + if (!key) return; + if (!refs.has(key)) refs.set(key, []); + if (label && !refs.get(key).includes(label)) refs.get(key).push(label); + }; + + const tiers = config.modelTiers || {}; + for (const [tier, val] of Object.entries(tiers)) { + if (typeof val === 'string' && val.trim()) { + note(val.split(':')[0], tier); + } + } + note(config.modelProvider?.type, 'default'); + + return refs; +} + +// Providers used by the routing config that have credentials/endpoints set. +// Unknown providers (no metadata) are included optimistically since we can't +// verify their credentials. +function getConfiguredProviders() { + const meta = providerMeta(); + const out = []; + for (const [name, tiers] of getReferencedProviders()) { + const m = meta[name]; + if (!m || m.configured) { + out.push({ name, type: m?.type || 'cloud', tiers }); + } + } + return out; +} + +// Tiers pointing at a known provider whose credentials/endpoint are missing — +// surfaced as a warning so a misconfigured tier is visible. +function getProviderWarnings() { + const meta = providerMeta(); + const out = []; + for (const [name, tiers] of getReferencedProviders()) { + const m = meta[name]; + if (m && !m.configured) { + out.push({ name, type: m.type, tiers }); + } + } + return out; } // Noise provider names injected by unit tests — filter them out of UI @@ -92,7 +142,8 @@ function overview(req, res) { port: config.port, version: process.env.npm_package_version || '9.0.2', modelProvider: config.modelProvider?.type || 'unknown', - providers: getConfiguredProviders(), + providers: getConfiguredProviders(), + providerWarnings: getProviderWarnings(), statsWindow: win.label, metrics: { requestsTotal: snap.requestsTotal, diff --git a/src/orchestrator/bypass.js b/src/orchestrator/bypass.js new file mode 100644 index 0000000..b47a567 --- /dev/null +++ b/src/orchestrator/bypass.js @@ -0,0 +1,135 @@ +/** + * Request Bypass + * + * Short-circuits Claude Code CLI housekeeping requests that don't need a real + * model call: + * - "Warmup" pings the CLI sends to prime a connection + * - Topic/title extraction (the CLI asks for {"isNewTopic":..,"title":..}) + * - Single-word "count" / "Warmup" probes + * + * Returning a canned response here saves a full provider round-trip (latency + * and tokens) on every session. Inspired by 9router's bypassHandler. + * + * Always on — only ever returns a canned response for unambiguous Claude CLI + * housekeeping traffic, never for real work. + * + * @module orchestrator/bypass + */ + +const logger = require("../logger"); + +/** Flatten Anthropic content (string | block[]) into plain text. */ +function getText(content) { + if (typeof content === "string") return content; + if (Array.isArray(content)) { + return content + .filter((b) => b && b.type === "text" && typeof b.text === "string") + .map((b) => b.text) + .join(" "); + } + return ""; +} + +/** Flatten the top-level Anthropic `system` field (string | block[]). */ +function getSystemText(system) { + if (typeof system === "string") return system; + if (Array.isArray(system)) { + return system + .filter((s) => s && s.type === "text" && typeof s.text === "string") + .map((s) => s.text) + .join(" "); + } + return ""; +} + +/** + * Decide whether a request is a bypassable Claude CLI housekeeping call. + * + * @param {object} args + * @param {object} args.payload - The Anthropic request body. + * @param {object} [args.headers] - Lowercased request headers. + * @returns {{kind: string, text: string}|null} bypass descriptor or null. + */ +function detectBypass({ payload, headers = {} }) { + if (!payload || !Array.isArray(payload.messages) || payload.messages.length === 0) { + return null; + } + + // Only bypass Claude CLI traffic — other clients use these endpoints for + // real work and must never receive a canned response. + const ua = String(headers["user-agent"] || "").toLowerCase(); + if (!ua.includes("claude-cli")) return null; + + const messages = payload.messages; + const lastMsg = messages[messages.length - 1]; + + // Pattern 1: Title prefill — the CLI seeds an assistant turn with just "{" + // to coax a JSON object out of the model. + if (lastMsg?.role === "assistant") { + const firstBlockText = + Array.isArray(lastMsg.content) && lastMsg.content[0]?.type === "text" + ? lastMsg.content[0].text + : typeof lastMsg.content === "string" + ? lastMsg.content + : ""; + if (firstBlockText.trim() === "{") { + return { kind: "title_prefill", text: "{}" }; + } + } + + // Pattern 2: Topic/title extraction — system prompt asks for isNewTopic. + // Synthesize a title from the first user message instead of calling a model. + const systemText = getSystemText(payload.system); + if (systemText.includes("isNewTopic")) { + const userMsg = messages.find((m) => m.role === "user"); + const userText = getText(userMsg?.content).trim(); + const title = userText.split(/\s+/).filter(Boolean).slice(0, 3).join(" "); + return { + kind: "title_extraction", + text: JSON.stringify({ isNewTopic: true, title }), + }; + } + + // Pattern 3: Warmup / count probes — a single short user message. + if (messages.length === 1 && messages[0]?.role === "user") { + const firstText = getText(messages[0].content).trim(); + if (firstText === "Warmup" || firstText === "count") { + return { kind: firstText.toLowerCase(), text: "OK" }; + } + } + + return null; +} + +/** + * Build the processMessage-shaped response for a bypass descriptor. + * Matches the `{ status, body, terminationReason }` contract the router + * consumes (same shape as the prompt-cache early returns). + * + * @param {{kind: string, text: string}} bypass + * @param {string} model - Model id to echo back. + * @returns {{status: number, body: object, terminationReason: string}} + */ +function buildBypassResponse(bypass, model) { + logger.info({ kind: bypass.kind }, "[Bypass] Short-circuiting CLI housekeeping request"); + return { + status: 200, + body: { + id: `msg_bypass_${Date.now()}`, + type: "message", + role: "assistant", + content: [{ type: "text", text: bypass.text }], + model: model || "claude-3-unknown", + stop_reason: "end_turn", + stop_sequence: null, + usage: { input_tokens: 1, output_tokens: 1 }, + lynkr_bypass: { kind: bypass.kind }, + }, + terminationReason: `bypass_${bypass.kind}`, + }; +} + +module.exports = { + detectBypass, + buildBypassResponse, +}; diff --git a/src/orchestrator/index.js b/src/orchestrator/index.js index f1144b6..87d2cce 100644 --- a/src/orchestrator/index.js +++ b/src/orchestrator/index.js @@ -18,6 +18,7 @@ const { createAuditLogger } = require("../logger/audit-logger"); const { getResolvedIp, runWithDnsContext } = require("../clients/dns-logger"); const { getShuttingDown } = require("../api/health"); const { tryPreflight, buildSatisfiedResponse: buildPreflightResponse } = require("./preflight"); +const { detectBypass, buildBypassResponse } = require("./bypass"); const crypto = require("crypto"); const { asyncClone, asyncTransform, getPoolStats } = require("../workers/helpers"); const { getSemanticCache, isSemanticCacheEnabled } = require("../cache/semantic"); @@ -1362,8 +1363,12 @@ function sanitizePayload(payload) { delete clean.tool_choice; } - // Smart tool selection (universal, applies to all providers) - if (config.smartToolSelection?.enabled && Array.isArray(clean.tools) && clean.tools.length> 0) { + // Smart tool selection (server mode only). In client/passthrough mode the + // client (e.g. Claude Code) owns tool execution, so stripping its tools would + // make the model emit calls for tools we removed — they then get dropped as + // "hallucinated" and the session makes no progress. Pass tools through intact. + const inClientMode = config.toolExecutionMode === "client" || config.toolExecutionMode === "passthrough"; + if (!inClientMode && config.smartToolSelection?.enabled && Array.isArray(clean.tools) && clean.tools.length> 0) { const classification = classifyRequestType(clean); const selectedTools = selectToolsSmartly(clean.tools, classification, { provider: providerType, @@ -1977,6 +1982,12 @@ IMPORTANT TOOL USAGE RULES: cleanPayload._tenantPolicy = options.tenantPolicy; } + // Thread session id for provider affinity — keeps a tool-bearing + // conversation on one provider so tool_call_id linkage doesn't break. + if (session?.id) { + cleanPayload._sessionId = session.id; + } + // RTK-inspired tool result compression: compress large tool_results // before they reach the model (saves 60-90% on test/git/lint output) if (config.toolResultCompression?.enabled !== false) { @@ -1985,6 +1996,18 @@ IMPORTANT TOOL USAGE RULES: compressToolResults(cleanPayload.messages, { tier }); } + // MCP-aware tool dedup: drop built-in tools superseded by present MCP tools + // (e.g. WebSearch/WebFetch when Exa/Tavily MCP is available). Always on. + const { applyToolDedup } = require("../context/tool-dedup"); + applyToolDedup(cleanPayload); + + // Caveman terse-output injection (opt-in): nudge the model toward shorter + // responses to reduce output tokens. + if (config.caveman?.enabled === true) { + const { injectCaveman } = require("../context/caveman"); + cleanPayload.system = injectCaveman(cleanPayload.system); + } + if (agentTimer) agentTimer.mark("preInvokeModel"); let databricksResponse; try { @@ -3735,6 +3758,14 @@ async function processMessage({ payload, headers, session, cwd, options = {} }) }; } + // === REQUEST BYPASS === + // Claude CLI housekeeping (Warmup pings, topic/title extraction) doesn't + // need a model call — return a canned response and skip the provider. + const bypass = detectBypass({ payload, headers }); + if (bypass) { + return buildBypassResponse(bypass, requestedModel); + } + // === PREFLIGHT CHECK === // If the request supplied preflight_commands and they all pass in // the workspace, the work is already done — short-circuit with a diff --git a/src/routing/index.js b/src/routing/index.js index 93c270b..b760fc3 100644 --- a/src/routing/index.js +++ b/src/routing/index.js @@ -138,7 +138,46 @@ function getBestLocalProvider() { * @param {Object} options - Routing options * @returns {Object} Routing decision with provider and metadata */ +const sessionAffinity = require('./session-affinity'); + +/** + * Provider routing with session affinity. + * + * When a conversation already carries tool history, reuse the provider the + * session first routed to so tool-call IDs don't break across providers. + * Fresh turns route normally and refresh the session's pinned provider. + */ async function determineProviderSmart(payload, options = {}) { + const sessionId = payload?._sessionId || null; + + // Enforce affinity only for in-flight tool exchanges — the turns that 400 + // if the provider changes. Fresh turns keep full per-turn tier routing. + if (sessionId && !options.forceProvider && sessionAffinity.payloadHasToolHistory(payload)) { + const pinned = sessionAffinity.getPinned(sessionId); + if (pinned) { + logger.debug({ sessionId, provider: pinned.provider, tier: pinned.tier }, + '[Routing] Session affinity — reusing provider for tool-bearing turn'); + return { + provider: pinned.provider, + model: pinned.model, + tier: pinned.tier, + method: 'session_affinity', + reason: 'tool_history_provider_pin', + }; + } + } + + const decision = await _determineProviderSmartInner(payload, options); + + // Remember the chosen provider so later tool-bearing turns stay consistent. + if (sessionId && decision?.provider && !options.forceProvider) { + sessionAffinity.setPinned(sessionId, decision); + } + + return decision; +} + +async function _determineProviderSmartInner(payload, options = {}) { const primaryProvider = config.modelProvider?.type ?? 'databricks'; // Risk analysis runs orthogonally to complexity. We compute it once diff --git a/src/routing/model-registry.js b/src/routing/model-registry.js index e52258b..ac87804 100644 --- a/src/routing/model-registry.js +++ b/src/routing/model-registry.js @@ -54,9 +54,41 @@ const DATABRICKS_FALLBACK = { 'databricks-bge-large-en': { input: 0.02, output: 0, context: 512 }, }; -// Default cost for unknown models +// Default cost for unknown models. Returned with `unknown: true` so callers can +// distinguish a real price from a fabricated guess. const DEFAULT_COST = { input: 1.0, output: 3.0, context: 128000 }; +// Curated name aliases (exact, one-directional). Maps a name a caller might use +// to the canonical key likely present in the pricing data. Misses are harmless +// (resolution simply continues down the ladder). +const MODEL_ALIASES = { + 'claude-sonnet-4-5': 'claude-sonnet-4-5-20250929', + 'claude-opus-4-1': 'claude-opus-4-1-20250805', + 'claude-3-5-sonnet': 'claude-3-5-sonnet-20241022', +}; + +/** + * Parse MODEL_PRICE_OVERRIDES env (JSON object of + * { "": { "input": , "output": , "context"?: N } }). + * Lets operators pin correct prices for models the registry doesn't know. + */ +function _loadOverrides() { + const out = new Map(); + const raw = process.env.MODEL_PRICE_OVERRIDES; + if (!raw) return out; + try { + const parsed = JSON.parse(raw); + for (const [name, info] of Object.entries(parsed)) { + if (info && typeof info.input === 'number' && typeof info.output === 'number') { + out.set(name.toLowerCase(), { context: 128000, ...info }); + } + } + } catch (err) { + logger.warn({ err: err.message }, '[ModelRegistry] Failed to parse MODEL_PRICE_OVERRIDES'); + } + return out; +} + class ModelRegistry { constructor() { this.litellmPrices = {}; @@ -64,6 +96,7 @@ class ModelRegistry { this.loaded = false; this.lastFetch = 0; this.modelIndex = new Map(); + this.overrides = _loadOverrides(); } /** @@ -255,40 +288,70 @@ class ModelRegistry { * @returns {Object} Cost info { input, output, context, ... } */ getCost(modelName) { - if (!modelName) return { ...DEFAULT_COST, source: 'default' }; + if (!modelName) return { ...DEFAULT_COST, source: 'default', unknown: true }; - const normalizedName = modelName.toLowerCase(); + const name = String(modelName).toLowerCase().trim(); + const hit = this._resolveCost(name); + if (hit) return hit; - // Direct lookup - if (this.modelIndex.has(normalizedName)) { - return this.modelIndex.get(normalizedName); - } + // Nothing matched — report unknown rather than silently fabricating a price. + logger.debug({ model: modelName }, '[ModelRegistry] Model not found — cost unknown'); + return { ...DEFAULT_COST, source: 'default', unknown: true }; + } - // Try common variations - const variations = [ - normalizedName, - normalizedName.replace('databricks-', ''), - normalizedName.replace('azure/', ''), - normalizedName.replace('bedrock/', ''), - normalizedName.replace('anthropic.', ''), - normalizedName.split('/').pop(), - ]; - - for (const variant of variations) { - if (this.modelIndex.has(variant)) { - return this.modelIndex.get(variant); - } + /** + * Deterministic price resolution. Each step is exact (no bidirectional + * substring matching), and the only loose step (longest-prefix) is + * one-directional and length-bounded, so unrelated names can't false-match. + * Returns a cost object with a `resolution` tag, or null if nothing matched. + * @param {string} name - already lowercased/trimmed + */ + _resolveCost(name) { + const tag = (value, resolution, matchedAs) => ({ + ...value, + resolution, + ...(matchedAs && matchedAs !== name ? { matchedAs } : {}), + }); + + // 1. Operator overrides (exact) — ground truth. + if (this.overrides.has(name)) return tag({ ...this.overrides.get(name), source: 'override' }, 'override'); + + // 2. Exact registry hit. + if (this.modelIndex.has(name)) return tag(this.modelIndex.get(name), 'exact'); + + // 3. Provider-prefix strip (exact). + const stripped = [ + name.replace(/^databricks-/, ''), + name.replace(/^azure\//, ''), + name.replace(/^bedrock\//, ''), + name.replace(/^anthropic\./, ''), + name.replace(/^openai\//, ''), + name.includes('/') ? name.split('/').pop() : null, + ].filter((v) => v && v !== name); + for (const v of stripped) { + if (this.overrides.has(v)) return tag({ ...this.overrides.get(v), source: 'override' }, 'prefix-strip', v); + if (this.modelIndex.has(v)) return tag(this.modelIndex.get(v), 'prefix-strip', v); } - // Fuzzy match for partial names + // 4. Curated alias (exact). + const alias = MODEL_ALIASES[name]; + if (alias && this.modelIndex.has(alias)) return tag(this.modelIndex.get(alias), 'alias', alias); + + // 5. Date/version-suffix normalization (e.g. -20250929, -2025年09月29日, -v2). + const dateless = name.replace(/[-@](\d{8}|\d{4}-\d{2}-\d{2}|v\d+)$/, ''); + if (dateless !== name && this.modelIndex.has(dateless)) return tag(this.modelIndex.get(dateless), 'date-normalize', dateless); + + // 6. Longest registry key that is a prefix of the requested name. Bounded so + // short keys can't grab unrelated names (e.g. "gpt-5.2-chat-2026" → "gpt-5.2-chat"). + let best = null; for (const [key, value] of this.modelIndex.entries()) { - if (key.includes(normalizedName) || normalizedName.includes(key)) { - return value; + if (key.length>= 6 && name.startsWith(key) && (!best || key.length> best.key.length)) { + best = { key, value }; } } + if (best) return tag(best.value, 'longest-prefix', best.key); - logger.debug({ model: modelName }, '[ModelRegistry] Model not found, using default'); - return { ...DEFAULT_COST, source: 'default' }; + return null; } /** diff --git a/src/routing/risk-analyzer.js b/src/routing/risk-analyzer.js index efd8281..78c402c 100644 --- a/src/routing/risk-analyzer.js +++ b/src/routing/risk-analyzer.js @@ -13,13 +13,18 @@ const { extractContent } = require('./complexity-analyzer'); // Substring keywords found in file paths or instruction text. // Matched case-insensitively as raw substrings, so "auth" hits // "src/auth/login.ts" and "authentication". +// NOTE: keywords are matched as case-insensitive *substrings* against file +// paths, so overly generic terms cause false positives. 'session' and 'token' +// were removed because they match benign paths (src/sessions/*, tokenizer.js, +// token-budget.js) and were force-escalating ordinary requests to COMPLEX — +// real secrets/credentials are still covered by the keywords below. const PROTECTED_PATH_KEYWORDS = [ - 'auth', 'oauth', 'jwt', 'session', 'security', 'permission', 'rbac', + 'auth', 'oauth', 'jwt', 'security', 'permission', 'rbac', 'payment', 'payments', 'billing', 'invoice', 'subscription', 'migration', 'migrations', 'schema', 'infra', 'terraform', 'kustomize', 'helm', 'kubernetes', '.github/workflows', '.env', 'secret', 'credential', - 'api-key', 'api_key', 'apikey', 'token', + 'api-key', 'api_key', 'apikey', 'webhook', 'admin', ]; diff --git a/src/routing/session-affinity.js b/src/routing/session-affinity.js new file mode 100644 index 0000000..5f76f82 --- /dev/null +++ b/src/routing/session-affinity.js @@ -0,0 +1,96 @@ +/** + * Session → Provider Affinity + * + * A multi-turn agentic conversation builds up tool_use / tool_result history + * whose tool-call IDs are formatted for the provider that produced them. If a + * later turn re-routes to a *different* provider (because per-turn complexity + * or risk changed), that provider rejects the orphaned tool linkage: + * + * Azure: 400 "No tool call found for function call output with call_id ..." + * Moonshot: 400 "Invalid request: tool_call_id is not found" + * + * To prevent that, once a session has chosen a provider we keep subsequent + * turns on it *while the payload carries tool history*. Fresh turns (no tool + * state) still route normally, so per-turn tier routing is preserved. + * + * @module routing/session-affinity + */ + +const MAX_ENTRIES = 2000; +const TTL_MS = 60 * 60 * 1000; // 1 hour + +/** @type {Map} */ +const pins = new Map(); + +function _evictIfNeeded() { + if (pins.size <= MAX_ENTRIES) return; + // Map preserves insertion order — drop the oldest. + const oldest = pins.keys().next().value; + if (oldest !== undefined) pins.delete(oldest); +} + +/** + * True when the payload contains an in-flight tool exchange — i.e. a prior + * assistant tool_use or a user tool_result. These are the turns whose + * tool-call IDs break if the provider changes. + * @param {object} payload + * @returns {boolean} + */ +function payloadHasToolHistory(payload) { + const messages = payload?.messages; + if (!Array.isArray(messages)) return false; + for (const msg of messages) { + const content = msg?.content; + if (!Array.isArray(content)) continue; + for (const block of content) { + const t = block?.type; + if (t === "tool_use" || t === "tool_result") return true; + } + } + return false; +} + +/** + * Return the pinned routing decision for a session, or null if none / expired. + * @param {string} sessionId + */ +function getPinned(sessionId) { + if (!sessionId) return null; + const entry = pins.get(sessionId); + if (!entry) return null; + if (Date.now() - entry.ts> TTL_MS) { + pins.delete(sessionId); + return null; + } + return entry; +} + +/** + * Record the provider a session routed to, for reuse on later tool-bearing turns. + * @param {string} sessionId + * @param {{provider:string, model?:string|null, tier?:string|null}} decision + */ +function setPinned(sessionId, decision) { + if (!sessionId || !decision?.provider) return; + // Refresh insertion order so active sessions aren't evicted. + pins.delete(sessionId); + pins.set(sessionId, { + provider: decision.provider, + model: decision.model ?? null, + tier: decision.tier ?? null, + ts: Date.now(), + }); + _evictIfNeeded(); +} + +/** Test/maintenance helper. */ +function _clear() { + pins.clear(); +} + +module.exports = { + payloadHasToolHistory, + getPinned, + setPinned, + _clear, +}; diff --git a/src/routing/telemetry.js b/src/routing/telemetry.js index 5d2a504..e606d35 100644 --- a/src/routing/telemetry.js +++ b/src/routing/telemetry.js @@ -94,7 +94,9 @@ function init() { circuit_breaker_state TEXT, quality_score REAL, tokens_per_second REAL, - cost_efficiency REAL + cost_efficiency REAL, + request_text TEXT, + response_text TEXT ); CREATE INDEX IF NOT EXISTS idx_telemetry_provider @@ -110,6 +112,15 @@ function init() { ON routing_telemetry(session_id, timestamp); `); + // Migration: add columns to pre-existing tables (CREATE TABLE IF NOT EXISTS + // won't add them to a DB created before these columns existed). + const existingCols = new Set(db.prepare("PRAGMA table_info(routing_telemetry)").all().map((c) => c.name)); + for (const col of ["request_text", "response_text"]) { + if (!existingCols.has(col)) { + db.exec(`ALTER TABLE routing_telemetry ADD COLUMN ${col} TEXT`); + } + } + logger.info({ dbPath }, "Routing telemetry database initialised"); return true; } catch (err) { @@ -163,14 +174,14 @@ function record(data) { provider, model, routing_method, was_fallback, output_tokens, latency_ms, status_code, error_type, cost_usd, tool_calls_made, retry_count, circuit_breaker_state, quality_score, tokens_per_second, - cost_efficiency + cost_efficiency, request_text, response_text ) VALUES ( @request_id, @session_id, @timestamp, @complexity_score, @tier, @agentic_type, @tool_count, @input_tokens, @message_count, @request_type, @provider, @model, @routing_method, @was_fallback, @output_tokens, @latency_ms, @status_code, @error_type, @cost_usd, @tool_calls_made, @retry_count, @circuit_breaker_state, @quality_score, @tokens_per_second, - @cost_efficiency + @cost_efficiency, @request_text, @response_text )` ); if (!insert) return; @@ -201,6 +212,8 @@ function record(data) { quality_score: data.quality_score ?? null, tokens_per_second: data.tokens_per_second ?? null, cost_efficiency: data.cost_efficiency ?? null, + request_text: data.request_text ?? null, + response_text: data.response_text ?? null, }); } catch (err) { logger.debug({ err: err.message }, "Telemetry record failed"); diff --git a/test/model-registry-cost.test.js b/test/model-registry-cost.test.js new file mode 100644 index 0000000..d0836cd --- /dev/null +++ b/test/model-registry-cost.test.js @@ -0,0 +1,50 @@ +const assert = require("assert"); +const { describe, it } = require("node:test"); + +const { getModelRegistrySync } = require("../src/routing/model-registry"); + +const reg = getModelRegistrySync(); + +describe("model-registry cost resolution ladder", () => { + it("resolves a known model exactly", () => { + const c = reg.getCost("gpt-5.2-chat"); + assert.strictEqual(c.unknown, undefined); + assert.ok(c.input> 0 && c.output> 0); + }); + + it("strips a provider prefix to resolve", () => { + const c = reg.getCost("databricks-claude-sonnet-4-5"); + assert.ok(!c.unknown); + assert.ok(c.input> 0); + }); + + it("matches a dated/suffixed name via longest-prefix", () => { + const base = reg.getCost("gpt-5.2-chat"); + const suffixed = reg.getCost("gpt-5.2-chat-2026"); + assert.ok(!suffixed.unknown); + assert.strictEqual(suffixed.input, base.input); + assert.strictEqual(suffixed.matchedAs, "gpt-5.2-chat"); + }); + + it("returns unknown (not a fabricated price) for a garbage name", () => { + const c = reg.getCost("totally-made-up-model-xyz"); + assert.strictEqual(c.unknown, true); + assert.strictEqual(c.resolution, undefined); + }); + + it("does not false-match a too-short name", () => { + assert.strictEqual(reg.getCost("xx").unknown, true); + }); + + it("treats empty/missing model as unknown", () => { + assert.strictEqual(reg.getCost("").unknown, true); + assert.strictEqual(reg.getCost(null).unknown, true); + }); + + it("never does a bidirectional substring match (the old fuzzy hazard)", () => { + // A name that contains a real key as a *substring* but not as a prefix must + // NOT resolve to that key. + const c = reg.getCost("my-custom-gpt-5.2-chat-wrapper"); + assert.strictEqual(c.unknown, true); + }); +}); diff --git a/test/session-affinity.test.js b/test/session-affinity.test.js new file mode 100644 index 0000000..8533d99 --- /dev/null +++ b/test/session-affinity.test.js @@ -0,0 +1,64 @@ +const assert = require("assert"); +const { describe, it, beforeEach } = require("node:test"); + +const affinity = require("../src/routing/session-affinity"); + +describe("session-affinity: payloadHasToolHistory", () => { + it("is false for a plain text conversation", () => { + const payload = { messages: [{ role: "user", content: "explain this repo" }] }; + assert.strictEqual(affinity.payloadHasToolHistory(payload), false); + }); + + it("is true when an assistant tool_use is present", () => { + const payload = { + messages: [ + { role: "user", content: "read the file" }, + { role: "assistant", content: [{ type: "tool_use", id: "t1", name: "Read", input: {} }] }, + ], + }; + assert.strictEqual(affinity.payloadHasToolHistory(payload), true); + }); + + it("is true when a user tool_result is present", () => { + const payload = { + messages: [ + { role: "user", content: [{ type: "tool_result", tool_use_id: "t1", content: "ok" }] }, + ], + }; + assert.strictEqual(affinity.payloadHasToolHistory(payload), true); + }); + + it("handles missing/!array messages safely", () => { + assert.strictEqual(affinity.payloadHasToolHistory({}), false); + assert.strictEqual(affinity.payloadHasToolHistory(null), false); + assert.strictEqual(affinity.payloadHasToolHistory({ messages: "x" }), false); + }); +}); + +describe("session-affinity: pin lifecycle", () => { + beforeEach(() => affinity._clear()); + + it("returns null when nothing is pinned", () => { + assert.strictEqual(affinity.getPinned("s1"), null); + }); + + it("round-trips a pinned decision", () => { + affinity.setPinned("s1", { provider: "moonshot", model: "moonshot-v1-auto", tier: "COMPLEX" }); + const got = affinity.getPinned("s1"); + assert.strictEqual(got.provider, "moonshot"); + assert.strictEqual(got.model, "moonshot-v1-auto"); + assert.strictEqual(got.tier, "COMPLEX"); + }); + + it("ignores empty session id or provider", () => { + affinity.setPinned("", { provider: "ollama" }); + affinity.setPinned("s2", { provider: undefined }); + assert.strictEqual(affinity.getPinned("s2"), null); + }); + + it("keeps the latest provider for a session", () => { + affinity.setPinned("s1", { provider: "ollama" }); + affinity.setPinned("s1", { provider: "azure-openai" }); + assert.strictEqual(affinity.getPinned("s1").provider, "azure-openai"); + }); +}); diff --git a/test/token-reduction.test.js b/test/token-reduction.test.js new file mode 100644 index 0000000..01363ef --- /dev/null +++ b/test/token-reduction.test.js @@ -0,0 +1,182 @@ +const assert = require("assert"); +const { describe, it } = require("node:test"); + +const { compressToolResults, getMetrics } = require("../src/context/tool-result-compressor"); +const { detectBypass, buildBypassResponse } = require("../src/orchestrator/bypass"); +const { dedupeTools } = require("../src/context/tool-dedup"); +const { injectCaveman } = require("../src/context/caveman"); + +// Helper: wrap a tool_result string in a message and compress it. +function compressOne(text, tier = "SIMPLE") { + const messages = [ + { role: "user", content: [{ type: "tool_result", tool_use_id: "t1", content: text }] }, + ]; + const res = compressToolResults(messages, { tier }); + return { out: messages[0].content[0].content, res }; +} + +describe("RTK filters — grep", () => { + it("groups grep matches by file and caps per-file output", () => { + const lines = []; + for (let i = 1; i <= 30; i++) lines.push(`src/app.js:${i}:const x = ${i};`); + for (let i = 1; i <= 5; i++) lines.push(`src/util.js:${i}:helper(${i});`); + const { out } = compressOne(lines.join("\n")); + assert.ok(out.includes("35 matches in 2F"), `got: ${out.slice(0, 80)}`); + assert.ok(out.includes("[file] src/app.js (30)")); + assert.ok(out.includes("+20"), "should cap at 10 per file and note the rest"); + // tee recovery pointer is appended + assert.ok(/\[full: tee_/.test(out)); + }); + + it("ignores prose that is not grep output", () => { + const text = "This is a normal paragraph.\nNo file:line:content here.\n".repeat(40); + const { out } = compressOne(text); + // grep should not fire; dedup_log collapses the repeated lines instead — but + // the point is the result is still valid text, not a grep summary. + assert.ok(!out.includes("matches in")); + }); +}); + +describe("RTK filters — dedup log", () => { + it("collapses consecutive duplicate lines", () => { + const text = "starting\n" + "retrying connection...\n".repeat(200) + "done\n"; + const { out } = compressOne(text); + assert.ok(out.includes("duplicate lines"), `got: ${out.slice(0, 120)}`); + assert.ok(out.length < text.length * 0.7); + }); +}); + +describe("RTK filters — smart truncate", () => { + it("keeps head and tail of very long unmatched output", () => { + const lines = []; + for (let i = 0; i < 400; i++) lines.push(`unique log line number ${i} ${Math.random()}`); + const { out } = compressOne(lines.join("\n")); + assert.ok(out.includes("lines truncated"), `got tail: ${out.slice(-80)}`); + assert.ok(out.includes("unique log line number 0")); + assert.ok(out.includes("unique log line number 399")); + }); +}); + +describe("request bypass", () => { + const cliHeaders = { "user-agent": "claude-cli/1.0.0" }; + + it("bypasses Warmup pings from the Claude CLI", () => { + const b = detectBypass({ + payload: { messages: [{ role: "user", content: "Warmup" }] }, + headers: cliHeaders, + }); + assert.ok(b, "expected bypass"); + assert.strictEqual(b.kind, "warmup"); + }); + + it("synthesizes a title for topic-extraction requests", () => { + const b = detectBypass({ + payload: { + system: "Analyze if this is a new topic. Respond with isNewTopic and title.", + messages: [{ role: "user", content: "refactor the auth middleware please" }], + }, + headers: cliHeaders, + }); + assert.ok(b); + assert.strictEqual(b.kind, "title_extraction"); + const parsed = JSON.parse(b.text); + assert.strictEqual(parsed.isNewTopic, true); + assert.strictEqual(parsed.title, "refactor the auth"); + }); + + it("handles the '{' title-prefill pattern", () => { + const b = detectBypass({ + payload: { + messages: [ + { role: "user", content: "hi" }, + { role: "assistant", content: [{ type: "text", text: "{" }] }, + ], + }, + headers: cliHeaders, + }); + assert.ok(b); + assert.strictEqual(b.kind, "title_prefill"); + }); + + it("does NOT bypass non-CLI clients", () => { + const b = detectBypass({ + payload: { messages: [{ role: "user", content: "Warmup" }] }, + headers: { "user-agent": "cursor/0.4" }, + }); + assert.strictEqual(b, null); + }); + + it("does NOT bypass a real coding question from the CLI", () => { + const b = detectBypass({ + payload: { messages: [{ role: "user", content: "write a binary search in python" }] }, + headers: cliHeaders, + }); + assert.strictEqual(b, null); + }); + + it("builds a valid Anthropic message response", () => { + const r = buildBypassResponse({ kind: "warmup", text: "OK" }, "claude-x"); + assert.strictEqual(r.status, 200); + assert.strictEqual(r.body.type, "message"); + assert.strictEqual(r.body.content[0].text, "OK"); + assert.strictEqual(r.body.model, "claude-x"); + assert.strictEqual(r.terminationReason, "bypass_warmup"); + }); +}); + +describe("MCP-aware tool dedup", () => { + it("strips built-in web tools when Exa MCP is present", () => { + const tools = [ + { name: "mcp__exa__web_search_exa" }, + { name: "WebSearch" }, + { name: "WebFetch" }, + { name: "Read" }, + ]; + const { tools: out, stripped } = dedupeTools(tools); + assert.deepStrictEqual(stripped.sort(), ["WebFetch", "WebSearch"]); + assert.ok(out.some((t) => t.name === "mcp__exa__web_search_exa")); + assert.ok(out.some((t) => t.name === "Read")); + assert.ok(!out.some((t) => t.name === "WebSearch")); + }); + + it("is a no-op when no trigger MCP tool is present", () => { + const tools = [{ name: "WebSearch" }, { name: "Read" }]; + const { tools: out, stripped } = dedupeTools(tools); + assert.deepStrictEqual(stripped, []); + assert.strictEqual(out.length, 2); + }); + + it("supports OpenAI-shaped tool definitions", () => { + const tools = [ + { type: "function", function: { name: "mcp__tavily__tavily_search" } }, + { type: "function", function: { name: "WebFetch" } }, + ]; + const { stripped } = dedupeTools(tools); + assert.deepStrictEqual(stripped, ["WebFetch"]); + }); +}); + +describe("caveman injector", () => { + it("is a no-op when disabled", () => { + const sys = "You are a helpful assistant."; + assert.strictEqual(injectCaveman(sys, { enabled: false }), sys); + }); + + it("appends a brevity instruction when enabled", () => { + const out = injectCaveman("base prompt", { enabled: true, level: "lite" }); + assert.ok(out.startsWith("base prompt")); + assert.ok(out.includes("[brevity]")); + assert.ok(out.includes("terse")); + }); + + it("is idempotent (no double injection)", () => { + const once = injectCaveman("base", { enabled: true }); + const twice = injectCaveman(once, { enabled: true }); + assert.strictEqual(once, twice); + }); + + it("falls back to lite for an unknown level", () => { + const out = injectCaveman("", { enabled: true, level: "bogus" }); + assert.ok(out.includes("[brevity]")); + }); +});

AltStyle によって変換されたページ (->オリジナル) /