Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Design 211

Seth Ford edited this page Mar 9, 2026 · 1 revision

I have enough context now. Let me produce the ADR.


Design: Memory Pattern Effectiveness Tracker with Proactive Failure Prevention Scoring

Context

Shipwright's memory system (sw-memory.sh, 2118 lines) captures failure patterns from pipeline runs into ~/.shipwright/memory/<repo-hash>/failures.json. A library module lib/memory-effectiveness.sh (507 lines) already provides core tracking primitives: injection recording, outcome tracking, scoring, ranking, pruning, and reporting. However:

  1. No pre-pipeline scoring — Memory is injected blindly. There's no mechanism to match an incoming issue against historical patterns before the pipeline starts, so irrelevant patterns waste context tokens.
  2. Hooks are disconnectedmemeff_on_injection() and memeff_on_pipeline_complete() exist but are never called from sw-loop.sh or pipeline-commands.sh.
  3. Outcome tracking is shallow — Current tracking records avoided_error: true/false but doesn't distinguish "failure prevented" from "failure unrelated to injected pattern."
  4. No CLI surface — Effectiveness data is inaccessible to operators. No shipwright memory effectiveness or shipwright memory score-issue commands exist.

Constraints:

  • Bash 3.2 compatibility (macOS default) — no associative arrays, no readarray
  • JSONL storage (~/.shipwright/optimization/) — append-only, atomic writes via tmp+mv
  • Must not break existing always-inject behavior (new scoring adds an optional filter layer)
  • Cold start: must return score 0 and inject nothing when no historical data exists
  • Must be backwards-compatible with existing JSONL files (additive fields only)

Decision

Approach: Enhance-and-Wire (not rebuild)

Extend lib/memory-effectiveness.sh with one new function (memeff_score_issue) and enhanced outcome fields. Wire the existing but disconnected hooks into three integration points. Expose via CLI subcommands on sw-memory.sh.

Component Architecture

┌─────────────────────────────────────────────────────────────────┐
│ CLI Layer │
│ sw-memory.sh: effectiveness [text|json|strategic] │
│ sw-memory.sh: score-issue <title> <body> [labels] [files] │
└────────────────────────┬────────────────────────────────────────┘
 │ dispatches to
┌────────────────────────▼────────────────────────────────────────┐
│ lib/memory-effectiveness.sh │
│ │
│ NEW: memeff_score_issue() │
│ ├─ keyword_overlap_score (title+body vs pattern+root_cause) │
│ ├─ label_match_score (issue labels vs pattern categories) │
│ ├─ file_overlap_score (affected files vs pattern file_paths) │
│ └─ effectiveness_weight (historical prevention_rate boost) │
│ │
│ ENHANCED: memeff_track_outcome() │
│ ├─ pattern_injected: bool │
│ ├─ failure_occurred: bool │
│ ├─ failure_type_matched: bool │
│ └─ failure_prevented: bool (derived: injected && !occurred) │
│ │
│ ENHANCED: memeff_report() │
│ ├─ injection_success_rate │
│ ├─ patterns_needing_refinement list │
│ └─ strategic export format (JSON for agent consumption) │
│ │
│ EXISTING (unchanged): │
│ memeff_track_injection, memeff_score_pattern, │
│ memeff_rank_patterns, memeff_prune_ineffective, │
│ memeff_proactive_score, memeff_on_injection, │
│ memeff_on_pipeline_complete │
└───────┬──────────────────────────┬──────────────────────────────┘
 │ │
 Injection Hook Completion Hook
 │ │
┌───────▼──────────┐ ┌─────────▼───────────────────────────┐
│ sw-loop.sh │ │ lib/pipeline-commands.sh │
│ │ │ lib/daemon-dispatch.sh │
│ memory_closed_ │ │ │
│ loop_inject() │ │ pipeline.completed event handler │
│ memory_inject_ │ │ daemon_reap_completed() │
│ context() │ │ │
│ + memeff_on_ │ │ + memeff_on_pipeline_complete() │
│ injection() │ │ called with outcome + error_class │
└──────────────────┘ └──────────────────────────────────────┘

Data Flow

Issue arrives
 │
 ▼
memeff_score_issue(title, body, labels, files)
 │
 ├─ Reads: ~/.shipwright/memory/<repo>/failures.json (pattern catalog)
 ├─ Reads: ~/.shipwright/optimization/memory-outcomes.jsonl (historical effectiveness)
 │
 ▼
Score 0-100 returned
 │
 ├─ score < 30 (default threshold) → No injection (cold start / low match)
 ├─ score >= 30 → Inject top-N matching patterns into pipeline prompt
 │
 ▼
memeff_track_injection(memory_id, pipeline_id, stage, context)
 │ Writes: ~/.shipwright/optimization/memory-injections.jsonl
 │
 ▼
Pipeline runs (build → test → review → ...)
 │
 ▼
Pipeline completes (success or failure)
 │
 ▼
memeff_on_pipeline_complete(pipeline_id, outcome, error_context)
 │ Reads: memory-injections.jsonl (find what was injected)
 │ Writes: memory-outcomes.jsonl (with enhanced fields)
 │ Emits: memeff.outcome event
 │
 ▼
memeff_report() / memeff_rank_patterns()
 │ Reads: memory-outcomes.jsonl
 │ Aggregates: prevention_rate, relevance_rate, effectiveness_score
 │ Outputs: text / json / strategic format

Interface Contracts

# NEW — Pre-pipeline issue scoring
# Returns: JSON {"score": 0-100, "matching_patterns": [...], "recommended_injections": [...]}
memeff_score_issue(issue_title, issue_body, issue_labels, affected_files)
# Errors: Returns '{"score":0,"matching_patterns":[],"recommended_injections":[]}' on any failure (cold start safe)
# ENHANCED — Outcome tracking with richer fields
# outcome_record schema (JSONL):
# {
# "memory_id": string,
# "pipeline_id": string,
# "outcome": "success" | "failure" | "inconclusive",
# "was_relevant": boolean,
# "avoided_error": boolean,
# "pattern_injected": boolean, # NEW
# "failure_occurred": boolean, # NEW
# "failure_type_matched": boolean, # NEW
# "failure_prevented": boolean, # NEW (derived)
# "error_description": string,
# "recorded_at": ISO8601
# }
# ENHANCED — Report with strategic export
# format: "text" (human), "json" (machine), "strategic" (agent-consumable)
memeff_report(format)
# strategic format schema:
# {
# "summary": { "total_patterns": N, "average_score": N, ... },
# "injection_success_rate": float,
# "top_patterns": [...],
# "patterns_needing_refinement": [...],
# "recommendations": [string]
# }
# CLI subcommands (dispatched by sw-memory.sh case statement):
# shipwright memory effectiveness [text|json|strategic]
# shipwright memory score-issue <title> [--body "..."] [--labels "a,b"] [--files "x,y"]

Error Boundaries

Component Error Source Handling
memeff_score_issue Missing failures.json, empty patterns Return {"score":0} — cold start safe
memeff_score_issue jq parse error on corrupted JSON `
memeff_track_outcome No matching injection record Warn + skip (existing behavior)
memeff_on_pipeline_complete JSONL grep finds no injections Early return, no outcome written
memeff_report strategic Empty outcomes file Return valid JSON with zeroes
Hook wiring in sw-loop.sh memory-effectiveness.sh not loadable type memeff_on_injection >/dev/null 2>&1 guard
Hook wiring in pipeline-commands.sh Module not sourced Same type guard pattern

Scoring Algorithm (memeff_score_issue)

score = (keyword_overlap * 0.35) + (label_match * 0.20) + (file_overlap * 0.25) + (effectiveness_weight * 0.20)
keyword_overlap: Tokenize issue title+body, match against pattern + root_cause fields.
 Score = (matched_keywords / total_pattern_keywords) * 100
label_match: Map issue labels to pattern categories (e.g., "bug" → "runtime_error").
 Binary: 100 if any match, 0 otherwise.
file_overlap: Intersect affected_files with pattern's file_paths (if available).
 Score = (intersection_count / pattern_file_count) * 100
effectiveness_weight: Historical prevention_rate from memeff_score_pattern().
 Score = prevention_rate (0-100), defaults to 50 if < 3 data points.

Threshold: inject when aggregate score >= 30 (configurable via MEMEFF_INJECTION_THRESHOLD).

Alternatives Considered

1. Separate SQLite-based tracker

Pros: Proper relational queries, JOIN capability, ACID transactions, indexes for fast lookups. Cons: Adds SQLite dependency to memory-effectiveness module (currently pure jq/JSONL); the project already uses sw-db.sh for other storage but memory effectiveness is a lightweight append-only workload. Over-engineered for ~100-1000 records. Breaks the pure-bash-library pattern of lib/.

2. AI-powered semantic matching (Claude API call in memeff_score_issue)

Pros: Much better issue-to-pattern matching than keyword overlap. Could understand semantic intent. Cons: Adds latency (2-5s API call) before every pipeline start. Adds cost (0ドル.01-0.05 per scoring call). Creates a circular dependency (using Claude to decide what to inject into Claude). Fails when offline/rate-limited, which is exactly when you need the memory system most.

3. Build from scratch — new sw-memory-effectiveness.sh top-level script

Pros: Clean separation, dedicated CLI entry point, no risk of breaking existing module. Cons: Duplicates the 507-line library that already exists. Creates two effectiveness-tracking codepaths. Plan explicitly states "modify only, no new files." The existing module was designed for exactly this extension.

Schema Changes

Forward migration — No SQL schema; JSONL records gain additive fields:

// memory-outcomes.jsonl — existing fields preserved, new fields added
{"memory_id":"auth-fix-1","pipeline_id":"pipe-42","outcome":"success","was_relevant":true,"avoided_error":true,"pattern_injected":true,"failure_occurred":false,"failure_type_matched":false,"failure_prevented":true,"error_description":"","recorded_at":"2026年03月09日T12:00:00Z"}

Backwards compatibility: New fields default to false/empty via jq's // false fallback. Old readers that don't query these fields are unaffected.

Rollback: Delete the new fields from future records. Old records without new fields already work with // false defaults. No migration needed — JSONL is append-only.

Data backfill: Not required. Existing records missing pattern_injected/failure_prevented are treated as having those fields set to false, which is semantically correct (we didn't track it, so we can't claim prevention).

Idempotency Strategy

  • Injection tracking: memeff_track_injection creates a new JSONL record per injection event. Duplicate calls for the same (memory_id, pipeline_id) pair produce duplicate records. This is acceptable: memeff_on_pipeline_complete processes all matching records, and duplicate injections don't change the outcome scoring (outcome is per-pipeline, not per-injection).
  • Outcome tracking: memeff_track_outcome finds the matching injection by (memory_id, pipeline_id) and appends an outcome. Multiple calls for the same pair append duplicate outcomes, but memeff_score_pattern counts all outcomes — so the score slightly skews toward the duplicated outcome. In practice, memeff_on_pipeline_complete is called once at pipeline end, so duplicates don't occur in normal flow.
  • Side-effect safety: All writes use atomic tmp+mv. Event emissions are fire-and-forget. No external API calls.

Rollback Plan

  1. Revert the commits modifying the 5 files
  2. Existing JSONL files in ~/.shipwright/optimization/ remain valid (new fields are simply ignored by the old code)
  3. No external state to clean up (no database, no API registrations)
  4. CLI subcommands in sw-memory.sh simply become unrecognized — fall through to usage help

Implementation Plan

Files to modify (5 total, no new files):

File Changes
scripts/lib/memory-effectiveness.sh Add memeff_score_issue(), enhance memeff_track_outcome() with new fields, enhance memeff_report() with strategic format and injection_success_rate
scripts/sw-memory.sh Add effectiveness and score-issue case branches in CLI dispatch
scripts/sw-loop.sh Wire memeff_on_injection call after memory_closed_loop_inject / memory_inject_context (~line 2162, 2207)
scripts/lib/pipeline-commands.sh Wire memeff_on_pipeline_complete call after pipeline.completed emit_event (~line 792, 844)
scripts/lib/daemon-dispatch.sh Wire memeff_on_pipeline_complete in daemon_reap_completed() (~line 399+)

Dependencies

None new. Uses existing jq, grep, sed, date — all available in Bash 3.2 environments.

Risk Areas

  1. Performance of memeff_score_issue with large failure catalogs — Iterates failures.json entries (typically < 50 per repo). Tokenization uses tr/sort/comm — O(n·m) where n=issue tokens, m=pattern tokens. Should be < 100ms for typical catalogs. Risk: repos with 500+ failure entries could take 1-2s. Mitigation: cap at top-50 patterns by seen_count before scoring.

  2. JSONL file growthmemory-injections.jsonl grows by ~1 record per pipeline run per injected pattern. At 10 pipelines/day with 3 patterns each, that's ~30 records/day, ~11K/year. At ~200 bytes/record, that's ~2MB/year. memeff_prune_ineffective already archives old records. Low risk.

  3. Hook wiring in sw-loop.sh — The loop runs in tight iteration cycles. Adding a type memeff_on_injection >/dev/null 2>&1 && memeff_on_injection ... guard is cheap (no fork), but the actual JSONL append inside memeff_track_injection forks jq + mktemp + mv. This adds ~50ms per injection. Acceptable since injection happens once at loop start, not per iteration.

  4. Pipeline-commands.sh completion path — Both success and failure paths emit pipeline.completed. The hook must handle both. Risk: if memeff_on_pipeline_complete errors out, it could interfere with the pipeline exit code. Mitigation: wrap in || true to ensure it's fire-and-forget.

Validation Criteria

  • memeff_score_issue "auth token expired" "Login fails after session timeout" "bug" "src/auth.ts" returns a JSON object with score between 0-100 when failures.json contains auth-related patterns
  • memeff_score_issue returns {"score":0,"matching_patterns":[],"recommended_injections":[]} on cold start (no failures.json)
  • Enhanced outcome records contain pattern_injected, failure_occurred, failure_type_matched, failure_prevented fields
  • Old outcome records without new fields still parse correctly (jq '.failure_prevented // false' returns false)
  • shipwright memory effectiveness prints a text report with injection_success_rate
  • shipwright memory effectiveness json returns valid JSON matching the strategic schema
  • shipwright memory score-issue "test title" returns valid JSON with a score
  • memeff_on_injection is called in sw-loop.sh when memory_closed_loop_inject returns a pattern (guarded by type check)
  • memeff_on_pipeline_complete is called in pipeline-commands.sh after both success and failure pipeline.completed events (guarded by || true)
  • Existing 16 tests in sw-memory-effectiveness-test.sh still pass (no regressions)
  • New integration tests cover: score→inject→succeed→outcome and score→inject→fail→outcome flows
  • sw-pipeline-test.sh, sw-memory-test.sh, and sw-daemon-test.sh pass without regressions
  • Scoring threshold (30 default) is configurable via MEMEFF_INJECTION_THRESHOLD environment variable
  • All new functions are Bash 3.2 compatible (no associative arrays, no readarray)

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /