Pipeline Design 211

Seth Ford edited this page Mar 9, 2026 · 1 revision

I have enough context now. Let me produce the ADR.

Design: Memory Pattern Effectiveness Tracker with Proactive Failure Prevention Scoring

Context

Shipwright's memory system (sw-memory.sh, 2118 lines) captures failure patterns from pipeline runs into ~/.shipwright/memory/<repo-hash>/failures.json. A library module lib/memory-effectiveness.sh (507 lines) already provides core tracking primitives: injection recording, outcome tracking, scoring, ranking, pruning, and reporting. However:

No pre-pipeline scoring — Memory is injected blindly. There's no mechanism to match an incoming issue against historical patterns before the pipeline starts, so irrelevant patterns waste context tokens.
Hooks are disconnected — memeff_on_injection() and memeff_on_pipeline_complete() exist but are never called from sw-loop.sh or pipeline-commands.sh.
Outcome tracking is shallow — Current tracking records avoided_error: true/false but doesn't distinguish "failure prevented" from "failure unrelated to injected pattern."
No CLI surface — Effectiveness data is inaccessible to operators. No shipwright memory effectiveness or shipwright memory score-issue commands exist.

Constraints:

Bash 3.2 compatibility (macOS default) — no associative arrays, no readarray
JSONL storage (~/.shipwright/optimization/) — append-only, atomic writes via tmp+mv
Must not break existing always-inject behavior (new scoring adds an optional filter layer)
Cold start: must return score 0 and inject nothing when no historical data exists
Must be backwards-compatible with existing JSONL files (additive fields only)

Decision

Approach: Enhance-and-Wire (not rebuild)

Extend lib/memory-effectiveness.sh with one new function (memeff_score_issue) and enhanced outcome fields. Wire the existing but disconnected hooks into three integration points. Expose via CLI subcommands on sw-memory.sh.

Component Architecture

┌─────────────────────────────────────────────────────────────────┐
│ CLI Layer │
│ sw-memory.sh: effectiveness [text|json|strategic] │
│ sw-memory.sh: score-issue <title> <body> [labels] [files] │
└────────────────────────┬────────────────────────────────────────┘
 │ dispatches to
┌────────────────────────▼────────────────────────────────────────┐
│ lib/memory-effectiveness.sh │
│ │
│ NEW: memeff_score_issue() │
│ ├─ keyword_overlap_score (title+body vs pattern+root_cause) │
│ ├─ label_match_score (issue labels vs pattern categories) │
│ ├─ file_overlap_score (affected files vs pattern file_paths) │
│ └─ effectiveness_weight (historical prevention_rate boost) │
│ │
│ ENHANCED: memeff_track_outcome() │
│ ├─ pattern_injected: bool │
│ ├─ failure_occurred: bool │
│ ├─ failure_type_matched: bool │
│ └─ failure_prevented: bool (derived: injected && !occurred) │
│ │
│ ENHANCED: memeff_report() │
│ ├─ injection_success_rate │
│ ├─ patterns_needing_refinement list │
│ └─ strategic export format (JSON for agent consumption) │
│ │
│ EXISTING (unchanged): │
│ memeff_track_injection, memeff_score_pattern, │
│ memeff_rank_patterns, memeff_prune_ineffective, │
│ memeff_proactive_score, memeff_on_injection, │
│ memeff_on_pipeline_complete │
└───────┬──────────────────────────┬──────────────────────────────┘
 │ │
 Injection Hook Completion Hook
 │ │
┌───────▼──────────┐ ┌─────────▼───────────────────────────┐
│ sw-loop.sh │ │ lib/pipeline-commands.sh │
│ │ │ lib/daemon-dispatch.sh │
│ memory_closed_ │ │ │
│ loop_inject() │ │ pipeline.completed event handler │
│ memory_inject_ │ │ daemon_reap_completed() │
│ context() │ │ │
│ + memeff_on_ │ │ + memeff_on_pipeline_complete() │
│ injection() │ │ called with outcome + error_class │
└──────────────────┘ └──────────────────────────────────────┘

Data Flow

Issue arrives
 │
 ▼
memeff_score_issue(title, body, labels, files)
 │
 ├─ Reads: ~/.shipwright/memory/<repo>/failures.json (pattern catalog)
 ├─ Reads: ~/.shipwright/optimization/memory-outcomes.jsonl (historical effectiveness)
 │
 ▼
Score 0-100 returned
 │
 ├─ score < 30 (default threshold) → No injection (cold start / low match)
 ├─ score >= 30 → Inject top-N matching patterns into pipeline prompt
 │
 ▼
memeff_track_injection(memory_id, pipeline_id, stage, context)
 │ Writes: ~/.shipwright/optimization/memory-injections.jsonl
 │
 ▼
Pipeline runs (build → test → review → ...)
 │
 ▼
Pipeline completes (success or failure)
 │
 ▼
memeff_on_pipeline_complete(pipeline_id, outcome, error_context)
 │ Reads: memory-injections.jsonl (find what was injected)
 │ Writes: memory-outcomes.jsonl (with enhanced fields)
 │ Emits: memeff.outcome event
 │
 ▼
memeff_report() / memeff_rank_patterns()
 │ Reads: memory-outcomes.jsonl
 │ Aggregates: prevention_rate, relevance_rate, effectiveness_score
 │ Outputs: text / json / strategic format

Interface Contracts

# NEW — Pre-pipeline issue scoring
# Returns: JSON {"score": 0-100, "matching_patterns": [...], "recommended_injections": [...]}
memeff_score_issue(issue_title, issue_body, issue_labels, affected_files)
# Errors: Returns '{"score":0,"matching_patterns":[],"recommended_injections":[]}' on any failure (cold start safe)
# ENHANCED — Outcome tracking with richer fields
# outcome_record schema (JSONL):
# {
# "memory_id": string,
# "pipeline_id": string,
# "outcome": "success" | "failure" | "inconclusive",
# "was_relevant": boolean,
# "avoided_error": boolean,
# "pattern_injected": boolean, # NEW
# "failure_occurred": boolean, # NEW
# "failure_type_matched": boolean, # NEW
# "failure_prevented": boolean, # NEW (derived)
# "error_description": string,
# "recorded_at": ISO8601
# }
# ENHANCED — Report with strategic export
# format: "text" (human), "json" (machine), "strategic" (agent-consumable)
memeff_report(format)
# strategic format schema:
# {
# "summary": { "total_patterns": N, "average_score": N, ... },
# "injection_success_rate": float,
# "top_patterns": [...],
# "patterns_needing_refinement": [...],
# "recommendations": [string]
# }
# CLI subcommands (dispatched by sw-memory.sh case statement):
# shipwright memory effectiveness [text|json|strategic]
# shipwright memory score-issue <title> [--body "..."] [--labels "a,b"] [--files "x,y"]

Error Boundaries

Component	Error Source	Handling
`memeff_score_issue`	Missing failures.json, empty patterns	Return `{"score":0}` — cold start safe
`memeff_score_issue`	jq parse error on corrupted JSON	`
`memeff_track_outcome`	No matching injection record	Warn + skip (existing behavior)
`memeff_on_pipeline_complete`	JSONL grep finds no injections	Early return, no outcome written
`memeff_report` strategic	Empty outcomes file	Return valid JSON with zeroes
Hook wiring in sw-loop.sh	`memory-effectiveness.sh` not loadable	`type memeff_on_injection >/dev/null 2>&1` guard
Hook wiring in pipeline-commands.sh	Module not sourced	Same `type` guard pattern

Scoring Algorithm (`memeff_score_issue`)

score = (keyword_overlap * 0.35) + (label_match * 0.20) + (file_overlap * 0.25) + (effectiveness_weight * 0.20)
keyword_overlap: Tokenize issue title+body, match against pattern + root_cause fields.
 Score = (matched_keywords / total_pattern_keywords) * 100
label_match: Map issue labels to pattern categories (e.g., "bug" → "runtime_error").
 Binary: 100 if any match, 0 otherwise.
file_overlap: Intersect affected_files with pattern's file_paths (if available).
 Score = (intersection_count / pattern_file_count) * 100
effectiveness_weight: Historical prevention_rate from memeff_score_pattern().
 Score = prevention_rate (0-100), defaults to 50 if < 3 data points.

Threshold: inject when aggregate score >= 30 (configurable via MEMEFF_INJECTION_THRESHOLD).

Alternatives Considered

1. Separate SQLite-based tracker

Pros: Proper relational queries, JOIN capability, ACID transactions, indexes for fast lookups. Cons: Adds SQLite dependency to memory-effectiveness module (currently pure jq/JSONL); the project already uses sw-db.sh for other storage but memory effectiveness is a lightweight append-only workload. Over-engineered for ~100-1000 records. Breaks the pure-bash-library pattern of lib/.

2. AI-powered semantic matching (Claude API call in `memeff_score_issue`)

Pros: Much better issue-to-pattern matching than keyword overlap. Could understand semantic intent. Cons: Adds latency (2-5s API call) before every pipeline start. Adds cost (0ドル.01-0.05 per scoring call). Creates a circular dependency (using Claude to decide what to inject into Claude). Fails when offline/rate-limited, which is exactly when you need the memory system most.

3. Build from scratch — new `sw-memory-effectiveness.sh` top-level script

Pros: Clean separation, dedicated CLI entry point, no risk of breaking existing module. Cons: Duplicates the 507-line library that already exists. Creates two effectiveness-tracking codepaths. Plan explicitly states "modify only, no new files." The existing module was designed for exactly this extension.

Schema Changes

Forward migration — No SQL schema; JSONL records gain additive fields:

// memory-outcomes.jsonl — existing fields preserved, new fields added
{"memory_id":"auth-fix-1","pipeline_id":"pipe-42","outcome":"success","was_relevant":true,"avoided_error":true,"pattern_injected":true,"failure_occurred":false,"failure_type_matched":false,"failure_prevented":true,"error_description":"","recorded_at":"2026年03月09日T12:00:00Z"}

Backwards compatibility: New fields default to false/empty via jq's // false fallback. Old readers that don't query these fields are unaffected.

Rollback: Delete the new fields from future records. Old records without new fields already work with // false defaults. No migration needed — JSONL is append-only.

Data backfill: Not required. Existing records missing pattern_injected/failure_prevented are treated as having those fields set to false, which is semantically correct (we didn't track it, so we can't claim prevention).

Idempotency Strategy

Injection tracking: memeff_track_injection creates a new JSONL record per injection event. Duplicate calls for the same (memory_id, pipeline_id) pair produce duplicate records. This is acceptable: memeff_on_pipeline_complete processes all matching records, and duplicate injections don't change the outcome scoring (outcome is per-pipeline, not per-injection).
Outcome tracking: memeff_track_outcome finds the matching injection by (memory_id, pipeline_id) and appends an outcome. Multiple calls for the same pair append duplicate outcomes, but memeff_score_pattern counts all outcomes — so the score slightly skews toward the duplicated outcome. In practice, memeff_on_pipeline_complete is called once at pipeline end, so duplicates don't occur in normal flow.
Side-effect safety: All writes use atomic tmp+mv. Event emissions are fire-and-forget. No external API calls.

Rollback Plan

Revert the commits modifying the 5 files
Existing JSONL files in ~/.shipwright/optimization/ remain valid (new fields are simply ignored by the old code)
No external state to clean up (no database, no API registrations)
CLI subcommands in sw-memory.sh simply become unrecognized — fall through to usage help

Implementation Plan

Files to modify (5 total, no new files):

File	Changes
`scripts/lib/memory-effectiveness.sh`	Add `memeff_score_issue()`, enhance `memeff_track_outcome()` with new fields, enhance `memeff_report()` with strategic format and injection_success_rate
`scripts/sw-memory.sh`	Add `effectiveness` and `score-issue` case branches in CLI dispatch
`scripts/sw-loop.sh`	Wire `memeff_on_injection` call after `memory_closed_loop_inject` / `memory_inject_context` (~line 2162, 2207)
`scripts/lib/pipeline-commands.sh`	Wire `memeff_on_pipeline_complete` call after `pipeline.completed` emit_event (~line 792, 844)
`scripts/lib/daemon-dispatch.sh`	Wire `memeff_on_pipeline_complete` in `daemon_reap_completed()` (~line 399+)

Dependencies

None new. Uses existing jq, grep, sed, date — all available in Bash 3.2 environments.

Risk Areas

Performance of memeff_score_issue with large failure catalogs — Iterates failures.json entries (typically < 50 per repo). Tokenization uses tr/sort/comm — O(n·m) where n=issue tokens, m=pattern tokens. Should be < 100ms for typical catalogs. Risk: repos with 500+ failure entries could take 1-2s. Mitigation: cap at top-50 patterns by seen_count before scoring.
JSONL file growth — memory-injections.jsonl grows by ~1 record per pipeline run per injected pattern. At 10 pipelines/day with 3 patterns each, that's ~30 records/day, ~11K/year. At ~200 bytes/record, that's ~2MB/year. memeff_prune_ineffective already archives old records. Low risk.
Hook wiring in sw-loop.sh — The loop runs in tight iteration cycles. Adding a type memeff_on_injection >/dev/null 2>&1 && memeff_on_injection ... guard is cheap (no fork), but the actual JSONL append inside memeff_track_injection forks jq + mktemp + mv. This adds ~50ms per injection. Acceptable since injection happens once at loop start, not per iteration.
Pipeline-commands.sh completion path — Both success and failure paths emit pipeline.completed. The hook must handle both. Risk: if memeff_on_pipeline_complete errors out, it could interfere with the pipeline exit code. Mitigation: wrap in || true to ensure it's fire-and-forget.

Validation Criteria

memeff_score_issue "auth token expired" "Login fails after session timeout" "bug" "src/auth.ts" returns a JSON object with score between 0-100 when failures.json contains auth-related patterns
memeff_score_issue returns {"score":0,"matching_patterns":[],"recommended_injections":[]} on cold start (no failures.json)
Enhanced outcome records contain pattern_injected, failure_occurred, failure_type_matched, failure_prevented fields
Old outcome records without new fields still parse correctly (jq '.failure_prevented // false' returns false)
shipwright memory effectiveness prints a text report with injection_success_rate
shipwright memory effectiveness json returns valid JSON matching the strategic schema
shipwright memory score-issue "test title" returns valid JSON with a score
memeff_on_injection is called in sw-loop.sh when memory_closed_loop_inject returns a pattern (guarded by type check)
memeff_on_pipeline_complete is called in pipeline-commands.sh after both success and failure pipeline.completed events (guarded by || true)
Existing 16 tests in sw-memory-effectiveness-test.sh still pass (no regressions)
New integration tests cover: score→inject→succeed→outcome and score→inject→fail→outcome flows
sw-pipeline-test.sh, sw-memory-test.sh, and sw-daemon-test.sh pass without regressions
Scoring threshold (30 default) is configurable via MEMEFF_INJECTION_THRESHOLD environment variable
All new functions are Bash 3.2 compatible (no associative arrays, no readarray)

Pipeline Design 211

Design: Memory Pattern Effectiveness Tracker with Proactive Failure Prevention Scoring

Context

Decision

Approach: Enhance-and-Wire (not rebuild)

Component Architecture

Data Flow

Interface Contracts

Error Boundaries

Scoring Algorithm (memeff_score_issue)

Alternatives Considered

1. Separate SQLite-based tracker

2. AI-powered semantic matching (Claude API call in memeff_score_issue)

3. Build from scratch — new sw-memory-effectiveness.sh top-level script

Schema Changes

Idempotency Strategy

Rollback Plan

Implementation Plan

Files to modify (5 total, no new files):

Dependencies

Risk Areas

Validation Criteria

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Scoring Algorithm (`memeff_score_issue`)

2. AI-powered semantic matching (Claude API call in `memeff_score_issue`)

3. Build from scratch — new `sw-memory-effectiveness.sh` top-level script