Pipeline Plan 179

ezigus edited this page Mar 16, 2026 · 3 revisions

Implementation Plan: Success Pattern Learning & Template Recommendation Engine

Brainstorming: Socratic Design Refinement

Requirements Clarity

Minimum viable change: Capture success patterns after pipeline completion, store them in ~/.shipwright/memory/<repo>/success-patterns.json, match new issues against historical successes using TF-IDF keyword similarity (already exists in memory_ranked_search()), and display a recommendation with confidence at pipeline start.

Implicit requirements:

Must not break existing failure-only memory capture flow
Must integrate with existing select_pipeline_template() hierarchy in daemon-triage.sh
Must work without SQLite (JSON fallback, like all other persistence)
Must be Bash 3.2 compatible (no associative arrays)
Acceptance rate tracking requires recording whether the recommendation was followed

Acceptance criteria (from issue):

Capture success patterns after each successful pipeline
Similarity matching for new issues against historical successes
Recommend template with confidence score
Display recommendation at pipeline start with rationale
Track recommendation acceptance rate
Store in ~/.shipwright/memory/<repo>/success-patterns.json

Design Alternatives

Approach A: Standalone script (sw-success-patterns.sh)

New script with capture/recommend/show/stats subcommands
Called from memory_finalize_pipeline() and select_pipeline_template()
Pros: Clean separation, testable in isolation, follows existing pattern (sw-memory.sh, sw-intelligence.sh)
Cons: Another script to source, slightly more overhead

Approach B: Extend sw-memory.sh with success functions

Add memory_capture_success_pattern(), memory_recommend_template() to existing memory script
Pros: Keeps memory concerns together, reuses memory_ranked_search() directly
Cons: sw-memory.sh is already 2100+ lines, harder to test in isolation

Approach C: Extend sw-self-optimize.sh (Thompson sampling lives there)

Add success patterns alongside existing Thompson sampling and template weights
Pros: Template optimization is already there
Cons: Different concern (learning vs optimization), self-optimize.sh is already complex

Decision: Approach A (standalone script) — Minimizes blast radius, follows the established pattern of modular scripts, and keeps success pattern logic testable in isolation. Integration points are surgical: hook into memory_finalize_pipeline() for capture, and into select_pipeline_template() for recommendation.

Risk Assessment

Risk	Mitigation
Cold start (no historical data)	Graceful fallback — no recommendation when < 3 patterns stored
TF-IDF similarity too noisy	Use domain keyword expansion (already in sw-memory.sh), require minimum match score
Recommendation ignored adds noise	Track acceptance separately; don't count ignored recommendations against confidence
JSON file grows unbounded	Cap at 200 patterns, evict oldest when full (like failures.json)
Performance: jq over large JSON in hot path	Cache last recommendation per issue hash; patterns file stays small (200 cap)
Breaking existing template selection	Insert as a new layer in the hierarchy, after quality memory, before Thompson sampling

Dependency Analysis

Depends on:

scripts/lib/helpers.sh — emit_event(), info(), success(), warn(), now_iso()
scripts/sw-memory.sh — repo_memory_dir(), ensure_memory_dir(), _expand_domain_keywords()
scripts/sw-db.sh — db_record_outcome() (already captures template+success, we read from it)
config/event-schema.json — needs new event types

Depended on by:

scripts/sw-memory.sh → memory_finalize_pipeline() calls capture
scripts/lib/daemon-triage.sh → select_pipeline_template() calls recommend
scripts/sw-pipeline.sh → displays recommendation at pipeline start

No circular dependency risks — new script is a leaf that's sourced by existing scripts.

Architecture Decision Record

Component Diagram

┌─────────────────────────────────────────────────────────────┐
│ Pipeline Lifecycle │
│ │
│ pipeline start ──► display_recommendation() │
│ │ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ sw-success-patterns │ │
│ │ .sh (NEW) │ │
│ │ │ │
│ │ ●くろまる capture_success() │◄── memory_finalize() │
│ │ ●くろまる recommend() │◄── select_template() │
│ │ ●くろまる match_similar() │ │
│ │ ●くろまる track_acceptance()│◄── pipeline complete │
│ │ ●くろまる show_stats() │ │
│ └──────┬───────────────┘ │
│ │ │
│ ┌───────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ success-patterns events.jsonl pipeline_outcomes (DB) │
│ .json (per-repo) │
└─────────────────────────────────────────────────────────────┘

Interface Contracts

// success-patterns.json schema
interface SuccessPattern {
 id: string; // sha256 hash of goal text
 goal: string; // pipeline goal/issue title
 labels: string[]; // issue labels
 template: string; // template used (fast, standard, full, etc.)
 complexity: string; // low | medium | high
 duration_secs: number; // pipeline duration
 cost_usd: number; // pipeline cost
 stage_results: Record<string, "complete" | "skipped">;
 keywords: string[]; // extracted + expanded keywords for matching
 issue_number: number | null; // GitHub issue number if available
 repo: string; // repo name
 captured_at: string; // ISO timestamp
}
interface SuccessPatternsFile {
 version: 1;
 patterns: SuccessPattern[]; // max 200, FIFO eviction
 stats: {
 total_captured: number;
 recommendations_made: number;
 recommendations_accepted: number;
 last_updated: string;
 };
}
interface TemplateRecommendation {
 template: string; // recommended template name
 confidence: number; // 0-100 percentage
 similar_count: number; // how many similar patterns matched
 success_rate: number; // success rate of matched patterns with this template
 rationale: string; // human-readable explanation
 matched_patterns: string[]; // IDs of matched patterns (for tracking)
}

Data Flow

Capture (pipeline completion):

pipeline completes → memory_finalize_pipeline() → success_capture_pattern()
 → reads pipeline-state.md (goal, template, stages, status)
 → reads pipeline-artifacts/ (duration, cost from events)
 → extracts keywords from goal text + domain expansion
 → writes to success-patterns.json (atomic tmp+mv)
 → emits "success.pattern_captured" event

Recommend (pipeline start):

select_pipeline_template() → success_recommend_template()
 → extracts keywords from new issue goal/title
 → scores each stored pattern by keyword overlap (TF-IDF-like)
 → groups top matches by template
 → calculates confidence = (matching_successes / total_matches) * score_weight
 → returns recommendation with rationale
 → emits "success.recommendation" event

Display (pipeline start):

sw-pipeline.sh intake stage → success_display_recommendation()
 → calls success_recommend_template()
 → formats boxed output with template, confidence, rationale
 → records recommendation in pipeline-state.md metadata

Track acceptance (pipeline completion):

pipeline completes → success_track_acceptance()
 → compares recommended template vs actually-used template
 → updates stats.recommendations_accepted counter
 → emits "success.recommendation_accepted" / "success.recommendation_rejected" event

Error Boundaries

success_capture_pattern(): All errors caught with || true, never fails the pipeline
success_recommend_template(): Returns empty string on any error (caller falls through to next selection method)
success_display_recommendation(): Silent on error (informational only)
JSON parse errors: Reinitialize file with empty structure
File locking: Uses atomic tmp+mv pattern (consistent with rest of codebase)

User Stories

Primary: As a shipwright operator, I want the system to learn from successful pipelines and recommend the best template for new issues, so that I get faster, more reliable pipeline runs without manual template selection.

Secondary: As a shipwright developer, I want to see why a template was recommended and how often recommendations are accepted, so that I can trust and tune the recommendation engine.

Acceptance Criteria (Given/When/Then)

Given a pipeline completes successfully, When memory_finalize_pipeline() runs, Then a success pattern is captured with goal, template, labels, complexity, duration, and keywords in success-patterns.json.
Given 3+ success patterns exist, When a new pipeline starts with a similar goal, Then a template recommendation is displayed with confidence score and rationale (e.g., "12 similar issues succeeded with 'fast' template").
Given no success patterns exist (cold start), When a pipeline starts, Then no recommendation is shown and template selection falls through to existing methods.
Given a recommendation was made, When the pipeline completes, Then the system records whether the recommendation was accepted (template matched) and updates acceptance stats.

Edge Cases from User Perspective

Empty state (cold start): No success-patterns.json exists → function returns empty, no recommendation shown, existing template selection logic takes over.
All patterns are from one template: Confidence will be high for that template. This is correct behavior — if all successes used "standard", recommend "standard".
Goal text is very short/generic: Domain keyword expansion will help, but minimum match score threshold (3+ keyword matches) prevents noisy recommendations.

Endpoint Specification (CLI subcommands)

Since this is a CLI/shell tool (not REST API), the "endpoints" are subcommands:

Subcommand	Input	Output	Errors
`capture <state_file> <artifacts_dir>`	Pipeline state + artifacts	Success message	Missing state file, parse error
`recommend <goal> [labels] [complexity]`	Issue text + metadata	JSON recommendation	No patterns, no match
`show`	None (reads success-patterns.json)	Formatted stats table	File not found
`stats`	None	JSON acceptance stats	File not found
`accept <recommendation_id> <accepted>`	Tracking data	Updated stats	Invalid ID

Error format: All errors go to stderr via error() helper, return code 1.

Rate limiting: N/A (local CLI tool).

Versioning: Schema version: 1 in success-patterns.json for future migration.

Files to Modify

New Files

scripts/sw-success-patterns.sh — Core success pattern engine (~300 lines)
scripts/sw-success-patterns-test.sh — Test suite (~250 lines)

Modified Files

scripts/sw-memory.sh — Source new script, call success_capture_pattern() from memory_finalize_pipeline()
scripts/lib/daemon-triage.sh — Add success-pattern recommendation layer in select_pipeline_template()
scripts/sw-pipeline.sh — Display recommendation at pipeline start (intake stage)
config/event-schema.json — Add new event types for success patterns

Implementation Steps

Step 1: Create `scripts/sw-success-patterns.sh`

Core script with these functions:

# ─── Storage ──────────────────────────────────────────────
_success_patterns_file() # Returns path to success-patterns.json for current repo
_success_patterns_init() # Initialize empty file if missing
_success_extract_keywords() # Extract keywords from goal text + domain expansion
# ─── Capture ──────────────────────────────────────────────
success_capture_pattern() # Capture success pattern after pipeline completion
 # Args: state_file, artifacts_dir
 # Reads: pipeline status, goal, template, stages, duration, cost
 # Writes: success-patterns.json (atomic tmp+mv, cap at 200)
 # Emits: success.pattern_captured event
# ─── Recommendation ──────────────────────────────────────
success_recommend_template() # Recommend template based on similar past successes
 # Args: goal, labels (optional), complexity (optional)
 # Returns: JSON {template, confidence, similar_count, success_rate, rationale}
 # Minimum: 3 patterns in store, 2+ keyword matches for similarity
 # Emits: success.recommendation event
success_display_recommendation() # Pretty-print recommendation for pipeline start
 # Args: goal, labels, complexity
 # Output: Boxed recommendation with confidence and rationale
# ─── Tracking ─────────────────────────────────────────────
success_track_acceptance() # Record whether recommendation was accepted
 # Args: recommended_template, actual_template
 # Updates: stats counters in success-patterns.json
 # Emits: success.recommendation_accepted or success.recommendation_rejected
# ─── Display ──────────────────────────────────────────────
success_show_stats() # Display success pattern statistics
 # Output: Pattern count, top templates, acceptance rate
# ─── CLI ──────────────────────────────────────────────────
# Subcommands: capture, recommend, show, stats, accept

Step 2: Add event types to `config/event-schema.json`

"success.pattern_captured": {
 "required": ["pattern_id"],
 "optional": ["template", "goal", "repo", "keywords"]
},
"success.recommendation": {
 "required": ["template", "confidence"],
 "optional": ["similar_count", "success_rate", "goal"]
},
"success.recommendation_accepted": {
 "required": ["template"],
 "optional": ["recommended_template", "confidence"]
},
"success.recommendation_rejected": {
 "required": ["template"],
 "optional": ["recommended_template", "actual_template", "confidence"]
}

Step 3: Integrate capture into `sw-memory.sh`

In memory_finalize_pipeline(), after Step 2 (failure capture), add Step 2.5:

# Step 2.5: Capture success pattern if pipeline succeeded
local _pipeline_status
_pipeline_status=$(sed -n 's/^status: *//p' "$state_file" | head -1)
if [[ "$_pipeline_status" == "complete" ]] && type success_capture_pattern >/dev/null 2>&1; then
 success_capture_pattern "$state_file" "$artifacts_dir" 2>/dev/null || true
fi

Also source the new script near the top of sw-memory.sh.

Step 4: Integrate recommendation into `daemon-triage.sh`

In select_pipeline_template(), add a new layer after the quality memory section and before Thompson sampling:

# ── Success pattern recommendation ──
if type success_recommend_template >/dev/null 2>&1; then
 local _sp_goal="${INTELLIGENCE_GOAL:-}"
 local _sp_labels="$labels"
 local _sp_complexity="medium"
 [[ "$score" -ge 70 ]] && _sp_complexity="low"
 [[ "$score" -lt 40 ]] && _sp_complexity="high"
 local _sp_rec
 _sp_rec=$(success_recommend_template "$_sp_goal" "$_sp_labels" "$_sp_complexity" 2>/dev/null || echo "")
 if [[ -n "$_sp_rec" ]]; then
 local _sp_template _sp_confidence
 _sp_template=$(echo "$_sp_rec" | jq -r '.template // ""' 2>/dev/null || echo "")
 _sp_confidence=$(echo "$_sp_rec" | jq -r '.confidence // 0' 2>/dev/null || echo "0")
 if [[ -n "$_sp_template" && "$_sp_confidence" -ge 60 ]]; then
 daemon_log INFO "Success patterns: $_sp_template (confidence=${_sp_confidence}%)" >&2
 echo "$_sp_template"
 return
 fi
 fi
fi

Step 5: Display recommendation at pipeline start

In sw-pipeline.sh, during the intake stage, after issue analysis but before template selection:

# Display success pattern recommendation (informational)
if type success_display_recommendation >/dev/null 2>&1; then
 success_display_recommendation "$goal" "${labels:-}" "${complexity:-medium}" 2>/dev/null || true
fi

Step 6: Track acceptance at pipeline completion

In sw-pipeline.sh, at pipeline completion (where memory_finalize_pipeline is called), add:

# Track recommendation acceptance
if type success_track_acceptance >/dev/null 2>&1; then
 local _rec_template
 _rec_template=$(grep -o 'recommended_template:[^ ]*' "$STATE_FILE" 2>/dev/null | cut -d: -f2 || true)
 if [[ -n "$_rec_template" ]]; then
 success_track_acceptance "$_rec_template" "$PIPELINE_TEMPLATE" 2>/dev/null || true
 fi
fi

Step 7: Write test suite `sw-success-patterns-test.sh`

Tests:

test_success_patterns_init — File creation with correct schema
test_capture_success_pattern — Capture from mock pipeline state
test_capture_skips_failed_pipeline — No capture when pipeline failed
test_keyword_extraction — Keywords extracted and expanded correctly
test_recommend_cold_start — Empty result when < 3 patterns
test_recommend_similar_issue — Correct template recommended for similar goal
test_recommend_confidence_calculation — Confidence reflects match quality
test_recommend_minimum_matches — No recommendation below match threshold
test_pattern_cap_200 — Oldest patterns evicted at cap
test_track_acceptance — Stats updated on accept/reject
test_show_stats — Stats display works with data
test_cli_subcommands — All subcommands dispatch correctly

Task Checklist

Task 1: Create scripts/sw-success-patterns.sh with storage helpers, keyword extraction, and file initialization
Task 2: Implement success_capture_pattern() — reads pipeline state, extracts metadata, writes to success-patterns.json
Task 3: Implement success_recommend_template() — TF-IDF keyword matching, confidence scoring, template grouping
Task 4: Implement success_display_recommendation() — formatted output with confidence and rationale
Task 5: Implement success_track_acceptance() and success_show_stats()
Task 6: Add CLI subcommand dispatcher (capture, recommend, show, stats, accept)
Task 7: Add event types to config/event-schema.json
Task 8: Integrate capture into sw-memory.sh memory_finalize_pipeline()
Task 9: Integrate recommendation into daemon-triage.sh select_pipeline_template()
Task 10: Integrate display at pipeline start in sw-pipeline.sh
Task 11: Integrate acceptance tracking at pipeline completion in sw-pipeline.sh
Task 12: Write test suite scripts/sw-success-patterns-test.sh
Task 13: Register test in package.json test scripts
Task 14: Run full test suite and fix any regressions

Testing Approach

Unit tests (sw-success-patterns-test.sh): Mock pipeline state files, test each function in isolation with temp directories
Integration: Verify memory_finalize_pipeline() calls capture, select_pipeline_template() uses recommendation
Existing test suites: Run sw-memory-test.sh, sw-lib-daemon-triage-test.sh, sw-pipeline-test.sh to verify no regressions
Full suite: npm test must pass

Definition of Done

Success patterns captured after every successful pipeline completion
Patterns stored in ~/.shipwright/memory/<repo>/success-patterns.json with documented schema
TF-IDF keyword matching recommends template from similar past successes
Recommendation displayed at pipeline start with confidence score and rationale
Confidence threshold (60%) prevents noisy recommendations
Acceptance rate tracked and available via success_show_stats()
Events emitted for capture, recommendation, acceptance/rejection
Cold start handled gracefully (no recommendation when < 3 patterns)
Pattern cap at 200 prevents unbounded growth
All new code is Bash 3.2 compatible
Test suite passes with 12+ test cases
Full npm test passes with no regressions

Alternatives Considered

Approach	Complexity	Performance	Maintainability	Blast Radius
A: Standalone script (chosen)	Low	Good (lazy-sourced)	High (isolated)	Minimal
B: Extend sw-memory.sh	Low	Good	Medium (2100+ line file)	Medium
C: Extend sw-self-optimize.sh	Medium	Good	Low (mixed concerns)	Medium
D: SQLite-only (no JSON)	High	Best	Medium (requires DB)	High
E: Embedding-based similarity	High	Slow (API calls)	Low	High

Approach A was chosen for minimal blast radius and clean testability. The existing TF-IDF keyword matching from sw-memory.sh is reused (pattern proven), avoiding the complexity of embeddings. JSON storage matches the existing dual-write pattern and works without SQLite.

Pipeline Plan 179

Implementation Plan: Success Pattern Learning & Template Recommendation Engine

Brainstorming: Socratic Design Refinement

Requirements Clarity

Design Alternatives

Risk Assessment

Dependency Analysis

Architecture Decision Record

Component Diagram

Interface Contracts

Data Flow

Error Boundaries

User Stories

Acceptance Criteria (Given/When/Then)

Edge Cases from User Perspective

Endpoint Specification (CLI subcommands)

Files to Modify

New Files

Modified Files

Implementation Steps

Step 1: Create scripts/sw-success-patterns.sh

Step 2: Add event types to config/event-schema.json

Step 3: Integrate capture into sw-memory.sh

Step 4: Integrate recommendation into daemon-triage.sh

Step 5: Display recommendation at pipeline start

Step 6: Track acceptance at pipeline completion

Step 7: Write test suite sw-success-patterns-test.sh

Task Checklist

Testing Approach

Definition of Done

Alternatives Considered

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Step 1: Create `scripts/sw-success-patterns.sh`

Step 2: Add event types to `config/event-schema.json`

Step 3: Integrate capture into `sw-memory.sh`

Step 4: Integrate recommendation into `daemon-triage.sh`

Step 7: Write test suite `sw-success-patterns-test.sh`