Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Plan 179

ezigus edited this page Mar 16, 2026 · 3 revisions

Implementation Plan: Success Pattern Learning & Template Recommendation Engine

Brainstorming: Socratic Design Refinement

Requirements Clarity

Minimum viable change: Capture success patterns after pipeline completion, store them in ~/.shipwright/memory/<repo>/success-patterns.json, match new issues against historical successes using TF-IDF keyword similarity (already exists in memory_ranked_search()), and display a recommendation with confidence at pipeline start.

Implicit requirements:

  • Must not break existing failure-only memory capture flow
  • Must integrate with existing select_pipeline_template() hierarchy in daemon-triage.sh
  • Must work without SQLite (JSON fallback, like all other persistence)
  • Must be Bash 3.2 compatible (no associative arrays)
  • Acceptance rate tracking requires recording whether the recommendation was followed

Acceptance criteria (from issue):

  1. Capture success patterns after each successful pipeline
  2. Similarity matching for new issues against historical successes
  3. Recommend template with confidence score
  4. Display recommendation at pipeline start with rationale
  5. Track recommendation acceptance rate
  6. Store in ~/.shipwright/memory/<repo>/success-patterns.json

Design Alternatives

Approach A: Standalone script (sw-success-patterns.sh)

  • New script with capture/recommend/show/stats subcommands
  • Called from memory_finalize_pipeline() and select_pipeline_template()
  • Pros: Clean separation, testable in isolation, follows existing pattern (sw-memory.sh, sw-intelligence.sh)
  • Cons: Another script to source, slightly more overhead

Approach B: Extend sw-memory.sh with success functions

  • Add memory_capture_success_pattern(), memory_recommend_template() to existing memory script
  • Pros: Keeps memory concerns together, reuses memory_ranked_search() directly
  • Cons: sw-memory.sh is already 2100+ lines, harder to test in isolation

Approach C: Extend sw-self-optimize.sh (Thompson sampling lives there)

  • Add success patterns alongside existing Thompson sampling and template weights
  • Pros: Template optimization is already there
  • Cons: Different concern (learning vs optimization), self-optimize.sh is already complex

Decision: Approach A (standalone script) — Minimizes blast radius, follows the established pattern of modular scripts, and keeps success pattern logic testable in isolation. Integration points are surgical: hook into memory_finalize_pipeline() for capture, and into select_pipeline_template() for recommendation.

Risk Assessment

Risk Mitigation
Cold start (no historical data) Graceful fallback — no recommendation when < 3 patterns stored
TF-IDF similarity too noisy Use domain keyword expansion (already in sw-memory.sh), require minimum match score
Recommendation ignored adds noise Track acceptance separately; don't count ignored recommendations against confidence
JSON file grows unbounded Cap at 200 patterns, evict oldest when full (like failures.json)
Performance: jq over large JSON in hot path Cache last recommendation per issue hash; patterns file stays small (200 cap)
Breaking existing template selection Insert as a new layer in the hierarchy, after quality memory, before Thompson sampling

Dependency Analysis

Depends on:

  • scripts/lib/helpers.shemit_event(), info(), success(), warn(), now_iso()
  • scripts/sw-memory.shrepo_memory_dir(), ensure_memory_dir(), _expand_domain_keywords()
  • scripts/sw-db.shdb_record_outcome() (already captures template+success, we read from it)
  • config/event-schema.json — needs new event types

Depended on by:

  • scripts/sw-memory.shmemory_finalize_pipeline() calls capture
  • scripts/lib/daemon-triage.shselect_pipeline_template() calls recommend
  • scripts/sw-pipeline.sh → displays recommendation at pipeline start

No circular dependency risks — new script is a leaf that's sourced by existing scripts.


Architecture Decision Record

Component Diagram

┌─────────────────────────────────────────────────────────────┐
│ Pipeline Lifecycle │
│ │
│ pipeline start ──► display_recommendation() │
│ │ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ sw-success-patterns │ │
│ │ .sh (NEW) │ │
│ │ │ │
│ │ くろまる capture_success() │◄── memory_finalize() │
│ │ くろまる recommend() │◄── select_template() │
│ │ くろまる match_similar() │ │
│ │ くろまる track_acceptance()│◄── pipeline complete │
│ │ くろまる show_stats() │ │
│ └──────┬───────────────┘ │
│ │ │
│ ┌───────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ success-patterns events.jsonl pipeline_outcomes (DB) │
│ .json (per-repo) │
└─────────────────────────────────────────────────────────────┘

Interface Contracts

// success-patterns.json schema
interface SuccessPattern {
 id: string; // sha256 hash of goal text
 goal: string; // pipeline goal/issue title
 labels: string[]; // issue labels
 template: string; // template used (fast, standard, full, etc.)
 complexity: string; // low | medium | high
 duration_secs: number; // pipeline duration
 cost_usd: number; // pipeline cost
 stage_results: Record<string, "complete" | "skipped">;
 keywords: string[]; // extracted + expanded keywords for matching
 issue_number: number | null; // GitHub issue number if available
 repo: string; // repo name
 captured_at: string; // ISO timestamp
}
interface SuccessPatternsFile {
 version: 1;
 patterns: SuccessPattern[]; // max 200, FIFO eviction
 stats: {
 total_captured: number;
 recommendations_made: number;
 recommendations_accepted: number;
 last_updated: string;
 };
}
interface TemplateRecommendation {
 template: string; // recommended template name
 confidence: number; // 0-100 percentage
 similar_count: number; // how many similar patterns matched
 success_rate: number; // success rate of matched patterns with this template
 rationale: string; // human-readable explanation
 matched_patterns: string[]; // IDs of matched patterns (for tracking)
}

Data Flow

  1. Capture (pipeline completion):

    pipeline completes → memory_finalize_pipeline() → success_capture_pattern()
     → reads pipeline-state.md (goal, template, stages, status)
     → reads pipeline-artifacts/ (duration, cost from events)
     → extracts keywords from goal text + domain expansion
     → writes to success-patterns.json (atomic tmp+mv)
     → emits "success.pattern_captured" event
    
  2. Recommend (pipeline start):

    select_pipeline_template() → success_recommend_template()
     → extracts keywords from new issue goal/title
     → scores each stored pattern by keyword overlap (TF-IDF-like)
     → groups top matches by template
     → calculates confidence = (matching_successes / total_matches) * score_weight
     → returns recommendation with rationale
     → emits "success.recommendation" event
    
  3. Display (pipeline start):

    sw-pipeline.sh intake stage → success_display_recommendation()
     → calls success_recommend_template()
     → formats boxed output with template, confidence, rationale
     → records recommendation in pipeline-state.md metadata
    
  4. Track acceptance (pipeline completion):

    pipeline completes → success_track_acceptance()
     → compares recommended template vs actually-used template
     → updates stats.recommendations_accepted counter
     → emits "success.recommendation_accepted" / "success.recommendation_rejected" event
    

Error Boundaries

  • success_capture_pattern(): All errors caught with || true, never fails the pipeline
  • success_recommend_template(): Returns empty string on any error (caller falls through to next selection method)
  • success_display_recommendation(): Silent on error (informational only)
  • JSON parse errors: Reinitialize file with empty structure
  • File locking: Uses atomic tmp+mv pattern (consistent with rest of codebase)

User Stories

Primary: As a shipwright operator, I want the system to learn from successful pipelines and recommend the best template for new issues, so that I get faster, more reliable pipeline runs without manual template selection.

Secondary: As a shipwright developer, I want to see why a template was recommended and how often recommendations are accepted, so that I can trust and tune the recommendation engine.

Acceptance Criteria (Given/When/Then)

  1. Given a pipeline completes successfully, When memory_finalize_pipeline() runs, Then a success pattern is captured with goal, template, labels, complexity, duration, and keywords in success-patterns.json.

  2. Given 3+ success patterns exist, When a new pipeline starts with a similar goal, Then a template recommendation is displayed with confidence score and rationale (e.g., "12 similar issues succeeded with 'fast' template").

  3. Given no success patterns exist (cold start), When a pipeline starts, Then no recommendation is shown and template selection falls through to existing methods.

  4. Given a recommendation was made, When the pipeline completes, Then the system records whether the recommendation was accepted (template matched) and updates acceptance stats.

Edge Cases from User Perspective

  1. Empty state (cold start): No success-patterns.json exists → function returns empty, no recommendation shown, existing template selection logic takes over.
  2. All patterns are from one template: Confidence will be high for that template. This is correct behavior — if all successes used "standard", recommend "standard".
  3. Goal text is very short/generic: Domain keyword expansion will help, but minimum match score threshold (3+ keyword matches) prevents noisy recommendations.

Endpoint Specification (CLI subcommands)

Since this is a CLI/shell tool (not REST API), the "endpoints" are subcommands:

Subcommand Input Output Errors
capture <state_file> <artifacts_dir> Pipeline state + artifacts Success message Missing state file, parse error
recommend <goal> [labels] [complexity] Issue text + metadata JSON recommendation No patterns, no match
show None (reads success-patterns.json) Formatted stats table File not found
stats None JSON acceptance stats File not found
accept <recommendation_id> <accepted> Tracking data Updated stats Invalid ID

Error format: All errors go to stderr via error() helper, return code 1.

Rate limiting: N/A (local CLI tool).

Versioning: Schema version: 1 in success-patterns.json for future migration.


Files to Modify

New Files

  1. scripts/sw-success-patterns.sh — Core success pattern engine (~300 lines)
  2. scripts/sw-success-patterns-test.sh — Test suite (~250 lines)

Modified Files

  1. scripts/sw-memory.sh — Source new script, call success_capture_pattern() from memory_finalize_pipeline()
  2. scripts/lib/daemon-triage.sh — Add success-pattern recommendation layer in select_pipeline_template()
  3. scripts/sw-pipeline.sh — Display recommendation at pipeline start (intake stage)
  4. config/event-schema.json — Add new event types for success patterns

Implementation Steps

Step 1: Create scripts/sw-success-patterns.sh

Core script with these functions:

# ─── Storage ──────────────────────────────────────────────
_success_patterns_file() # Returns path to success-patterns.json for current repo
_success_patterns_init() # Initialize empty file if missing
_success_extract_keywords() # Extract keywords from goal text + domain expansion
# ─── Capture ──────────────────────────────────────────────
success_capture_pattern() # Capture success pattern after pipeline completion
 # Args: state_file, artifacts_dir
 # Reads: pipeline status, goal, template, stages, duration, cost
 # Writes: success-patterns.json (atomic tmp+mv, cap at 200)
 # Emits: success.pattern_captured event
# ─── Recommendation ──────────────────────────────────────
success_recommend_template() # Recommend template based on similar past successes
 # Args: goal, labels (optional), complexity (optional)
 # Returns: JSON {template, confidence, similar_count, success_rate, rationale}
 # Minimum: 3 patterns in store, 2+ keyword matches for similarity
 # Emits: success.recommendation event
success_display_recommendation() # Pretty-print recommendation for pipeline start
 # Args: goal, labels, complexity
 # Output: Boxed recommendation with confidence and rationale
# ─── Tracking ─────────────────────────────────────────────
success_track_acceptance() # Record whether recommendation was accepted
 # Args: recommended_template, actual_template
 # Updates: stats counters in success-patterns.json
 # Emits: success.recommendation_accepted or success.recommendation_rejected
# ─── Display ──────────────────────────────────────────────
success_show_stats() # Display success pattern statistics
 # Output: Pattern count, top templates, acceptance rate
# ─── CLI ──────────────────────────────────────────────────
# Subcommands: capture, recommend, show, stats, accept

Step 2: Add event types to config/event-schema.json

"success.pattern_captured": {
 "required": ["pattern_id"],
 "optional": ["template", "goal", "repo", "keywords"]
},
"success.recommendation": {
 "required": ["template", "confidence"],
 "optional": ["similar_count", "success_rate", "goal"]
},
"success.recommendation_accepted": {
 "required": ["template"],
 "optional": ["recommended_template", "confidence"]
},
"success.recommendation_rejected": {
 "required": ["template"],
 "optional": ["recommended_template", "actual_template", "confidence"]
}

Step 3: Integrate capture into sw-memory.sh

In memory_finalize_pipeline(), after Step 2 (failure capture), add Step 2.5:

# Step 2.5: Capture success pattern if pipeline succeeded
local _pipeline_status
_pipeline_status=$(sed -n 's/^status: *//p' "$state_file" | head -1)
if [[ "$_pipeline_status" == "complete" ]] && type success_capture_pattern >/dev/null 2>&1; then
 success_capture_pattern "$state_file" "$artifacts_dir" 2>/dev/null || true
fi

Also source the new script near the top of sw-memory.sh.

Step 4: Integrate recommendation into daemon-triage.sh

In select_pipeline_template(), add a new layer after the quality memory section and before Thompson sampling:

# ── Success pattern recommendation ──
if type success_recommend_template >/dev/null 2>&1; then
 local _sp_goal="${INTELLIGENCE_GOAL:-}"
 local _sp_labels="$labels"
 local _sp_complexity="medium"
 [[ "$score" -ge 70 ]] && _sp_complexity="low"
 [[ "$score" -lt 40 ]] && _sp_complexity="high"
 local _sp_rec
 _sp_rec=$(success_recommend_template "$_sp_goal" "$_sp_labels" "$_sp_complexity" 2>/dev/null || echo "")
 if [[ -n "$_sp_rec" ]]; then
 local _sp_template _sp_confidence
 _sp_template=$(echo "$_sp_rec" | jq -r '.template // ""' 2>/dev/null || echo "")
 _sp_confidence=$(echo "$_sp_rec" | jq -r '.confidence // 0' 2>/dev/null || echo "0")
 if [[ -n "$_sp_template" && "$_sp_confidence" -ge 60 ]]; then
 daemon_log INFO "Success patterns: $_sp_template (confidence=${_sp_confidence}%)" >&2
 echo "$_sp_template"
 return
 fi
 fi
fi

Step 5: Display recommendation at pipeline start

In sw-pipeline.sh, during the intake stage, after issue analysis but before template selection:

# Display success pattern recommendation (informational)
if type success_display_recommendation >/dev/null 2>&1; then
 success_display_recommendation "$goal" "${labels:-}" "${complexity:-medium}" 2>/dev/null || true
fi

Step 6: Track acceptance at pipeline completion

In sw-pipeline.sh, at pipeline completion (where memory_finalize_pipeline is called), add:

# Track recommendation acceptance
if type success_track_acceptance >/dev/null 2>&1; then
 local _rec_template
 _rec_template=$(grep -o 'recommended_template:[^ ]*' "$STATE_FILE" 2>/dev/null | cut -d: -f2 || true)
 if [[ -n "$_rec_template" ]]; then
 success_track_acceptance "$_rec_template" "$PIPELINE_TEMPLATE" 2>/dev/null || true
 fi
fi

Step 7: Write test suite sw-success-patterns-test.sh

Tests:

  1. test_success_patterns_init — File creation with correct schema
  2. test_capture_success_pattern — Capture from mock pipeline state
  3. test_capture_skips_failed_pipeline — No capture when pipeline failed
  4. test_keyword_extraction — Keywords extracted and expanded correctly
  5. test_recommend_cold_start — Empty result when < 3 patterns
  6. test_recommend_similar_issue — Correct template recommended for similar goal
  7. test_recommend_confidence_calculation — Confidence reflects match quality
  8. test_recommend_minimum_matches — No recommendation below match threshold
  9. test_pattern_cap_200 — Oldest patterns evicted at cap
  10. test_track_acceptance — Stats updated on accept/reject
  11. test_show_stats — Stats display works with data
  12. test_cli_subcommands — All subcommands dispatch correctly

Task Checklist

  • Task 1: Create scripts/sw-success-patterns.sh with storage helpers, keyword extraction, and file initialization
  • Task 2: Implement success_capture_pattern() — reads pipeline state, extracts metadata, writes to success-patterns.json
  • Task 3: Implement success_recommend_template() — TF-IDF keyword matching, confidence scoring, template grouping
  • Task 4: Implement success_display_recommendation() — formatted output with confidence and rationale
  • Task 5: Implement success_track_acceptance() and success_show_stats()
  • Task 6: Add CLI subcommand dispatcher (capture, recommend, show, stats, accept)
  • Task 7: Add event types to config/event-schema.json
  • Task 8: Integrate capture into sw-memory.sh memory_finalize_pipeline()
  • Task 9: Integrate recommendation into daemon-triage.sh select_pipeline_template()
  • Task 10: Integrate display at pipeline start in sw-pipeline.sh
  • Task 11: Integrate acceptance tracking at pipeline completion in sw-pipeline.sh
  • Task 12: Write test suite scripts/sw-success-patterns-test.sh
  • Task 13: Register test in package.json test scripts
  • Task 14: Run full test suite and fix any regressions

Testing Approach

  1. Unit tests (sw-success-patterns-test.sh): Mock pipeline state files, test each function in isolation with temp directories
  2. Integration: Verify memory_finalize_pipeline() calls capture, select_pipeline_template() uses recommendation
  3. Existing test suites: Run sw-memory-test.sh, sw-lib-daemon-triage-test.sh, sw-pipeline-test.sh to verify no regressions
  4. Full suite: npm test must pass

Definition of Done

  • Success patterns captured after every successful pipeline completion
  • Patterns stored in ~/.shipwright/memory/<repo>/success-patterns.json with documented schema
  • TF-IDF keyword matching recommends template from similar past successes
  • Recommendation displayed at pipeline start with confidence score and rationale
  • Confidence threshold (60%) prevents noisy recommendations
  • Acceptance rate tracked and available via success_show_stats()
  • Events emitted for capture, recommendation, acceptance/rejection
  • Cold start handled gracefully (no recommendation when < 3 patterns)
  • Pattern cap at 200 prevents unbounded growth
  • All new code is Bash 3.2 compatible
  • Test suite passes with 12+ test cases
  • Full npm test passes with no regressions

Alternatives Considered

Approach Complexity Performance Maintainability Blast Radius
A: Standalone script (chosen) Low Good (lazy-sourced) High (isolated) Minimal
B: Extend sw-memory.sh Low Good Medium (2100+ line file) Medium
C: Extend sw-self-optimize.sh Medium Good Low (mixed concerns) Medium
D: SQLite-only (no JSON) High Best Medium (requires DB) High
E: Embedding-based similarity High Slow (API calls) Low High

Approach A was chosen for minimal blast radius and clean testability. The existing TF-IDF keyword matching from sw-memory.sh is reused (pattern proven), avoiding the complexity of embeddings. JSON storage matches the existing dual-write pattern and works without SQLite.

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /