-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline Plan 179
Minimum viable change: Capture success patterns after pipeline completion, store them in ~/.shipwright/memory/<repo>/success-patterns.json, match new issues against historical successes using TF-IDF keyword similarity (already exists in memory_ranked_search()), and display a recommendation with confidence at pipeline start.
Implicit requirements:
- Must not break existing failure-only memory capture flow
- Must integrate with existing
select_pipeline_template()hierarchy indaemon-triage.sh - Must work without SQLite (JSON fallback, like all other persistence)
- Must be Bash 3.2 compatible (no associative arrays)
- Acceptance rate tracking requires recording whether the recommendation was followed
Acceptance criteria (from issue):
- Capture success patterns after each successful pipeline
- Similarity matching for new issues against historical successes
- Recommend template with confidence score
- Display recommendation at pipeline start with rationale
- Track recommendation acceptance rate
- Store in
~/.shipwright/memory/<repo>/success-patterns.json
Approach A: Standalone script (sw-success-patterns.sh)
- New script with capture/recommend/show/stats subcommands
- Called from
memory_finalize_pipeline()andselect_pipeline_template() - Pros: Clean separation, testable in isolation, follows existing pattern (sw-memory.sh, sw-intelligence.sh)
- Cons: Another script to source, slightly more overhead
Approach B: Extend sw-memory.sh with success functions
- Add
memory_capture_success_pattern(),memory_recommend_template()to existing memory script - Pros: Keeps memory concerns together, reuses
memory_ranked_search()directly - Cons: sw-memory.sh is already 2100+ lines, harder to test in isolation
Approach C: Extend sw-self-optimize.sh (Thompson sampling lives there)
- Add success patterns alongside existing Thompson sampling and template weights
- Pros: Template optimization is already there
- Cons: Different concern (learning vs optimization), self-optimize.sh is already complex
Decision: Approach A (standalone script) — Minimizes blast radius, follows the established pattern of modular scripts, and keeps success pattern logic testable in isolation. Integration points are surgical: hook into memory_finalize_pipeline() for capture, and into select_pipeline_template() for recommendation.
| Risk | Mitigation |
|---|---|
| Cold start (no historical data) | Graceful fallback — no recommendation when < 3 patterns stored |
| TF-IDF similarity too noisy | Use domain keyword expansion (already in sw-memory.sh), require minimum match score |
| Recommendation ignored adds noise | Track acceptance separately; don't count ignored recommendations against confidence |
| JSON file grows unbounded | Cap at 200 patterns, evict oldest when full (like failures.json) |
| Performance: jq over large JSON in hot path | Cache last recommendation per issue hash; patterns file stays small (200 cap) |
| Breaking existing template selection | Insert as a new layer in the hierarchy, after quality memory, before Thompson sampling |
Depends on:
-
scripts/lib/helpers.sh—emit_event(),info(),success(),warn(),now_iso() -
scripts/sw-memory.sh—repo_memory_dir(),ensure_memory_dir(),_expand_domain_keywords() -
scripts/sw-db.sh—db_record_outcome()(already captures template+success, we read from it) -
config/event-schema.json— needs new event types
Depended on by:
-
scripts/sw-memory.sh→memory_finalize_pipeline()calls capture -
scripts/lib/daemon-triage.sh→select_pipeline_template()calls recommend -
scripts/sw-pipeline.sh→ displays recommendation at pipeline start
No circular dependency risks — new script is a leaf that's sourced by existing scripts.
┌─────────────────────────────────────────────────────────────┐
│ Pipeline Lifecycle │
│ │
│ pipeline start ──► display_recommendation() │
│ │ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ sw-success-patterns │ │
│ │ .sh (NEW) │ │
│ │ │ │
│ │ ●くろまる capture_success() │◄── memory_finalize() │
│ │ ●くろまる recommend() │◄── select_template() │
│ │ ●くろまる match_similar() │ │
│ │ ●くろまる track_acceptance()│◄── pipeline complete │
│ │ ●くろまる show_stats() │ │
│ └──────┬───────────────┘ │
│ │ │
│ ┌───────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ success-patterns events.jsonl pipeline_outcomes (DB) │
│ .json (per-repo) │
└─────────────────────────────────────────────────────────────┘
// success-patterns.json schema interface SuccessPattern { id: string; // sha256 hash of goal text goal: string; // pipeline goal/issue title labels: string[]; // issue labels template: string; // template used (fast, standard, full, etc.) complexity: string; // low | medium | high duration_secs: number; // pipeline duration cost_usd: number; // pipeline cost stage_results: Record<string, "complete" | "skipped">; keywords: string[]; // extracted + expanded keywords for matching issue_number: number | null; // GitHub issue number if available repo: string; // repo name captured_at: string; // ISO timestamp } interface SuccessPatternsFile { version: 1; patterns: SuccessPattern[]; // max 200, FIFO eviction stats: { total_captured: number; recommendations_made: number; recommendations_accepted: number; last_updated: string; }; } interface TemplateRecommendation { template: string; // recommended template name confidence: number; // 0-100 percentage similar_count: number; // how many similar patterns matched success_rate: number; // success rate of matched patterns with this template rationale: string; // human-readable explanation matched_patterns: string[]; // IDs of matched patterns (for tracking) }
-
Capture (pipeline completion):
pipeline completes → memory_finalize_pipeline() → success_capture_pattern() → reads pipeline-state.md (goal, template, stages, status) → reads pipeline-artifacts/ (duration, cost from events) → extracts keywords from goal text + domain expansion → writes to success-patterns.json (atomic tmp+mv) → emits "success.pattern_captured" event -
Recommend (pipeline start):
select_pipeline_template() → success_recommend_template() → extracts keywords from new issue goal/title → scores each stored pattern by keyword overlap (TF-IDF-like) → groups top matches by template → calculates confidence = (matching_successes / total_matches) * score_weight → returns recommendation with rationale → emits "success.recommendation" event -
Display (pipeline start):
sw-pipeline.sh intake stage → success_display_recommendation() → calls success_recommend_template() → formats boxed output with template, confidence, rationale → records recommendation in pipeline-state.md metadata -
Track acceptance (pipeline completion):
pipeline completes → success_track_acceptance() → compares recommended template vs actually-used template → updates stats.recommendations_accepted counter → emits "success.recommendation_accepted" / "success.recommendation_rejected" event
-
success_capture_pattern(): All errors caught with|| true, never fails the pipeline -
success_recommend_template(): Returns empty string on any error (caller falls through to next selection method) -
success_display_recommendation(): Silent on error (informational only) - JSON parse errors: Reinitialize file with empty structure
- File locking: Uses atomic tmp+mv pattern (consistent with rest of codebase)
Primary: As a shipwright operator, I want the system to learn from successful pipelines and recommend the best template for new issues, so that I get faster, more reliable pipeline runs without manual template selection.
Secondary: As a shipwright developer, I want to see why a template was recommended and how often recommendations are accepted, so that I can trust and tune the recommendation engine.
-
Given a pipeline completes successfully, When
memory_finalize_pipeline()runs, Then a success pattern is captured with goal, template, labels, complexity, duration, and keywords insuccess-patterns.json. -
Given 3+ success patterns exist, When a new pipeline starts with a similar goal, Then a template recommendation is displayed with confidence score and rationale (e.g., "12 similar issues succeeded with 'fast' template").
-
Given no success patterns exist (cold start), When a pipeline starts, Then no recommendation is shown and template selection falls through to existing methods.
-
Given a recommendation was made, When the pipeline completes, Then the system records whether the recommendation was accepted (template matched) and updates acceptance stats.
- Empty state (cold start): No success-patterns.json exists → function returns empty, no recommendation shown, existing template selection logic takes over.
- All patterns are from one template: Confidence will be high for that template. This is correct behavior — if all successes used "standard", recommend "standard".
- Goal text is very short/generic: Domain keyword expansion will help, but minimum match score threshold (3+ keyword matches) prevents noisy recommendations.
Since this is a CLI/shell tool (not REST API), the "endpoints" are subcommands:
| Subcommand | Input | Output | Errors |
|---|---|---|---|
capture <state_file> <artifacts_dir> |
Pipeline state + artifacts | Success message | Missing state file, parse error |
recommend <goal> [labels] [complexity] |
Issue text + metadata | JSON recommendation | No patterns, no match |
show |
None (reads success-patterns.json) | Formatted stats table | File not found |
stats |
None | JSON acceptance stats | File not found |
accept <recommendation_id> <accepted> |
Tracking data | Updated stats | Invalid ID |
Error format: All errors go to stderr via error() helper, return code 1.
Rate limiting: N/A (local CLI tool).
Versioning: Schema version: 1 in success-patterns.json for future migration.
-
scripts/sw-success-patterns.sh— Core success pattern engine (~300 lines) -
scripts/sw-success-patterns-test.sh— Test suite (~250 lines)
-
scripts/sw-memory.sh— Source new script, callsuccess_capture_pattern()frommemory_finalize_pipeline() -
scripts/lib/daemon-triage.sh— Add success-pattern recommendation layer inselect_pipeline_template() -
scripts/sw-pipeline.sh— Display recommendation at pipeline start (intake stage) -
config/event-schema.json— Add new event types for success patterns
Core script with these functions:
# ─── Storage ────────────────────────────────────────────── _success_patterns_file() # Returns path to success-patterns.json for current repo _success_patterns_init() # Initialize empty file if missing _success_extract_keywords() # Extract keywords from goal text + domain expansion # ─── Capture ────────────────────────────────────────────── success_capture_pattern() # Capture success pattern after pipeline completion # Args: state_file, artifacts_dir # Reads: pipeline status, goal, template, stages, duration, cost # Writes: success-patterns.json (atomic tmp+mv, cap at 200) # Emits: success.pattern_captured event # ─── Recommendation ────────────────────────────────────── success_recommend_template() # Recommend template based on similar past successes # Args: goal, labels (optional), complexity (optional) # Returns: JSON {template, confidence, similar_count, success_rate, rationale} # Minimum: 3 patterns in store, 2+ keyword matches for similarity # Emits: success.recommendation event success_display_recommendation() # Pretty-print recommendation for pipeline start # Args: goal, labels, complexity # Output: Boxed recommendation with confidence and rationale # ─── Tracking ───────────────────────────────────────────── success_track_acceptance() # Record whether recommendation was accepted # Args: recommended_template, actual_template # Updates: stats counters in success-patterns.json # Emits: success.recommendation_accepted or success.recommendation_rejected # ─── Display ────────────────────────────────────────────── success_show_stats() # Display success pattern statistics # Output: Pattern count, top templates, acceptance rate # ─── CLI ────────────────────────────────────────────────── # Subcommands: capture, recommend, show, stats, accept
"success.pattern_captured": { "required": ["pattern_id"], "optional": ["template", "goal", "repo", "keywords"] }, "success.recommendation": { "required": ["template", "confidence"], "optional": ["similar_count", "success_rate", "goal"] }, "success.recommendation_accepted": { "required": ["template"], "optional": ["recommended_template", "confidence"] }, "success.recommendation_rejected": { "required": ["template"], "optional": ["recommended_template", "actual_template", "confidence"] }
In memory_finalize_pipeline(), after Step 2 (failure capture), add Step 2.5:
# Step 2.5: Capture success pattern if pipeline succeeded local _pipeline_status _pipeline_status=$(sed -n 's/^status: *//p' "$state_file" | head -1) if [[ "$_pipeline_status" == "complete" ]] && type success_capture_pattern >/dev/null 2>&1; then success_capture_pattern "$state_file" "$artifacts_dir" 2>/dev/null || true fi
Also source the new script near the top of sw-memory.sh.
In select_pipeline_template(), add a new layer after the quality memory section and before Thompson sampling:
# ── Success pattern recommendation ── if type success_recommend_template >/dev/null 2>&1; then local _sp_goal="${INTELLIGENCE_GOAL:-}" local _sp_labels="$labels" local _sp_complexity="medium" [[ "$score" -ge 70 ]] && _sp_complexity="low" [[ "$score" -lt 40 ]] && _sp_complexity="high" local _sp_rec _sp_rec=$(success_recommend_template "$_sp_goal" "$_sp_labels" "$_sp_complexity" 2>/dev/null || echo "") if [[ -n "$_sp_rec" ]]; then local _sp_template _sp_confidence _sp_template=$(echo "$_sp_rec" | jq -r '.template // ""' 2>/dev/null || echo "") _sp_confidence=$(echo "$_sp_rec" | jq -r '.confidence // 0' 2>/dev/null || echo "0") if [[ -n "$_sp_template" && "$_sp_confidence" -ge 60 ]]; then daemon_log INFO "Success patterns: $_sp_template (confidence=${_sp_confidence}%)" >&2 echo "$_sp_template" return fi fi fi
In sw-pipeline.sh, during the intake stage, after issue analysis but before template selection:
# Display success pattern recommendation (informational) if type success_display_recommendation >/dev/null 2>&1; then success_display_recommendation "$goal" "${labels:-}" "${complexity:-medium}" 2>/dev/null || true fi
In sw-pipeline.sh, at pipeline completion (where memory_finalize_pipeline is called), add:
# Track recommendation acceptance if type success_track_acceptance >/dev/null 2>&1; then local _rec_template _rec_template=$(grep -o 'recommended_template:[^ ]*' "$STATE_FILE" 2>/dev/null | cut -d: -f2 || true) if [[ -n "$_rec_template" ]]; then success_track_acceptance "$_rec_template" "$PIPELINE_TEMPLATE" 2>/dev/null || true fi fi
Tests:
-
test_success_patterns_init— File creation with correct schema -
test_capture_success_pattern— Capture from mock pipeline state -
test_capture_skips_failed_pipeline— No capture when pipeline failed -
test_keyword_extraction— Keywords extracted and expanded correctly -
test_recommend_cold_start— Empty result when < 3 patterns -
test_recommend_similar_issue— Correct template recommended for similar goal -
test_recommend_confidence_calculation— Confidence reflects match quality -
test_recommend_minimum_matches— No recommendation below match threshold -
test_pattern_cap_200— Oldest patterns evicted at cap -
test_track_acceptance— Stats updated on accept/reject -
test_show_stats— Stats display works with data -
test_cli_subcommands— All subcommands dispatch correctly
- Task 1: Create
scripts/sw-success-patterns.shwith storage helpers, keyword extraction, and file initialization - Task 2: Implement
success_capture_pattern()— reads pipeline state, extracts metadata, writes to success-patterns.json - Task 3: Implement
success_recommend_template()— TF-IDF keyword matching, confidence scoring, template grouping - Task 4: Implement
success_display_recommendation()— formatted output with confidence and rationale - Task 5: Implement
success_track_acceptance()andsuccess_show_stats() - Task 6: Add CLI subcommand dispatcher (capture, recommend, show, stats, accept)
- Task 7: Add event types to
config/event-schema.json - Task 8: Integrate capture into
sw-memory.shmemory_finalize_pipeline() - Task 9: Integrate recommendation into
daemon-triage.shselect_pipeline_template() - Task 10: Integrate display at pipeline start in
sw-pipeline.sh - Task 11: Integrate acceptance tracking at pipeline completion in
sw-pipeline.sh - Task 12: Write test suite
scripts/sw-success-patterns-test.sh - Task 13: Register test in
package.jsontest scripts - Task 14: Run full test suite and fix any regressions
-
Unit tests (
sw-success-patterns-test.sh): Mock pipeline state files, test each function in isolation with temp directories -
Integration: Verify
memory_finalize_pipeline()calls capture,select_pipeline_template()uses recommendation -
Existing test suites: Run
sw-memory-test.sh,sw-lib-daemon-triage-test.sh,sw-pipeline-test.shto verify no regressions -
Full suite:
npm testmust pass
- Success patterns captured after every successful pipeline completion
- Patterns stored in
~/.shipwright/memory/<repo>/success-patterns.jsonwith documented schema - TF-IDF keyword matching recommends template from similar past successes
- Recommendation displayed at pipeline start with confidence score and rationale
- Confidence threshold (60%) prevents noisy recommendations
- Acceptance rate tracked and available via
success_show_stats() - Events emitted for capture, recommendation, acceptance/rejection
- Cold start handled gracefully (no recommendation when < 3 patterns)
- Pattern cap at 200 prevents unbounded growth
- All new code is Bash 3.2 compatible
- Test suite passes with 12+ test cases
- Full
npm testpasses with no regressions
| Approach | Complexity | Performance | Maintainability | Blast Radius |
|---|---|---|---|---|
| A: Standalone script (chosen) | Low | Good (lazy-sourced) | High (isolated) | Minimal |
| B: Extend sw-memory.sh | Low | Good | Medium (2100+ line file) | Medium |
| C: Extend sw-self-optimize.sh | Medium | Good | Low (mixed concerns) | Medium |
| D: SQLite-only (no JSON) | High | Best | Medium (requires DB) | High |
| E: Embedding-based similarity | High | Slow (API calls) | Low | High |
Approach A was chosen for minimal blast radius and clean testability. The existing TF-IDF keyword matching from sw-memory.sh is reused (pattern proven), avoiding the complexity of embeddings. JSON storage matches the existing dual-write pattern and works without SQLite.