Pipeline Plan 175

ezigus edited this page Mar 17, 2026 · 1 revision

Implementation Plan: Issue Pre-flight Validation with Actionability Scoring

Socratic Design Refinement

Requirements Clarity

Minimum viable change: A validate_issue_quality() function in the intake stage that scores issue content on heuristics (description length, code references, acceptance criteria presence, vagueness detection) and blocks issues scoring below a configurable threshold (default: 60). Emits validation results to events.jsonl.

Implicit requirements:

Must not break existing pipelines that use --goal instead of --issue (goal-only pipelines skip validation)
Must integrate cleanly with the existing stage_intake() flow in pipeline-stages-intake.sh
Feedback must be posted back to the GitHub issue so the author knows what to fix
Must work without gh CLI (graceful degradation when GitHub is unavailable)

Acceptance criteria (from issue + inferred):

Issues with minimal content (< 50 chars body) score low and are blocked
Bug issues claiming code problems are checked for file/code references
Vague phrases ("make it better", "improve performance", "fix things") are detected and penalized
Score 0-100 with configurable threshold (default: 60)
Actionable feedback posted to GitHub issue explaining what's missing
Validation results emitted to events.jsonl
Bypass mechanism for urgent/override situations (label or flag)

Design Alternatives

Approach A: Standalone validation module (CHOSEN)

New file scripts/lib/issue-validation.sh with pure scoring functions
Called from stage_intake() after issue metadata is fetched
Pros: Clean separation, independently testable, minimal blast radius
Cons: One more file to source

Approach B: Inline in stage_intake()

Add validation logic directly into pipeline-stages-intake.sh
Pros: No new files
Cons: Bloats an already 116-line function, harder to test individual heuristics

Approach C: AI-powered validation (call Claude to score)

Use ai_run_json() to have Claude evaluate issue quality
Pros: More nuanced understanding of requirements
Cons: Adds latency + cost to every pipeline start, defeats the purpose of preventing wasted budget

Decision: Approach A — standalone module. Minimal blast radius, independently testable, follows the existing scripts/lib/*.sh module pattern with double-source guards.

Risk Assessment

False positives blocking good issues: Mitigated by tunable threshold and bypass label (skip-validation)
Breaking existing --goal pipelines: Goal-only pipelines skip validation entirely (no ISSUE_BODY to validate)
Breaking daemon auto-processing: Daemon spawns pipelines with --issue; validation will apply. If an issue is blocked, the daemon should mark it and move on (not retry endlessly). We'll emit a specific event type for this.
Bash 3.2 compatibility: No associative arrays, no ${var,,} — use tr '[:upper:]' '[:lower:]' for case conversion

Dependency Analysis

Depends on: helpers.sh (emit_event, output helpers), pipeline-github.sh (gh_comment_issue for feedback)
Called by: pipeline-stages-intake.sh (stage_intake function)
No circular dependency risk — new module is leaf-level, only depends on helpers

Files to Modify

New Files

scripts/lib/issue-validation.sh — Core validation + scoring logic
scripts/sw-issue-validation-test.sh — Test suite for validation

Modified Files

scripts/lib/pipeline-stages-intake.sh — Call validation after issue fetch
config/event-schema.json — Add new event types for validation
scripts/sw-pipeline-test.sh — Add integration test for validation in pipeline flow

Implementation Steps

Step 1: Create `scripts/lib/issue-validation.sh`

New module with double-source guard pattern. Functions:

validate_issue_quality(issue_body, issue_title, issue_labels) → returns 0 (pass) or 2 (fail)
 Sets globals: VALIDATION_SCORE, VALIDATION_FEEDBACK
_score_description_length(body) → 0-25 points
_score_structure(body) → 0-25 points
_score_specificity(body) → 0-25 points
_score_code_references(body, title) → 0-25 points
_detect_vague_phrases(body) → penalty (0 to -20)

Scoring breakdown (100 points max):

Description length (0-25): < 50 chars = 0, 50-150 = 10, 150-500 = 20, 500+ = 25
Structure (0-25): Has headings = +5, has bullet points/numbered lists = +5, has acceptance criteria section = +10, has code blocks = +5
Specificity (0-25): References specific files/paths = +10, mentions function/class names = +5, has error messages/stack traces = +5, has reproduction steps = +5
Code references (0-25): Contains file extensions (.js, .sh, .ts, etc.) = +10, references line numbers = +5, has code fences = +5, mentions specific directories = +5
Vagueness penalty (-20 to 0): Each vague phrase detected = -5 (capped at -20). Vague phrases: "make it better", "improve performance", "fix things", "clean up", "refactor everything", "make it work", "update stuff", "do something about"

Bypass conditions (skip validation, return score=100):

Issue has label skip-validation or hotfix or urgent
Pipeline started with --skip-gates flag
SHIPWRIGHT_SKIP_VALIDATION=1 env var

Threshold: Read from pipeline config intake.validation_threshold, default 60.

Step 2: Integrate into `stage_intake()`

After issue metadata is fetched (line 48 in current pipeline-stages-intake.sh) and before task type detection (line 51), insert:

# Issue quality validation
if [[ -n "$ISSUE_BODY" ]]; then
 if ! validate_issue_quality "$ISSUE_BODY" "$GOAL" "${ISSUE_LABELS:-}"; then
 # Post feedback to GitHub issue
 if [[ -n "$ISSUE_NUMBER" ]]; then
 local feedback_body="## Issue Validation Failed

**Actionability Score: ${VALIDATION_SCORE}/100** (threshold: ${VALIDATION_THRESHOLD})

${VALIDATION_FEEDBACK}

---
_Please update this issue with the requested information and re-apply the pipeline label._
_Generated by \`shipwright pipeline\` intake validation at $(now_iso)_"
 gh_comment_issue "$ISSUE_NUMBER" "$feedback_body"
 gh_add_labels "$ISSUE_NUMBER" "needs-info"
 fi
 emit_event "intake.validation_failed" \
 "issue=${ISSUE_NUMBER:-0}" \
 "score=$VALIDATION_SCORE" \
 "threshold=$VALIDATION_THRESHOLD"
 error "Issue #${ISSUE_NUMBER} failed actionability check (score: ${VALIDATION_SCORE}/${VALIDATION_THRESHOLD})"
 return 2
 fi
 emit_event "intake.validation_passed" \
 "issue=${ISSUE_NUMBER:-0}" \
 "score=$VALIDATION_SCORE"
fi

Step 3: Update event schema

Add to config/event-schema.json:

"intake.validation_passed": {
 "required": ["issue", "score"],
 "optional": ["threshold"]
},
"intake.validation_failed": {
 "required": ["issue", "score"],
 "optional": ["threshold", "feedback"]
}

Step 4: Write tests in `scripts/sw-issue-validation-test.sh`

Test cases:

Empty body → score 0, validation fails
Minimal body (< 50 chars) → low score, fails
Well-structured issue with acceptance criteria → high score, passes
Vague issue ("make it better") → penalized, likely fails
Bug issue with code references → scores well on specificity
Issue with skip-validation label → bypasses, returns 100
Threshold override via config works
Feedback message contains actionable suggestions

Step 5: Add pipeline integration test

In sw-pipeline-test.sh, add test_intake_validation_blocks_low_quality that:

Sets up mock gh to return an issue with minimal body
Runs pipeline with --issue 999
Asserts exit code 2 and output contains "failed actionability check"

Task Checklist

Task 1: Create scripts/lib/issue-validation.sh with double-source guard, scoring functions, and validate_issue_quality() entry point
Task 2: Implement _score_description_length() — length-based scoring (0-25 points)
Task 3: Implement _score_structure() — structural quality detection (headings, lists, acceptance criteria, code blocks)
Task 4: Implement _score_specificity() — file references, function names, error messages, repro steps
Task 5: Implement _score_code_references() — file extensions, line numbers, code fences, directories
Task 6: Implement _detect_vague_phrases() — pattern matching for vague/non-actionable language
Task 7: Implement bypass logic (labels, flags, env vars) and threshold configuration
Task 8: Implement _build_validation_feedback() — generates human-readable feedback listing what's missing
Task 9: Integrate validate_issue_quality() into stage_intake() in pipeline-stages-intake.sh
Task 10: Add intake.validation_passed and intake.validation_failed to config/event-schema.json
Task 11: Create scripts/sw-issue-validation-test.sh with unit tests for all scoring functions
Task 12: Add integration test test_intake_validation_blocks_low_quality to sw-pipeline-test.sh
Task 13: Run full test suite (npm test) and fix any regressions

Testing Approach

Unit tests (sw-issue-validation-test.sh):

Source issue-validation.sh directly
Test each _score_* function with known inputs
Test composite scoring with realistic issue bodies
Test bypass conditions
Test feedback generation

Integration tests (sw-pipeline-test.sh):

Modify mock gh to return issue bodies with varying quality
Verify pipeline blocks on low-quality issues (exit 2)
Verify pipeline proceeds on high-quality issues
Verify events are emitted

Manual verification:

Run npm test to ensure no regressions
Verify with grep that events appear in correct format

Definition of Done

validate_issue_quality() scores issues 0-100 using heuristics
Issues scoring below threshold (default 60) are blocked in intake
Actionable feedback is posted to GitHub issue explaining deficiencies
Vague phrases are detected and penalized
Code/file references are rewarded in scoring
Bypass via skip-validation label, --skip-gates flag, or env var works
Events intake.validation_passed and intake.validation_failed emitted to events.jsonl
All new tests pass
All existing tests pass (npm test)
Bash 3.2 compatible (no associative arrays, no ${var,,})

Endpoint Specification

Not applicable — this is an internal pipeline stage, not an API endpoint.

Error Codes

Exit 0: Validation passed (score >= threshold)
Exit 2: Validation failed (score < threshold) — uses exit 2 per helpers.sh convention for "check condition failed"

Rate Limiting

Not applicable — runs once per pipeline invocation during intake.

Versioning

No API versioning needed. Event schema versioned via config/event-schema.json version field (currently "1").

User Stories

Primary: As a pipeline operator, I want issues to be validated before burning pipeline budget, so that low-quality issues are caught early with actionable feedback instead of wasting 12+ minutes of build time.

Secondary: As an issue author, I want clear feedback when my issue lacks actionable detail, so that I can improve it and re-submit without guessing what's missing.

Edge Cases from User Perspective

Empty state: Issue has no body at all (just a title) → Score 0, feedback says "Issue has no description"
Goal-only pipeline: --goal "Fix the thing" with no --issue → Validation skipped entirely (no issue body to validate)
Override state: Urgent hotfix needs to bypass → hotfix label or --skip-gates flag bypasses validation
Rich issue that scores perfectly: Issue with acceptance criteria, code references, reproduction steps → Score 90+, passes immediately
Borderline issue (score ~60): Issue with some structure but missing specifics → Passes at default threshold, but operator can raise threshold

Baseline Metrics

Not applicable — this is a new feature, not a performance optimization. No existing metrics to baseline against.

Optimization Targets

Validation should complete in < 1 second (pure bash string matching, no AI calls)
Zero additional API calls beyond existing GitHub issue fetch

Profiling Strategy

Not applicable — pure bash heuristics with no performance concerns.

Benchmark Plan

Not applicable — validation is instantaneous string processing.

Alternatives Considered

AI-powered validation (Claude scores the issue): More nuanced but adds ~30s latency and ~0ドル.05 per validation. Defeats the goal of preventing wasted budget. Rejected.
Inline validation in stage_intake(): Simpler but creates a 200+ line function, harder to test. Rejected in favor of standalone module.
External webhook/GitHub Action: Validates before pipeline starts. More decoupled but requires infrastructure setup and doesn't integrate with pipeline event system. Rejected.

Risk Analysis

Risk	Impact	Mitigation
False positives blocking good issues	Medium — blocks valid work	Configurable threshold + bypass labels + `--skip-gates`
Daemon retry loops on blocked issues	High — wastes daemon slots	Emit specific event; daemon should skip (not retry) blocked issues
Bash 3.2 incompatibility	High — breaks on macOS	No associative arrays; use `tr` for case conversion; test on Bash 3.2
Breaking existing tests	Medium — CI failure	Run full suite before PR

Task Decomposition with Dependencies

Create issue-validation.sh module skeleton (blocks all others)
Implement _score_description_length() (blocked by 1)
Implement _score_structure() (blocked by 1)
Implement _score_specificity() (blocked by 1)
Implement _score_code_references() (blocked by 1)
Implement _detect_vague_phrases() (blocked by 1)
Implement bypass logic + threshold config (blocked by 1)
Implement _build_validation_feedback() (blocked by 2-6)
Implement validate_issue_quality() entry point (blocked by 2-8)
Update event schema (independent)
Integrate into stage_intake() (blocked by 9, 10)
Write unit tests (blocked by 9)
Write integration test (blocked by 11)
Run full test suite (blocked by 12, 13)

Pipeline Plan 175

Implementation Plan: Issue Pre-flight Validation with Actionability Scoring

Socratic Design Refinement

Requirements Clarity

Design Alternatives

Risk Assessment

Dependency Analysis

Files to Modify

New Files

Modified Files

Implementation Steps

Step 1: Create scripts/lib/issue-validation.sh

Step 2: Integrate into stage_intake()

Step 3: Update event schema

Step 4: Write tests in scripts/sw-issue-validation-test.sh

Step 5: Add pipeline integration test

Task Checklist

Testing Approach

Definition of Done

Endpoint Specification

Error Codes

Rate Limiting

Versioning

User Stories

Edge Cases from User Perspective

Baseline Metrics

Optimization Targets

Profiling Strategy

Benchmark Plan

Alternatives Considered

Risk Analysis

Task Decomposition with Dependencies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Step 1: Create `scripts/lib/issue-validation.sh`

Step 2: Integrate into `stage_intake()`

Step 4: Write tests in `scripts/sw-issue-validation-test.sh`