-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline Plan 175
Minimum viable change: A validate_issue_quality() function in the intake stage that scores issue content on heuristics (description length, code references, acceptance criteria presence, vagueness detection) and blocks issues scoring below a configurable threshold (default: 60). Emits validation results to events.jsonl.
Implicit requirements:
- Must not break existing pipelines that use
--goalinstead of--issue(goal-only pipelines skip validation) - Must integrate cleanly with the existing
stage_intake()flow inpipeline-stages-intake.sh - Feedback must be posted back to the GitHub issue so the author knows what to fix
- Must work without
ghCLI (graceful degradation when GitHub is unavailable)
Acceptance criteria (from issue + inferred):
- Issues with minimal content (< 50 chars body) score low and are blocked
- Bug issues claiming code problems are checked for file/code references
- Vague phrases ("make it better", "improve performance", "fix things") are detected and penalized
- Score 0-100 with configurable threshold (default: 60)
- Actionable feedback posted to GitHub issue explaining what's missing
- Validation results emitted to events.jsonl
- Bypass mechanism for urgent/override situations (label or flag)
Approach A: Standalone validation module (CHOSEN)
- New file
scripts/lib/issue-validation.shwith pure scoring functions - Called from
stage_intake()after issue metadata is fetched - Pros: Clean separation, independently testable, minimal blast radius
- Cons: One more file to source
Approach B: Inline in stage_intake()
- Add validation logic directly into
pipeline-stages-intake.sh - Pros: No new files
- Cons: Bloats an already 116-line function, harder to test individual heuristics
Approach C: AI-powered validation (call Claude to score)
- Use
ai_run_json()to have Claude evaluate issue quality - Pros: More nuanced understanding of requirements
- Cons: Adds latency + cost to every pipeline start, defeats the purpose of preventing wasted budget
Decision: Approach A — standalone module. Minimal blast radius, independently testable, follows the existing scripts/lib/*.sh module pattern with double-source guards.
-
False positives blocking good issues: Mitigated by tunable threshold and bypass label (
skip-validation) -
Breaking existing
--goalpipelines: Goal-only pipelines skip validation entirely (no ISSUE_BODY to validate) -
Breaking daemon auto-processing: Daemon spawns pipelines with
--issue; validation will apply. If an issue is blocked, the daemon should mark it and move on (not retry endlessly). We'll emit a specific event type for this. -
Bash 3.2 compatibility: No associative arrays, no
${var,,}— usetr '[:upper:]' '[:lower:]'for case conversion
-
Depends on:
helpers.sh(emit_event, output helpers),pipeline-github.sh(gh_comment_issue for feedback) -
Called by:
pipeline-stages-intake.sh(stage_intake function) - No circular dependency risk — new module is leaf-level, only depends on helpers
-
scripts/lib/issue-validation.sh— Core validation + scoring logic -
scripts/sw-issue-validation-test.sh— Test suite for validation
-
scripts/lib/pipeline-stages-intake.sh— Call validation after issue fetch -
config/event-schema.json— Add new event types for validation -
scripts/sw-pipeline-test.sh— Add integration test for validation in pipeline flow
New module with double-source guard pattern. Functions:
validate_issue_quality(issue_body, issue_title, issue_labels) → returns 0 (pass) or 2 (fail)
Sets globals: VALIDATION_SCORE, VALIDATION_FEEDBACK
_score_description_length(body) → 0-25 points
_score_structure(body) → 0-25 points
_score_specificity(body) → 0-25 points
_score_code_references(body, title) → 0-25 points
_detect_vague_phrases(body) → penalty (0 to -20)
Scoring breakdown (100 points max):
- Description length (0-25): < 50 chars = 0, 50-150 = 10, 150-500 = 20, 500+ = 25
- Structure (0-25): Has headings = +5, has bullet points/numbered lists = +5, has acceptance criteria section = +10, has code blocks = +5
- Specificity (0-25): References specific files/paths = +10, mentions function/class names = +5, has error messages/stack traces = +5, has reproduction steps = +5
- Code references (0-25): Contains file extensions (.js, .sh, .ts, etc.) = +10, references line numbers = +5, has code fences = +5, mentions specific directories = +5
- Vagueness penalty (-20 to 0): Each vague phrase detected = -5 (capped at -20). Vague phrases: "make it better", "improve performance", "fix things", "clean up", "refactor everything", "make it work", "update stuff", "do something about"
Bypass conditions (skip validation, return score=100):
- Issue has label
skip-validationorhotfixorurgent - Pipeline started with
--skip-gatesflag -
SHIPWRIGHT_SKIP_VALIDATION=1env var
Threshold: Read from pipeline config intake.validation_threshold, default 60.
After issue metadata is fetched (line 48 in current pipeline-stages-intake.sh) and before task type detection (line 51), insert:
# Issue quality validation if [[ -n "$ISSUE_BODY" ]]; then if ! validate_issue_quality "$ISSUE_BODY" "$GOAL" "${ISSUE_LABELS:-}"; then # Post feedback to GitHub issue if [[ -n "$ISSUE_NUMBER" ]]; then local feedback_body="## Issue Validation Failed **Actionability Score: ${VALIDATION_SCORE}/100** (threshold: ${VALIDATION_THRESHOLD}) ${VALIDATION_FEEDBACK} --- _Please update this issue with the requested information and re-apply the pipeline label._ _Generated by \`shipwright pipeline\` intake validation at $(now_iso)_" gh_comment_issue "$ISSUE_NUMBER" "$feedback_body" gh_add_labels "$ISSUE_NUMBER" "needs-info" fi emit_event "intake.validation_failed" \ "issue=${ISSUE_NUMBER:-0}" \ "score=$VALIDATION_SCORE" \ "threshold=$VALIDATION_THRESHOLD" error "Issue #${ISSUE_NUMBER} failed actionability check (score: ${VALIDATION_SCORE}/${VALIDATION_THRESHOLD})" return 2 fi emit_event "intake.validation_passed" \ "issue=${ISSUE_NUMBER:-0}" \ "score=$VALIDATION_SCORE" fi
Add to config/event-schema.json:
"intake.validation_passed": { "required": ["issue", "score"], "optional": ["threshold"] }, "intake.validation_failed": { "required": ["issue", "score"], "optional": ["threshold", "feedback"] }
Test cases:
- Empty body → score 0, validation fails
- Minimal body (< 50 chars) → low score, fails
- Well-structured issue with acceptance criteria → high score, passes
- Vague issue ("make it better") → penalized, likely fails
- Bug issue with code references → scores well on specificity
- Issue with
skip-validationlabel → bypasses, returns 100 - Threshold override via config works
- Feedback message contains actionable suggestions
In sw-pipeline-test.sh, add test_intake_validation_blocks_low_quality that:
- Sets up mock gh to return an issue with minimal body
- Runs pipeline with
--issue 999 - Asserts exit code 2 and output contains "failed actionability check"
- Task 1: Create
scripts/lib/issue-validation.shwith double-source guard, scoring functions, andvalidate_issue_quality()entry point - Task 2: Implement
_score_description_length()— length-based scoring (0-25 points) - Task 3: Implement
_score_structure()— structural quality detection (headings, lists, acceptance criteria, code blocks) - Task 4: Implement
_score_specificity()— file references, function names, error messages, repro steps - Task 5: Implement
_score_code_references()— file extensions, line numbers, code fences, directories - Task 6: Implement
_detect_vague_phrases()— pattern matching for vague/non-actionable language - Task 7: Implement bypass logic (labels, flags, env vars) and threshold configuration
- Task 8: Implement
_build_validation_feedback()— generates human-readable feedback listing what's missing - Task 9: Integrate
validate_issue_quality()intostage_intake()inpipeline-stages-intake.sh - Task 10: Add
intake.validation_passedandintake.validation_failedtoconfig/event-schema.json - Task 11: Create
scripts/sw-issue-validation-test.shwith unit tests for all scoring functions - Task 12: Add integration test
test_intake_validation_blocks_low_qualitytosw-pipeline-test.sh - Task 13: Run full test suite (
npm test) and fix any regressions
Unit tests (sw-issue-validation-test.sh):
- Source
issue-validation.shdirectly - Test each
_score_*function with known inputs - Test composite scoring with realistic issue bodies
- Test bypass conditions
- Test feedback generation
Integration tests (sw-pipeline-test.sh):
- Modify mock
ghto return issue bodies with varying quality - Verify pipeline blocks on low-quality issues (exit 2)
- Verify pipeline proceeds on high-quality issues
- Verify events are emitted
Manual verification:
- Run
npm testto ensure no regressions - Verify with
grepthat events appear in correct format
-
validate_issue_quality()scores issues 0-100 using heuristics - Issues scoring below threshold (default 60) are blocked in intake
- Actionable feedback is posted to GitHub issue explaining deficiencies
- Vague phrases are detected and penalized
- Code/file references are rewarded in scoring
- Bypass via
skip-validationlabel,--skip-gatesflag, or env var works - Events
intake.validation_passedandintake.validation_failedemitted to events.jsonl - All new tests pass
- All existing tests pass (
npm test) - Bash 3.2 compatible (no associative arrays, no
${var,,})
Not applicable — this is an internal pipeline stage, not an API endpoint.
- Exit 0: Validation passed (score >= threshold)
- Exit 2: Validation failed (score < threshold) — uses exit 2 per helpers.sh convention for "check condition failed"
Not applicable — runs once per pipeline invocation during intake.
No API versioning needed. Event schema versioned via config/event-schema.json version field (currently "1").
Primary: As a pipeline operator, I want issues to be validated before burning pipeline budget, so that low-quality issues are caught early with actionable feedback instead of wasting 12+ minutes of build time.
Secondary: As an issue author, I want clear feedback when my issue lacks actionable detail, so that I can improve it and re-submit without guessing what's missing.
- Empty state: Issue has no body at all (just a title) → Score 0, feedback says "Issue has no description"
-
Goal-only pipeline:
--goal "Fix the thing"with no--issue→ Validation skipped entirely (no issue body to validate) -
Override state: Urgent hotfix needs to bypass →
hotfixlabel or--skip-gatesflag bypasses validation - Rich issue that scores perfectly: Issue with acceptance criteria, code references, reproduction steps → Score 90+, passes immediately
- Borderline issue (score ~60): Issue with some structure but missing specifics → Passes at default threshold, but operator can raise threshold
Not applicable — this is a new feature, not a performance optimization. No existing metrics to baseline against.
- Validation should complete in < 1 second (pure bash string matching, no AI calls)
- Zero additional API calls beyond existing GitHub issue fetch
Not applicable — pure bash heuristics with no performance concerns.
Not applicable — validation is instantaneous string processing.
- AI-powered validation (Claude scores the issue): More nuanced but adds ~30s latency and ~0ドル.05 per validation. Defeats the goal of preventing wasted budget. Rejected.
- Inline validation in stage_intake(): Simpler but creates a 200+ line function, harder to test. Rejected in favor of standalone module.
- External webhook/GitHub Action: Validates before pipeline starts. More decoupled but requires infrastructure setup and doesn't integrate with pipeline event system. Rejected.
| Risk | Impact | Mitigation |
|---|---|---|
| False positives blocking good issues | Medium — blocks valid work | Configurable threshold + bypass labels + --skip-gates
|
| Daemon retry loops on blocked issues | High — wastes daemon slots | Emit specific event; daemon should skip (not retry) blocked issues |
| Bash 3.2 incompatibility | High — breaks on macOS | No associative arrays; use tr for case conversion; test on Bash 3.2 |
| Breaking existing tests | Medium — CI failure | Run full suite before PR |
- Create
issue-validation.shmodule skeleton (blocks all others) - Implement
_score_description_length()(blocked by 1) - Implement
_score_structure()(blocked by 1) - Implement
_score_specificity()(blocked by 1) - Implement
_score_code_references()(blocked by 1) - Implement
_detect_vague_phrases()(blocked by 1) - Implement bypass logic + threshold config (blocked by 1)
- Implement
_build_validation_feedback()(blocked by 2-6) - Implement
validate_issue_quality()entry point (blocked by 2-8) - Update event schema (independent)
- Integrate into
stage_intake()(blocked by 9, 10) - Write unit tests (blocked by 9)
- Write integration test (blocked by 11)
- Run full test suite (blocked by 12, 13)