Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Design 175

ezigus edited this page Mar 17, 2026 · 1 revision

Now I have enough context. Here's the ADR:

Design: Issue pre-flight validation with actionability scoring

Context

Shipwright pipelines currently accept any GitHub issue regardless of content quality. A poorly written issue ("fix things", empty body, no code references) burns 12+ minutes of pipeline budget before failing at the build stage due to ambiguity. The intake stage (scripts/lib/pipeline-stages-intake.sh) fetches issue metadata (lines 13-48) but performs no quality assessment before proceeding to branch creation and task detection.

Constraints:

  • Bash 3.2 compatibility (no associative arrays, no ${var,,})
  • Must not break --goal-only pipelines (no ISSUE_BODY to validate)
  • Must integrate with existing emit_event() and gh_comment_issue() helpers
  • Must follow the scripts/lib/*.sh double-source guard pattern (see helpers.sh:30)
  • Exit code 2 for "check condition failed" per helpers.sh convention (line 10)
  • Daemon pipelines must handle blocked issues gracefully (skip, not retry)

Decision

Standalone validation module at scripts/lib/issue-validation.sh. Pure bash heuristic scoring (no AI calls) invoked from stage_intake() after issue metadata fetch (line 48) and before task type detection (line 51).

Scoring Model (0-100)

Four dimensions, 25 points each, plus a vagueness penalty:

  • Description length (0-25): Graduated by character count (<50=0, 50-150=10, 150-500=20, 500+=25)
  • Structure (0-25): Headings (+5), bullet/numbered lists (+5), acceptance criteria section (+10), code blocks (+5)
  • Specificity (0-25): File path references (+10), function/class names (+5), error messages/stack traces (+5), reproduction steps (+5)
  • Code references (0-25): File extensions (+10), line numbers (+5), code fences (+5), directory references (+5)
  • Vagueness penalty (0 to -20): -5 per vague phrase detected ("make it better", "improve performance", "fix things", etc.), capped at -20

Data Flow

stage_intake()
 → fetch issue metadata (existing, lines 13-48)
 → validate_issue_quality(ISSUE_BODY, GOAL, ISSUE_LABELS)
 → _score_description_length()
 → _score_structure()
 → _score_specificity()
 → _score_code_references()
 → _detect_vague_phrases()
 → sets VALIDATION_SCORE, VALIDATION_FEEDBACK
 → returns 0 (pass) or 2 (fail)
 → on failure: gh_comment_issue() with feedback, gh_add_labels() "needs-info", emit_event "intake.validation_failed", return 2
 → on pass: emit_event "intake.validation_passed", continue to task type detection

Bypass Conditions (returns score=100, skip all checks)

  • Issue labels contain skip-validation, hotfix, or urgent
  • SHIPWRIGHT_SKIP_VALIDATION=1 environment variable
  • Pipeline flag --skip-gates (existing mechanism)

Threshold

Configurable via pipeline config key intake.validation_threshold, default 60. Read with _config_get_int (existing helper from config.sh).

Error Handling

  • Validation failure returns exit 2 (check-condition-failed convention)
  • GitHub API unavailability: validation still runs locally, feedback posting fails silently (existing gh_comment_issue handles $NO_GITHUB)
  • Empty ISSUE_BODY: scores 0 on all dimensions, feedback says "Issue has no description"
  • Goal-only pipelines: validation block is skipped entirely (if [[ -n "$ISSUE_BODY" ]])

Alternatives Considered

  1. Inline validation in stage_intake() — Pros: No new files, simpler sourcing / Cons: Bloats a 116-line function to 200+, individual heuristics untestable in isolation, violates the module pattern used by all other scripts/lib/*.sh files
  2. AI-powered validation (Claude scores the issue) — Pros: More nuanced semantic understanding / Cons: Adds ~30s latency and ~0ドル.05 cost per pipeline start, defeats the goal of preventing wasted budget, creates a dependency on API availability at the earliest pipeline stage
  3. External GitHub Action/webhook — Pros: Validates before pipeline even starts / Cons: Requires infrastructure setup, doesn't integrate with pipeline event system, can't use bypass labels in pipeline context

Implementation Plan

  • Files to create:

    • scripts/lib/issue-validation.sh — Core validation module with double-source guard, scoring functions, bypass logic, feedback builder
    • scripts/sw-issue-validation-test.sh — Unit test suite for all scoring functions and composite validation
  • Files to modify:

    • scripts/lib/pipeline-stages-intake.sh — Insert validation call after line 48 (issue metadata fetch), before line 51 (task type detection). Source the new module.
    • config/event-schema.json — Add intake.validation_passed and intake.validation_failed event types
    • scripts/sw-pipeline-test.sh — Add integration test test_intake_validation_blocks_low_quality
  • Dependencies: None new. Uses existing helpers.sh (emit_event, output helpers), pipeline-github.sh (gh_comment_issue, gh_add_labels), config.sh (_config_get_int)

  • Risk areas:

    • False positives: Well-intentioned short issues (e.g., "Typo in README line 42") could score below threshold. Mitigated by bypass labels and tunable threshold.
    • Daemon retry loops: If daemon retries a blocked issue, it burns slots. Must emit intake.validation_failed event so daemon dispatch can recognize and skip.
    • Bash string matching edge cases: Vague phrase detection uses grep -i with word boundaries; regex must be portable across BSD/GNU grep. Use grep -iE with \b-free patterns (use [^a-z] boundaries instead for Bash 3.2/BSD compat).

Validation Criteria

  • Empty issue body (title only) produces score 0 and returns exit 2
  • Issue with < 50 char body scores below threshold and is blocked
  • Well-structured issue (headings, acceptance criteria, code refs, 500+ chars) scores >= 80
  • Each vague phrase ("make it better", "fix things") reduces score by 5, capped at -20
  • Issues with skip-validation / hotfix / urgent label bypass validation (score=100)
  • SHIPWRIGHT_SKIP_VALIDATION=1 env var bypasses validation
  • Feedback posted to GitHub issue lists specific deficiencies with actionable suggestions
  • intake.validation_passed event emitted with score on success
  • intake.validation_failed event emitted with score and threshold on failure
  • --goal-only pipelines (no --issue) skip validation entirely
  • All existing tests pass (npm test) with no regressions
  • All scoring functions work under Bash 3.2 (no associative arrays, no ${var,,})

Test Pyramid Breakdown

  • Unit tests (~12 tests in sw-issue-validation-test.sh): Each _score_* function tested with known inputs (empty, minimal, rich), vague phrase detection, bypass conditions, threshold override, feedback message content, composite validate_issue_quality() with realistic issue bodies
  • Integration tests (~2 tests in sw-pipeline-test.sh): Pipeline blocks on low-quality mock issue (exit 2, correct output); pipeline proceeds on high-quality mock issue
  • E2E tests (0): Not applicable — validation is deterministic string processing with no external dependencies beyond mocked gh

Coverage Targets

  • Unit: 100% of scoring functions and bypass paths (this is pure logic with no external deps)
  • Integration: Happy path (pass) and blocking path (fail) through stage_intake()

Critical Paths to Test

  • Happy path: Rich issue with acceptance criteria, code references → score >= 80 → pipeline proceeds
  • Error case 1: Empty body → score 0 → exit 2 → feedback posted → needs-info label added
  • Error case 2: Vague issue ("make it better, improve performance, fix things") → penalty applied → score below threshold → blocked
  • Edge case 1: Issue with hotfix label + empty body → bypass → score 100 → pipeline proceeds
  • Edge case 2: Borderline issue scoring exactly at threshold (60) → passes (>= comparison)

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /