Pipeline Design 327

ezigus edited this page Apr 19, 2026 · 1 revision

Now I have the full picture. Here's the ADR:

Design: feat(ruflo): store test stage results in ruflo and recall historical flakiness patterns

Context

The stage_test() function in scripts/lib/pipeline-stages-build.sh:596–726 currently has two gaps:

No historical recall. Unlike stage_test_first (line 20–31) which calls ruflo_recall_similar_outcomes before generating tests, stage_test runs blind — it never queries ruflo for past flakiness patterns that could inform the developer reading the log.
Store only fires on pass. The ruflo_store at line 719 sits after the pass-path GitHub comment. The failure branch return 1s at line 665, so failed test runs are never persisted. This means the single most valuable data point for flakiness tracking — failures — is dropped.

The stage_test_first integration (lines 20–31, 137–160) provides a proven template: declare -f guards, ruflo_available check, printf '%.2000s' truncation, || true fail-open, and learning-<repo_hash> namespace for semantic recall. Four prior memory entries document hard-won conventions: plain-text recall (not JSON), namespace matching, and declare -f guards.

Constraints

Bash 3.2 compatible (no associative arrays, no ${var,,})
All ruflo calls must be fail-open — test execution must never be blocked by ruflo failures
set -euo pipefail is active; unguarded failures will abort the pipeline
Test file (sw-ruflo-adapter-test.sh, 2531 lines) uses a mock-stub pattern: override functions, log args to temp files, assert against those files

Decision

Data Flow

stage_test() entry
 │
 ├─ 1. RECALL: ruflo_recall("test flakiness patterns failures",
 │ "pipeline-<PIPELINE_ID>")
 │ → log result via info() for human visibility
 │ → does NOT gate test execution
 │
 ├─ 2. RUN TESTS: bash -c "$test_cmd"
 │
 ├─ 3a. ON FAIL (before return 1 at line 665):
 │ ruflo_store("stage-test-result",
 │ "Tests FAILED (exit $test_exit). Count: <N>. Cmd: <cmd>. Coverage: 0%. Time: <ISO8601>.",
 │ "pipeline-<PIPELINE_ID>",
 │ "test,stage_test,failed")
 │
 └─ 3b. ON PASS (replacing lines 718–723):
 ruflo_store("stage-test-result",
 "Tests PASSED. Count: <N>. Cmd: <cmd>. Coverage: <pct>%. Time: <ISO8601>.",
 "pipeline-<PIPELINE_ID>",
 "test,stage_test,passed")

Key Design Choices

ruflo_recall (not ruflo_recall_similar_outcomes): The recall here is a simple keyword query against the pipeline namespace for prior test results — not a semantic TDD outcome lookup. ruflo_recall is the correct function because we're searching a fixed namespace with a keyword query, not doing cross-namespace semantic matching.
Namespace: pipeline-<PIPELINE_ID>: Stores and recalls against the same pipeline namespace. This differs from stage_test_first which uses learning-<repo_hash> for cross-pipeline TDD recall. Here, the goal is run-to-run flakiness tracking within the pipeline's history, so pipeline-* is correct.
Tags for classification: test,stage_test,passed or test,stage_test,failed enable future queries to filter by outcome. Tags are comma-separated strings matching the existing convention (see stage_test_first at line 157: "tdd,test_first,${TASK_TYPE:-feature}").
Test count via grep: grep -cE 'PASS|FAIL|✓|✗|ok [0-9]' against the test log provides a coarse count. This is imprecise but consistent with how the pipeline already parses coverage (grep -oE '[0-9]+%' at line 711). Not worth a parsing overhaul for a display-only field.
Fail-open at every layer: declare -f + ruflo_available + || true + empty-string fallback. Three levels of protection, matching the established pattern.

Error Handling

Failure Mode	Behavior
`ruflo_recall` returns empty	No info log emitted, test proceeds normally
`ruflo_recall` errors	Caught by `
`ruflo_store` errors on fail path	Caught by `
`ruflo_store` errors on pass path	Caught by `
`ruflo_available` returns false	Entire block skipped
`ruflo_recall`/`ruflo_store` not defined	`declare -f` guard skips block
`SHIPWRIGHT_PIPELINE_ID` unset	Falls back to `pipeline-unknown` namespace

Alternatives Considered

Store in learning-<repo_hash> namespace (like stage_test_first) — Pros: cross-pipeline recall; semantic matching works better with learning namespace. Cons: test results are pipeline-specific artifacts, not reusable TDD patterns; mixing them with TDD outcomes pollutes that namespace; the recall query would need different semantics. The pipeline namespace is the right scope for run-to-run flakiness data.
Structured JSON storage instead of plain text — Pros: machine-parseable, enables richer future queries. Cons: ruflo_recall returns plain text (per feedback_ruflo_recall_plain_text.md), so JSON storage would require a separate retrieval+parse path; adds complexity for no current consumer; breaks the convention established by stage_build stores (lines 474, 498, 588) which all use plain text.
Gate test execution on flakiness recall (skip known-flaky tests) — Pros: could reduce CI noise from known-flaky tests. Cons: far beyond scope; dangerous without human confirmation; test-skipping logic would need its own ADR; the recall is purely informational and that's the right starting point.
Store test log content in ruflo — Pros: richer data for future recall. Cons: test logs can be large (thousands of lines); ruflo values should be concise summaries; the full log is already persisted at $ARTIFACTS_DIR/test-results.log.

Implementation Plan

Files to create: None
Files to modify:
- scripts/lib/pipeline-stages-build.sh — stage_test() function: add recall block after line 618, add store in failure path before line 665, replace store in pass path at lines 718–723
- scripts/sw-ruflo-adapter-test.sh — add 4 test sections before print_test_results (line 2530)
Dependencies: None — ruflo_recall, ruflo_store, ruflo_available are already sourced via ruflo-adapter.sh
Risk areas:
- The failure path insertion point (before return 1 at line 665) is inside a nested if [[ -n "$ISSUE_NUMBER" ]] block for GitHub commenting. The ruflo store must be placed after that block but before return 1, so that GitHub comments still fire first.
- grep -cE under pipefail returns exit code 1 when count is 0 — the || echo "0" handles this, but double-check it doesn't produce 0\n0 (the known pitfall documented in CLAUDE.md).

Validation Criteria

ruflo_recall is invoked before bash -c "$test_cmd" with query containing "test flakiness patterns failures" and namespace pipeline-<PIPELINE_ID>
Recall output is logged via info but does not prevent test execution (even on error)
ruflo_store fires on the fail path with tag failed, exit code, test count, cmd, and ISO8601 timestamp
ruflo_store fires on the pass path with tag passed, coverage %, test count, cmd, and timestamp
All ruflo calls guarded: declare -f ruflo_store, declare -f ruflo_available, ruflo_available, || true
When SHIPWRIGHT_PIPELINE_ID is unset, namespace defaults to pipeline-unknown
When ruflo_available returns false, no ruflo calls are made and test execution is unaffected
4 new tests added to sw-ruflo-adapter-test.sh covering recall-happy, recall-logged, store-on-pass, store-on-fail
npm test passes with zero regressions

Test Pyramid Breakdown

Unit tests (4 new): Mock ruflo_recall/ruflo_store/ruflo_available as stub functions logging args to temp files. Assertions check logged arguments for correct namespace, tags, and data fields. Pattern mirrors existing stage_test_first tests at lines 2409–2528.
Integration tests (0): The ruflo adapter integration paths are already covered by existing adapter tests earlier in the file. No new integration boundary is introduced.
E2E tests (0): Pipeline-level E2E is a separate concern and not in scope.

Coverage Targets

100% branch coverage of new code: recall-available, recall-unavailable, store-on-pass, store-on-fail
The "ruflo unavailable" branch is implicitly covered by the declare -f/ruflo_available guard pattern already tested at line 2440–2453

Critical Paths to Test

Happy path: Ruflo available → recall returns history → tests pass → store called with passed tag, coverage, count, cmd, timestamp
Error case 1: Tests fail → store called with failed tag, exit code, count, cmd, 0% coverage
Error case 2: Ruflo unavailable → recall returns empty, store skipped, test execution unaffected
Edge case 1: SHIPWRIGHT_PIPELINE_ID unset → namespace pipeline-unknown
Edge case 2: Recall returns empty string → no info log emitted, execution proceeds

Observability

Monitoring Checklist: Not applicable — this change adds informational logging to an existing pipeline stage. No new services, endpoints, or deployments are introduced. The recall and store results are visible in the existing pipeline log output via info calls.

Anomaly Detection Triggers: Not applicable — ruflo calls are fail-open with || true. A ruflo outage produces no pipeline-visible error, only a missing info line. No alerting threshold exists for "recall returned empty."

Log Analysis: After deployment, verify by searching pipeline logs for:

"Ruflo recall: historical test patterns found" — confirms recall is working
"stage-test-result" in ruflo memory — confirms store is persisting (check via ruflo_recall "stage-test-result" "pipeline-*")

Auto-Rollback Decision Criteria: Not applicable — this is a shell script change deployed via git merge, not a service deployment. Rollback = revert the commit. The fail-open design means a broken ruflo has zero impact on test execution.

Pipeline Design 327

Design: feat(ruflo): store test stage results in ruflo and recall historical flakiness patterns

Context

Constraints

Decision

Data Flow

Key Design Choices

Error Handling

Alternatives Considered

Implementation Plan

Validation Criteria

Test Pyramid Breakdown

Coverage Targets

Critical Paths to Test

Observability

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally