-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline Design 327
Now I have the full picture. Here's the ADR:
The stage_test() function in scripts/lib/pipeline-stages-build.sh:596–726 currently has two gaps:
-
No historical recall. Unlike
stage_test_first(line 20–31) which callsruflo_recall_similar_outcomesbefore generating tests,stage_testruns blind — it never queries ruflo for past flakiness patterns that could inform the developer reading the log. -
Store only fires on pass. The
ruflo_storeat line 719 sits after the pass-path GitHub comment. The failure branchreturn 1s at line 665, so failed test runs are never persisted. This means the single most valuable data point for flakiness tracking — failures — is dropped.
The stage_test_first integration (lines 20–31, 137–160) provides a proven template: declare -f guards, ruflo_available check, printf '%.2000s' truncation, || true fail-open, and learning-<repo_hash> namespace for semantic recall. Four prior memory entries document hard-won conventions: plain-text recall (not JSON), namespace matching, and declare -f guards.
- Bash 3.2 compatible (no associative arrays, no
${var,,}) - All ruflo calls must be fail-open — test execution must never be blocked by ruflo failures
-
set -euo pipefailis active; unguarded failures will abort the pipeline - Test file (
sw-ruflo-adapter-test.sh, 2531 lines) uses a mock-stub pattern: override functions, log args to temp files, assert against those files
stage_test() entry
│
├─ 1. RECALL: ruflo_recall("test flakiness patterns failures",
│ "pipeline-<PIPELINE_ID>")
│ → log result via info() for human visibility
│ → does NOT gate test execution
│
├─ 2. RUN TESTS: bash -c "$test_cmd"
│
├─ 3a. ON FAIL (before return 1 at line 665):
│ ruflo_store("stage-test-result",
│ "Tests FAILED (exit $test_exit). Count: <N>. Cmd: <cmd>. Coverage: 0%. Time: <ISO8601>.",
│ "pipeline-<PIPELINE_ID>",
│ "test,stage_test,failed")
│
└─ 3b. ON PASS (replacing lines 718–723):
ruflo_store("stage-test-result",
"Tests PASSED. Count: <N>. Cmd: <cmd>. Coverage: <pct>%. Time: <ISO8601>.",
"pipeline-<PIPELINE_ID>",
"test,stage_test,passed")
-
ruflo_recall(notruflo_recall_similar_outcomes): The recall here is a simple keyword query against the pipeline namespace for prior test results — not a semantic TDD outcome lookup.ruflo_recallis the correct function because we're searching a fixed namespace with a keyword query, not doing cross-namespace semantic matching. -
Namespace:
pipeline-<PIPELINE_ID>: Stores and recalls against the same pipeline namespace. This differs fromstage_test_firstwhich useslearning-<repo_hash>for cross-pipeline TDD recall. Here, the goal is run-to-run flakiness tracking within the pipeline's history, sopipeline-*is correct. -
Tags for classification:
test,stage_test,passedortest,stage_test,failedenable future queries to filter by outcome. Tags are comma-separated strings matching the existing convention (seestage_test_firstat line 157:"tdd,test_first,${TASK_TYPE:-feature}"). -
Test count via grep:
grep -cE 'PASS|FAIL|✓|✗|ok [0-9]'against the test log provides a coarse count. This is imprecise but consistent with how the pipeline already parses coverage (grep -oE '[0-9]+%'at line 711). Not worth a parsing overhaul for a display-only field. -
Fail-open at every layer:
declare -f+ruflo_available+|| true+ empty-string fallback. Three levels of protection, matching the established pattern.
| Failure Mode | Behavior |
|---|---|
ruflo_recall returns empty |
No info log emitted, test proceeds normally |
ruflo_recall errors |
Caught by ` |
ruflo_store errors on fail path |
Caught by ` |
ruflo_store errors on pass path |
Caught by ` |
ruflo_available returns false |
Entire block skipped |
ruflo_recall/ruflo_store not defined |
declare -f guard skips block |
SHIPWRIGHT_PIPELINE_ID unset |
Falls back to pipeline-unknown namespace |
-
Store in
learning-<repo_hash>namespace (likestage_test_first) — Pros: cross-pipeline recall; semantic matching works better with learning namespace. Cons: test results are pipeline-specific artifacts, not reusable TDD patterns; mixing them with TDD outcomes pollutes that namespace; the recall query would need different semantics. The pipeline namespace is the right scope for run-to-run flakiness data. -
Structured JSON storage instead of plain text — Pros: machine-parseable, enables richer future queries. Cons:
ruflo_recallreturns plain text (perfeedback_ruflo_recall_plain_text.md), so JSON storage would require a separate retrieval+parse path; adds complexity for no current consumer; breaks the convention established bystage_buildstores (lines 474, 498, 588) which all use plain text. -
Gate test execution on flakiness recall (skip known-flaky tests) — Pros: could reduce CI noise from known-flaky tests. Cons: far beyond scope; dangerous without human confirmation; test-skipping logic would need its own ADR; the recall is purely informational and that's the right starting point.
-
Store test log content in ruflo — Pros: richer data for future recall. Cons: test logs can be large (thousands of lines); ruflo values should be concise summaries; the full log is already persisted at
$ARTIFACTS_DIR/test-results.log.
- Files to create: None
-
Files to modify:
-
scripts/lib/pipeline-stages-build.sh—stage_test()function: add recall block after line 618, add store in failure path before line 665, replace store in pass path at lines 718–723 -
scripts/sw-ruflo-adapter-test.sh— add 4 test sections beforeprint_test_results(line 2530)
-
-
Dependencies: None —
ruflo_recall,ruflo_store,ruflo_availableare already sourced viaruflo-adapter.sh -
Risk areas:
- The failure path insertion point (before
return 1at line 665) is inside a nestedif [[ -n "$ISSUE_NUMBER" ]]block for GitHub commenting. The ruflo store must be placed after that block but beforereturn 1, so that GitHub comments still fire first. -
grep -cEunderpipefailreturns exit code 1 when count is 0 — the|| echo "0"handles this, but double-check it doesn't produce0\n0(the known pitfall documented in CLAUDE.md).
- The failure path insertion point (before
-
ruflo_recallis invoked beforebash -c "$test_cmd"with query containing "test flakiness patterns failures" and namespacepipeline-<PIPELINE_ID> - Recall output is logged via
infobut does not prevent test execution (even on error) -
ruflo_storefires on the fail path with tagfailed, exit code, test count, cmd, and ISO8601 timestamp -
ruflo_storefires on the pass path with tagpassed, coverage %, test count, cmd, and timestamp - All ruflo calls guarded:
declare -f ruflo_store,declare -f ruflo_available,ruflo_available,|| true - When
SHIPWRIGHT_PIPELINE_IDis unset, namespace defaults topipeline-unknown - When
ruflo_availablereturns false, no ruflo calls are made and test execution is unaffected - 4 new tests added to
sw-ruflo-adapter-test.shcovering recall-happy, recall-logged, store-on-pass, store-on-fail -
npm testpasses with zero regressions
-
Unit tests (4 new): Mock
ruflo_recall/ruflo_store/ruflo_availableas stub functions logging args to temp files. Assertions check logged arguments for correct namespace, tags, and data fields. Pattern mirrors existingstage_test_firsttests at lines 2409–2528. - Integration tests (0): The ruflo adapter integration paths are already covered by existing adapter tests earlier in the file. No new integration boundary is introduced.
- E2E tests (0): Pipeline-level E2E is a separate concern and not in scope.
- 100% branch coverage of new code: recall-available, recall-unavailable, store-on-pass, store-on-fail
- The "ruflo unavailable" branch is implicitly covered by the
declare -f/ruflo_availableguard pattern already tested at line 2440–2453
-
Happy path: Ruflo available → recall returns history → tests pass → store called with
passedtag, coverage, count, cmd, timestamp -
Error case 1: Tests fail → store called with
failedtag, exit code, count, cmd, 0% coverage - Error case 2: Ruflo unavailable → recall returns empty, store skipped, test execution unaffected
-
Edge case 1:
SHIPWRIGHT_PIPELINE_IDunset → namespacepipeline-unknown - Edge case 2: Recall returns empty string → no info log emitted, execution proceeds
Monitoring Checklist: Not applicable — this change adds informational logging to an existing pipeline stage. No new services, endpoints, or deployments are introduced. The recall and store results are visible in the existing pipeline log output via info calls.
Anomaly Detection Triggers: Not applicable — ruflo calls are fail-open with || true. A ruflo outage produces no pipeline-visible error, only a missing info line. No alerting threshold exists for "recall returned empty."
Log Analysis: After deployment, verify by searching pipeline logs for:
-
"Ruflo recall: historical test patterns found"— confirms recall is working -
"stage-test-result"in ruflo memory — confirms store is persisting (check viaruflo_recall "stage-test-result" "pipeline-*")
Auto-Rollback Decision Criteria: Not applicable — this is a shell script change deployed via git merge, not a service deployment. Rollback = revert the commit. The fail-open design means a broken ruflo has zero impact on test execution.