Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pipeline Plan 327

ezigus edited this page Apr 19, 2026 · 1 revision

I now have all the context I need. Here's the implementation plan:


Implementation Plan: Store Test Stage Results in Ruflo & Recall Flakiness Patterns

Files to Modify

  1. scripts/lib/pipeline-stages-build.shstage_test() function (lines 596–726)
  2. scripts/sw-ruflo-adapter-test.sh — add new test sections at end (before print_test_results)

Implementation Steps

1. Add ruflo recall at the start of stage_test() (after line 618, before running tests)

Insert a block after info "Running tests: ..." (line 618) but before bash -c "$test_cmd" (line 620). Pattern follows stage_test_first lines 21–31:

# ── Recall historical flakiness patterns from ruflo ──────────────────
local _ruflo_flakiness_ctx=""
if declare -f ruflo_recall >/dev/null 2>&1 && \
 declare -f ruflo_available >/dev/null 2>&1 && \
 ruflo_available; then
 _ruflo_flakiness_ctx=$(ruflo_recall "test flakiness patterns failures" \
 "pipeline-${SHIPWRIGHT_PIPELINE_ID:-unknown}" 2>/dev/null || true)
 _ruflo_flakiness_ctx=$(printf '%.2000s' "${_ruflo_flakiness_ctx:-}")
 if [[ -n "$_ruflo_flakiness_ctx" ]]; then
 info "Ruflo recall: historical test patterns found"
 info "${DIM}${_ruflo_flakiness_ctx}${RESET}"
 fi
fi

Key points:

  • declare -f guards per memory feedback (feedback_ruflo_declare_f_guard.md)
  • printf '%.2000s' truncation per memory feedback (feedback_ruflo_recall_plain_text.md)
  • Namespace pipeline-<PIPELINE_ID> per acceptance criteria
  • Logged for human visibility, does not gate execution
  • Fail-open: || true on recall, empty string fallback

2. Move/restructure ruflo store to fire on BOTH pass and fail (lines 635–723)

Current problem: ruflo_store at line 719 only fires on pass (the failure path return 1s at line 665). Fix by:

  • Adding a store call inside the failure branch (before return 1 at line 665)
  • Enriching the existing pass-side store (line 719–722) with more data

Failure path — insert before return 1 at line 665:

# Store failed test result in ruflo for flakiness tracking
if declare -f ruflo_store >/dev/null 2>&1 && \
 declare -f ruflo_available >/dev/null 2>&1 && \
 ruflo_available; then
 local _fail_test_count
 _fail_test_count=$(grep -cE 'PASS|FAIL|✓|✗|ok [0-9]' "$test_log" 2>/dev/null || echo "0")
 ruflo_store "stage-test-result" \
 "Tests FAILED (exit $test_exit). Count: ${_fail_test_count}. Cmd: ${test_cmd}. Coverage: 0%. Time: $(date -u +"%Y-%m-%dT%H:%M:%SZ" 2>/dev/null || echo unknown)." \
 "pipeline-${SHIPWRIGHT_PIPELINE_ID:-unknown}" \
 "test,stage_test,failed" 2>/dev/null || true
fi

Pass path — replace lines 718–723 with enriched store:

# Store test results in ruflo for cross-stage context and flakiness tracking
if declare -f ruflo_store >/dev/null 2>&1 && \
 declare -f ruflo_available >/dev/null 2>&1 && \
 ruflo_available; then
 local _pass_test_count
 _pass_test_count=$(grep -cE 'PASS|FAIL|✓|✗|ok [0-9]' "$test_log" 2>/dev/null || echo "0")
 ruflo_store "stage-test-result" \
 "Tests PASSED. Count: ${_pass_test_count}. Cmd: ${test_cmd}. Coverage: ${_cov_pct:-0}%. Time: $(date -u +"%Y-%m-%dT%H:%M:%SZ" 2>/dev/null || echo unknown)." \
 "pipeline-${SHIPWRIGHT_PIPELINE_ID:-unknown}" \
 "test,stage_test,passed" 2>/dev/null || true
fi

3. Add tests in sw-ruflo-adapter-test.sh

Insert before print_test_results (line 2531). Add 4 test sections:

  1. stage_test recall: ruflo_recall called at start — mock ruflo_recall, verify it's called with correct namespace and query
  2. stage_test recall: output logged for visibility — verify recall output appears in info log
  3. stage_test store: ruflo_store called on pass — mock test pass, verify store called with passed tag and correct data (coverage, count, cmd, timestamp)
  4. stage_test store: ruflo_store called on fail — mock test fail, verify store called with failed tag

Each test stubs ruflo_available, ruflo_recall, ruflo_store as local functions that log args to temp files, then asserts against those logs. Pattern matches existing stage_test_first tests at lines 2400–2528.

Task Checklist

  • Task 1: Add ruflo recall block at start of stage_test() — query pipeline-<PIPELINE_ID> for historical test/flakiness patterns
  • Task 2: Log recall results via info for human visibility (no gating)
  • Task 3: Add ruflo store in the failure path (before return 1) with pass/fail, coverage, test count, cmd, timestamp
  • Task 4: Enrich the existing pass-path ruflo store with test count, cmd, timestamp, and tags
  • Task 5: Add declare -f guards + ruflo_available checks on all new ruflo calls
  • Task 6: Add test — recall-before-test: verify ruflo_recall invoked with correct namespace
  • Task 7: Add test — recall output logged for human visibility
  • Task 8: Add test — store-after-test (pass path): verify store args contain passed, coverage, cmd
  • Task 9: Add test — store-after-test (fail path): verify store called with failed tag
  • Task 10: Run npm test and confirm all existing tests pass

Testing Approach

Test Pyramid Breakdown:

  • Unit tests (4 new): All 4 tests are unit-level, mocking ruflo_recall/ruflo_store/ruflo_available as stub functions that log their arguments to temp files. Assertions check those logs.
  • Integration tests (0): Not needed — the ruflo adapter integration is already covered by existing tests.
  • E2E tests (0): Not applicable — pipeline-level E2E would be a separate concern.

Coverage Targets:

  • 100% of the new code paths (recall-at-start, store-on-pass, store-on-fail)
  • Both the "ruflo available" and "ruflo unavailable" branches are covered (the unavailable branch is implicitly tested by existing guard tests)

Critical Paths to Test:

  • Happy path: Ruflo available, recall returns history, tests pass, store called with pass data
  • Error case 1: Tests fail — store still called with failure data (the main gap this issue fixes)
  • Error case 2: Ruflo unavailable — recall returns empty, store skipped, test execution unaffected
  • Edge case 1: SHIPWRIGHT_PIPELINE_ID unset — namespace falls back to pipeline-unknown
  • Edge case 2: Recall returns empty string — no flakiness info logged, execution proceeds normally

Definition of Done

  • ruflo_recall() called at start of stage_test() with pipeline-<PIPELINE_ID> namespace
  • Recall results logged via info but do not gate test execution
  • ruflo_store() called after BOTH pass and fail outcomes with: pass/fail, coverage %, test count, test cmd, timestamp
  • Storage namespace: pipeline-<PIPELINE_ID>
  • All ruflo calls guarded with declare -f + ruflo_available (fail-open)
  • 4 new tests in sw-ruflo-adapter-test.sh covering recall and store on pass/fail
  • npm test passes with no regressions

Security/API/Skill Notes

  • Endpoint Specification: N/A — no new API endpoints; this is shell-to-CLI integration.
  • Error Codes: N/A — all ruflo calls are fail-open (|| true).
  • Rate Limiting: N/A — ruflo CLI has built-in circuit-breaker timeout.
  • Versioning: N/A — no API versioning change.
  • Threat Model (STRIDE): N/A — no new attack surface; ruflo data is local, no secrets stored.
  • Auth Flow: N/A — no authentication changes.
  • Input Validation Points: The test_cmd value stored in ruflo is already validated/auto-detected by the existing pipeline config parsing. The recall query is a hardcoded string.
  • Security Checklist: No secrets in stored data (only test counts/coverage/cmd), no user input flows into ruflo keys without existing sanitization.

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /