-
Notifications
You must be signed in to change notification settings - Fork 316
CI: typescript-e2e dev (moonwall) suite intermittently times out launching node-subtensor #2755
Description
Summary
The typescript-e2e dev job (moonwall dev environment, ts-tests/suites/dev/**) can fail with Operation timed out, which auto-skips its tests and fails the suite. The node binary itself is fine — the zombienet_* e2e suites run the same target/release/node-subtensor build and pass. This points at moonwall's dev foundation not reaching readiness in time rather than a node or test-logic bug.
Symptom
A failing run looks like:
❯ dev suites/dev/subtensor/staking/test-add-staking.ts (2 tests | 2 skipped) 30010ms
FAIL dev suites/dev/subtensor/staking/test-add-staking.ts
Error: Operation timed out
Terminate orphan process: pid (NNNN) (node-subtensor)
(2 tests | 2 skipped) + the ~30s Operation timed out means moonwall's dev foundation never became ready (node launch / RPC at ws://127.0.0.1:9947 / first block within its readiness window), so moonwall skipped the tests and failed the suite. Terminate orphan process: node-subtensor confirms a node was spawned but never driven. Any suites/dev/** test will surface it; test-add-staking.ts is just where it tends to show up first.
The ~30s limit is not the config test timeout
ts-tests/moonwall.config.json already sets defaultTestTimeout: 120000 and the dev env timeout: 120000, yet the failure happens at ~30s. So the limit being hit is moonwall's internal launch/connection readiness wait, not the config test timeout — bumping defaultTestTimeout alone won't fix it. The dev env also runs multiThreads: true, i.e. it spawns several node-subtensor --dev instances in parallel, which can contend for CPU/RAM on slower or cold runners and push first-block/RPC-ready past that ~30s window.
Suggested fixes (rough order)
- Raise moonwall's launch/connection readiness timeout for the
devfoundation (vialaunchSpec/connection config) so a slower cold start still connects in time. - Reduce
dev-env parallelism (e.g.multiThreads: falseor a smaller thread count) so multiple--devnodes don't contend. - Confirm the CI runner has enough CPU/RAM and check
node-subtensor --devcold-start → RPC-ready time on it.
Repro
Intermittent on CI; reproduces by running pnpm moonwall test dev in ts-tests/ against a target/release/node-subtensor on a resource-constrained / cold runner.