-
Notifications
You must be signed in to change notification settings - Fork 15
WSRouter: heartbeat-depth + cross-server sweep coordination #99
Description
Correction note (added after initial filing): earlier draft of this issue claimed presence GC was unimplemented. It IS implemented — see
src/WSRouter.php:881(writeServerRegistryRow), line 459 (App::tick(SERVER_HEARTBEAT_INTERVAL_MS)refresh), line 892 (runStaleServerGC), line 466 (worker-0-only sweeper selection). External reviewer flagged two REAL gaps that survived the initial impl. This issue tracks those, not the GC itself.
The two real gaps in the shipped GC
Gap 1 — Heartbeat is shallow (Redis-write only, no liveness probe)
writeServerRegistryRow() is just:
Store::set(self::SERVERS_TABLE, self::$serverId, [ 'last_seen' => time(), 'host' => (string) gethostname(), 'pid' => getmypid(), ]);
It says "the worker that runs the tick is alive enough to write Redis." It does NOT say "the WS server can accept connections." If the rest of the server is wedged — FD exhaustion, blocked event loop, long GC pause, full accept backlog, IO worker deadlock — the tick coroutine in worker 0 still fires happily and ws_nodes row stays green. Meanwhile the ws:server:{ID} subscriber may be wedged too, so messages routed to that server silently vanish. The 90-second SERVER_STALE_AFTER_SEC window can't fire because last_seen keeps refreshing.
Fix shape (~5 LOC): probe our own OpenSwoole listen socket before the heartbeat write. If the accept loop can't even accept a self-connect, skip the refresh and let the row go stale.
public static function writeServerRegistryRow(): void { if (self::$serverId === '') { return; } // Liveness probe — TCP self-connect to our own listener with short // timeout. Catches FD exhaustion, blocked event loop, full accept // backlog. If we can't accept our OWN connection, don't claim alive. $port = (int) (App::getServer()?->port ?? 0); if ($port > 0) { $sock = @stream_socket_client( 'tcp://127.0.0.1:' . $port, $errno, $errstr, 0.1, STREAM_CLIENT_CONNECT ); if (!is_resource($sock)) { elog("WSRouter: self-connect failed (\$errno=\$errno) — skipping heartbeat refresh", 'warn'); return; } fclose($sock); } Store::set(self::SERVERS_TABLE, self::\$serverId, [ 'last_seen' => time(), 'host' => (string) gethostname(), 'pid' => getmypid(), ]); }
Gap 2 — No cross-server leader for the GC sweep
runStaleServerGC() selection is if ($workerId === 0) (line 466) — pure within-server convention. Across servers there's NO coordination. If servers A, B, C all notice server D is stale (90s window elapsed), all three concurrently:
- Run
Store::iterate(ws_owner)— three full table scans Store::del()the same rows — three concurrent deletesStore::del()D's registry row — three deletes
Works correctly today because Store::del on the Redis backend is idempotent — operations commute, no double-effect. But:
- Triple Redis work (N scans, ×ばつrows deletes) for 1 dead node in an N-server cluster
- Zero observability: no way to know "server B's sweep won, C's was redundant"
- Doesn't scale — 100-node cluster losing 1 node = ×ばつ wasted iterates
Fix shape (~10 LOC) — deterministic owner over SETNX lease:
Original draft pitched a Redis SETNX lease per sweep. Reviewer pushback (correct): we already have ws_nodes with heartbeats. The lowest live node_id is the natural sweep owner — every node computes the same answer locally from data we already keep, zero new Redis writes per cycle, free observability via the existing table.
public static function runStaleServerGC(int $staleAfter = self::SERVER_STALE_AFTER_SEC): int { // Deterministic owner: lowest live node_id in ws_nodes runs this tick. // Every node reads the same Redis source-of-truth, every node agrees. // No SETNX, no lease TTL to tune, no extra writes per cycle. $liveNodeIds = self::activeNodeIds($staleAfter); if ($liveNodeIds === [] || min($liveNodeIds) !== self::$serverId) { return 0; // not the sweep owner this tick } // ... [existing dead-server detection + per-row delete unchanged] ... }
Why deterministic owner over lease:
- Zero writes per cycle. A 6-node cluster sweeping every 30 s pays 12 SETNX/min for derivable math; deterministic owner pays 0.
- Free observability.
ws_nodesalready records every live node —min(activeNodeIds())tells anyone watching who'd win this tick without a separatelast_swept_bykey. - Owner-mid-die edge handled by Gap 1. When the elected owner's accept-loop probe fails, its heartbeat row goes stale; next tick every node deterministically picks the next-lowest. No election round-trip, no failover gap, no lease TTL to wait out.
- Safe under brief disagreement. During heartbeat-expiry transitions two nodes may both think they're owner for one tick — both run the same
iterate + del, second is no-op.Store::delis idempotent, so we don't even need fencing tokens (Redlock is overkill when double-run does no harm).
activeNodeIds() needs to be exposed (or extracted from the existing runStaleServerGC scan) — already a one-liner over the ws_nodes table read we do today.
Out of scope (deferred to v0.4.0+)
- Network partitions vs hard death — heartbeat alone can't distinguish. Same trade-off Centrifugo/LiveKit make: accept some stale state during partitions, mitigate at the app layer (single-region WS fabric, cross-region via Redis replication).
- Connection draining on planned shutdown — separately tracked. The current
onWorkerStop→sweepThisServer()(line 476) covers the graceful path. - Replacing per-conn cleanup on the sender side — separately worth adding ("PUBLISH returns 0 receivers → mark client offline") for the gap between hard-crash and 90s GC catching up; cheap pre-fix that gives visibility before the full fixes land.
Tests
tests/Unit/WS/WSRouterHeartbeatLivenessTest.php— install uopz override forstream_socket_clientreturning false, assertwriteServerRegistryRow()skips the Store::set.tests/Unit/WS/WSRouterGCSweepOwnerTest.php— node with lowest live id runs the sweep; other live nodes return 0 without iterating; when the elected owner's heartbeat goes stale (Gap 1), the next-lowest takes over deterministically on the following tick.- Cross-host integration: bring two servers, force server B to wedge via deep blocking call, observe server A's GC reaps B within 90s instead of letting B's stale-but-still-refreshing row block detection.
Timeline
- Gap 1 (accept-loop probe): ~5 LOC + 1 test. v0.3.1 candidate.
- Gap 2 (deterministic sweep owner): ~10 LOC + 2 tests. No new Redis surface needed — reads existing
ws_nodesheartbeats. v0.3.1 candidate. - Sender-side "0 receivers → mark offline": ~15 LOC + 1 test. Could ship even sooner as a 'better visibility' patch.
All three are independent — no ordering constraint.
References
- Existing impl:
src/WSRouter.phplines 54-56 (constants), 397-401 (table schema), 408 + 458-460 (heartbeat wiring), 466-468 (worker-0 sweep selection), 471-476 (graceful shutdown), 881-889 (writeServerRegistryRow), 892-928 (runStaleServerGC) - External reviewer notes (HN-prep round): "heartbeat-says-alive while the actual ws server is hung (accept queue full, long gc pause). ws_nodes stays green but messages still vanish, so baking an accept-loop liveness check into the heartbeat itself catches more than just touching redis. what's the leader layer, redlock or plain SETNX?"
- C1 hardening (already shipped): conn_id nonce guard in publisher routing path, prevents FD-reuse cross-tenant leak — separate from this issue.