WSRouter: heartbeat-depth + cross-server sweep coordination #99

Open

Labels

enhancement

@sibidharan

Description

@sibidharan

sibidharan

opened

on May 24, 2026

Correction note (added after initial filing): earlier draft of this issue claimed presence GC was unimplemented. It IS implemented — see src/WSRouter.php:881 (writeServerRegistryRow), line 459 (App::tick(SERVER_HEARTBEAT_INTERVAL_MS) refresh), line 892 (runStaleServerGC), line 466 (worker-0-only sweeper selection). External reviewer flagged two REAL gaps that survived the initial impl. This issue tracks those, not the GC itself.

The two real gaps in the shipped GC

Gap 1 — Heartbeat is shallow (Redis-write only, no liveness probe)

writeServerRegistryRow() is just:

Store::set(self::SERVERS_TABLE, self::$serverId, [
 'last_seen' => time(),
 'host' => (string) gethostname(),
 'pid' => getmypid(),
]);

It says "the worker that runs the tick is alive enough to write Redis." It does NOT say "the WS server can accept connections." If the rest of the server is wedged — FD exhaustion, blocked event loop, long GC pause, full accept backlog, IO worker deadlock — the tick coroutine in worker 0 still fires happily and ws_nodes row stays green. Meanwhile the ws:server:{ID} subscriber may be wedged too, so messages routed to that server silently vanish. The 90-second SERVER_STALE_AFTER_SEC window can't fire because last_seen keeps refreshing.

Fix shape (~5 LOC): probe our own OpenSwoole listen socket before the heartbeat write. If the accept loop can't even accept a self-connect, skip the refresh and let the row go stale.

public static function writeServerRegistryRow(): void
{
 if (self::$serverId === '') { return; }
 // Liveness probe — TCP self-connect to our own listener with short
 // timeout. Catches FD exhaustion, blocked event loop, full accept
 // backlog. If we can't accept our OWN connection, don't claim alive.
 $port = (int) (App::getServer()?->port ?? 0);
 if ($port > 0) {
 $sock = @stream_socket_client(
 'tcp://127.0.0.1:' . $port, $errno, $errstr,
 0.1, STREAM_CLIENT_CONNECT
 );
 if (!is_resource($sock)) {
 elog("WSRouter: self-connect failed (\$errno=\$errno) — skipping heartbeat refresh", 'warn');
 return;
 }
 fclose($sock);
 }
 Store::set(self::SERVERS_TABLE, self::\$serverId, [
 'last_seen' => time(),
 'host' => (string) gethostname(),
 'pid' => getmypid(),
 ]);
}

Gap 2 — No cross-server leader for the GC sweep

runStaleServerGC() selection is if ($workerId === 0) (line 466) — pure within-server convention. Across servers there's NO coordination. If servers A, B, C all notice server D is stale (90s window elapsed), all three concurrently:

Run Store::iterate(ws_owner) — three full table scans
Store::del() the same rows — three concurrent deletes
Store::del() D's registry row — three deletes

Works correctly today because Store::del on the Redis backend is idempotent — operations commute, no double-effect. But:

Triple Redis work (N scans, ×ばつrows deletes) for 1 dead node in an N-server cluster
Zero observability: no way to know "server B's sweep won, C's was redundant"
Doesn't scale — 100-node cluster losing 1 node = ×ばつ wasted iterates

Fix shape (~10 LOC) — deterministic owner over SETNX lease:

Original draft pitched a Redis SETNX lease per sweep. Reviewer pushback (correct): we already have ws_nodes with heartbeats. The lowest live node_id is the natural sweep owner — every node computes the same answer locally from data we already keep, zero new Redis writes per cycle, free observability via the existing table.

public static function runStaleServerGC(int $staleAfter = self::SERVER_STALE_AFTER_SEC): int
{
 // Deterministic owner: lowest live node_id in ws_nodes runs this tick.
 // Every node reads the same Redis source-of-truth, every node agrees.
 // No SETNX, no lease TTL to tune, no extra writes per cycle.
 $liveNodeIds = self::activeNodeIds($staleAfter);
 if ($liveNodeIds === [] || min($liveNodeIds) !== self::$serverId) {
 return 0; // not the sweep owner this tick
 }
 // ... [existing dead-server detection + per-row delete unchanged] ...
}

Why deterministic owner over lease:

Zero writes per cycle. A 6-node cluster sweeping every 30 s pays 12 SETNX/min for derivable math; deterministic owner pays 0.
Free observability. ws_nodes already records every live node — min(activeNodeIds()) tells anyone watching who'd win this tick without a separate last_swept_by key.
Owner-mid-die edge handled by Gap 1. When the elected owner's accept-loop probe fails, its heartbeat row goes stale; next tick every node deterministically picks the next-lowest. No election round-trip, no failover gap, no lease TTL to wait out.
Safe under brief disagreement. During heartbeat-expiry transitions two nodes may both think they're owner for one tick — both run the same iterate + del, second is no-op. Store::del is idempotent, so we don't even need fencing tokens (Redlock is overkill when double-run does no harm).

activeNodeIds() needs to be exposed (or extracted from the existing runStaleServerGC scan) — already a one-liner over the ws_nodes table read we do today.

Out of scope (deferred to v0.4.0+)

Network partitions vs hard death — heartbeat alone can't distinguish. Same trade-off Centrifugo/LiveKit make: accept some stale state during partitions, mitigate at the app layer (single-region WS fabric, cross-region via Redis replication).
Connection draining on planned shutdown — separately tracked. The current onWorkerStop → sweepThisServer() (line 476) covers the graceful path.
Replacing per-conn cleanup on the sender side — separately worth adding ("PUBLISH returns 0 receivers → mark client offline") for the gap between hard-crash and 90s GC catching up; cheap pre-fix that gives visibility before the full fixes land.

Tests

tests/Unit/WS/WSRouterHeartbeatLivenessTest.php — install uopz override for stream_socket_client returning false, assert writeServerRegistryRow() skips the Store::set.
tests/Unit/WS/WSRouterGCSweepOwnerTest.php — node with lowest live id runs the sweep; other live nodes return 0 without iterating; when the elected owner's heartbeat goes stale (Gap 1), the next-lowest takes over deterministically on the following tick.
Cross-host integration: bring two servers, force server B to wedge via deep blocking call, observe server A's GC reaps B within 90s instead of letting B's stale-but-still-refreshing row block detection.

Timeline

Gap 1 (accept-loop probe): ~5 LOC + 1 test. v0.3.1 candidate.
Gap 2 (deterministic sweep owner): ~10 LOC + 2 tests. No new Redis surface needed — reads existing ws_nodes heartbeats. v0.3.1 candidate.
Sender-side "0 receivers → mark offline": ~15 LOC + 1 test. Could ship even sooner as a 'better visibility' patch.

All three are independent — no ordering constraint.

References

Existing impl: src/WSRouter.php lines 54-56 (constants), 397-401 (table schema), 408 + 458-460 (heartbeat wiring), 466-468 (worker-0 sweep selection), 471-476 (graceful shutdown), 881-889 (writeServerRegistryRow), 892-928 (runStaleServerGC)
External reviewer notes (HN-prep round): "heartbeat-says-alive while the actual ws server is hung (accept queue full, long gc pause). ws_nodes stays green but messages still vanish, so baking an accept-loop liveness check into the heartbeat itself catches more than just touching redis. what's the leader layer, redlock or plain SETNX?"
C1 hardening (already shipped): conn_id nonce guard in publisher routing path, prevents FD-reuse cross-tenant leak — separate from this issue.

Metadata

Assignees

No one assigned

Labels

enhancement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WSRouter: heartbeat-depth + cross-server sweep coordination #99

Description

The two real gaps in the shipped GC

Gap 1 — Heartbeat is shallow (Redis-write only, no liveness probe)

Gap 2 — No cross-server leader for the GC sweep

Out of scope (deferred to v0.4.0+)

Tests

Timeline

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions