-
Notifications
You must be signed in to change notification settings - Fork 46
fix: daemon bugs — flush race, readline leak, timeline lock, client retry, file-request scope#117
fix: daemon bugs — flush race, readline leak, timeline lock, client retry, file-request scope #117gusye1234 wants to merge 1 commit into
Conversation
...e lock, client retry) 1. session-runner: Add isFlushing guard to prevent concurrent flushMessages calls from racing and dropping messages (#108) 2. agent backends (claude, codex, opencode): Add rl.close() in process close handlers to prevent readline memory leaks (#109) 3. timeline: Add retry loop (2 retries with spin-wait) in updateEntry when lock acquisition fails, reducing silent update drops (#110) 4. client: Initialize lastError with a meaningful Error instead of undefined, preventing throw-undefined edge case (#115) 5. workspace-file-request: Add optional workspaceId parameter to getRequest for defense-in-depth scoping (#107) Closes #107, #108, #109, #110, #115
@gusye1234
gusye1234
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review — PR #117: fix: daemon bugs — flush race, readline leak, timeline lock, client retry, file-request scope
What it does
Fixes five independent daemon bugs:
- Readline leak — closes readline interfaces on process exit across all agent backends
- Flush race — adds an
isFlushingguard to prevent concurrent message batch flushes - Timeline lock — retries lock acquisition with short spin-waits before giving up
- Client retry — initializes
lastErrorwith a descriptive Error so the throw path always has a meaningful message - File-request scope — scopes
getRequestlookup byworkspaceIdso one workspace can't access another's file requests
Code Quality Checklist
Functionality
- Code does what it's supposed to do
- Edge cases handled
- Error handling appropriate
- No logic errors
Code Quality
- Readable and well-structured
- Functions are small and focused
- No code duplication
- Follows project conventions
Code Simplification
- Changes are minimal and targeted
- One concern: The spin-wait in
timeline.ts(lines 157-161) uses a busy loop (while (Date.now() - start < 50 * (retry + 1)) {}) which blocks the Node.js event loop. Since this appears to be in a synchronous file-lock context, it's likely acceptable for 50-100ms, but worth noting. If this ever becomes a hot path, considerAtomics.waitor restructuring to async.
Security
- No security vulnerabilities
- Good catch: The
workspaceIdscoping on file requests closes a potential cross-workspace data access bug - No hardcoded secrets
Tests Coverage
- No new tests added for these fixes. The flush race guard, timeline retry, and workspace-scoped query would benefit from unit tests, especially:
- Concurrent flush prevention
- Lock retry behavior (acquired on 2nd/3rd try vs. still failing)
getRequestwith/without workspaceId
Project Syncing
- All three agent backends (claude, codex, opencode) updated consistently for rl.close()
- Both retry paths in client.ts updated
Observations
-
Flush race — lost messages on failure: When
reportMessagesthrows, the spliced batch is silently lost (only logged at debug level). The previous code had the same issue, so this PR doesn't regress it — but it's worth considering a re-queue or at least a warning-level log for dropped messages. -
Spin-wait justification: The spin-wait is a pragmatic choice for a sync file-lock path, but 50-100ms of event-loop blocking could stall other timers. Since this only triggers under contention (lock already held), it's probably fine in practice.
-
and(...conditions)— whenworkspaceIdis undefined,conditionshas only one element. Verify thatand()with a single argument works correctly in your Drizzle version (it should, but worth a quick check).
Verdict
Solid set of targeted bug fixes. All changes are low-risk and improve correctness. Would be stronger with unit tests for the new behaviors, but the fixes themselves are clean and well-scoped. Looks ready to merge.
Summary
isFlushingguard to prevent concurrentflushMessagescalls from racingrl.close()in process close handlers for all 3 agent backends (claude, codex, opencode)updateEntrywhen lock acquisition failslastErrorwith a meaningful Error in both retry loopsworkspaceIdtogetRequestquery for defense-in-depthTest plan
Closes #107, #108, #109, #110, #115