-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Background task hangs forever in "queued" when --cwd is a git worktree (detached worker dies silently; no timeout) #367
Description
Summary
A background task (task ... --background, as used by /codex:rescue and the adversarial-review flow) hangs forever in queued and never executes when its --cwd points at a git worktree (a checkout where .git is a file pointing at the real gitdir, e.g. git worktree add). The detached worker dies during startup and, because it is spawned with stdio: "ignore" and there is no heartbeat/timeout, the failure is completely silent and the job is stuck at queued permanently.
Foreground tasks work fine in the same worktree, and background tasks work fine in non-worktree directories — so this is specific to the detached-worker + worktree combination, not to Codex itself (the CLI is installed and authenticated and runs the identical work in the foreground).
Environment
- OS: Windows 11 (10.0.26200), shell: Git-bash / MSYS (also reproduced via the plugin invoked from Claude Code)
- Plugin:
codex@openai-codexv1.0.4 (scripts/codex-companion.mjs) codex-cli0.137.0, authenticated ("Logged in using ChatGPT")- Node: v20+ (system node)
Reproduction
CC=".../codex/1.0.4/scripts/codex-companion.mjs" # 1) Make a git worktree git -C /path/to/repo worktree add /tmp/wt-probe origin/main # .git is now a FILE # 2) Background task in the worktree -> stuck "queued" forever node "$CC" task "Reply with exactly: OK" --cwd /tmp/wt-probe --background node "$CC" status <jobId> # phase: queued, never advances; worker pid is dead # 3) Same task, FOREGROUND, same worktree -> works node "$CC" task "Reply with exactly: OK" --cwd /tmp/wt-probe # -> OK # 4) Background task in a NON-worktree dir -> works node "$CC" task "Reply with exactly: OK" --cwd /tmp/plain-dir --background # -> completes
A/B matrix (all other factors equal):
| cwd | mode | result |
|---|---|---|
| plain dir / main clone | --background |
✅ runs to completion |
| git worktree | --background |
❌ stuck queued forever |
| git worktree | foreground | ✅ runs to completion |
The stuck job's log dead-ends at:
[..] Starting Codex Task.
[..] Queued for background execution.
...and never reaches Starting Codex task thread.. Re-running the same task-worker job in the foreground completes it.
Root cause / analysis
enqueueBackgroundTask → spawnDetachedTaskWorker (codex-companion.mjs ~L641-650) spawns the worker as:
spawn(process.execPath, [scriptPath, "task-worker", "--cwd", cwd, "--job-id", jobId], { cwd, env: process.env, detached: true, stdio: "ignore", windowsHide: true });
Two problems combine:
-
Silent death + no timeout.
stdio: "ignore"discards the worker's stderr, and there is no heartbeat or watchdog that flips a never-started job tofailed. So when the detached worker throws during init (which it does in a worktree cwd, before it has wired its own logging to the job logFile), the error vanishes and the job is pinned atqueuedindefinitely.statuskeeps reportingqueuedagainst a dead pid. -
Worktree-specific worker startup failure. The detached worker (process cwd = the worktree) dies before logging
Starting Codex task thread, whereas the identical foreground path in the same worktree succeeds. The only deltas aredetached:true+stdio:"ignore"+ the child's process cwd being the worktree. Something in the worker→broker→codex app-serverspawn chain fails for a worktree checkout under those conditions on Windows.
Secondary bug (recovery is also broken on Windows/Git-bash)
/codex:cancel <job> cannot kill a stuck job here: the companion's taskkill /PID <pid> /T /F is run through a Git-bash/MSYS context that path-translates the /PID switch into a filesystem path, producing:
taskkill /PID ... : ERROR: Invalid argument/option - 'C:/Program Files/Git/PID'.
so the kill silently fails and the orphaned job/process can't be cleaned via the plugin.
Tertiary: broker daemons leak
app-server-broker.mjs serve daemons (one per cwd) are never reaped — they accumulated to ~40 over a few days of normal use on this machine, one per working directory/worktree, long after those jobs finished.
Suggested fixes
- Don't discard the detached worker's stderr to the void — redirect it to the job log file (or a worker log) so startup failures are recorded. At minimum, mark a job
failed(with the captured error) if its worker exits/never advances within a timeout. - Add a heartbeat/timeout so a job that never leaves
queued(dead worker pid) is surfaced asfailed, not reported asqueuedforever. - Fix worktree support in the detached/background worker path (parity with the working foreground path).
- On Windows, invoke
taskkillvia the comspec / an args array that isn't subject to MSYS path translation (e.g. spawn with explicit args, or prefix with the stop-parsing token), so/codex:cancelworks under Git-bash. - Reap idle
app-server-brokerdaemons.
Workaround
Run the review/task from the main clone (non-worktree) directory, or run foreground (not --background), when the branch under review lives in a git worktree. Reviewing a worktree branch's diff still works from the main clone via git show <sha> / git diff.