Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Background task hangs forever in "queued" when --cwd is a git worktree (detached worker dies silently; no timeout) #367

Open

Description

Summary

A background task (task ... --background, as used by /codex:rescue and the adversarial-review flow) hangs forever in queued and never executes when its --cwd points at a git worktree (a checkout where .git is a file pointing at the real gitdir, e.g. git worktree add). The detached worker dies during startup and, because it is spawned with stdio: "ignore" and there is no heartbeat/timeout, the failure is completely silent and the job is stuck at queued permanently.

Foreground tasks work fine in the same worktree, and background tasks work fine in non-worktree directories — so this is specific to the detached-worker + worktree combination, not to Codex itself (the CLI is installed and authenticated and runs the identical work in the foreground).

Environment

  • OS: Windows 11 (10.0.26200), shell: Git-bash / MSYS (also reproduced via the plugin invoked from Claude Code)
  • Plugin: codex@openai-codex v1.0.4 (scripts/codex-companion.mjs)
  • codex-cli 0.137.0, authenticated ("Logged in using ChatGPT")
  • Node: v20+ (system node)

Reproduction

CC=".../codex/1.0.4/scripts/codex-companion.mjs"
# 1) Make a git worktree
git -C /path/to/repo worktree add /tmp/wt-probe origin/main # .git is now a FILE
# 2) Background task in the worktree -> stuck "queued" forever
node "$CC" task "Reply with exactly: OK" --cwd /tmp/wt-probe --background
node "$CC" status <jobId> # phase: queued, never advances; worker pid is dead
# 3) Same task, FOREGROUND, same worktree -> works
node "$CC" task "Reply with exactly: OK" --cwd /tmp/wt-probe # -> OK
# 4) Background task in a NON-worktree dir -> works
node "$CC" task "Reply with exactly: OK" --cwd /tmp/plain-dir --background # -> completes

A/B matrix (all other factors equal):

cwd mode result
plain dir / main clone --background ✅ runs to completion
git worktree --background stuck queued forever
git worktree foreground ✅ runs to completion

The stuck job's log dead-ends at:

[..] Starting Codex Task.
[..] Queued for background execution.

...and never reaches Starting Codex task thread.. Re-running the same task-worker job in the foreground completes it.

Root cause / analysis

enqueueBackgroundTaskspawnDetachedTaskWorker (codex-companion.mjs ~L641-650) spawns the worker as:

spawn(process.execPath, [scriptPath, "task-worker", "--cwd", cwd, "--job-id", jobId], {
 cwd, env: process.env, detached: true, stdio: "ignore", windowsHide: true
});

Two problems combine:

  1. Silent death + no timeout. stdio: "ignore" discards the worker's stderr, and there is no heartbeat or watchdog that flips a never-started job to failed. So when the detached worker throws during init (which it does in a worktree cwd, before it has wired its own logging to the job logFile), the error vanishes and the job is pinned at queued indefinitely. status keeps reporting queued against a dead pid.

  2. Worktree-specific worker startup failure. The detached worker (process cwd = the worktree) dies before logging Starting Codex task thread, whereas the identical foreground path in the same worktree succeeds. The only deltas are detached:true + stdio:"ignore" + the child's process cwd being the worktree. Something in the worker→broker→codex app-server spawn chain fails for a worktree checkout under those conditions on Windows.

Secondary bug (recovery is also broken on Windows/Git-bash)

/codex:cancel <job> cannot kill a stuck job here: the companion's taskkill /PID <pid> /T /F is run through a Git-bash/MSYS context that path-translates the /PID switch into a filesystem path, producing:

taskkill /PID ... : ERROR: Invalid argument/option - 'C:/Program Files/Git/PID'.

so the kill silently fails and the orphaned job/process can't be cleaned via the plugin.

Tertiary: broker daemons leak

app-server-broker.mjs serve daemons (one per cwd) are never reaped — they accumulated to ~40 over a few days of normal use on this machine, one per working directory/worktree, long after those jobs finished.

Suggested fixes

  • Don't discard the detached worker's stderr to the void — redirect it to the job log file (or a worker log) so startup failures are recorded. At minimum, mark a job failed (with the captured error) if its worker exits/never advances within a timeout.
  • Add a heartbeat/timeout so a job that never leaves queued (dead worker pid) is surfaced as failed, not reported as queued forever.
  • Fix worktree support in the detached/background worker path (parity with the working foreground path).
  • On Windows, invoke taskkill via the comspec / an args array that isn't subject to MSYS path translation (e.g. spawn with explicit args, or prefix with the stop-parsing token), so /codex:cancel works under Git-bash.
  • Reap idle app-server-broker daemons.

Workaround

Run the review/task from the main clone (non-worktree) directory, or run foreground (not --background), when the branch under review lives in a git worktree. Reviewing a worktree branch's diff still works from the main clone via git show <sha> / git diff.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /