-
Notifications
You must be signed in to change notification settings - Fork 0
Run Codex live benchmark #1
Open
Description
Track live agent benchmark validation for Codex CLI.\n\nCommand target:
ctx leaderboard --hallucination --live --agent codex
Acceptance:
- Run in an external environment where Codex CLI auth/session is available.
- Capture skipped/error output if the CLI is unavailable or blocked.
- Do not use this result in launch copy until it is reproducible.
- Keep offline deterministic benchmark as the primary launch claim.
Metadata
Metadata
Assignees
Labels
No labels