-
Notifications
You must be signed in to change notification settings - Fork 0
Run Gemini CLI live benchmark #4
Open
Description
Track live agent benchmark validation for Gemini CLI.
Command target:
ctx leaderboard --hallucination --live --agent gemini
Acceptance:
- Run in an external environment where Gemini CLI auth/session is available.
- Capture skipped/error output if the CLI is unavailable or blocked.
- Do not use this result in launch copy until it is reproducible.
- Keep offline deterministic benchmark as the primary launch claim.
Metadata
Metadata
Assignees
Labels
No labels