Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

mingrath/arena-workflow

Repository files navigation

CCG - Claude + Codex + Gemini Competitive Multi-Model System

This is a redesigned fork of fengshao1227/ccg-workflow. See What Changed below.

A competitive multi-model development system where every significant task is dispatched to ALL available models (Codex + Gemini) in parallel. Claude orchestrates, competes, compares outputs using benchmark-weighted evaluation criteria, and synthesizes the best solution.

Why competitive? Our benchmark analysis shows no single model dominates all tasks. Claude leads code quality (SWE-bench 80.8%), Codex leads autonomous execution (Terminal-Bench 77.3%), and Gemini leads visual design (WebDev Arena #1). Competitive dispatch ensures you always get the best output.


What Changed in This Fork

The original CCG uses static routing — frontend tasks always go to Gemini, backend tasks always go to Codex. This fork changes that completely.

Before (Original)

Frontend task → Gemini only (assumed "frontend expert")
Backend task → Codex only (assumed "backend expert")
Claude → Orchestrator (doesn't compete)

The problem: these assumptions aren't backed by benchmarks. SWE-bench scores are nearly tied across all three models (~80%), and each model has different strengths that don't align with a simple frontend/backend split.

After (This Fork)

Any task → Send to ALL models in parallel → Compare outputs with weighted scores → Use the best one

Every task goes to Codex + Gemini + Claude at the same time. Claude then scores all three outputs using benchmark-weighted criteria and picks the best one (or combines the best parts from each).

What's different in practice

Feature Original This Fork
Routing Fixed: Gemini=frontend, Codex=backend All models compete on every task
How to pick the best output Trust rules: "Codex is backend authority" Weighted scoring based on actual benchmarks
Claude's role Orchestrator only (doesn't generate solutions) Orchestrator AND competitor (generates its own solution too)
Code review 2 models (Codex + Gemini) 3 models (Codex + Gemini + Claude), with consensus scoring (3/3, 2/3, 1/3)
Model strengths Assumed (no evidence cited) Documented with benchmark data and sources
Model weaknesses Not mentioned Each model's prompts list known limitations
Cost control Always 2 models 3 modes: Competitive (all 3), Focused (best-match), Quick (Claude only)

Why the change

We fact-checked the original claims against published benchmarks:

Original Claim Reality
"Gemini is the frontend expert" Gemini leads visual appeal (WebDev Arena #1) but Claude produces better responsive and accessible code (Index.dev test)
"Codex is the backend expert" Codex leads terminal workflows (77.3%) but Claude leads architecture planning and code quality (80.8% SWE-bench)
One model is always best for a domain No — each model wins on different dimensions, not domains

Architecture

 ┌─→ Codex CLI ─→ Output A ─┐
 │ │
Task ─→ Claude ─────┼─→ Gemini CLI ─→ Output B ─┼─→ Weighted ─→ Best
 (also │ │ Compare Solution
 generates └─→ Claude Self ─→ Output C ─┘
 Output C)

External models have no write access — they only return patches/analysis, which Claude evaluates and applies.

How It Works

A full /ccg:workflow run goes through 6 phases. At every phase that involves model output, all models compete in parallel:

flowchart TD
 Start["🚀 User runs /ccg:workflow"] --> P1
 subgraph P1["Phase 1 — Research"]
 R1["Analyze requirements"]
 R2["Search codebase via MCP"]
 R3["Score completeness (≥7 to proceed)"]
 R1 --> R2 --> R3
 end
 P1 --> P2
 subgraph P2["Phase 2 — Ideation (Competitive)"]
 direction LR
 C2["Codex analyzes"] & G2["Gemini analyzes"] & CL2["Claude analyzes"]
 end
 P2 --> W2["⚖️ Weighted scoring → Best analysis"]
 W2 --> P3
 subgraph P3["Phase 3 — Planning (Competitive)"]
 direction LR
 C3["Codex plans"] & G3["Gemini plans"] & CL3["Claude plans"]
 end
 P3 --> W3["⚖️ Weighted scoring → Best plan"]
 W3 --> Approve{"👤 User approves plan?"}
 Approve -- No --> P3
 Approve -- Yes --> P4
 subgraph P4["Phase 4 — Execution"]
 E1["Claude implements code"]
 E2["(sole code writer — external models have zero write access)"]
 E1 --> E2
 end
 P4 --> P5
 subgraph P5["Phase 5 — Review (Competitive)"]
 direction LR
 C5["Codex reviews"] & G5["Gemini reviews"] & CL5["Claude reviews"]
 end
 P5 --> W5["⚖️ Weighted consensus → Issues ranked by agreement (3/3, 2/3, 1/3)"]
 W5 --> Fix{"Critical issues?"}
 Fix -- Yes --> FixCode["Claude fixes"] --> P5
 Fix -- No --> P6
 subgraph P6["Phase 6 — Final Review"]
 F1["Verify against plan"]
 F2["Run tests"]
 F3["Report to user"]
 F1 --> F2 --> F3
 end
 P6 --> Done["✅ Done"]
 style P2 fill:#e8f4f8,stroke:#2196F3
 style P3 fill:#e8f4f8,stroke:#2196F3
 style P5 fill:#e8f4f8,stroke:#2196F3
 style W2 fill:#fff3e0,stroke:#FF9800
 style W3 fill:#fff3e0,stroke:#FF9800
 style W5 fill:#fff3e0,stroke:#FF9800
Loading

Weighted Scoring Example (Phase 2 — Analysis)

Each model's output is scored on multiple dimensions. The weights depend on the task type:

×ばつ weight) Winner = highest total score Final output = winner + cherry-picked improvements from others">
 Codex Gemini Claude
 ───── ────── ──────
Tech depth 0.30 0.20 0.35 ← Claude weighted highest
Risk detection 0.35 0.15 0.30 ← Codex weighted highest
UX / visual 0.10 0.40 0.20 ← Gemini weighted highest
Actionability 0.25 0.25 0.25 ← Equal
Score = Σ (dimension rating ×ばつ weight)
Winner = highest total score
Final output = winner + cherry-picked improvements from others

Different task types use different weight tables — see /ccg:routing-guide for all of them.

Installation

npx ccg-workflow

Requirements: Claude Code CLI, Node.js 20+

Important: This project depends on ora@9.x and string-width@8.x, which require Node.js >= 20.

Optional: Codex CLI (competitive dispatch), Gemini CLI (competitive dispatch)

Benchmark Evidence

Routing decisions are based on published benchmarks, not assumptions:

Benchmark Claude Opus 4.6 GPT-5.3-Codex Gemini 3.1 Pro Source
SWE-bench Verified 80.8% 80.0% 80.6% marc0.dev
Terminal-Bench 2.0 74.7% 77.3% 78.4% (hard) SmartScope
WebDev Arena ~1510 ELO 1477 ELO 1487 ELO Google Dev Blog
Code Review Quality #1 tied Below Claude Below Claude Milvus Blog
Responsive Design Leader - Below Claude Index.dev

Model Strength Summary

Model Best At Evidence
Claude Code quality, architecture, responsive/a11y, code review SWE-bench leader, Milvus review #1, Index.dev responsive test
Codex Edge case detection, autonomous execution, Python, terminal workflows Terminal-Bench leader, catches bugs others miss, cloud sandbox
Gemini Visual design, rapid prototyping, large codebase analysis WebDev Arena #1, fastest iteration, 1M token context

Dispatch Modes

Mode Description Cost
Competitive (default) All models compete, weighted comparison ~3x
Focused Best-match model + Claude verification ~1-2x
Quick Claude only 1x

See /ccg:routing-guide for full weighted evaluation criteria per task type.

Commands

Command Description
/ccg:workflow Full 6-phase competitive workflow
/ccg:plan Competitive multi-model planning
/ccg:execute Competitive execution with output comparison
/ccg:feat Smart feature development
/ccg:frontend Frontend tasks (all models, visual weight)
/ccg:backend Backend tasks (all models, logic weight)
/ccg:analyze Technical analysis
/ccg:debug Competitive problem diagnosis
/ccg:optimize Competitive performance optimization
/ccg:test Test generation
/ccg:review Competitive code review (weighted consensus)
/ccg:routing-guide View benchmark evidence and routing criteria
/ccg:commit Git commit
/ccg:rollback Git rollback
/ccg:clean-branches Clean branches
/ccg:worktree Worktree management
/ccg:init Initialize CLAUDE.md
/ccg:enhance Prompt enhancement
/ccg:spec-init Initialize OPSX environment
/ccg:spec-research Requirements → Constraints
/ccg:spec-plan Constraints → Zero-decision plan
/ccg:spec-impl Execute plan + archive
/ccg:spec-review Dual-model cross-review
/ccg:team-research Agent Teams requirements → constraints
/ccg:team-plan Agent Teams constraints → parallel plan
/ccg:team-exec Agent Teams parallel execution
/ccg:team-review Agent Teams weighted consensus review
/ccg:codex-exec Codex full execution (plan → code → review)

Weighted Evaluation

Every competitive dispatch uses task-type-specific weights:

Analysis: Tech depth (Claude .35) | Risk (Codex .35) | UX (Gemini .40) | Action (.25 each)
Planning: Architecture (Claude .40) | Scale (Claude .30) | UI (Gemini .40) | Detail (Codex .30)
Implementation: Correctness (Claude .35) | Visual (Gemini .45) | Edge cases (Codex .35) | Maintain (Claude .35)
Review: Security (Codex .35) | Visual (Gemini .40) | Logic (Claude .35) | Perf (Codex .30)

OPSX Spec-Driven (v1.7.52+)

Integrates OPSX architecture to turn requirements into constraints:

/ccg:spec-init
/ccg:spec-research implement user authentication
/ccg:spec-plan
/ccg:spec-impl
/ccg:spec-review

Agent Teams Parallel Execution (v1.7.60+)

/ccg:team-research implement real-time collaboration kanban API
/ccg:team-plan kanban-api
/ccg:team-exec
/ccg:team-review

Prerequisite: CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 in settings.json

Configuration

Directory Structure

~/.claude/
├── commands/ccg/ # Slash commands
├── agents/ccg/ # Sub-agents
├── skills/ # Quality gates
├── bin/codeagent-wrapper
└── .ccg/
 ├── config.toml
 └── prompts/{codex,gemini}/

Environment Variables

Variable Description Default
CODEAGENT_POST_MESSAGE_DELAY Wait time after Codex completion (seconds) 5
CODEX_TIMEOUT codeagent-wrapper execution timeout (seconds) 7200
BASH_DEFAULT_TIMEOUT_MS Claude Code Bash default timeout (ms) 120000
BASH_MAX_TIMEOUT_MS Claude Code Bash max timeout (ms) 600000

codeagent-wrapper Auto-Authorization

CCG automatically installs a Hook to auto-authorize codeagent-wrapper commands.

Requirement: jq must be installed.

MCP Configuration

Code retrieval MCP (choose one):

  • ace-tool (recommended)
  • ContextWeaver (alternative)

Optional: Context7, Playwright, DeepWiki, Exa

Update / Uninstall

npx ccg-workflow@latest # Update
npx ccg-workflow # Select "Uninstall"

Credits

Star History

Star History Chart

License

MIT


v1.8.0 (Competitive Multi-Model Redesign) | Issues

About

CCG v1.8.0 — Competitive Multi-Model System. Redesigned from static routing (Gemini=frontend, Codex=backend) to evidence-based competitive dispatch where ALL models compete on every task. Fork of fengshao1227/ccg-workflow.

Resources

License

Stars

Watchers

Forks

Packages

Contributors

Languages

  • Go 57.1%
  • TypeScript 34.2%
  • JavaScript 8.6%
  • Shell 0.1%

AltStyle によって変換されたページ (->オリジナル) /