Run Claude Code Agent Teams as a Kubernetes operator.
kagents is the project brand. The implementation lives in the
claude-teams-operatorrepository and ships under theclaude.amcheste.io/v1alpha1API group. Documentation site: kagents.dev (under construction. See v0.7.0 milestone).
Claude Code Agent Teams let multiple Claude Code instances collaborate. A lead coordinates work via a shared task list while teammates communicate through peer-to-peer mailboxes. Natively this runs on a single machine using tmux. This operator lifts that pattern into Kubernetes so you can run large-scale agent teams on your cluster.
The operator supports two distinct use cases controlled by a single field in the AgentTeam spec:
| Mode | Use when | Key field |
|---|---|---|
| Coding | Agents work on a git repository | spec.repository |
| Cowork | Agents produce documents, reports, emails, analysis | spec.workspace |
Both modes share the same coordination protocol (shared PVCs, mailboxes, task lists) and all Cowork extensions (Skills, MCP servers, approval gates).
- Native Agent Teams protocol. Preserves Anthropic's file-based mailbox and task list format over ReadWriteMany PVCs; no protocol translation
- Per-teammate git worktrees. Each coding agent works on an isolated branch to prevent merge conflicts
- Cowork mode. Mount ConfigMap/PVC inputs and collect outputs without requiring a git repo
- Skills as CRD fields. Mount Claude Code skills from ConfigMaps into each agent's
.claude/skills/ - MCP servers per agent. Configure Model Context Protocol connections per teammate
- Approval gates. Pause spawning specific teammates until a human applies an annotation
- Budget enforcement. Terminate the team if estimated API cost exceeds a configured limit
- Timeout enforcement. Terminate the team after a configurable wall-clock duration
dependsOnordering. Spawn teammates only after their declared dependencies complete- Reusable templates. Define team patterns with
AgentTeamTemplate, instantiate withAgentTeamRun
- Kubernetes 1.28+
- ReadWriteMany PVC support (NFS, EFS, or a compatible CSI driver. See ARCHITECTURE.md § Storage Requirements for options)
- Claude Code CLI access (Max subscription or API key)
- Opus 4.6 model access (required for Agent Teams)
# 1. Install the operator (CRDs + controller + RBAC) helm install claude-teams-operator \ oci://ghcr.io/amcheste/charts/claude-teams-operator \ --namespace claude-teams-system --create-namespace # 2. Create an API key secret in the namespace where your teams will run kubectl create namespace dev-agents kubectl create secret generic anthropic-api-key \ --namespace dev-agents \ --from-literal=ANTHROPIC_API_KEY=sk-ant-... # 3. Apply a sample team kubectl apply -n dev-agents -f \ https://raw.githubusercontent.com/amcheste/claude-teams-operator/main/config/samples/auth-refactor-team.yaml # 4. Watch the team progress kubectl get agentteams -n dev-agents -w kubectl describe agentteam auth-refactor -n dev-agents
For contributors and anyone who wants to run the full stack from source:
# 1. Create a Kind cluster with NFS provisioner make kind-create # 2. Build and load images into Kind make docker-build docker-build-runner kind-load # 3. Install CRDs and deploy the operator make install deploy # 4. Create your API key secret kubectl create secret generic anthropic-api-key \ --namespace dev-agents \ --from-literal=ANTHROPIC_API_KEY=sk-ant-... # 5. Apply a sample team kubectl apply -f config/samples/auth-refactor-team.yaml
See CONTRIBUTING.md for the full dev loop (testing, linting, manifest regeneration).
apiVersion: claude.amcheste.io/v1alpha1 kind: AgentTeam metadata: name: auth-refactor namespace: dev-agents spec: repository: url: "git@github.com:acme/backend.git" branch: "main" credentialsSecret: "git-credentials" auth: apiKeySecret: "anthropic-api-key" lead: model: "opus" prompt: | Coordinate the migration from JWT to OAuth2. Assign backend-api, frontend-auth, and test-coverage to their tracks. Validate integration when all tracks complete. teammates: - name: "backend-api" model: "sonnet" prompt: "Implement OAuth2 endpoints. Remove JWT middleware." scope: includePaths: ["src/api/auth/", "src/middleware/"] - name: "test-coverage" model: "sonnet" prompt: "Write comprehensive tests for the OAuth2 migration." dependsOn: ["backend-api"] lifecycle: timeout: "2h" budgetLimit: "30.00" onComplete: "create-pr" pullRequest: targetBranch: "main" titleTemplate: "feat(auth): migrate from JWT to OAuth2"
apiVersion: claude.amcheste.io/v1alpha1 kind: AgentTeam metadata: name: q3-report namespace: cowork-agents spec: workspace: inputs: - configMap: "quarterly-data" mountPath: "/workspace/data" output: mountPath: "/workspace/output" size: "5Gi" auth: apiKeySecret: "anthropic-api-key" lead: model: "opus" prompt: "Coordinate the Q3 business report. Assign research, writing, and design." teammates: - name: "researcher" model: "sonnet" prompt: "Analyse the data in /workspace/data and produce a findings summary." skills: - name: "data-analysis" source: configMap: "data-analysis-skill" - name: "email-drafter" model: "sonnet" prompt: "Draft follow-up emails for all Q3 prospects." mcpServers: - name: "gmail" url: "https://gmail.mcp.example.com/mcp" lifecycle: timeout: "3h" budgetLimit: "15.00" approvalGates: - event: "spawn-email-drafter" channel: "webhook" webhookUrl: "https://hooks.example.com/approvals"
Grant approval after reviewing the researcher's output:
kubectl annotate agentteam q3-report \
"approved.claude.amcheste.io/spawn-email-drafter=true" \
-n cowork-agentsThe primary resource. Defines the full team, its workspace, lifecycle, and observability config.
| Field | Type | Description |
|---|---|---|
spec.repository |
RepositorySpec |
Git repo config (coding mode). Optional when spec.workspace is set. |
spec.workspace |
WorkspaceSpec |
Input/output volumes (Cowork mode). Optional when spec.repository is set. |
spec.auth |
AuthSpec |
API key or OAuth secret reference. |
spec.lead |
LeadSpec |
Lead agent model, prompt, skills, and MCP servers. |
spec.teammates |
[]TeammateSpec |
Worker agents with optional dependsOn, scope, skills, mcpServers. |
spec.lifecycle.timeout |
string |
Max duration, e.g. "4h". Defaults to "4h". |
spec.lifecycle.budgetLimit |
string |
Max USD spend, e.g. "10.00". No limit if unset. |
spec.lifecycle.onComplete |
string |
create-pr | push-branch | notify | none |
spec.lifecycle.approvalGates |
[]ApprovalGateSpec |
Human-in-the-loop gates before spawning a teammate. |
A reusable team pattern. Does not run on its own. Instantiate with AgentTeamRun.
Instantiates an AgentTeamTemplate against a specific repo or workspace.
apiVersion: claude.amcheste.io/v1alpha1 kind: AgentTeamRun metadata: name: q4-security-review spec: templateRef: name: fullstack-review repository: url: "git@github.com:acme/platform.git" branch: "release/4.0" credentialsSecret: "git-credentials" auth: apiKeySecret: "anthropic-api-key" lead: model: "opus" prompt: "Run a full security, performance, and test quality review."
Watch team progress. The Ready column reports running+completed/total teammates, so 2/3 means two of three workers are up (or have finished) while one is still spawning or blocked on a dependency:
kubectl get agentteams -A # NAME PHASE READY TASKS DONE COST AGE # auth-refactor Running 2/3 7 1ドル.42 14m # q3-report Completed 2/2 12 3ドル.80 2h
Inspect details, including operator events emitted at every phase transition:
kubectl describe agentteam auth-refactor -n dev-agents # Status: # Phase: Running # Ready: 2/3 # Estimated Cost: 1.42 # Lead: # Pod Name: auth-refactor-lead # Phase: Running # Teammates: # - Name: backend-api, Phase: Running # - Name: test-coverage, Phase: Waiting (dependsOn: backend-api) # Events: # Normal Initializing 5m agentteam-controller Provisioned PVCs and launched init Job # Normal Running 4m agentteam-controller All agent pods started
Approval gates block a teammate from being spawned until a human applies an annotation.
# Inspect which teammates are waiting kubectl get agentteam my-team -o jsonpath='{.status.teammates[*].pendingApproval}' # Grant approval kubectl annotate agentteam my-team \ "approved.claude.amcheste.io/spawn-email-drafter=true"
If channel: webhook is set, the operator POSTs a JSON payload to webhookUrl when the gate is triggered, allowing an external system to present the approval to a human and then apply the annotation.
This README is the entry point. For deeper dives, every topic lives in a dedicated in-repo document:
| Document | Read when you want to... |
|---|---|
| ARCHITECTURE.md | Understand how the operator models Agent Teams. Phase state machine, PVC layout, RWX storage backends, coordination protocol, key design tradeoffs. |
| TESTING.md | See the test strategy (unit / integration / acceptance / E2E), how to run each suite, and what each one actually verifies. |
| CONTRIBUTING.md | Set up a dev environment, run the full build/test loop, follow the branch + PR workflow, and walk through "How to add a new reconciler feature." |
| docs/helm-values.md | Tune the Helm chart. Every value documented with defaults and production override recipes. |
| SECURITY.md | Report a vulnerability or review the project's security policy. |
| KUBECON.md | See the talk framing and "interesting problems" log. Useful context for why specific architectural choices were made. |
Common Makefile targets (full loop in CONTRIBUTING.md):
make build # Build operator binary make test # Run all tests make lint # Run golangci-lint make manifests # Regenerate CRD manifests make generate # Regenerate deepcopy methods
Apache 2.0