Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
/ cua Public

Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full desktops (macOS, Linux, Windows).

License

Notifications You must be signed in to change notification settings

trycua/cua

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Repository files navigation

Cua ("koo-ah") is Docker for Computer-Use Agents - it enables AI agents to control full operating systems in virtual containers and deploy them locally or to the cloud.

vibe-photoshop.mp4

With the Computer SDK, you can:

With the Agent SDK, you can:

  • run computer-use models with a consistent schema
  • benchmark on OSWorld-Verified, SheetBench-V2, and more with a single line of code using HUD (Notebook)
  • combine UI grounding models with any LLM using composed agents
  • use new UI agent models and UI grounding models from the Model Zoo below with just a model string (e.g., ComputerAgent(model="openai/computer-use-preview"))
  • use API or local inference by changing a prefix (e.g., openai/, openrouter/, ollama/, huggingface-local/, mlx/, etc.)

Quick Start

Agent Usage

pip install cua-agent[all]
from agent import ComputerAgent
agent = ComputerAgent(
 model="anthropic/claude-3-5-sonnet-20241022",
 tools=[computer],
 max_trajectory_budget=5.0
)
messages = [{"role": "user", "content": "Take a screenshot and tell me what you see"}]
async for result in agent.run(messages):
 for item in result["output"]:
 if item["type"] == "message":
 print(item["content"][0]["text"])

Output format

Cua uses the OpenAI Agent response format.

Example
{
 "output": [
 {
 "role": "user",
 "content": "go to trycua on gh"
 },
 {
 "summary": [
 {
 "text": "Searching Firefox for Trycua GitHub",
 "type": "summary_text"
 }
 ],
 "type": "reasoning"
 },
 {
 "action": {
 "text": "Trycua GitHub",
 "type": "type"
 },
 "call_id": "call_QI6OsYkXxl6Ww1KvyJc4LKKq",
 "status": "completed",
 "type": "computer_call"
 },
 {
 "type": "computer_call_output",
 "call_id": "call_QI6OsYkXxl6Ww1KvyJc4LKKq",
 "output": {
 "type": "input_image",
 "image_url": "data:image/png;base64,..."
 }
 },
 {
 "type": "message",
 "role": "assistant",
 "content": [
 {
 "text": "Success! The Trycua GitHub page has been opened.",
 "type": "output_text"
 }
 ]
 }
 ],
 "usage": {
 "prompt_tokens": 150,
 "completion_tokens": 75,
 "total_tokens": 225,
 "response_cost": 0.01
 }
}

Model Configuration

These are the valid model configurations for ComputerAgent(model="..."):

Configuration Description
{computer-use-model} A single model to perform all computer-use tasks
{grounding-model}+{any-vlm-with-tools} Composed with VLM for captioning and grounding LLM for element detection
moondream3+{any-llm-with-tools} Composed with Moondream3 for captioning and UI element detection
human/human A human-in-the-loop in place of a model

Model Capabilities

The following table shows which capabilities are supported by each model:

Model Computer-Use Grounding Tools VLM
Claude Sonnet/Haiku
OpenAI CU Preview
GLM-V
Gemini CU Preview
InternVL
UI-TARS
OpenCUA
GTA
Holo
Moondream
OmniParser

Model IDs

Examples of valid model IDs
Model Model IDs
Claude Sonnet/Haiku anthropic/claude-sonnet-4-5, anthropic/claude-haiku-4-5
OpenAI CU Preview openai/computer-use-preview
GLM-V openrouter/z-ai/glm-4.5v, huggingface-local/zai-org/GLM-4.5V
Gemini CU Preview gemini-2.5-computer-use-preview
InternVL huggingface-local/OpenGVLab/InternVL3_5-{1B,2B,4B,8B,...}
UI-TARS huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B
OpenCUA huggingface-local/xlangai/OpenCUA-{7B,32B}
GTA huggingface-local/HelloKKMe/GTA1-{7B,32B,72B}
Holo huggingface-local/Hcompany/Holo1.5-{3B,7B,72B}
Moondream moondream3
OmniParser omniparser

Missing a model? Create a feature request or contribute!

Computer

pip install cua-computer[all]
from computer import Computer
async with Computer(
 os_type="linux",
 provider_type="cloud",
 name="your-sandbox-name",
 api_key="your-api-key"
) as computer:
 # Take screenshot
 screenshot = await computer.interface.screenshot()
 # Click and type
 await computer.interface.left_click(100, 100)
 await computer.interface.type("Hello!")

Modules

Module Description Installation
Lume VM management for macOS/Linux using Apple's Virtualization.Framework curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh | bash
Lumier Docker interface for macOS and Linux VMs docker pull trycua/lumier:latest
Computer (Python/TS) Interface for controlling virtual machines pip install "cua-computer[all]"
npm install @trycua/computer
Agent AI agent framework for automating tasks pip install "cua-agent[all]"
MCP Server MCP server for using CUA with Claude Desktop pip install cua-mcp-server
SOM Self-of-Mark library for Agent pip install cua-som
Computer Server Server component for Computer pip install cua-computer-server
Core (Python/TS) Core utilities pip install cua-core
npm install @trycua/core

Resources

Community and Contributions

We welcome contributions to Cua! Please refer to our Contributing Guidelines for details.

Join our Discord community to discuss ideas, get assistance, or share your demos!

License

Cua is open-sourced under the MIT License - see the LICENSE file for details.

Portions of this project, specifically components adapted from Kasm Technologies Inc., are also licensed under the MIT License. See libs/kasm/LICENSE for details.

Microsoft's OmniParser, which is used in this project, is licensed under the Creative Commons Attribution 4.0 International License (CC-BY-4.0). See the OmniParser LICENSE for details.

Third-Party Licenses and Optional Components

Some optional extras for this project depend on third-party packages that are licensed under terms different from the MIT License.

  • The optional "omni" extra (installed via pip install "cua-agent[omni]") installs the cua-som module, which includes ultralytics and is licensed under the AGPL-3.0.

When you choose to install and use such optional extras, your use, modification, and distribution of those third-party components are governed by their respective licenses (e.g., AGPL-3.0 for ultralytics).

Trademarks

Apple, macOS, and Apple Silicon are trademarks of Apple Inc.
Ubuntu and Canonical are registered trademarks of Canonical Ltd.
Microsoft is a registered trademark of Microsoft Corporation.

This project is not affiliated with, endorsed by, or sponsored by Apple Inc., Canonical Ltd., Microsoft Corporation, or Kasm Technologies.

Stargazers

Thank you to all our supporters!

Stargazers over time

Sponsors

Thank you to all our GitHub Sponsors!

coderabbit-cli

About

Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full desktops (macOS, Linux, Windows).

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Sponsor this project

AltStyle によって変換されたページ (->オリジナル) /